Report on Cache-Oblivious Priority Queue and Graph Algorithm Applications[1]
|
|
- Gwenda McCormick
- 6 years ago
- Views:
Transcription
1 Report on Cache-Oblivious Priority Queue and Graph Algorithm Applications[1] Marc André Tanner May 30, 2014 Abstract This report contains two main sections: In section 1 the cache-oblivious computational model is motivated and introduced. Section 2 describes the main contribution of the original paper an optimal cache-oblivious priority queue. 1 Background and Models of Computation Memory systems of modern computers consist of complex multilevel memory hierarchies with several layers of cache, main memory and disk. Since access times between different layers of the hierarchy can vary by several orders of magnitude, it is becoming increasingly important to obtain high data locality in memory access patterns. These developments lead to new theoretical models of computation which allow to analyze algorithms with regard to their memory behaviour. 1.1 Random Access Machine (RAM) Model In the Random Access Machine (RAM) model memory is assumed to be infinite i.e. the relevant data always fits into main memory. Furthermore the memory is considered to have uniform access time, which is why it also referred to as flat memory. Clearly these assumptions are not suitable for use cases in which the memory system becomes the bottleneck, especially in a Big Data context with graphs consisting of millions of vertices and billions of edges. However developing a model which is both simple and realistic is a challenging task. The external memory model, which has gained widespread use due to its simplicity, is described next. 1
2 1.2 External Memory Model This model is also known as the I/O model or the cache-aware model (to contrast it with cache obliviousness) and was introduced in 1988 by Aggarwal and Vitter[2]. In order to avoid the complications of multilevel memory models the memory hierarchy consists of only two levels: an internal memory (often called cache) of size M which is fast to access but limited in space, and an arbitrarily large external memory (referred to as disk), partitioned into blocks of size B, with significantly slower access times. The efficiency of an algorithm is measured in terms of the number of block or memory transfers it performs between these two levels. What follows are a few important bounds which characterize the I/O model and will be used later on in the analysis of the priority queue. linear or scanning bound scan(n) = Θ( N B ) is the number of memory transfers needed to read N contiguous items from disk. sorting bound sort(n) = Θ( N B log M/B N B ) refers to the fact that sort(n) memory transfers are both necessary and sufficient to sort N elements. finding the median of N elements is possible in O( N B ) memory transfers. searching bound, the number of memory transfers needed to search for an element among a set of N elements is Ω(log B N). The searching bound is matched by the B-tree, which also supports updates in O(log B N). Notice however that unlike in the RAM model, one cannot sort optimally with a search tree inserting N elements in a B-tree takes N O(N log B N) memory transfers which is a factor of (B log B N)/(log M/B B ) from optimal. While the external memory model is reasonably simple, its algorithms still crucially depend on the parametrization of M and B. Furthermore the algorithms have (at least in principle) to explicitly issue read/write requests to the disk as well as explicitly manage the cache. 1.3 Cache-Oblivious Model The main idea of the cache-oblivious model introduced in 1999 by Frigo et al.[3] is to design and analyze algorithms in the external memory model, but without using the parameters M and B in the algorithm description. Thus combining the simplicity of a two-level model with the realism of more complicated hierarchical models. The cache oblivious model is based on a few assumptions which might seem unrealistic at first, but Frigo et al. showed with a couple of reductions, based on the least recently used (LRU) replacement strategy and 2-universal hash functions, that such a model can be simulated by essentially any memory system with only a small constant-factor overhead. 2
3 These assumptions are: there exist exactly two levels of memory. optimal paging strategy that is if main memory is full the ideal i.e. the block which will be accessed the farthest in the future will be evicted. automatic replacement when an algorithm accesses an element that is not currently stored in main memory, the relevant block is automatically fetched with a memory transfer. full associativity any block can be stored anywhere in memory. tall cache assumption: the number of blocks M/B is larger than the size of each block B or equivalently M B 2. It is important to realize that if a cache-oblivious algorithm performs well between two levels of the hierarchy, then it must automatically work well between any two adjacent levels of the memory hierarchy. This is implied by the fact that a cache-oblivious algorithm neither depends on the memory size nor on the block size. Therefore if the algorithm is optimal in the two-level model, it is optimal on all levels of the hierarchy. This is the reason why the cache-oblivious model is useful: it allows convenient algorithm analysis in a simple two-level model while still deriving reasonable conclusions about much more complex, multilayer memory systems as found in contemporary systems. 2 Optimal Cache-Oblivious Priority Queue 2.1 Background and Motivation A priority queue maintains a set of elements each with a priority (or key) under the operations insert, delete and deletemin, where a deletemin operation finds and deletes the minimum key element in the queue. The goal is now to design such a data structure which is both optimal (that is the number of memory transfers matches the sorting bound) as well as cacheoblivious (i.e. it should work without any knowledge of the memory and block size). These criteria are not satisfied by an implementation based on a heap or a balanced search tree as known from the RAM model. The authors point out that even though several efficient priority queues for the I/O model are known, none of them can readily be made cache-oblivious. Since there exist cache-oblivious B-tree implementations supporting all operations in O(log B N) memory transfers, this immediately implies the existence of a O(log B N) cache-oblivious priority queue. However as discussed above, in order to sort optimally a data structure performing all the operations in O( 1 B log M/B N B ) amortized memory transfers and O(log 2 N) amortized computation time is needed. This is exactly what the presented cache-oblivious priority queue achieves. 3
4 2.2 Structure In order to minimize the number of memory transfers a technique which could be described as lazy evaluation using buffers is employed. The idea is to keep keys with similar priorities grouped into buffers in such that random I/O is avoided as much as possible. Intuitively a buffer holds a certain interval of elements and is used to move elements between levels. As a consequence all elements of a buffer can be processed sequentially, in one operation, thus amortizing the cache misses among all involved elements. To efficiently support the deletemin operation an order among the elements - or at least among the buffers - needs to be maintained. Therefore the priority queue is structured in terms of levels which contain various buffers. The general idea is that smaller elements are stored in lower levels and as the level grow the contained elements do likewise. In particular all insert and deletemin operations are performed on the lowest level and over time the larger elements rise up, whereas the smaller ones trickle down. During this process a level might reach its maximum capacity and elements of a buffer need to be pushed one level up. Similarly if a level becomes too empty, elements are pulled from the next higher level. As will be shown, the structure is carefully designed in such a way that these operations can be performed efficiently. The whole data structure is statically pre-allocated and is completely rebuilt after a certain number of operations. The following section formally introduces this multilevel structure, the contained buffers as well as the maintained invariants Levels The priority queue is built of Θ(log log N) levels. The largest level has size N and all subsequent levels decrease by the power of 2/3 until a constant size c is reached. The levels are referred to by their respective sizes and the i th level from above has size N (2/3)i 1. Thus the levels from largest to smallest are level N, level N 2/3,..., level X 3/2, level X, level X 2/3..., level c 3/2, level c Buffers In order to efficiently move elements between different levels there exist two types of buffers. Up buffers store elements which are on their way up (i.e. they have not yet found the buffer they belong to) and will be stored in one of the down buffers higher up in the hierarchy. Similarly down buffers store elements which are on their way down. Their size is chosen in such a way that the up buffer one level down can quickly be filled with the smallest elements among the down buffers of this level. Level X consists of one up buffer u X that can store up to X elements, and at most X 1/3 down buffers d X 1,..., d X each containing between 1 X 1/3 2 X2/3 and 2X 2/3 elements. Notice that this means that each down buffer is at all times at least a quarter full. The element of each down buffer with the largest 4
5 . level X 3/2 d X3/2 1 u X3/2.. d X3/2 X 1/2 level X up buffer of size X at most X 1/3 down buffers each of size X 2/3 level X 2/ up buffer of size X 2/3 at most X 2/9 down buffers each of size X 4/9. Figure 1: Levels X 3/2, X and X 2/3 of the priority queue data structure with some example elements illustrating invariants 1-3. Pivot elements are highlighted. key is called the pivot element. In total the maximum capacity of level X is X + X 1/3 2X 2/3 = 3X. The size of the down buffers is twice as large as the size of the up buffer one level down. As an example consider the down buffers on level X 3/2 which have size 2X (3/2)2/3 = 2X, whereas the up buffer u X one level below has size X. Furthermore the following three invariants about the relationship between the elements in buffers of various levels are maintained. Invariant 1. At level X, elements are sorted among the down buffer, that is, elements in d X i have smaller keys than elements in d X i+1, but the elements within are unordered. d X i Invariant 2. At level X, the elements in the down buffers have smaller keys than the elements in the up buffer. Invariant 3. The elements in the down buffers at level X have smaller keys than the elements in the down buffer at the next higher level X 3/2. These invariants define intervals for the various buffers and ensure that the elements get larger as the levels grow Layout The priority queue is stored in a contiguous array, where levels are placed consecutively from smallest to largest. Each level reserves space for its maximal capacity of 3X. The up buffer is stored first, followed by all down buffers in an arbitrary order, but linked together to form an ordered linked list. 5
6 u X d X 1 d X 3 d X 2 d X 4... d X X 1/ X X 1/3 2X 2/3 = 2X Figure 2: Physical storage layout of level X which has size 3X. arbitrary, but linked together order of the down buffers. Notice the Summing up over all levels log 3/2 log c N i=0 3N (2/3)i = O(N) leads to the following space requirement. Lemma 1. The cache-oblivious priority queue uses O(N) space. 2.3 Operations The priority queue works with two main operations, push which inserts X elements into the next higher level X 3/2 and pull which moves the X elements with smallest key from level X 3/2 to the next lower one. Thus inserting an element into the priority queue corresponds to a push into the lowest level c. Similarly a deletemin is implemented by a pull from the lowest level Push A push is used when level X is full. In which case the largest X elements are moved from level X into the level above X 3/2. As a first step the X elements which are to be inserted into level X 3/2 are sorted cache-obliviously using O(1 + X B log M/B X B ) memory transfers. Now that these X elements are sorted they are distributed into the X 1/2 down buffers of the next higher level X 3/2. Remember that the elements are sorted among the down buffers (Invariant 1), and each down buffer stores its largest element as a pivot element. Therefore distributing the elements works by visiting the down buffers in linked order and appending elements as long as they are smaller than the current down buffer s pivot element. Elements with keys larger than the pivot element of the last down buffer d X3/2 are inserted into the up buffer of X 1/2 the same level u X3/2 While this process is fairly straight forward a few corner cases need to be handled carefully: down buffer overflows: Remember that a down buffer on level X 3/2 is twice as large as the up buffer on level X and thus has a maximal capacity of 2X. If during the distribution of elements this capacity is reached, the down buffer is split into two new down buffers and the elements are evenly distributed into the two new buffers such that both contain X elements. 6
7 Algorithm 1 push X elements into level X 3/2 Input: an array A of size X B := d1 X3/2 A sort(a) for all e A do {find the correct down buffer to insert the current element} while B nil and B u X3/2 and x > pivot(b) do B := B.next end while if B = nil then {the element is too large for the down buffers} B := u X3/2 {hence prepare insertion into up buffer} if B = u X3/2 then insert-into-up-buffer({e}) {see algorithm 2} else if B = 2X then {down buffer full} {check whether there is space left for a new down buffer} if number-of-down-buffers-on-level(x 3/2 ) = X 1/2 then {if not, move content of last down buffer to up buffer} insert-into-up-buffer(d X3/2 ) {see algorithm 2} X 1/2 d X3/2 X 1/2 B new := {allocate a new down buffer} m median(b) for all e B do {split the current buffer based on its median} if e > m then B B \ {e} B new B new {e} end for {chain the new buffer into the linked list} B new.next = B.next B.next = B new {make sure the current element will be placed into the correct buffer} if e > pivot(b) then B := B new B B {e} {add the element to the buffer} end for 7
8 Algorithm 2 insert a set of elements into the up buffer of level X 3/2, used by push Input: an array A for all e A do if u X3/2 = X 3/2 then {check wheter up buffer is full} push(u X3/2 ) {if so, push all elements into the next higher level X 9/4 } u X3/2 u X3/2 u X3/2 {e} end for This splitting step is a two phase process, first the median of the elements is calculated based on which the elements are partitioned into their respective buffer in a simple scan. When calculating the median it is assumed that the priority queue contains no duplicate keys, that is no elements with the same priority. Since the down buffers are linked together to form an ordered list, the newly allocated buffer can be placed at the end (after all already existing down buffers) where space is reserved. All in all this case can be implemented in median(x) + scan(x) + O(1) memory transfers. level X 3/2 already contains the maximum X 1/2 amount of down buffers: This case is problematic if the above splitting procedure happens when there is no space left to allocate a new down buffer. In this case the less than 2X elements of the last down buffer d X3/2, which by Invariant 1 are X 1/2 larger than all elements of the other down buffers, are moved into the up buffer u X3/2. Since the number of elements involved is bounded by 2X this case can be handled in scan(x) + O(1). up buffer u X3/2 overflows: If the up buffer reaches its maximum capacity of X 3/2 all of its elements are recursively pushed into the next higher level. Notice that after such a recursive push the up buffer is empty and X 3/2 elements have to be inserted before another recursive push is needed. The cost of this recursive push is for now ignored, it will be taken into account when doing an amortized analysis over all levels. Let us now do an analysis with regard to the number of memory transfers needed to perform such a push operation. First the X elements are sorted, during the distribution step X elements are scanned and in the worst case each of the X 1/2 down buffers is visited. The above listed special cases can all be dealt with in scan(x) memory transfers which means a pull can be performed in O(1) + sort(x) + scan(x) + X 1/2 = O(1 + X B log M/B 8 X B + X1/2 )
9 memory transfers. However upon closer inspection the X 1/2 term, which stems from the fact that during the distribution step non-full buffers might have to be written back, can be eliminated. To see this a case distinction on X, the number of elements involved is performed. B 2 < X : or equivalently X 1/2 < X B, which immediately leads to O(1 + X B log M/B X B ). B X B 2 : in this case the X 1/2 term could possibly dominate. The problem is that during the distribution step a down buffer could have to be written back even though its data does not amount to a full block. However since X 1/2 B M B, where the second inequality is justified by the tall-cache assumption (M B 2 ), a block for each of the X 1/2 down buffers can fit into memory. Notice that the operations take place on level X 3/2 and since B 3/2 X 3/2 B 3 there exist only one such level. Therefore a fraction of the main memory can be reserved to hold such partially filled blocks until they become full and are written back to disk. Since the assumed optimal paging strategy will perform at least as good as the strategy outlined above, the X 1/2 term can be eliminated. X < B : this case induces no costs since all levels less than B 3/2 can be kept in main memory at all times. Ignoring the cost of the recursion for now it can be concluded that: Lemma 2. A push of X elements from level X up to level X 3/2 can be performed in O(1+ X B log M/B X B ) memory transfers amortized while maintaining Invariants Pull A pull operation removes the X elements with smallest key from level X 3/2 and returns them in sorted order. It is used when there are not enough elements in the down buffers of level X. Recall that each down buffer at all times needs to be at least 1/4 full, which amounts to X/2 elements. During a pull X elements will be removed, but this invariant still has to be fulfilled. Therefore a case distinction on the number of elements contained within all down buffers is performed. In the first case it is assumed that the down buffers contain at least 3 2 X elements. Since the maximal capacity of a down buffer at level X 3/2 is 2X, the first three down buffers contain the smallest between 3 2X and 6X elements. These elements are sorted using O(1 + X B log M/B X B ) memory transfers. The smallest X elements are removed, while the other remaining between X/2 and 5X elements are left in one, two, or three down buffers of size between X/2 and 2X. These buffers can be constructed in O(1 + X/B) which means the sorting dominates. This procedure maintains Invariants
10 Algorithm 3 Pull from level X 3/2, remove the X smallest elements Output: the X smallest elements in sorted order {check whether the first three down buffers contain enough elements} if d1 X3/2 d2 X3/2 d X3/2 < 3 2 X then 3 P pull from X 9/4 {if not, pull X 3/2 elements from the level above} U u X3/2 M merge(p, sort(u X3/2 )) u X3/2 U largest elements of M {distribute the remaining smaller elements into the down buffers} di X3/2 M \ u X3/2 {sort the first 3 down buffers, return the X smallest elements and distribute the remaining ones} T sort(d1 X3/2 d2 X3/2 d3 X3/2 ) S X smallest elements of T d1 X3/2, d2 X3/2, d3 X3/2 T \ S return S In the second case where the down buffers contain fewer than 3 2 X elements, a recursive pull of X 3/2 elements is performed on the next higher level. Recall that the keys of the elements in an up buffer are unordered relative to the keys of the elements in the down buffers one level up. Assume the up buffer u X3/2 contains U elements, these are sorted and then merged with the already sorted elements pulled from one level above. Now that all elements are sorted the U elements with largest keys are inserted into the up buffer, thus the number of elements in the up buffer is the same as before the pull operation. The remaining between X 3/2 and X 3/ X elements are distributed into the X1/2 down buffers of size between X and X X1/2. This procedure maintains the three invariants and the down buffers now contain at least X 3/2 elements, which means the previously discussed first case applies. As for the cost it requires one sort and one scan of X 3/2 elements, which is negligible compared to the cost of the recursive pull operation on the next level up. Ignoring these costs for now, it can be concluded that: Lemma 3. A pull of X elements from level X 3/2 down to level X can be performed in O(1+ X B log M/B X B ) memory transfers amortized while maintaining Invariants Total cost In order to analyze the amortized cost of an insert or deletemin, a sequence of N/2 operations is considered with regard to the performed memory transfers in their respective push and pull invocations. After N/2 operations the structure is 10
11 completely rebuilt such that all up buffers are empty and level X has X 1/3 down buffers each containing X 2/3. Notice that this ensures that the largest level N is always of size Θ(N). The rebuilding can be performed in a sorting step using sort(n) memory transfers, or O( 1 B log M/B N B ) transfers per operation. A push of X elements from level X up to level X 3/2 is charged to level X, because after such a push the up buffer u X is completely empty and X elements will have to be inserted before another recursive push is needed. Similarly a pull of X elements from level X 3/2 down to level X is charged to level X, because X elements will have to be deleted from level X before another recursive pull is needed. During the N/2 operations at most O(N/X) pushes and pulls are charged to level X. According to Lemma 2 and 3, a push or pull charged to level X uses O(1 + X B log M/B X B ) memory transfers. Altogether, the amortized memory transfers during the N/2 operations charged to level X are bounded by O(1 + 1 B log M/B X B ). Thus summing up over all levels it can be concluded that the total amortized transfer cost of an insert or deletemin operation in the sequence of N/2 such operations is: 1 O( B log M/B i=0 N (2/3)i B ) = O( 1 B log M/B N B ) The paper briefly mentions that a delete operation can be implemented in the same bounds and concludes with: Theorem 1. A set of N elements can be maintained in a linear-space cacheoblivious priority queue data structure supporting each insert, deletemin and delete operation in O( 1 B log M/B N B ) amortized memory transfers and O(log 2 N) amortized computing time. 3 Conclusion On a personal note I find the simplicity of the cache-oblivious model quite appealing. It is remarkable that such a universal, hardware independent model can be used to predict certain algorithm characteristics of real world systems. More concretely the main insight I got from the paper is the idea of lazy evaluation using buffers. That is the technique to carefully craft a data structure in a way that just about enough data can be kept in memory. Organizing the data such that the required operations can always be performed in sequential fashion, thus yielding excellent I/O behaviour regardless of the underlying memory system. As for further information, the interested reader can find a more detailed description and analysis of the cache-oblivious priority queue in a follow up paper by the same authors[4]. 11
12 References [1] L. Arge and M. A. Bender and E. D. Demaine and B. Holland-Minkley and J. I. Munro. Cache-Oblivious Priority Queue and Graph Algorithm Applications. Proceedings of the 34th Annual ACM Symposium on Theory of Computing, pages , [2] A. Aggarwal and J. S. Vitter. The Input/Output complexity of sorting and related problems. Communications of the ACM, 31(9): , [3] M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proceedings of the IEEE Symposium on Foundations of Computer Science, pages , [4] L. Arge and M. A. Bender and E. D. Demaine and B. Holland-Minkley and J. I. Munro. An Optimal Cache-Oblivious Priority Queue and its Application to Graph Algorithms. SIAM Journal on Computing, Volume 36, Number 6, pages ,
Lecture 8 13 March, 2012
6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 8 13 March, 2012 1 From Last Lectures... In the previous lecture, we discussed the External Memory and Cache Oblivious memory models.
More informationFunnel Heap - A Cache Oblivious Priority Queue
Alcom-FT Technical Report Series ALCOMFT-TR-02-136 Funnel Heap - A Cache Oblivious Priority Queue Gerth Stølting Brodal, Rolf Fagerberg Abstract The cache oblivious model of computation is a two-level
More informationLecture 19 Apr 25, 2007
6.851: Advanced Data Structures Spring 2007 Prof. Erik Demaine Lecture 19 Apr 25, 2007 Scribe: Aditya Rathnam 1 Overview Previously we worked in the RA or cell probe models, in which the cost of an algorithm
More informationCache-Oblivious Traversals of an Array s Pairs
Cache-Oblivious Traversals of an Array s Pairs Tobias Johnson May 7, 2007 Abstract Cache-obliviousness is a concept first introduced by Frigo et al. in [1]. We follow their model and develop a cache-oblivious
More informationCache-Oblivious Algorithms A Unified Approach to Hierarchical Memory Algorithms
Cache-Oblivious Algorithms A Unified Approach to Hierarchical Memory Algorithms Aarhus University Cache-Oblivious Current Trends Algorithms in Algorithms, - A Unified Complexity Approach to Theory, Hierarchical
More informationAdvanced Database Systems
Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed
More informationBuffer Heap Implementation & Evaluation. Hatem Nassrat. CSCI 6104 Instructor: N.Zeh Dalhousie University Computer Science
Buffer Heap Implementation & Evaluation Hatem Nassrat CSCI 6104 Instructor: N.Zeh Dalhousie University Computer Science Table of Contents Introduction...3 Cache Aware / Cache Oblivious Algorithms...3 Buffer
More information38 Cache-Oblivious Data Structures
38 Cache-Oblivious Data Structures Lars Arge Duke University Gerth Stølting Brodal University of Aarhus Rolf Fagerberg University of Southern Denmark 38.1 The Cache-Oblivious Model... 38-1 38.2 Fundamental
More informationAlgorithms and Data Structures: Efficient and Cache-Oblivious
7 Ritika Angrish and Dr. Deepak Garg Algorithms and Data Structures: Efficient and Cache-Oblivious Ritika Angrish* and Dr. Deepak Garg Department of Computer Science and Engineering, Thapar University,
More informationLecture 6: External Interval Tree (Part II) 3 Making the external interval tree dynamic. 3.1 Dynamizing an underflow structure
Lecture 6: External Interval Tree (Part II) Yufei Tao Division of Web Science and Technology Korea Advanced Institute of Science and Technology taoyf@cse.cuhk.edu.hk 3 Making the external interval tree
More informationReport Seminar Algorithm Engineering
Report Seminar Algorithm Engineering G. S. Brodal, R. Fagerberg, K. Vinther: Engineering a Cache-Oblivious Sorting Algorithm Iftikhar Ahmad Chair of Algorithm and Complexity Department of Computer Science
More informationPreview. Memory Management
Preview Memory Management With Mono-Process With Multi-Processes Multi-process with Fixed Partitions Modeling Multiprogramming Swapping Memory Management with Bitmaps Memory Management with Free-List Virtual
More informationLecture April, 2010
6.851: Advanced Data Structures Spring 2010 Prof. Eri Demaine Lecture 20 22 April, 2010 1 Memory Hierarchies and Models of Them So far in class, we have wored with models of computation lie the word RAM
More informationLecture 24 November 24, 2015
CS 229r: Algorithms for Big Data Fall 2015 Prof. Jelani Nelson Lecture 24 November 24, 2015 Scribes: Zhengyu Wang 1 Cache-oblivious Model Last time we talked about disk access model (as known as DAM, or
More informationLecture Notes: External Interval Tree. 1 External Interval Tree The Static Version
Lecture Notes: External Interval Tree Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk This lecture discusses the stabbing problem. Let I be
More informationl So unlike the search trees, there are neither arbitrary find operations nor arbitrary delete operations possible.
DDS-Heaps 1 Heaps - basics l Heaps an abstract structure where each object has a key value (the priority), and the operations are: insert an object, find the object of minimum key (find min), and delete
More informationMassive Data Algorithmics. Lecture 12: Cache-Oblivious Model
Typical Computer Hierarchical Memory Basics Data moved between adjacent memory level in blocks A Trivial Program A Trivial Program: d = 1 A Trivial Program: d = 1 A Trivial Program: n = 2 24 A Trivial
More informationHierarchical Memory. Modern machines have complicated memory hierarchy
Hierarchical Memory Modern machines have complicated memory hierarchy Levels get larger and slower further away from CPU Data moved between levels using large blocks Lecture 2: Slow IO Review Disk access
More informationI/O-Algorithms Lars Arge
Fall 2014 August 28, 2014 assive Data Pervasive use of computers and sensors Increased ability to acquire/store/process data assive data collected everywhere Society increasingly data driven Access/process
More informationAnalyze the obvious algorithm, 5 points Here is the most obvious algorithm for this problem: (LastLargerElement[A[1..n]:
CSE 101 Homework 1 Background (Order and Recurrence Relations), correctness proofs, time analysis, and speeding up algorithms with restructuring, preprocessing and data structures. Due Thursday, April
More information6 Distributed data management I Hashing
6 Distributed data management I Hashing There are two major approaches for the management of data in distributed systems: hashing and caching. The hashing approach tries to minimize the use of communication
More informationExternal-Memory Algorithms with Applications in GIS - (L. Arge) Enylton Machado Roberto Beauclair
External-Memory Algorithms with Applications in GIS - (L. Arge) Enylton Machado Roberto Beauclair {machado,tron}@visgraf.impa.br Theoretical Models Random Access Machine Memory: Infinite Array. Access
More informationDeliverables. Quick Sort. Randomized Quick Sort. Median Order statistics. Heap Sort. External Merge Sort
More Sorting Deliverables Quick Sort Randomized Quick Sort Median Order statistics Heap Sort External Merge Sort Copyright @ gdeepak.com 2 Quick Sort Divide and conquer algorithm which relies on a partition
More information6.895 Final Project: Serial and Parallel execution of Funnel Sort
6.895 Final Project: Serial and Parallel execution of Funnel Sort Paul Youn December 17, 2003 Abstract The speed of a sorting algorithm is often measured based on the sheer number of calculations required
More informationCACHE-OBLIVIOUS SEARCHING AND SORTING IN MULTISETS
CACHE-OLIVIOUS SEARCHIG AD SORTIG I MULTISETS by Arash Farzan A thesis presented to the University of Waterloo in fulfilment of the thesis requirement for the degree of Master of Mathematics in Computer
More informationA Distribution-Sensitive Dictionary with Low Space Overhead
A Distribution-Sensitive Dictionary with Low Space Overhead Prosenjit Bose, John Howat, and Pat Morin School of Computer Science, Carleton University 1125 Colonel By Dr., Ottawa, Ontario, CANADA, K1S 5B6
More informationFINAL EXAM SOLUTIONS
COMP/MATH 3804 Design and Analysis of Algorithms I Fall 2015 FINAL EXAM SOLUTIONS Question 1 (12%). Modify Euclid s algorithm as follows. function Newclid(a,b) if a
More informationLecture 22 November 19, 2015
CS 229r: Algorithms for ig Data Fall 2015 Prof. Jelani Nelson Lecture 22 November 19, 2015 Scribe: Johnny Ho 1 Overview Today we re starting a completely new topic, which is the external memory model,
More informationMulti-core Computing Lecture 2
Multi-core Computing Lecture 2 MADALGO Summer School 2012 Algorithms for Modern Parallel and Distributed Models Phillip B. Gibbons Intel Labs Pittsburgh August 21, 2012 Multi-core Computing Lectures: Progress-to-date
More informationCache-Oblivious String Dictionaries
Cache-Oblivious String Dictionaries Gerth Stølting Brodal University of Aarhus Joint work with Rolf Fagerberg #"! Outline of Talk Cache-oblivious model Basic cache-oblivious techniques Cache-oblivious
More informationl Heaps very popular abstract data structure, where each object has a key value (the priority), and the operations are:
DDS-Heaps 1 Heaps - basics l Heaps very popular abstract data structure, where each object has a key value (the priority), and the operations are: l insert an object, find the object of minimum key (find
More informationCache-Aware and Cache-Oblivious Adaptive Sorting
Cache-Aware and Cache-Oblivious Adaptive Sorting Gerth Stølting rodal 1,, Rolf Fagerberg 2,, and Gabriel Moruz 1 1 RICS, Department of Computer Science, University of Aarhus, IT Parken, Åbogade 34, DK-8200
More informationDatabase System Concepts
Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth
More informationUnit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION
DESIGN AND ANALYSIS OF ALGORITHMS Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION http://milanvachhani.blogspot.in EXAMPLES FROM THE SORTING WORLD Sorting provides a good set of examples for analyzing
More informationExternal Sorting. Why We Need New Algorithms
1 External Sorting All the internal sorting algorithms require that the input fit into main memory. There are, however, applications where the input is much too large to fit into memory. For those external
More informationTreaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19
CSE34T/CSE549T /05/04 Lecture 9 Treaps Binary Search Trees (BSTs) Search trees are tree-based data structures that can be used to store and search for items that satisfy a total order. There are many types
More informationChapter 13: Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationAlgorithms for dealing with massive data
Computer Science Department Federal University of Rio Grande do Sul Porto Alegre, Brazil Outline of the talk Introduction Outline of the talk Algorithms models for dealing with massive datasets : Motivation,
More informationChapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join
More informationI/O Model. Cache-Oblivious Algorithms : Algorithms in the Real World. Advantages of Cache-Oblivious Algorithms 4/9/13
I/O Model 15-853: Algorithms in the Real World Locality II: Cache-oblivious algorithms Matrix multiplication Distribution sort Static searching Abstracts a single level of the memory hierarchy Fast memory
More informationSimple and Semi-Dynamic Structures for Cache-Oblivious Planar Orthogonal Range Searching
Simple and Semi-Dynamic Structures for Cache-Oblivious Planar Orthogonal Range Searching ABSTRACT Lars Arge Department of Computer Science University of Aarhus IT-Parken, Aabogade 34 DK-8200 Aarhus N Denmark
More informationThe History of I/O Models Erik Demaine
The History of I/O Models Erik Demaine MASSACHUSETTS INSTITUTE OF TECHNOLOGY Memory Hierarchies in Practice CPU 750 ps Registers 100B Level 1 Cache 100KB Level 2 Cache 1MB 10GB 14 ns Main Memory 1EB-1ZB
More informationEffect of memory latency
CACHE AWARENESS Effect of memory latency Consider a processor operating at 1 GHz (1 ns clock) connected to a DRAM with a latency of 100 ns. Assume that the processor has two ALU units and it is capable
More information! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationChapter 13: Query Processing Basic Steps in Query Processing
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationFinal Exam in Algorithms and Data Structures 1 (1DL210)
Final Exam in Algorithms and Data Structures 1 (1DL210) Department of Information Technology Uppsala University February 0th, 2012 Lecturers: Parosh Aziz Abdulla, Jonathan Cederberg and Jari Stenman Location:
More informationLecture 7 8 March, 2012
6.851: Advanced Data Structures Spring 2012 Lecture 7 8 arch, 2012 Prof. Erik Demaine Scribe: Claudio A Andreoni 2012, Sebastien Dabdoub 2012, Usman asood 2012, Eric Liu 2010, Aditya Rathnam 2007 1 emory
More informationCOMPUTER SCIENCE 4500 OPERATING SYSTEMS
Last update: 3/28/2017 COMPUTER SCIENCE 4500 OPERATING SYSTEMS 2017 Stanley Wileman Module 9: Memory Management Part 1 In This Module 2! Memory management functions! Types of memory and typical uses! Simple
More informationChapter 12: Query Processing. Chapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join
More informationCache Oblivious Matrix Transposition: Simulation and Experiment
Cache Oblivious Matrix Transposition: Simulation and Experiment Dimitrios Tsifakis, Alistair P. Rendell * and Peter E. Strazdins Department of Computer Science Australian National University Canberra ACT0200,
More information* (4.1) A more exact setting will be specified later. The side lengthsj are determined such that
D D Chapter 4 xtensions of the CUB MTOD e present several generalizations of the CUB MTOD In section 41 we analyze the query algorithm GOINGCUB The difference to the CUB MTOD occurs in the case when the
More informationCache-Adaptive Analysis
Cache-Adaptive Analysis Michael A. Bender1 Erik Demaine4 Roozbeh Ebrahimi1 Jeremy T. Fineman3 Rob Johnson1 Andrea Lincoln4 Jayson Lynch4 Samuel McCauley1 1 3 4 Available Memory Can Fluctuate in Real Systems
More informationCS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics
CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics 1 Sorting 1.1 Problem Statement You are given a sequence of n numbers < a 1, a 2,..., a n >. You need to
More informationCache-oblivious comparison-based algorithms on multisets
Cache-oblivious comparison-based algorithms on multisets Arash Farzan 1, Paolo Ferragina 2, Gianni Franceschini 2, and J. Ian unro 1 1 {afarzan, imunro}@uwaterloo.ca School of Computer Science, University
More informationQuery Processing & Optimization
Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction
More informationUNIT III BALANCED SEARCH TREES AND INDEXING
UNIT III BALANCED SEARCH TREES AND INDEXING OBJECTIVE The implementation of hash tables is frequently called hashing. Hashing is a technique used for performing insertions, deletions and finds in constant
More informationQuery Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016
Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,
More informationScan and its Uses. 1 Scan. 1.1 Contraction CSE341T/CSE549T 09/17/2014. Lecture 8
CSE341T/CSE549T 09/17/2014 Lecture 8 Scan and its Uses 1 Scan Today, we start by learning a very useful primitive. First, lets start by thinking about what other primitives we have learned so far? The
More informationII (Sorting and) Order Statistics
II (Sorting and) Order Statistics Heapsort Quicksort Sorting in Linear Time Medians and Order Statistics 8 Sorting in Linear Time The sorting algorithms introduced thus far are comparison sorts Any comparison
More informationRemoving Belady s Anomaly from Caches with Prefetch Data
Removing Belady s Anomaly from Caches with Prefetch Data Elizabeth Varki University of New Hampshire varki@cs.unh.edu Abstract Belady s anomaly occurs when a small cache gets more hits than a larger cache,
More informationSoft Heaps And Minimum Spanning Trees
Soft And George Mason University ibanerje@gmu.edu October 27, 2016 GMU October 27, 2016 1 / 34 A (min)-heap is a data structure which stores a set of keys (with an underlying total order) on which following
More informationHashing Based Dictionaries in Different Memory Models. Zhewei Wei
Hashing Based Dictionaries in Different Memory Models Zhewei Wei Outline Introduction RAM model I/O model Cache-oblivious model Open questions Outline Introduction RAM model I/O model Cache-oblivious model
More informationDatabase Technology. Topic 7: Data Structures for Databases. Olaf Hartig.
Topic 7: Data Structures for Databases Olaf Hartig olaf.hartig@liu.se Database System 2 Storage Hierarchy Traditional Storage Hierarchy CPU Cache memory Main memory Primary storage Disk Tape Secondary
More information1 Motivation for Improving Matrix Multiplication
CS170 Spring 2007 Lecture 7 Feb 6 1 Motivation for Improving Matrix Multiplication Now we will just consider the best way to implement the usual algorithm for matrix multiplication, the one that take 2n
More informationAdvanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret
Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Greedy Algorithms (continued) The best known application where the greedy algorithm is optimal is surely
More informationCHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang
CHAPTER 6 Memory 6.1 Memory 341 6.2 Types of Memory 341 6.3 The Memory Hierarchy 343 6.3.1 Locality of Reference 346 6.4 Cache Memory 347 6.4.1 Cache Mapping Schemes 349 6.4.2 Replacement Policies 365
More informationFOUR EDGE-INDEPENDENT SPANNING TREES 1
FOUR EDGE-INDEPENDENT SPANNING TREES 1 Alexander Hoyer and Robin Thomas School of Mathematics Georgia Institute of Technology Atlanta, Georgia 30332-0160, USA ABSTRACT We prove an ear-decomposition theorem
More informationOn the Relationships between Zero Forcing Numbers and Certain Graph Coverings
On the Relationships between Zero Forcing Numbers and Certain Graph Coverings Fatemeh Alinaghipour Taklimi, Shaun Fallat 1,, Karen Meagher 2 Department of Mathematics and Statistics, University of Regina,
More informationSeminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm
Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of
More informationComputational Geometry in the Parallel External Memory Model
Computational Geometry in the Parallel External Memory Model Nodari Sitchinava Institute for Theoretical Informatics Karlsruhe Institute of Technology nodari@ira.uka.de 1 Introduction Continued advances
More informationSAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6. Sorting Algorithms
SAMPLE OF THE STUDY MATERIAL PART OF CHAPTER 6 6.0 Introduction Sorting algorithms used in computer science are often classified by: Computational complexity (worst, average and best behavior) of element
More informationCACHE-OBLIVIOUS MAPS. Edward Kmett McGraw Hill Financial. Saturday, October 26, 13
CACHE-OBLIVIOUS MAPS Edward Kmett McGraw Hill Financial CACHE-OBLIVIOUS MAPS Edward Kmett McGraw Hill Financial CACHE-OBLIVIOUS MAPS Indexing and Machine Models Cache-Oblivious Lookahead Arrays Amortization
More informationExternal Memory. Philip Bille
External Memory Philip Bille Outline Computationals models Modern computers (word) RAM I/O Cache-oblivious Shortest path in implicit grid graphs RAM algorithm I/O algorithms Cache-oblivious algorithm Computational
More informationL9: Storage Manager Physical Data Organization
L9: Storage Manager Physical Data Organization Disks and files Record and file organization Indexing Tree-based index: B+-tree Hash-based index c.f. Fig 1.3 in [RG] and Fig 2.3 in [EN] Functional Components
More informationAlgorithms and Data Structures
Algorithms and Data Structures Spring 2019 Alexis Maciel Department of Computer Science Clarkson University Copyright c 2019 Alexis Maciel ii Contents 1 Analysis of Algorithms 1 1.1 Introduction.................................
More informationarxiv: v1 [cs.ds] 1 May 2015
Strictly Implicit Priority Queues: On the Number of Moves and Worst-Case Time Gerth Stølting Brodal, Jesper Sindahl Nielsen, and Jakob Truelsen arxiv:1505.00147v1 [cs.ds] 1 May 2015 MADALGO, Department
More informationBasic Data Structures (Version 7) Name:
Prerequisite Concepts for Analysis of Algorithms Basic Data Structures (Version 7) Name: Email: Concept: mathematics notation 1. log 2 n is: Code: 21481 (A) o(log 10 n) (B) ω(log 10 n) (C) Θ(log 10 n)
More informationSorting and Selection
Sorting and Selection Introduction Divide and Conquer Merge-Sort Quick-Sort Radix-Sort Bucket-Sort 10-1 Introduction Assuming we have a sequence S storing a list of keyelement entries. The key of the element
More informationOptimal Parallel Randomized Renaming
Optimal Parallel Randomized Renaming Martin Farach S. Muthukrishnan September 11, 1995 Abstract We consider the Renaming Problem, a basic processing step in string algorithms, for which we give a simultaneously
More informationICS 691: Advanced Data Structures Spring Lecture 8
ICS 691: Advanced Data Structures Spring 2016 Prof. odari Sitchinava Lecture 8 Scribe: Ben Karsin 1 Overview In the last lecture we continued looking at arborally satisfied sets and their equivalence to
More informationWorst-case running time for RANDOMIZED-SELECT
Worst-case running time for RANDOMIZED-SELECT is ), even to nd the minimum The algorithm has a linear expected running time, though, and because it is randomized, no particular input elicits the worst-case
More informationJoint Entity Resolution
Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute
More informationCS Operating Systems
CS 4500 - Operating Systems Module 9: Memory Management - Part 1 Stanley Wileman Department of Computer Science University of Nebraska at Omaha Omaha, NE 68182-0500, USA June 9, 2017 In This Module...
More informationCS Operating Systems
CS 4500 - Operating Systems Module 9: Memory Management - Part 1 Stanley Wileman Department of Computer Science University of Nebraska at Omaha Omaha, NE 68182-0500, USA June 9, 2017 In This Module...
More informationCHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang
CHAPTER 6 Memory 6.1 Memory 233 6.2 Types of Memory 233 6.3 The Memory Hierarchy 235 6.3.1 Locality of Reference 237 6.4 Cache Memory 237 6.4.1 Cache Mapping Schemes 239 6.4.2 Replacement Policies 247
More informationCache-Oblivious Algorithms and Data Structures
Cache-Oblivious Algorithms and Data Structures Erik D. Demaine MIT Laboratory for Computer Science, 200 Technology Square, Cambridge, MA 02139, USA, edemaine@mit.edu Abstract. A recent direction in the
More informationCSE 638: Advanced Algorithms. Lectures 18 & 19 ( Cache-efficient Searching and Sorting )
CSE 638: Advanced Algorithms Lectures 18 & 19 ( Cache-efficient Searching and Sorting ) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2013 Searching ( Static B-Trees ) A Static
More informationCSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019
CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting Ruth Anderson Winter 2019 Today Sorting Comparison sorting 2/08/2019 2 Introduction to sorting Stacks, queues, priority queues, and
More information3 Competitive Dynamic BSTs (January 31 and February 2)
3 Competitive Dynamic BSTs (January 31 and February ) In their original paper on splay trees [3], Danny Sleator and Bob Tarjan conjectured that the cost of sequence of searches in a splay tree is within
More informationI/O-Algorithms Lars Arge Aarhus University
I/O-Algorithms Aarhus University April 10, 2008 I/O-Model Block I/O D Parameters N = # elements in problem instance B = # elements that fits in disk block M = # elements that fits in main memory M T =
More informationChapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction
Chapter 6 Objectives Chapter 6 Memory Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.
More informationHeap-on-Top Priority Queues. March Abstract. We introduce the heap-on-top (hot) priority queue data structure that combines the
Heap-on-Top Priority Queues Boris V. Cherkassky Central Economics and Mathematics Institute Krasikova St. 32 117418, Moscow, Russia cher@cemi.msk.su Andrew V. Goldberg NEC Research Institute 4 Independence
More information3.2 Cache Oblivious Algorithms
3.2 Cache Oblivious Algorithms Cache-Oblivious Algorithms by Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. In the 40th Annual Symposium on Foundations of Computer Science,
More informationFaster parameterized algorithms for Minimum Fill-In
Faster parameterized algorithms for Minimum Fill-In Hans L. Bodlaender Pinar Heggernes Yngve Villanger Technical Report UU-CS-2008-042 December 2008 Department of Information and Computing Sciences Utrecht
More informationApplied Algorithm Design Lecture 3
Applied Algorithm Design Lecture 3 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 3 1 / 75 PART I : GREEDY ALGORITHMS Pietro Michiardi (Eurecom) Applied Algorithm
More informationFrom Static to Dynamic Routing: Efficient Transformations of Store-and-Forward Protocols
SIAM Journal on Computing to appear From Static to Dynamic Routing: Efficient Transformations of StoreandForward Protocols Christian Scheideler Berthold Vöcking Abstract We investigate how static storeandforward
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part V Lecture 15, March 15, 2015 Mohammad Hammoud Today Last Session: DBMS Internals- Part IV Tree-based (i.e., B+ Tree) and Hash-based (i.e., Extendible
More informationFile Structures and Indexing
File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures
More informationMassive Data Algorithmics. Lecture 1: Introduction
. Massive Data Massive datasets are being collected everywhere Storage management software is billion-dollar industry . Examples Phone: AT&T 20TB phone call database, wireless tracking Consumer: WalMart
More informationApproximation Algorithms
Chapter 8 Approximation Algorithms Algorithm Theory WS 2016/17 Fabian Kuhn Approximation Algorithms Optimization appears everywhere in computer science We have seen many examples, e.g.: scheduling jobs
More information