As an additional safeguard on the total buer size required we might further

Size: px
Start display at page:

Download "As an additional safeguard on the total buer size required we might further"

Transcription

1 As an additional safeguard on the total buer size required we might further require that no superblock be larger than some certain size. Variable length superblocks would then require the reintroduction into the upper index of the R array, again with only a small overall impact on the total index size. For practical use we would not, however, advocate that the B array and Hirschberg's searching strategy be used, since simple binary search in an R array of even 1,000,000 integers would require at most perhaps 0.1 milliseconds. This is the indexing scheme that we have used in our compressed full-text retrieval scheme [7]. Although it provides fast access within modest memory requirements, the index (as described in this section) does not conform to the exact detail of Sections 2 and 3, and so is asymptotically inecient in a worst-case bit-access model of computation. Nevertheless, the coded lower index is all that is necessary to generate an index that provides fast average-case random access to a large le of variable length records. Acknowledgements The authors gratefully acknowledge the assistance of Guy Jacobson. This work was supported by the Australian Research Council. References [1] R.G. Gallager and D.C. Van Voorhis. Optimal source codes for geometrically distributed alphabets. IEEE Transactions on Information Theory, IT{21:228{230, March [2] S.W. Golomb. Run-length encodings. IEEE Transactions on Information Theory, IT{12:399{401, July [3] D.S. Hirschberg. On the complexity of searching a set of vectors. SIAM Journal on Computing, 9:126{129, February [4] G. Jacobson. Random access in Human-coded les. In J.A. Storer and M. Cohn, editors, Proc. IEEE Data Compression Conference, pages 368{377. IEEE Computer Society Press, Los Alamitos, CA, March [5] M.D. McIlroy. Development of a spelling list. IEEE Transactions on Communications, COM{30:91{99, January [6] A. Moffat. Economical inversion of large text les. Computing Systems, 5:125{139, Spring [7] A. Moffat and J. Zobel. Coding for compression in full-text retrieval systems. In J.A. Storer and M. Cohn, editors, Proc. IEEE Data Compression Conference, pages 72{81. IEEE Computer Society Press, Los Alamitos, CA, March [8] O. Petersson. Personal communication. 10

2 Stored naively, a record-level index would require bits, about 3.6 Mbyte. Access to any record could then be eected in a total of 20 milliseconds, assuming that the index can be held in memory. Decoding of the desired record would take an additional millisecond, for a total of 21 milliseconds. Since skipping over the average record by decoding until a record terminator is found requires 1 millisecond, an index that points to only every b'th record will require (10 6 =b) 30 index bits, and 20 + b milliseconds will be required per record access. Suppose further that we must provide, on average, access to any record within 30 milliseconds. Then, in this simple scheme, we should choose b = 10, and thereby require 366 Kbyte of memory for the index. Let us now consider the space required by the two level approach given the same constraints. If blocks of exactly b 0 records are formed they will have average length b bits, and the dierence elds will on average require 12+ log b 0 bits each, for a total lower index of (10 6 =b ) 0 (12+ log b 0 ) bits. Moreover, if superblocks each of s 0 records are formed then the total access time will be 20+b 0 +(s 0 =b 0 )(12+log b 0 )=1000 milliseconds. In this case we should choose b 0 = 9 and s 0 = 600; with this choice we will still be able to access any record within 30 milliseconds, provided that the lower index can be held in memory. The total size of the lower index will then be about 199 Kbyte. The upper index will require 10 6 =s entries, each containing two pointers: one to the start of the superblock in the primary le on disk; and one to the memory location of the lower index records describing the block lengths within this superblock. The upper index will thus consume at most an additional 11 Kbyte. Even stored as 32-bit integers for ease of access the upper index only requires a modest amount of memory. If main memory is scarce then the lower index records can be interleaved on disk with the blocks of the primary le that they measure the length of. In this case a single pointer suces, and just 6 Kbyte of main memory is enough to obtain an average random access time of 30 milliseconds. With s 0 = 600 and an interleaved lower index each superblock will be on average 75 Kbyte long, and a buer of this size must be allocated to allow entire superblocks of the main le to be read. If this memory is unavailable, or if the physical characteristics of the disk are such that reading large blocks is uneconomic, s 0 should be reduced to allow a smaller block and buer size. This will have only a marginal eect on the overall size of the index, since the bulk of the space is in the lower index, which is unaected by the choice of s 0. 9

3 4. process bits in the primary le starting at (P p [block] + d), continuing until the start of the (m? R[block]? s)'th record (within the block) is found; 5. return the current bit location it is the address of record m. By construction, both the upper and lower indexes require O(N 1 ) bits, and since N 1 = O((N log log N)= log N), we have thus met our stated objective for index size. We must make a further slight modication to obtain the desired bound on the running time. Step 2 must now locate one entry in a sub-table of R that contains as many as p=p 2 = O(log 2 N) items, and the linear method of Hirschberg is no longer sucient. However we apply his algorithm rst to a collection of at most O(log N) entries, selecting every log N'th item in the set R[low : : : high]. This rst search will identify which region of log N items contains the desired value; we then apply the same technique a second time on the items of this region, again spending O(log N) bit accesses. These two 3 applications of the searching strategy will require in total O(log N) bit accesses, and the overall O(log N) bound can be preserved. As was pointed out by Jacobson, O(log log N= log N) = o(1) and tends toward zero, but only extremely slowly, and the constant factors ignored in the asymptotic analysis are very important. In the next section we consider the actual savings that are possible in one typical database application. 4 Practical Application In practice the primary le and at least some of the index will be stored on disk, and we are more interested in the size of the memory resident component of the index and the number of disk accesses consumed than the asymptotic number of bit operations. In this section we consider one application involving large les of variable length records, and describe the performance obtained by a `stripped down' two level indexing scheme of the type described above. We suppose that we have a le of p = 10 6 records of total length N = 10 9 bits, i.e., 120 Mbyte. We also suppose that records in the lower index and primary le can be accessed at a rate of 1 Mbit/second, that is, 1 Kbit per millisecond; and that any disk operations require 20 milliseconds. These values are all typical for the compressed main text of a full-text retrieval system when implemented on a Sun SparcStation 2 [7]; the relatively slow access to the index and primary le is the rate at which they can be decompressed rather than a disk transfer speed. 3 In general, for a table of M items, each of K symbols, applying Hirschberg's algorithm recursively on K items at a time will require O((K log M)= log K) symbol comparisons [8]. 8

4 0 6 B R P l P p upper index: p 2 entries, O(N 1 ) bits lower index: p 1 records, O(N 1 ) bits primary le: p records, N bits Figure 2: Two level structure bits. With each pointer dierence we must also store the number of records in the block in a blocksize eld, but this too can be represented using the same code, and will require at most an additional N 1 = O((N log log N)= log N) bits. This lower index is now a le of variable length records, each storing a dierence and a blocksize eld, and random access is not possible. On top of this le of O(N 1 ) bits we build an index of the form described in Section 2, requiring O(N 1 ) additional bits. In this upper index the R and B arrays are as before, but indicate `superblocks': blocks in the lower index each record of which corresponds to a block of records in the primary le. The P array must be cloned, with one set of pointers P p pointing to the primary le, and one set of pointers P l pointing into the lower level index (Figure 2). The new access sequence is described below. It is assumed that there are p 2 records in the upper index, and that p=p 2 is the nominal size of each superblock. 1. low B[b m high p=p 2 c], B[d m p=p 2 e]; 2. determine block in the range low: : :high such that R[block] m < R[block +1]; 3. process bits in the lower index starting at P l [block], accumulating decompressed blocksize and dierence elds until the sum of the blocksize elds is as large as possible without becoming greater than (m? R[block]), d s sum of the dierence elds, sum of the blocksize elds; 7

5 code. This results in an index le of variable length records which is then indexed using the technique of the previous section. The key to reducing the space required by this lower index is the encoding used to represent the dierences between successive pointers. The method we use was rst described by Golomb [2] and Gallager and Van Voorhis [1], and has subsequently been used to good eect by McIlroy [5], Moat [6], and Moat and Zobel [7]. The code is controlled by a single parameter b. To code integer x 1 we rst code (x? 1) div b in unary, and then code d = (x? 1) mod b in binary using either blog bc bits if d < 2 dlog be? b or in dlog be bits if d 2 dlog be? b. Some example codes for small values of b are shown in Table 1. The comma is indicative only, and does not appear in the output codeword. x b = 2 b = 3 b = 5 b = ,0 0,0 0,00 0, ,1 0,10 0,01 0, ,0 0,11 0,10 0, ,1 10,0 0,110 0, ,0 10,10 0,111 0, ,1 10,11 10,00 0, ,0 110,0 10,01 0,1100 Table 1: Example codes for dierences The use of this code to store a list of p integers summing to N or less requires B w p (log N p + 2) bits [6, 7], provided that N?p blog b = 2 p c ; and so, as before, the lower level index will store pointers but now will require at most N 1 N k log N p 1 N k log N (log(k log N) + 2) 6

6 0 c < S S A a = S??? z M? 1 0 > S K? 1 Figure 1: Lexicographic searching 2. while a z and A[a; c] < S[c] do a a + 1, while a z and A[z; c] > S[c] do z z? 1, c c + 1; 3. repeat step 2 until either c = K or a > z; 4. return z it is the index in A of the string lexicographically preceding S. This algorithm is a variant [8] of a method presented by Hirschberg [3], and solves the K-dimensional searching problem in O(K + M) time. When applied to the searching step 2 of the lookup algorithm above, we have M p=p 1 = O(log N) and K = dlog Ne, with a total searching cost of O(log N) bit accesses. Since every step of the lookup process can be implemented in O(log N) bit accesses, we have thus met our rst goal: provision of an O(N) space index that allows random access using O(log N) bit accesses. 3 Adding a Second Level To further reduce the space required by the index we interpose an additional level of blocking between the primary le and the index of the previous section. As before, we break the primary le into blocks so that each record is within k log N bits of the start of its block. Then, rather than store these pointers absolutely, we code the dierences between pointers using a prex-free variable length 5

7 the block containing the m'th record has at least been roughly located. The full access sequence to determine the location of record m is then: 1. low B[b m high p=p 1 c], B[d m p=p 1 e]; 2. determine block in the range low: : :high such that R[block] m < R[block +1]; 3. process bits in the primary le starting at P [block] until the start of the (m? R[block])'th record (within the block) is found; 4. return the current bit location it is the address of record m. We will defer discussion of step 2 for the moment. Instead, let us count the space requirements of this structure. Array P contains at most N=(k log N) pointers, each requiring dlog Ne bits. This totals N=k bits, which is O(N). Array R contains the same number of record numbers, each requiring dlog pe dlog Ne bits, contributing in total not more than N=k bits. Finally, each item in array B is in the range 1 : : : p 1, with p 1 N=(k log N) < N. In total, no more than 3N=k bits are required by the three index arrays. Now let us return to the time requirements. Step 3 requires at most k log N bit accesses, by design. Steps 1 and 4 do not dominate this, leaving step 2 as the only problem. In this step we must search an ordered set of record numbers, where each record number requires at most dlog Ne bits. As noted above, high? low p=p 1 k log N, but this restriction is still not enough to allow a normal binary search, since (log log N) probes would be required, each necessitating (log N) bit accesses. Jacobson [4] solved a similar problem by noting that R[high]? R[low] k log N, and thus that only the low order dlog(k log N)e bits needed to be accessed in each probe of the binary search, resulting in a total step 2 cost of O((log log N) 2 ) = O(log N). We suggest an alternative approach, described by the following algorithm. We suppose that A[i; j] is an array of symbols, 0 i < M, 0 j < K for some M and K, and that S[j], 0 j < K is a string of K symbols we wish to search for in A to nd either an exact match z for which A[z; j] = S[j]; 0 j < K, or the entry A[z] that is the lexicographic predecessor of S (Figure 1): 1. c 0, a 0, z M? 1; 4

8 bit of the next subsequent record, and so on down the le. This rule ensures that no record starts further than k log N bits from the pointer to the block containing the record, and is sucient to guarantee that the total number of pointers in the index is less than or equal to N=(k log N). With this structure, to access record m we must identify the `last pointer' prior to m, and then access at most k log N bits within that block to locate the start of record m. Identifying the block containing record m requires some care. If we associate with each pointer P [i] the ordinal record number R[i] in the primary le that it points to then a search of the (sorted) array R will suce to locate the correct pointer into the primary le. However a straightforward binary search may require (log(n=(k log N))) = (log N) probes, each accessing a record number of (log p) bits. When p = (N) this is (log 2 N), and the search becomes too expensive. An alternative would be to make each block a xed number of records so that direct access to the index (rather than requiring a search) would be possible. But in this case a long record in the primary le might mean that other records in the same block lie more than k log N bits away from their indexing pointer. Jacobson [4] described a third possibility in which each block contains an exact number of bits. This makes the pointers P unnecessary, but extra synchronising information is required for each block to allow the starting point of the rst record to be found, and the equivalent of the R array must still be retained. Moreover, in this approach extra information is required to handle long records that span more than one block. Here we choose to allow blocks to contain both a variable number of records and a variable number of bits. To speed the search in the R array we add another value to each pointer. As before, let the i'th pointer be P [i] and the corresponding record number be R[i]. Suppose that the index contains p 1 pointers, p 1 N=(k log n), and, without loss of generality, that p 1 divides evenly into p. The `block number' eld B[i] stores the number of the block that contains the i (p=p 1 )'th record of the le. For example, B[0] = 0, to indicate that the rst record (i.e., record number zero) of the le is in block zero; B[1] stores the number of the block that contains record p=p 1, and so on. Note that, since every block contains at least one record, B[i + 1] B[i] + (p=p 1 ) for all i. This bound will be used below. Suppose now that we must locate the m'th record. Rather than search the whole of the R array, the search can be constrained to the range m R[i] : B p=p 1 m i B : p=p 1 That is, if (p=p 1 ) divides m evenly the correct block number has been found. If not, 3

9 a few tens or hundreds of bits long an index that contains the address of every record will constitute a signicant overhead. Moreover, even when the average record size is (log N) bits as will be the case, for example, when each record contains a unique key the O(N) space required by a simple index might still be an unacceptable overhead. Here we consider implementations of the index that allow the O(log N) bit-access cost to be retained, but require less space. Our results closely parallel those given by Jacobson [4], but the structure we describe is simpler to implement. The next section describes a one level index that allows random access in O(log N) bit accesses and requires O(N) bits of overhead space. Section 3 then shows how the addition of a second compressed index level allows the space to be reduced to O((N log log N)= log N) = o(n). Finally, Section 4 gives the results of applying the technique to a practical situation. 2 One Level Indexing Suppose we are required to determine the bit address at which the m'th record of a le begins. The cost of accessing this record has two components: rst, we spend some time consulting the index, and second, we spend some time consulting the primary le. The only constraint is that the total number of bit accesses be O(log N), and so O(log N) time can be spent in each of the index and the primary le. The ineciency of the simple approach is that only O(1) time is spent accessing the primary le, forcing the index to contain too many pointers. In fact there is no need for the index to list every record address, and the records can be grouped into blocks, provided only that no record starts further than k log N bits from the pointer that marks the start of the block, for some constant k. Note that we assume that each record stores its own length, either implicitly or explicitly. For example, in the Human coding case discussed by Jacobson [4], the prex-free nature of a Human code means that the decoder can always know when the end of a symbol (record) has been reached. More generally, variable length records must either contain an explicit eld storing the total length of the record or must contain a unique `end-of-record' symbol, since without either even sequential processing would not be possible. Suppose then that we store pointers to records roughly k log N bits 2 apart, for some constant k. The rst pointer always points at the rst bit of the rst record in the le. To establish the next pointer, we skip over k log N bits and index the rst 2 All logarithms are binary. 2

10 Supporting Random Access in Files of Variable Length Records Alistair Moat Department of Computer Science, The University of Melbourne, Parkville 3052, Australia. Justin Zobel Department of Computer Science, Royal Melbourne Institute of Technology, GPO Box 2476V, Melbourne 3001, Australia. Abstract: We consider the problem of providing a random access index to a le of variable length records. For a le of N bits and (N) records the index we describe requires O((N log log N)= log N) = o(n) bits, and access to any record is possible after O(log N) bit accesses in the index and the le itself. This compares favourably with the (N log N) space that would be required by a conventional index with the same access bound. Jacobson has also presented an O((N log log N)= log N) space method of indexing; our method is simpler and leads to an implementation that is suitable for practical applications. Keywords: data structures, le structures, analysis of algorithms. 1 Introduction We suppose that a le of variable length records contains a total of N bits and p records; that the records are numbered sequentially from zero to p?1; and that it is necessary to be able to eciently access any record of the le given only an ordinal record number. The problem we consider is this: how should an index to the le be constructed so that random access is fast, but the index is small? One simple indexing scheme would be to maintain the address of each record in an index array of xed size pointers, allowing access to any record of the primary le using just one pointer and thus O(log N) bit accesses 1. However in this case the index consumes (p log N) bits, and when the average record size is (1) and p is (N), the index will asymptotically dominate the size of the original le. Of course, for practical purposes, a 32-bit pointer will index les of up to 4 Gbit, and, if that is not enough, 64-bit pointers could be used. Nevertheless, if the average record is just 1 We assume a model of computation where the unit cost operation is the reading of a single bit from either the index or the primary le. 1

number of passes. Each pass of the data improves the overall compression achieved and, in almost all cases, compression achieved is better than the ad

number of passes. Each pass of the data improves the overall compression achieved and, in almost all cases, compression achieved is better than the ad A General-Purpose Compression Scheme for Databases Adam Cannane Hugh E. Williams Justin Zobel Department of Computer Science, RMIT University, GPO Box 2476V, Melbourne 3001, Australia fcannane,hugh,jzg@cs.rmit.edu.au

More information

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Zhou B. B., Brent R. P. and Tridgell A. y Computer Sciences Laboratory The Australian National University Canberra,

More information

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines B. B. Zhou, R. P. Brent and A. Tridgell Computer Sciences Laboratory The Australian National University Canberra,

More information

In-Place Calculation of Minimum-Redundancy Codes

In-Place Calculation of Minimum-Redundancy Codes In-Place Calculation of Minimum-Redundancy Codes Alistair Moffat 1 Jyrki Katajainen 2 Department of Computer Science, The University of Melbourne, Parkville 3052, Australia alistair@cs.mu.oz.an 2 Department

More information

Chapter 5 Lempel-Ziv Codes To set the stage for Lempel-Ziv codes, suppose we wish to nd the best block code for compressing a datavector X. Then we ha

Chapter 5 Lempel-Ziv Codes To set the stage for Lempel-Ziv codes, suppose we wish to nd the best block code for compressing a datavector X. Then we ha Chapter 5 Lempel-Ziv Codes To set the stage for Lempel-Ziv codes, suppose we wish to nd the best block code for compressing a datavector X. Then we have to take into account the complexity of the code.

More information

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s]

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s] Fast, single-pass K-means algorithms Fredrik Farnstrom Computer Science and Engineering Lund Institute of Technology, Sweden arnstrom@ucsd.edu James Lewis Computer Science and Engineering University of

More information

Heap-on-Top Priority Queues. March Abstract. We introduce the heap-on-top (hot) priority queue data structure that combines the

Heap-on-Top Priority Queues. March Abstract. We introduce the heap-on-top (hot) priority queue data structure that combines the Heap-on-Top Priority Queues Boris V. Cherkassky Central Economics and Mathematics Institute Krasikova St. 32 117418, Moscow, Russia cher@cemi.msk.su Andrew V. Goldberg NEC Research Institute 4 Independence

More information

16 Greedy Algorithms

16 Greedy Algorithms 16 Greedy Algorithms Optimization algorithms typically go through a sequence of steps, with a set of choices at each For many optimization problems, using dynamic programming to determine the best choices

More information

[13] W. Litwin. Linear hashing: a new tool for le and table addressing. In. Proceedings of the 6th International Conference on Very Large Databases,

[13] W. Litwin. Linear hashing: a new tool for le and table addressing. In. Proceedings of the 6th International Conference on Very Large Databases, [12] P. Larson. Linear hashing with partial expansions. In Proceedings of the 6th International Conference on Very Large Databases, pages 224{232, 1980. [13] W. Litwin. Linear hashing: a new tool for le

More information

V Advanced Data Structures

V Advanced Data Structures V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,

More information

An On-line Variable Length Binary. Institute for Systems Research and. Institute for Advanced Computer Studies. University of Maryland

An On-line Variable Length Binary. Institute for Systems Research and. Institute for Advanced Computer Studies. University of Maryland An On-line Variable Length inary Encoding Tinku Acharya Joseph F. Ja Ja Institute for Systems Research and Institute for Advanced Computer Studies University of Maryland College Park, MD 242 facharya,

More information

V Advanced Data Structures

V Advanced Data Structures V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,

More information

Sparse Hypercube 3-Spanners

Sparse Hypercube 3-Spanners Sparse Hypercube 3-Spanners W. Duckworth and M. Zito Department of Mathematics and Statistics, University of Melbourne, Parkville, Victoria 3052, Australia Department of Computer Science, University of

More information

Optimal Parallel Randomized Renaming

Optimal Parallel Randomized Renaming Optimal Parallel Randomized Renaming Martin Farach S. Muthukrishnan September 11, 1995 Abstract We consider the Renaming Problem, a basic processing step in string algorithms, for which we give a simultaneously

More information

Implementations of Dijkstra's Algorithm. Based on Multi-Level Buckets. November Abstract

Implementations of Dijkstra's Algorithm. Based on Multi-Level Buckets. November Abstract Implementations of Dijkstra's Algorithm Based on Multi-Level Buckets Andrew V. Goldberg NEC Research Institute 4 Independence Way Princeton, NJ 08540 avg@research.nj.nec.com Craig Silverstein Computer

More information

CSCI 104 Log Structured Merge Trees. Mark Redekopp

CSCI 104 Log Structured Merge Trees. Mark Redekopp 1 CSCI 10 Log Structured Merge Trees Mark Redekopp Series Summation Review Let n = 1 + + + + k = σk i=0 n = k+1-1 i. What is n? What is log (1) + log () + log () + log (8)++ log ( k ) = 0 + 1 + + 3+ +

More information

Fall 2007, Final Exam, Data Structures and Algorithms

Fall 2007, Final Exam, Data Structures and Algorithms Fall 2007, Final Exam, Data Structures and Algorithms Name: Section: Email id: 12th December, 2007 This is an open book, one crib sheet (2 sides), closed notebook exam. Answer all twelve questions. Each

More information

Linear Block Codes. Allen B. MacKenzie Notes for February 4, 9, & 11, Some Definitions

Linear Block Codes. Allen B. MacKenzie Notes for February 4, 9, & 11, Some Definitions Linear Block Codes Allen B. MacKenzie Notes for February 4, 9, & 11, 2015 This handout covers our in-class study of Chapter 3 of your textbook. We ll introduce some notation and then discuss the generator

More information

TR-CS The rsync algorithm. Andrew Tridgell and Paul Mackerras. June 1996

TR-CS The rsync algorithm. Andrew Tridgell and Paul Mackerras. June 1996 TR-CS-96-05 The rsync algorithm Andrew Tridgell and Paul Mackerras June 1996 Joint Computer Science Technical Report Series Department of Computer Science Faculty of Engineering and Information Technology

More information

Cluster based Mixed Coding Schemes for Inverted File Index Compression

Cluster based Mixed Coding Schemes for Inverted File Index Compression Cluster based Mixed Coding Schemes for Inverted File Index Compression Jinlin Chen 1, Ping Zhong 2, Terry Cook 3 1 Computer Science Department Queen College, City University of New York USA jchen@cs.qc.edu

More information

Computer Science 210 Data Structures Siena College Fall Topic Notes: Complexity and Asymptotic Analysis

Computer Science 210 Data Structures Siena College Fall Topic Notes: Complexity and Asymptotic Analysis Computer Science 210 Data Structures Siena College Fall 2017 Topic Notes: Complexity and Asymptotic Analysis Consider the abstract data type, the Vector or ArrayList. This structure affords us the opportunity

More information

residual residual program final result

residual residual program final result C-Mix: Making Easily Maintainable C-Programs run FAST The C-Mix Group, DIKU, University of Copenhagen Abstract C-Mix is a tool based on state-of-the-art technology that solves the dilemma of whether to

More information

Worst-case running time for RANDOMIZED-SELECT

Worst-case running time for RANDOMIZED-SELECT Worst-case running time for RANDOMIZED-SELECT is ), even to nd the minimum The algorithm has a linear expected running time, though, and because it is randomized, no particular input elicits the worst-case

More information

DESIGN AND ANALYSIS OF ALGORITHMS. Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES

DESIGN AND ANALYSIS OF ALGORITHMS. Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES DESIGN AND ANALYSIS OF ALGORITHMS Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES http://milanvachhani.blogspot.in USE OF LOOPS As we break down algorithm into sub-algorithms, sooner or later we shall

More information

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is: CS 124 Section #8 Hashing, Skip Lists 3/20/17 1 Probability Review Expectation (weighted average): the expectation of a random quantity X is: x= x P (X = x) For each value x that X can take on, we look

More information

Prelim CS410, Summer July 1998 Please note: This exam is closed book, closed note. Sit every-other seat. Put your answers in this exam. The or

Prelim CS410, Summer July 1998 Please note: This exam is closed book, closed note. Sit every-other seat. Put your answers in this exam. The or Prelim CS410, Summer 1998 17 July 1998 Please note: This exam is closed book, closed note. Sit every-other seat. Put your answers in this exam. The order of the questions roughly follows the course presentation

More information

Using the Holey Brick Tree for Spatial Data. in General Purpose DBMSs. Northeastern University

Using the Holey Brick Tree for Spatial Data. in General Purpose DBMSs. Northeastern University Using the Holey Brick Tree for Spatial Data in General Purpose DBMSs Georgios Evangelidis Betty Salzberg College of Computer Science Northeastern University Boston, MA 02115-5096 1 Introduction There is

More information

Richard E. Korf. June 27, Abstract. divide them into two subsets, so that the sum of the numbers in

Richard E. Korf. June 27, Abstract. divide them into two subsets, so that the sum of the numbers in A Complete Anytime Algorithm for Number Partitioning Richard E. Korf Computer Science Department University of California, Los Angeles Los Angeles, Ca. 90095 korf@cs.ucla.edu June 27, 1997 Abstract Given

More information

9/24/ Hash functions

9/24/ Hash functions 11.3 Hash functions A good hash function satis es (approximately) the assumption of SUH: each key is equally likely to hash to any of the slots, independently of the other keys We typically have no way

More information

CS 245 Midterm Exam Solution Winter 2015

CS 245 Midterm Exam Solution Winter 2015 CS 245 Midterm Exam Solution Winter 2015 This exam is open book and notes. You can use a calculator and your laptop to access course notes and videos (but not to communicate with other people). You have

More information

An Asymmetric, Semi-adaptive Text Compression Algorithm

An Asymmetric, Semi-adaptive Text Compression Algorithm An Asymmetric, Semi-adaptive Text Compression Algorithm Harry Plantinga Department of Computer Science University of Pittsburgh Pittsburgh, PA 15260 planting@cs.pitt.edu Abstract A new heuristic for text

More information

SIGNAL COMPRESSION Lecture Lempel-Ziv Coding

SIGNAL COMPRESSION Lecture Lempel-Ziv Coding SIGNAL COMPRESSION Lecture 5 11.9.2007 Lempel-Ziv Coding Dictionary methods Ziv-Lempel 77 The gzip variant of Ziv-Lempel 77 Ziv-Lempel 78 The LZW variant of Ziv-Lempel 78 Asymptotic optimality of Ziv-Lempel

More information

Information Retrieval

Information Retrieval Information Retrieval Suan Lee - Information Retrieval - 05 Index Compression 1 05 Index Compression - Information Retrieval - 05 Index Compression 2 Last lecture index construction Sort-based indexing

More information

THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS

THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS THE RELATIVE EFFICIENCY OF DATA COMPRESSION BY LZW AND LZSS Yair Wiseman 1* * 1 Computer Science Department, Bar-Ilan University, Ramat-Gan 52900, Israel Email: wiseman@cs.huji.ac.il, http://www.cs.biu.ac.il/~wiseman

More information

Multiway Blockwise In-place Merging

Multiway Blockwise In-place Merging Multiway Blockwise In-place Merging Viliam Geffert and Jozef Gajdoš Institute of Computer Science, P.J.Šafárik University, Faculty of Science Jesenná 5, 041 54 Košice, Slovak Republic viliam.geffert@upjs.sk,

More information

III Data Structures. Dynamic sets

III Data Structures. Dynamic sets III Data Structures Elementary Data Structures Hash Tables Binary Search Trees Red-Black Trees Dynamic sets Sets are fundamental to computer science Algorithms may require several different types of operations

More information

8 Integer encoding. scritto da: Tiziano De Matteis

8 Integer encoding. scritto da: Tiziano De Matteis 8 Integer encoding scritto da: Tiziano De Matteis 8.1 Unary code... 8-2 8.2 Elias codes: γ andδ... 8-2 8.3 Rice code... 8-3 8.4 Interpolative coding... 8-4 8.5 Variable-byte codes and (s,c)-dense codes...

More information

Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES

Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES DESIGN AND ANALYSIS OF ALGORITHMS Unit 1 Chapter 4 ITERATIVE ALGORITHM DESIGN ISSUES http://milanvachhani.blogspot.in USE OF LOOPS As we break down algorithm into sub-algorithms, sooner or later we shall

More information

Introduction to the Analysis of Algorithms. Algorithm

Introduction to the Analysis of Algorithms. Algorithm Introduction to the Analysis of Algorithms Based on the notes from David Fernandez-Baca Bryn Mawr College CS206 Intro to Data Structures Algorithm An algorithm is a strategy (well-defined computational

More information

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code Entropy Coding } different probabilities for the appearing of single symbols are used - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic

More information

Compression of Inverted Indexes For Fast Query Evaluation

Compression of Inverted Indexes For Fast Query Evaluation Compression of Inverted Indexes For Fast Query Evaluation Falk Scholer Hugh E. Williams John Yiannis Justin Zobel School of Computer Science and Information Technology RMIT University, GPO Box 2476V Melbourne,

More information

Melbourne University at the 2006 Terabyte Track

Melbourne University at the 2006 Terabyte Track Melbourne University at the 2006 Terabyte Track Vo Ngoc Anh William Webber Alistair Moffat Department of Computer Science and Software Engineering The University of Melbourne Victoria 3010, Australia Abstract:

More information

An Order-2 Context Model for Data Compression. With Reduced Time and Space Requirements. Technical Report No

An Order-2 Context Model for Data Compression. With Reduced Time and Space Requirements. Technical Report No An Order-2 Context Model for Data Compression With Reduced Time and Space Requirements Debra A. Lelewer and Daniel S. Hirschberg Technical Report No. 90-33 Abstract Context modeling has emerged as the

More information

Data Compression Techniques

Data Compression Techniques Data Compression Techniques Part 1: Entropy Coding Lecture 1: Introduction and Huffman Coding Juha Kärkkäinen 31.10.2017 1 / 21 Introduction Data compression deals with encoding information in as few bits

More information

Understand how to deal with collisions

Understand how to deal with collisions Understand the basic structure of a hash table and its associated hash function Understand what makes a good (and a bad) hash function Understand how to deal with collisions Open addressing Separate chaining

More information

Compressing Integers for Fast File Access

Compressing Integers for Fast File Access Compressing Integers for Fast File Access Hugh E. Williams Justin Zobel Benjamin Tripp COSI 175a: Data Compression October 23, 2006 Introduction Many data processing applications depend on access to integer

More information

DATA STRUCTURES/UNIT 3

DATA STRUCTURES/UNIT 3 UNIT III SORTING AND SEARCHING 9 General Background Exchange sorts Selection and Tree Sorting Insertion Sorts Merge and Radix Sorts Basic Search Techniques Tree Searching General Search Trees- Hashing.

More information

So the actual cost is 2 Handout 3: Problem Set 1 Solutions the mark counter reaches c, a cascading cut is performed and the mark counter is reset to 0

So the actual cost is 2 Handout 3: Problem Set 1 Solutions the mark counter reaches c, a cascading cut is performed and the mark counter is reset to 0 Massachusetts Institute of Technology Handout 3 6854/18415: Advanced Algorithms September 14, 1999 David Karger Problem Set 1 Solutions Problem 1 Suppose that we have a chain of n 1 nodes in a Fibonacci

More information

Index Compression. David Kauchak cs160 Fall 2009 adapted from:

Index Compression. David Kauchak cs160 Fall 2009 adapted from: Index Compression David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture5-indexcompression.ppt Administrative Homework 2 Assignment 1 Assignment 2 Pair programming?

More information

Lecture 9 March 4, 2010

Lecture 9 March 4, 2010 6.851: Advanced Data Structures Spring 010 Dr. André Schulz Lecture 9 March 4, 010 1 Overview Last lecture we defined the Least Common Ancestor (LCA) and Range Min Query (RMQ) problems. Recall that an

More information

1 Introduction to generation and random generation

1 Introduction to generation and random generation Contents 1 Introduction to generation and random generation 1 1.1 Features we might want in an exhaustive generation algorithm............... 1 1.2 What about random generation?.................................

More information

CS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims. Lecture 10: Asymptotic Complexity and

CS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims. Lecture 10: Asymptotic Complexity and CS/ENGRD 2110 Object-Oriented Programming and Data Structures Spring 2012 Thorsten Joachims Lecture 10: Asymptotic Complexity and What Makes a Good Algorithm? Suppose you have two possible algorithms or

More information

CMSC 754 Computational Geometry 1

CMSC 754 Computational Geometry 1 CMSC 754 Computational Geometry 1 David M. Mount Department of Computer Science University of Maryland Fall 2005 1 Copyright, David M. Mount, 2005, Dept. of Computer Science, University of Maryland, College

More information

Context Modeling for Text Compression. Daniel S. Hirschbergy and Debra A. Lelewerz. Abstract

Context Modeling for Text Compression. Daniel S. Hirschbergy and Debra A. Lelewerz. Abstract Context Modeling for Text Compression Daniel S. Hirschbergy and Debra A. Lelewerz Abstract Adaptive context modeling has emerged as one of the most promising new approaches to compressing text. A nite-context

More information

NOI 2012 TASKS OVERVIEW

NOI 2012 TASKS OVERVIEW NOI 2012 TASKS OVERVIEW Tasks Task 1: MODSUM Task 2: PANCAKE Task 3: FORENSIC Task 4: WALKING Notes: 1. Each task is worth 25 marks. 2. Each task will be tested on a few sets of input instances. Each set

More information

CMa simple C Abstract Machine

CMa simple C Abstract Machine CMa simple C Abstract Machine CMa architecture An abstract machine has set of instructions which can be executed in an abstract hardware. The abstract hardware may be seen as a collection of certain data

More information

Let the dynamic table support the operations TABLE-INSERT and TABLE-DELETE It is convenient to use the load factor ( )

Let the dynamic table support the operations TABLE-INSERT and TABLE-DELETE It is convenient to use the load factor ( ) 17.4 Dynamic tables Let us now study the problem of dynamically expanding and contracting a table We show that the amortized cost of insertion/ deletion is only (1) Though the actual cost of an operation

More information

The Compositional C++ Language. Denition. Abstract. This document gives a concise denition of the syntax and semantics

The Compositional C++ Language. Denition. Abstract. This document gives a concise denition of the syntax and semantics The Compositional C++ Language Denition Peter Carlin Mani Chandy Carl Kesselman March 12, 1993 Revision 0.95 3/12/93, Comments welcome. Abstract This document gives a concise denition of the syntax and

More information

Solutions to Exam Data structures (X and NV)

Solutions to Exam Data structures (X and NV) Solutions to Exam Data structures X and NV 2005102. 1. a Insert the keys 9, 6, 2,, 97, 1 into a binary search tree BST. Draw the final tree. See Figure 1. b Add NIL nodes to the tree of 1a and color it

More information

Horn Formulae. CS124 Course Notes 8 Spring 2018

Horn Formulae. CS124 Course Notes 8 Spring 2018 CS124 Course Notes 8 Spring 2018 In today s lecture we will be looking a bit more closely at the Greedy approach to designing algorithms. As we will see, sometimes it works, and sometimes even when it

More information

Writing Parallel Programs; Cost Model.

Writing Parallel Programs; Cost Model. CSE341T 08/30/2017 Lecture 2 Writing Parallel Programs; Cost Model. Due to physical and economical constraints, a typical machine we can buy now has 4 to 8 computing cores, and soon this number will be

More information

An NC Algorithm for Sorting Real Numbers

An NC Algorithm for Sorting Real Numbers EPiC Series in Computing Volume 58, 2019, Pages 93 98 Proceedings of 34th International Conference on Computers and Their Applications An NC Algorithm for Sorting Real Numbers in O( nlogn loglogn ) Operations

More information

18.3 Deleting a key from a B-tree

18.3 Deleting a key from a B-tree 18.3 Deleting a key from a B-tree B-TREE-DELETE deletes the key from the subtree rooted at We design it to guarantee that whenever it calls itself recursively on a node, the number of keys in is at least

More information

2 The Service Provision Problem The formulation given here can also be found in Tomasgard et al. [6]. That paper also details the background of the mo

2 The Service Provision Problem The formulation given here can also be found in Tomasgard et al. [6]. That paper also details the background of the mo Two-Stage Service Provision by Branch and Bound Shane Dye Department ofmanagement University of Canterbury Christchurch, New Zealand s.dye@mang.canterbury.ac.nz Asgeir Tomasgard SINTEF, Trondheim, Norway

More information

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See  for conditions on re-use Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files Static

More information

Sorting. Introduction. Classification

Sorting. Introduction. Classification Sorting Introduction In many applications it is necessary to order give objects as per an attribute. For example, arranging a list of student information in increasing order of their roll numbers or arranging

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

CSE 417 Branch & Bound (pt 4) Branch & Bound

CSE 417 Branch & Bound (pt 4) Branch & Bound CSE 417 Branch & Bound (pt 4) Branch & Bound Reminders > HW8 due today > HW9 will be posted tomorrow start early program will be slow, so debugging will be slow... Review of previous lectures > Complexity

More information

Eect of fan-out on the Performance of a. Single-message cancellation scheme. Atul Prakash (Contact Author) Gwo-baw Wu. Seema Jetli

Eect of fan-out on the Performance of a. Single-message cancellation scheme. Atul Prakash (Contact Author) Gwo-baw Wu. Seema Jetli Eect of fan-out on the Performance of a Single-message cancellation scheme Atul Prakash (Contact Author) Gwo-baw Wu Seema Jetli Department of Electrical Engineering and Computer Science University of Michigan,

More information

A Simplied NP-complete MAXSAT Problem. Abstract. It is shown that the MAX2SAT problem is NP-complete even if every variable

A Simplied NP-complete MAXSAT Problem. Abstract. It is shown that the MAX2SAT problem is NP-complete even if every variable A Simplied NP-complete MAXSAT Problem Venkatesh Raman 1, B. Ravikumar 2 and S. Srinivasa Rao 1 1 The Institute of Mathematical Sciences, C. I. T. Campus, Chennai 600 113. India 2 Department of Computer

More information

MAXIMAL PLANAR SUBGRAPHS OF FIXED GIRTH IN RANDOM GRAPHS

MAXIMAL PLANAR SUBGRAPHS OF FIXED GIRTH IN RANDOM GRAPHS MAXIMAL PLANAR SUBGRAPHS OF FIXED GIRTH IN RANDOM GRAPHS MANUEL FERNÁNDEZ, NICHOLAS SIEGER, AND MICHAEL TAIT Abstract. In 99, Bollobás and Frieze showed that the threshold for G n,p to contain a spanning

More information

Code generation scheme for RCMA

Code generation scheme for RCMA Code generation scheme for RCMA Axel Simon July 5th, 2010 1 Revised Specification of the R-CMa We detail what constitutes the Register C-Machine (R-CMa ) and its operations in turn We then detail how the

More information

( ( ( ( ) ( ( ) ( ( ( ) ( ( ) ) ) ) ( ) ( ( ) ) ) ( ) ( ) ( ) ( ( ) ) ) ) ( ) ( ( ( ( ) ) ) ( ( ) ) ) )

( ( ( ( ) ( ( ) ( ( ( ) ( ( ) ) ) ) ( ) ( ( ) ) ) ( ) ( ) ( ) ( ( ) ) ) ) ( ) ( ( ( ( ) ) ) ( ( ) ) ) ) Representing Trees of Higher Degree David Benoit 1;2, Erik D. Demaine 2, J. Ian Munro 2, and Venkatesh Raman 3 1 InfoInteractive Inc., Suite 604, 1550 Bedford Hwy., Bedford, N.S. B4A 1E6, Canada 2 Dept.

More information

Searchable Compressed Representations of Very Sparse Bitmaps (extended abstract)

Searchable Compressed Representations of Very Sparse Bitmaps (extended abstract) Searchable Compressed Representations of Very Sparse Bitmaps (extended abstract) Steven Pigeon 1 pigeon@iro.umontreal.ca McMaster University Hamilton, Ontario Xiaolin Wu 2 xwu@poly.edu Polytechnic University

More information

A Simple Lossless Compression Heuristic for Grey Scale Images

A Simple Lossless Compression Heuristic for Grey Scale Images L. Cinque 1, S. De Agostino 1, F. Liberati 1 and B. Westgeest 2 1 Computer Science Department University La Sapienza Via Salaria 113, 00198 Rome, Italy e-mail: deagostino@di.uniroma1.it 2 Computer Science

More information

Indexing. UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze

Indexing. UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze Indexing UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze All slides Addison Wesley, 2008 Table of Content Inverted index with positional information

More information

University of Illinois at Urbana-Champaign Department of Computer Science. Final Examination

University of Illinois at Urbana-Champaign Department of Computer Science. Final Examination University of Illinois at Urbana-Champaign Department of Computer Science Final Examination CS 225 Data Structures and Software Principles Spring 2010 7-10p, Wednesday, May 12 Name: NetID: Lab Section

More information

A Secondary storage Algorithms and Data Structures Supplementary Questions and Exercises

A Secondary storage Algorithms and Data Structures Supplementary Questions and Exercises 308-420A Secondary storage Algorithms and Data Structures Supplementary Questions and Exercises Section 1.2 4, Logarithmic Files Logarithmic Files 1. A B-tree of height 6 contains 170,000 nodes with an

More information

Annex A (Informative) Collected syntax The nonterminal symbols pointer-type, program, signed-number, simple-type, special-symbol, and structured-type

Annex A (Informative) Collected syntax The nonterminal symbols pointer-type, program, signed-number, simple-type, special-symbol, and structured-type Pascal ISO 7185:1990 This online copy of the unextended Pascal standard is provided only as an aid to standardization. In the case of dierences between this online version and the printed version, the

More information

CSC630/CSC730 Parallel & Distributed Computing

CSC630/CSC730 Parallel & Distributed Computing CSC630/CSC730 Parallel & Distributed Computing Analytical Modeling of Parallel Programs Chapter 5 1 Contents Sources of Parallel Overhead Performance Metrics Granularity and Data Mapping Scalability 2

More information

Quiz 1 Solutions. (a) f(n) = n g(n) = log n Circle all that apply: f = O(g) f = Θ(g) f = Ω(g)

Quiz 1 Solutions. (a) f(n) = n g(n) = log n Circle all that apply: f = O(g) f = Θ(g) f = Ω(g) Introduction to Algorithms March 11, 2009 Massachusetts Institute of Technology 6.006 Spring 2009 Professors Sivan Toledo and Alan Edelman Quiz 1 Solutions Problem 1. Quiz 1 Solutions Asymptotic orders

More information

Course: Operating Systems Instructor: M Umair. M Umair

Course: Operating Systems Instructor: M Umair. M Umair Course: Operating Systems Instructor: M Umair Process The Process A process is a program in execution. A program is a passive entity, such as a file containing a list of instructions stored on disk (often

More information

Extensions to RTP to support Mobile Networking: Brown, Singh 2 within the cell. In our proposed architecture [3], we add a third level to this hierarc

Extensions to RTP to support Mobile Networking: Brown, Singh 2 within the cell. In our proposed architecture [3], we add a third level to this hierarc Extensions to RTP to support Mobile Networking Kevin Brown Suresh Singh Department of Computer Science Department of Computer Science University of South Carolina Department of South Carolina Columbia,

More information

COMP171. Hashing.

COMP171. Hashing. COMP171 Hashing Hashing 2 Hashing Again, a (dynamic) set of elements in which we do search, insert, and delete Linear ones: lists, stacks, queues, Nonlinear ones: trees, graphs (relations between elements

More information

Algorithmic "imperative" language

Algorithmic imperative language Algorithmic "imperative" language Undergraduate years Epita November 2014 The aim of this document is to introduce breiy the "imperative algorithmic" language used in the courses and tutorials during the

More information

Chapter 2: Complexity Analysis

Chapter 2: Complexity Analysis Chapter 2: Complexity Analysis Objectives Looking ahead in this chapter, we ll consider: Computational and Asymptotic Complexity Big-O Notation Properties of the Big-O Notation Ω and Θ Notations Possible

More information

CS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77

CS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77 CS 493: Algorithms for Massive Data Sets February 14, 2002 Dictionary-based compression Scribe: Tony Wirth This lecture will explore two adaptive dictionary compression schemes: LZ77 and LZ78. We use the

More information

Extra-High Speed Matrix Multiplication on the Cray-2. David H. Bailey. September 2, 1987

Extra-High Speed Matrix Multiplication on the Cray-2. David H. Bailey. September 2, 1987 Extra-High Speed Matrix Multiplication on the Cray-2 David H. Bailey September 2, 1987 Ref: SIAM J. on Scientic and Statistical Computing, vol. 9, no. 3, (May 1988), pg. 603{607 Abstract The Cray-2 is

More information

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION DESIGN AND ANALYSIS OF ALGORITHMS Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION http://milanvachhani.blogspot.in EXAMPLES FROM THE SORTING WORLD Sorting provides a good set of examples for analyzing

More information

CS Fast Progressive Lossless Image Compression. Paul G. Howard, Jeffrey S. Vitter. Department of Computer Science.

CS Fast Progressive Lossless Image Compression. Paul G. Howard, Jeffrey S. Vitter. Department of Computer Science. CS--1994--14 Fast Progressive Lossless Image Compression Paul G Howard, Jeffrey S Vitter Department of Computer Science Duke University Durham, North Carolina 27708-0129 March 25, 1994 Fast Progressive

More information

The Global Standard for Mobility (GSM) (see, e.g., [6], [4], [5]) yields a

The Global Standard for Mobility (GSM) (see, e.g., [6], [4], [5]) yields a Preprint 0 (2000)?{? 1 Approximation of a direction of N d in bounded coordinates Jean-Christophe Novelli a Gilles Schaeer b Florent Hivert a a Universite Paris 7 { LIAFA 2, place Jussieu - 75251 Paris

More information

Table-Lookup Approach for Compiling Two-Level Data-Processor Mappings in HPF Kuei-Ping Shih y, Jang-Ping Sheu y, and Chua-Huang Huang z y Department o

Table-Lookup Approach for Compiling Two-Level Data-Processor Mappings in HPF Kuei-Ping Shih y, Jang-Ping Sheu y, and Chua-Huang Huang z y Department o Table-Lookup Approach for Compiling Two-Level Data-Processor Mappings in HPF Kuei-Ping Shih y, Jang-Ping Sheu y, and Chua-Huang Huang z y Department of Computer Science and Information Engineering National

More information

David Rappaport School of Computing Queen s University CANADA. Copyright, 1996 Dale Carnegie & Associates, Inc.

David Rappaport School of Computing Queen s University CANADA. Copyright, 1996 Dale Carnegie & Associates, Inc. David Rappaport School of Computing Queen s University CANADA Copyright, 1996 Dale Carnegie & Associates, Inc. Data Compression There are two broad categories of data compression: Lossless Compression

More information

Lecture 3 February 23, 2012

Lecture 3 February 23, 2012 6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 3 February 23, 2012 1 Overview In the last lecture we saw the concepts of persistence and retroactivity as well as several data structures

More information

Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented wh

Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented wh Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented which, for a large-dimensional exponential family G,

More information

where is a constant, 0 < <. In other words, the ratio between the shortest and longest paths from a node to a leaf is at least. An BB-tree allows ecie

where is a constant, 0 < <. In other words, the ratio between the shortest and longest paths from a node to a leaf is at least. An BB-tree allows ecie Maintaining -balanced Trees by Partial Rebuilding Arne Andersson Department of Computer Science Lund University Box 8 S-22 00 Lund Sweden Abstract The balance criterion dening the class of -balanced trees

More information

Exercise 1 : B-Trees [ =17pts]

Exercise 1 : B-Trees [ =17pts] CS - Fall 003 Assignment Due : Thu November 7 (written part), Tue Dec 0 (programming part) Exercise : B-Trees [+++3+=7pts] 3 0 3 3 3 0 Figure : B-Tree. Consider the B-Tree of figure.. What are the values

More information

Recognition. Clark F. Olson. Cornell University. work on separate feature sets can be performed in

Recognition. Clark F. Olson. Cornell University. work on separate feature sets can be performed in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 907-912, 1996. Connectionist Networks for Feature Indexing and Object Recognition Clark F. Olson Department of Computer

More information

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Data Structures Hashing Structures. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Data Structures Hashing Structures Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Hashing Structures I. Motivation and Review II. Hash Functions III. HashTables I. Implementations

More information

CS584/684 Algorithm Analysis and Design Spring 2017 Week 3: Multithreaded Algorithms

CS584/684 Algorithm Analysis and Design Spring 2017 Week 3: Multithreaded Algorithms CS584/684 Algorithm Analysis and Design Spring 2017 Week 3: Multithreaded Algorithms Additional sources for this lecture include: 1. Umut Acar and Guy Blelloch, Algorithm Design: Parallel and Sequential

More information