Bitmap Index Partition Techniques for Continuous and High Cardinality Discrete Attributes

Size: px
Start display at page:

Download "Bitmap Index Partition Techniques for Continuous and High Cardinality Discrete Attributes"

Transcription

1 Bitmap Index Partition Techniques for Continuous and High Cardinality Discrete Attributes Songrit Maneewongvatana Department of Computer Engineering King s Mongkut s University of Technology, Thonburi, Thailand songrit@cpe.kmutt.ac.th For submission to InTech 2003 Abstract Bitmap indexing is a technique to index data. The main advantage of bitmap indexing is that boolean operations on bitmaps are very fast. This is essential for queries in OLAP applications. Typically, bitmap indexing is used for low cardinality attributes since the overall space requirement depends on the cardinality. For high cardinality attributes, a technique of associating a range of contiguous values to a single bitmap is generally applied to reduce the space requirement. This technique requires an additional step, candidate check, which checks the actual records to verify if they satisfy the condition in the query or not. In this paper, we study techniques on partitioning the attribute domain into intervals, each is assigned to a bitmap. The goal is to minimize the candidate check cost. We propose two partitioning techniques. the first technique is for the situation that the query distribution is similar to the data distribution in the database. We also prove that this partitioning scheme is optimal when both query and data distributions are the same. The second technique uses a set of training queries in addition to the data in database. The partition generated by this technique has the minimal candidate checking cost with respect to the training query set. We consider both equality queries and range queries. 1 Introduction Indexing is a well-known methodology for optimizing the query performance on large databases. In traditional database applications, many indexing data structures have been proposed, but the most popular one is B-tree family [3, 6]. B-tree and its variants provide fast access method and relatively efficient index maintenance. Both properties are essential for traditional OLTP (OnLine Transaction Processing) applications where the systems have to handle many concurrent insert, delete and update operations. However, B-tree takes up a lot of space which can slow down the retrieval process if it can not fit in main memory. Recent uses of database are not limited to OLTP (OnLine Transaction Processing) applications only, but there have been increasing uses of OLAP (OnLine Analytical Processing)/data warehousing applications. In data warehousing applications, the data are relatively static with periodically bulk inserts. Data select criteria are usually in complex form consisting of many attributes joined by boolean operations. With these access patterns, B-Tree or its variants might not be the best indexing structure. Bitmap indexing usually provides a superior performance for such scenarios. It provides an alternative way to index the attribute values from B-tree. Instead of a list of RIDs (Record ID) in B-tree, a string of bit or bitmap is used to store the information about which records contain a particular attribute value. The main advantage is that the boolean operations on a set of bitmaps are very fast. This is essential for data warehousing applications since the most queries are in forms of multi-attributes, connected by logical operators. Another benefit of bitmap 1

2 indexing is that it requires less space, for attributes with low or medium cardinality. However, the attribute with high cardinality such as a typical scientific data stored in floating-point format are not suitable for a simple bitmap index scheme since the large space requirement outweighs other strengths. A common technique to accommodate large cardinality attributes is to assign a bitmap to a set of attribute values [10]. The set of all possible values is partitioned into k subsets, each with associated bitmap. For categorical attributes, each set can be expressed as enumeration of the attribute values. For continuous attributes, we can assign a continuous range of values to a set. Bits in a bitmap is set if the corresponding records contain one of the values associated with the bitmap. It is possible to have false hits, the attribute value is outside the interval associated with hit bitmap, in the bitmap. Therefore, an additional stage is required during the retrieval to filter the set bits so that only records with the desired attribute value are retrieved. The filtering stage involves accessing the actual database in the disk and thus it is time-consuming. Making the right partition can reduce this overhead. The data distribution (the distribution of the attribute values in the database) determines the size of attribute set and how it should be partitioned. The attribute set must contain all values in the data distribution. The query distribution also affects the choice on how to partition the attribute set. In some applications, it is possible to obtain query distribution or the training set of queries. This information can be used for optimization during the partition process. The goal is to have the minimal cost with respect to the data and query distributions. In this paper we focus on continuous and high cardinality discrete attributes. We improve the partition algorithm used to create bitmap index for such attributes. Our contributions include: We present a method to partition the attribute set when the actual query distribution is unknown but assumed to be the same as data distribution. We also prove that this partition is optimal with respect to the data and query distribution. We present a partitioning algorithm when a query training set is present. This algorithm is an extension of an existing algorithm to find a partition of for bitmap index of highcardinality discrete attributes [10]. Its major improvements are: 1) it is applicable to non-discrete attributes 2) it reduces the number of candidate partition points. 3) it allows range queries as well as equality queries We also prove that our algorithm, even using a smaller set of candidate partition points, is still optimal. 2 Related Work Bitmap indexing is an index structure consisting of a collection of bitmaps. It has been used in numerous applications, including commercial database systems like Oracle, Informix and Sybase [7]. Bitmap indexing was first introduced by O Neil for the Model 204 DBMS [11]. Since then, several improvements have been proposed [12, 5, 15, 13, 2]. A major advantage of bitmap indexing is that complex bitmap selections can be performed very quickly using bitwise boolean operations such as AND, OR, XOR and NOT. Its small space requirement, especially for low cardinality attributes, is another benefit. In [12], the authors gave a review of simple bitmap indexing and introduced two approaches for encoding the bitmaps. The first method is called projection index which stores a sequence of attribute values in the tuple-id order. The projection index is particularly efficient when query results are the values of the indexed attribute of all tuples that satisfy the query criteria since it is faster to scan the smaller projection index than to scan the full table. Bit-sliced index is the second method. Its organization is somewhat orthogonal to the structure of projection index. Each bit-slice holds only bits from a single position of the encoding of the attribute value. For example, if k bits are required to encode all possible attribute values (that is, the number of all possible attribute values can be up to 2 k ), then number of bit-slices is k. Wu and Buchmann extended the features of bit-slice in [15] and called it encodedbitmapindexing. Encoded bitmap indexing has the same space requirement as bit-slice index but it adds the flexibility on how 2

3 to encode the attribute values. Encoded bitmap has a separate mapping table that contains mapping from each attribute value to a unique k bit vector. Each bit in this vector corresponds to a bitmap. The mapping function can be adjusted so that the number of bitmaps accessed can be minimized in some queries. Bit-slice is a special encoding bitmap indexing where the mapping function is the a mapping from attribute value to its own binary. Chan and Ioannidis generalized the bitmap encoding by using two-dimensional framework: bitmap index decomposition, bitmap encoding scheme [4]. Bitmap index decomposition determines how to relate collection of bitmaps to a set of attribute values. For example, in a simple bitmap scheme, it is a one-to-one mapping from bitmaps to attribute values. The number of bitmaps required equals to the cardinality of the attribute. In a better space economical schemes, bitmaps can be divided into groups. Each bitmap in a group determines a unique group value. The combination of group values of all groups is used to identify a particular attribute value. Bitmap encoding scheme decides which bit(s) in bit vector should be set to 1. In equality encoding scheme, only bitmap(s) that corresponds to the attribute value is set. The same authors also introduced new encoding schemes range encoding and interval encoding [4, 5]. Each of these encoding schemes is suitable for different types of queries. Another effort to reduce the space-requirement of bitmap indexing is through compression. Each bitmap is compressed separately. The compressed bitmaps are generally much smaller than the uncompressed ones, especially for sparse bitmaps. The typical compression method is to convert the code into a run length encoding (RLE), which keeps distances between adjacent set bits ( 1 bits in this case). Other compression algorithms can also be used, for example: gzip/zlib [8], byte-aligned bitmap code (BBC) [2] or word-aligned hybrid code (WAH) [14]. The main goal of compression for bitmap indices is to reduce the size of bitmaps as much as possible (because it means faster disk scan) but at the same time maintain fast logical operations, the main strength of bitmap indexing. Specifically designed algorithm like BBC allows the logical operations to perform directly against the compressed bitmaps, therefore its speed is the main advantage. [9, 1, 14] discussed the performance of compressed bitmaps. Most of the bitmap indices were designed for the low cardinality discrete attributes. However, bitmap index can also be applied to non-discrete attributes which are common in scientific communities [13]. However, it is impractical to assign a bitmap to each possible value of a typical floating point variable. An approach to solve this problem is to assign a bitmap to a set of attribute values. [13] proposed a method for continuous attributes. The attribute domain is partitioned uniformly and each interval is covered by a bitmap. A related task of finding an optimal partition with respect to data and query distributions is proposed in [10]. 3 Problem Definition For an attribute A of a table T of size n, bitmap indexing is a set of k bitmaps. The value of k is determined by the cardinality of A, the encoding scheme, the decomposition and the domain partitioning. Each bitmap B is a string of n bit, b 1,b 2,...,b n.bitb i corresponds to the value of A in row i. The attribute value and the encoding scheme determine if a bit is set ( 1 ) or reset ( 0 ). For simplicity, we will focus only on the indexed attribute A and ignore the remaining attributes. With this in mind, the record value implies the attribute value. Also, the algorithm will be presented in the simple bitmap context but it generally can be applied to other bitmap schemes. Let us discuss other concepts related to the domain partitioning. The attribute domain X is the set of all possible attribute values. We assume that the attribute is at least in ordinal scale. We denote x min and x max the lowest and highest values. X canbeexpressedinintervalform of {x min,x max },where {, } can be substituted by either close interval [, ] or open interval (, ). The domain partitioning problem is to partition {x min,x max } into k disjoint intervals I i, 1 i k. Let {s i,e i } denote the interval I i, which starts at s i and ends at e i. During a query q is processed, the system identifies the range of the attribute values that needs to be accessed. All the bitmaps whose interval intersects the query range will be examined. However, some records associated with set bits may store the value that is outside the query range. For example, bitmap B i covers interval [40, 50] and a query which extracts all records with A = 42. 3

4 For every set bit in B i, the value of corresponding record could be anything from 40 to 50. Thus those records must be checked. This stage is referred to as candidate check. Wemodelthe candidate check cost to be the number of records covered by the accessed bitmap. The candidate check cost typically dominates the query time, in some cases it consumes more than 80% of the total I/O [13]. The choice of the bitmap partition affects the performance of the index because it decides how many records each bitmap covers. The basic idea is that frequent accessed bitmaps should have relatively few records, and thus lower candidate check cost. In the next two sections, we present the methods for finding partition with minimal candidate check cost in different situations. The query pattern also has influence on how the partition should be made. There are several types of queries. An equality query seeks for records that contain a certain value only. A query range is specified in a range query, e.g. to retrieve any records that contain anything from 20 to Partitioning without Query Training Set In [13], a straightforward partitioning was discussed. The attribute domain is partitioned uniformly so that interval of each bitmap is of the same size (the size is the number of unique attribute values for discrete attributes, or the length of interval for continuous attributes), we refer to this method as equal-interval partitioning. We can formulate the partition as: X = I 1 I 2... I k where I i = {s i,e i },e i s i = xmax xmin k for all i s, s 1 = x min,e k = x max,s i+1 = e i, 1 i<k 1 and I i I j = if i j. One variable of equal-interval partitioning is to use the range of data in the T to determine the interval of each bitmap (except the first and the last which must extend to x min and x max ). This is more efficient when the data distribution spreads over a small portion of attribute domain. Let D = {d 1,d 2,...,d n } denote the data set in table T and d min,d max denote the entries with minimum and maximum values. The size of the intervals (except I 1 and I k ) changes to dmax dmin k. It is easy to find intervals in the first form of equal-interval partitioning since x min and x max are known. For the second form, the table needs to be scanned to obtain d min and d max. In most of the real world situations, the data distribution is not uniform. In some cases, the query distribution Q resembles the data distribution. In this paper, we propose a partitioning method, called equal-density partitioning. Equal-density partitioning is optimal when the data (D)and query come from the same distribution. Also, in the case that query distribution is uniform its candidate check cost is the same as that of equal-interval partitioning. In equal-density partitioning, the attribute domain is partitioned such that the number of records (or density) that fall in each interval is roughly equal. In other words, the number of set bits in each bitmap is about the same. Sometimes, it is not possible to partition such that the density of each bitmap is identical because the n is indivisible by k or because there are duplicate values around partition points (all records with the same value must fall in the same bitmap s interval). We use the criteria min( i j ( I i I j ) 2 )where I i is the number of data records whose value that falls into the interval of bitmap i. Fig. 1 shows the equal-interval and equal-density partitionings. Equal-density partitioning is sensitive to the data set. The interval is smaller, e.g. interval [24,25] in the figure, when point density is dense. We will now show that the bitmap indexing obtaining from equal-density partitioning has the minimal candidate check cost when both data and query distributions are the same. Lemma 4.1 Given data set D and a query training set Q both are drawn from the same distribution, the minimal candidate check cost can be obtained when the intervals of all bitmaps have thesamedensity. Proof sketch: Assume that n is divisible by k. We would like to assign n k data to each bitmap. Let p i be the fraction of data records falling in the interval I i of the bitmap B i. Therefore p i = 1 k 4

5 [0,10) [10,20) [20,30) [30,40) [40,50) [50,60] Equal interval partitioning [0,15] (15,24) (25,32) [32,38) [38,60] Equal density partitioning Figure 1: Equal-interval and equal density interval partitionings for all i s because each interval has the same record density. Since the distribution of Q is the same as D, the probability q i that a query falls in I i is q i = p i = 1 k for all i s. Each query that falls in I i, the candidate check cost is the number of records associated with B i,whichisnp i. For simplicity, a constant factor n factor is removed. The expected candidate check cost is: p i q i = (p i ) 2 = ( 1 k )2 = 1 k. (1) We claim that this expected candidate check cost is the minimum. For any other set of intervals that p i 1 k for some i s, we can rewrite p i as p i = 1 k + δ i,whereδ i 0. Since k p i =1and k δ i = 0, the expected candidate cost is: p i q i = = (p i ) 2 = ( 1 k + δ i) 2 (( 1 k )2 + 2δ i k +(δ i) 2 ). (2) The first term in Eq. 2 is the same as Eq. 1, the second term k 2δ i k = 2 k k δ i =0andthe last term k (δ i) 2 > 0 because some δ i is not equal to 0. Therefore the expected candidate check cost of Eq. 2 is greater than that of Eq. 1 and that completes the proof. Moreover, the equal-density partitioning has the same expected candidate check cost as equal-interval partitioning when the query distribution is uniform. We will show this in the next lemma. Lemma 4.2 Given data set D and uniform query distribution, the expected candidate cost of bitmap indexing obtained from either equal-density or equal-interval partitioning method is identical. Proof sketch: We first formulate the expect candidate check cost for uniform query distribution with equal-density partitioning: p i q i = q i k = 1 k q i. (3) Since k q i = 1, the candidate check cost is 1 k. For equal-interval partitioning, the candidate check cost can be derived using the same method as in Eq. 3, but now q i = 1 k for all i s and 5

6 k p i =1: p i q i = p i k = 1 k. (4) Therefore, equal-density and equal-interval partitioning methods result in the same expected candidate check cost. However, in this case, both equal-density and equal-interval partitioning methods may not yield the minimal candidate check cost. For example, if the data distribution contains few tightly clusters that are far away from one another. The partition that yields the minimal candidate check cost is the one with interval snugly contain each cluster. But this situation seldom occurs in the real world. 5 Partitioning with Query Training Set The overall efficiency of each bitmap partition also depends on the queries. For example, if it is known that most of the queries are in a specific range, such range should be partitioned into intervals whose size is smaller than average. This reduces overall candidate check cost. In this section, we consider the situation when a query training set Q, which reflects the actual query distribution, is present during the partitioning. The size of Q affects the performance and the quality of the partitioning. Basically Q should be as compact as possible but still capture major characteristic of the query distribution. For large data table, the size of Q is usually much smaller than the size of D. In this section, we extend a partitioning algorithm based on dynamic programming technique presented by Koudas in [10]. We now briefly discuss the original algorithm. It is designed for large cardinality discrete domain and supports only equality queries. Let p x and q x denote the number of records and queries that contain attribute value x respectively. The goal of the algorithm is to create a partition that minimizes the number of all false hits based on sets of p x and q x, for all x in attribute domain. The number of false hits F i associated with the interval of bitmap B i can be defined as: F i = e i q j j=s i s i k e iandk j And the number of all false hits is F = k F i. Since its attribute is discrete, the cardinality m is finite. Dynamic algorithm technique is used to efficiently find the partition with k intervals that minimizes the number of all false hits. Suppose that the attribute domain is embedded in a horizontal line and the attribute values are sorted from left to right. We can state our problem of finding optimal k 1 partition points in a recursive form. First, find the optimal split point, split1, that partitions the attribute domain into an interval on the left [x min,split1) and a collection of k 1 intervals on the right (split1,x max ]: split1 =findoptimalsplit(x min,x max,d,q,s)wheres is the set of split point candidates (that is, set of attribute domain). We split the range on the right side: split2 = findoptimalsplit(split1,x max,d D 1,Q Q 1,S S 1 )whered 1, Q 1 and S 1 are data records, queries and split point candidates that are in the first (leftmost) interval. The range on the right side is recursively split until k intervals are found. A naive recursive algorithm tests all possible sets of k unique values and reports the one with the lowest number of all false hits. This method requires O(m k ) time. Dynamic programming technique can significantly reduce the run time to O(m 2 k) by precomputing the solutions of smaller subproblems first, these solutions could be reused when the algorithm finds a solution of a larger subproblem. The detail of dynamic programming technique for partitioning the attribute domain was presented in [10]. Our algorithm has slightly different criteria, it uses the candidate check cost instead of false hits. The candidate check cost includes both true hits and false hits. Obviously one can use either criteria and having the same partition since the number of true hits remains constant for p k, 6

7 a particular query. Candidate check cost gives a closer resemblance to the performance since it is proportional to the number of accessed records. Theoretically, the cardinality of the continuous attribute is infinite. But since these attributes are typically stored as fixed-length variables, the cardinality of the stored data is very large but finite. Even with dynamic programming technique, it is inefficient to consider every unique value in function findoptimalsplit(). We notice that any attribute value x with p x = q x =0 can be removed from the set S. This is because the candidate check cost depends only on p x and q x. With this reduction, the finding optimal partitioning can be significantly faster than the original algorithm in [10] if D and Q do not uniformly spread over the attribute domain which is quite common for both continuous and large discrete attributes. We can further reduce the size of S and improve the speed of the algorithm by removing any attribute value x with q x =0fromS. We will later prove that there exists an optimal partition that does not use such value as a split point. However, we require 2 split point candidates for each value x with q x 0. This is because value x can be assigned to an interval either on the left side of x (the left one has closed end) or on the interval on the right side of x (the right one has closed start). Fig. 2 illustrates this concept and the markers x and x +. The set S can now be expressed as S = {x,x + q x 0}. As discussed above, continuous attributes must be discretized before stored in the system. Hence, for actual implementations, we can use x to represent x and the smallest value that is greater than x to represent x +.Iftwoqueryvalues x, y are adjacent and x<y, x + and y can be combined and represented by y. We now give the proof showing that the set S = {x,x + q x 0} is sufficient for the algorithm to find an optimal partition. [a,x) x [x,b] [a,x] x+ (x,b] a x b Figure 2: Two split points for x, x and x +. Lemma 5.1 Given data set D, query training set Q and set of split point candidates S = {x,x + q x 0}, it is possible the find a partition I that has minimum candidate check cost with respect to D and Q. Proof sketch: Suppose that there is a partition I that has a split point s S, we show that we can move the split point from s to a nearby point s S without additional candidate check cost. Let s l,s r S be the nearest points on the left and right side of s. Since s is a split point, we can define I l and I r to be intervals on the left and right side of s and p Il, q Il be the numbers of data records and queries that fall in interval I l (similar definitions apply to I r ). The candidate check cost of I l + I r is p Il q Il + p Ir q Ir. (5) If q Il > q Ir, the split point can be moved to s + l. Let I l and I r be new intervals on the left and right side of s + l. The candidate check cost of I l + I r is p I l q I l + p I r q I r. (6) Since there is no query point between s + l and s, q I l = q Il and q I r = q Ir.Buttheremight be some c data records in interval (s l,s ), therefore p I l + c = p Il and p I r c = p Ir for some c 0. (6) can be rewritten as: ( p Il c) q Il +( p Ir + c) q Ir, (7) 7

8 which is not greater than (5). Therefore split points can be moved from s to s + l without any increment on candidate check cost. Similar argument also applies in the case that q Il < q Ir, the split point can be moved from s to s r without additional candidate check cost. If q I l = q Ir, the candidate check cost is the same for split point s, s + l, s r. The candidate check cost for equality queries Ci E associated with the interval of bitmap B i is Ci E = q Ii p Ii, and the overall candidate check cost for equality query is the sum of Ci E over all intervals. Up till this point, the queries in the training set are equality queries. We now extend the algorithm for range queries. For range query q =[s, e], any data values which fall in intervals that intersect [s, e] can match the query and are subjected to candidate check. However, if the query range fully covers an interval, all data records in the bitmap associated with such interval are true hits. Hence the candidate check for those fully covered intervals are not required. The candidate check cost for range query Ci R of interval I i is then: C R i = w i p Ii, where w i is the number of range queries whose range intersects but does not contain interval I i. Notice that the candidate check cost changes only when the split points are at certain values: start and end points of query ranges. If [s, e] is a query range, s and e + are added into set S of split point candidates. s + needs to be added into set S only if s is an end point of another query range or if p s 0 (there is at least a data record with value s). Placing a splitting point at s + never gives a better candidate check cost than at s with respect to this particular range query since the bitmap with the interval ending at s ( {..., s] ) needs to be scanned. The same analogy can be applied when adding e into set S. 6 Conclusions In this paper, we presented the two techniques for finding the partition for the bitmap indexing. The first is based on the assumption that the query and data distributions are the same. Basically, the attribute domain is divided so that each interval has equal density of data records. We proved that the candidate check cost of the bitmap indexing with equal-density interval is minimal. In addition, we showed that the candidate check cost of equal-density method is the same as previously proposed equal-interval method. The second partitioning technique is based on an existing dynamic programming technique. We extended the technique so that it can be used for continuous attributes and it accepts range queries which are common in OLAP applications. Also we reduced the size of the split point candidate set without compromising the optimality, hence faster execution time of the algorithm. It is interesting to see how much speed up the bitmap indexing generated by these partitioning techniques can achieve. In future work, we plan to conduct the experiments on query time comparison between bitmaps generated by our methods and the conventional method. Also we would like to extend this work to more general multi-dimensional range queries. References [1] S. Amer-Yahia and T. Johnson. Optimizing queries on compressed bitmaps. In Proc. 26th Int. Conf. Very Large Data Bases (VLDB), pages , [2] G. Antoshenkov. Byte-aligned bitmap compression. Technical report, Oracle Corp., [3] R. Bayer and E. McCreight. Organization and maintenance of large ordered indices. Acta Informatica, 1(3): , [4] C.-Y. Chan and Y. Ioannidis. Bitmap index design and evaluation. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages ,

9 [5] C.-Y. Chan and Y. Ioannidis. An efficient bitmap encoding scheme for selection queries. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages , [6] D. Comer. The ubiquitous b-tree. ACM Computing Surveys, 11(2): , [7] H. Edelstein. Faster data warehouses. Information Week, pages 77 88, December [8] J.-L. Gailly and M. Adler. Zlib home page. [9] T. Johnson. Performance measurements of compressed bitmap indice. In Proc. 25th Int. Conf. Very Large Data Bases (VLDB), pages , [10] N. Koudas. Space efficient bitmap indexing. In Conf. Information and Knowledge Management, pages , [11] P. O Neil. Model 204 architecture and performance. In Int. Workshop on High Performance Transactions Systems, pages 40 59, [12] P. O Neil and D. Quass. Improved query performance with variant indexes. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 38 49, [13] K. Stockinger. Design and implementation of bitmap indices for scientific data. In Int. Database Engineering and Application Sympos., pages 47 57, [14] K. Wu, E. Otoo, and A. Shoshani. Compressing bitmap indexes for faster search operations. In Proc. 14th Int. Conf. Scientific and Statistical Database Management, pages , [15] M.-C. Wu and A. Buchmann. Encoding bitmap indexing for data warehouses. In Proc. 14th IEEE Int. Conf. Data Engineering, pages ,

Analysis of Basic Data Reordering Techniques

Analysis of Basic Data Reordering Techniques Analysis of Basic Data Reordering Techniques Tan Apaydin 1, Ali Şaman Tosun 2, and Hakan Ferhatosmanoglu 1 1 The Ohio State University, Computer Science and Engineering apaydin,hakan@cse.ohio-state.edu

More information

Data Compression for Bitmap Indexes. Y. Chen

Data Compression for Bitmap Indexes. Y. Chen Data Compression for Bitmap Indexes Y. Chen Abstract Compression Ratio (CR) and Logical Operation Time (LOT) are two major measures of the efficiency of bitmap indexing. Previous works by [5, 9, 10, 11]

More information

Bitmap Indices for Fast End-User Physics Analysis in ROOT

Bitmap Indices for Fast End-User Physics Analysis in ROOT Bitmap Indices for Fast End-User Physics Analysis in ROOT Kurt Stockinger a, Kesheng Wu a, Rene Brun b, Philippe Canal c a Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA b European Organization

More information

Bitmap Indices for Speeding Up High-Dimensional Data Analysis

Bitmap Indices for Speeding Up High-Dimensional Data Analysis Bitmap Indices for Speeding Up High-Dimensional Data Analysis Kurt Stockinger CERN, European Organization for Nuclear Research CH-1211 Geneva, Switzerland Institute for Computer Science and Business Informatics

More information

Strategies for Processing ad hoc Queries on Large Data Warehouses

Strategies for Processing ad hoc Queries on Large Data Warehouses Strategies for Processing ad hoc Queries on Large Data Warehouses Kurt Stockinger CERN Geneva, Switzerland Kurt.Stockinger@cern.ch Kesheng Wu Lawrence Berkeley Nat l Lab Berkeley, CA, USA KWu@lbl.gov Arie

More information

Minimizing I/O Costs of Multi-Dimensional Queries with Bitmap Indices

Minimizing I/O Costs of Multi-Dimensional Queries with Bitmap Indices Minimizing I/O Costs of Multi-Dimensional Queries with Bitmap Indices Doron Rotem, Kurt Stockinger and Kesheng Wu Computational Research Division Lawrence Berkeley National Laboratory University of California

More information

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Title Breaking the Curse of Cardinality on Bitmap Indexes Permalink https://escholarship.org/uc/item/5v921692 Author Wu, Kesheng

More information

Improving the Performance of High-Energy Physics Analysis through Bitmap Indices

Improving the Performance of High-Energy Physics Analysis through Bitmap Indices Improving the Performance of High-Energy Physics Analysis through Bitmap Indices Kurt Stockinger,2, Dirk Duellmann, Wolfgang Hoschek,3, and Erich Schikuta 2 CERN IT Division, European Organization for

More information

Histogram-Aware Sorting for Enhanced Word-Aligned Compress

Histogram-Aware Sorting for Enhanced Word-Aligned Compress Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes 1- University of New Brunswick, Saint John 2- Université du Québec at Montréal (UQAM) October 23, 2008 Bitmap indexes SELECT

More information

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17 Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa

More information

Keynote: About Bitmap Indexes

Keynote: About Bitmap Indexes The ifth International Conference on Advances in Databases, Knowledge, and Data Applications January 27 - ebruary, 23 - Seville, Spain Keynote: About Bitmap Indexes Andreas Schmidt (andreas.schmidt@kit.edu)

More information

Efficient Iceberg Query Evaluation on Multiple Attributes using Set Representation

Efficient Iceberg Query Evaluation on Multiple Attributes using Set Representation Efficient Iceberg Query Evaluation on Multiple Attributes using Set Representation V. Chandra Shekhar Rao 1 and P. Sammulal 2 1 Research Scalar, JNTUH, Associate Professor of Computer Science & Engg.,

More information

Sorting Improves Bitmap Indexes

Sorting Improves Bitmap Indexes Joint work (presented at BDA 08 and DOLAP 08) with Daniel Lemire and Kamel Aouiche, UQAM. December 4, 2008 Database Indexes Databases use precomputed indexes (auxiliary data structures) to speed processing.

More information

Processing of Very Large Data

Processing of Very Large Data Processing of Very Large Data Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first

More information

Introduction to Spatial Database Systems

Introduction to Spatial Database Systems Introduction to Spatial Database Systems by Cyrus Shahabi from Ralf Hart Hartmut Guting s VLDB Journal v3, n4, October 1994 Data Structures & Algorithms 1. Implementation of spatial algebra in an integrated

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

Parallel, In Situ Indexing for Data-intensive Computing. Introduction

Parallel, In Situ Indexing for Data-intensive Computing. Introduction FastQuery - LDAV /24/ Parallel, In Situ Indexing for Data-intensive Computing October 24, 2 Jinoh Kim, Hasan Abbasi, Luis Chacon, Ciprian Docan, Scott Klasky, Qing Liu, Norbert Podhorszki, Arie Shoshani,

More information

Module 4: Tree-Structured Indexing

Module 4: Tree-Structured Indexing Module 4: Tree-Structured Indexing Module Outline 4.1 B + trees 4.2 Structure of B + trees 4.3 Operations on B + trees 4.4 Extensions 4.5 Generalized Access Path 4.6 ORACLE Clusters Web Forms Transaction

More information

Enhancing Bitmap Indices

Enhancing Bitmap Indices Enhancing Bitmap Indices Guadalupe Canahuate The Ohio State University Advisor: Hakan Ferhatosmanoglu Introduction n Bitmap indices: Widely used in data warehouses and read-only domains Implemented in

More information

DATA WAREHOUSING II. CS121: Relational Databases Fall 2017 Lecture 23

DATA WAREHOUSING II. CS121: Relational Databases Fall 2017 Lecture 23 DATA WAREHOUSING II CS121: Relational Databases Fall 2017 Lecture 23 Last Time: Data Warehousing 2 Last time introduced the topic of decision support systems (DSS) and data warehousing Very large DBs used

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Module 9: Selectivity Estimation

Module 9: Selectivity Estimation Module 9: Selectivity Estimation Module Outline 9.1 Query Cost and Selectivity Estimation 9.2 Database profiles 9.3 Sampling 9.4 Statistics maintained by commercial DBMS Web Forms Transaction Manager Lock

More information

Binary Encoded Attribute-Pairing Technique for Database Compression

Binary Encoded Attribute-Pairing Technique for Database Compression Binary Encoded Attribute-Pairing Technique for Database Compression Akanksha Baid and Swetha Krishnan Computer Sciences Department University of Wisconsin, Madison baid,swetha@cs.wisc.edu Abstract Data

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:

More information

Chapter 5. Indexing for DWH

Chapter 5. Indexing for DWH Chapter 5. Indexing for DWH D1 Facts D2 Prof. Bayer, DWH, Ch.5, SS 2000 1 dimension Time with composite key K1 according to hierarchy key K1 = (year int, month int, day int) dimension Region with composite

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases Jinlong Wang, Congfu Xu, Hongwei Dan, and Yunhe Pan Institute of Artificial Intelligence, Zhejiang University Hangzhou, 310027,

More information

Lecture 3 February 9, 2010

Lecture 3 February 9, 2010 6.851: Advanced Data Structures Spring 2010 Dr. André Schulz Lecture 3 February 9, 2010 Scribe: Jacob Steinhardt and Greg Brockman 1 Overview In the last lecture we continued to study binary search trees

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization

More information

Approximate Encoding for Direct Access and Query Processing over Compressed Bitmaps

Approximate Encoding for Direct Access and Query Processing over Compressed Bitmaps Approximate Encoding for Direct Access and Query Processing over Compressed Bitmaps Tan Apaydin The Ohio State University apaydin@cse.ohio-state.edu Hakan Ferhatosmanoglu The Ohio State University hakan@cse.ohio-state.edu

More information

Summary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week:

Summary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week: Summary Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Last week: Logical Model: Cubes,

More information

Multi-resolution Bitmap Indexes for Scientific Data

Multi-resolution Bitmap Indexes for Scientific Data Multi-resolution Bitmap Indexes for Scientific Data RISHI RAKESH SINHA and MARIANNE WINSLETT University of Illinois at Urbana-Champaign The unique characteristics of scientific data and queries cause traditional

More information

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Summary Last week: Logical Model: Cubes,

More information

CMSC 754 Computational Geometry 1

CMSC 754 Computational Geometry 1 CMSC 754 Computational Geometry 1 David M. Mount Department of Computer Science University of Maryland Fall 2005 1 Copyright, David M. Mount, 2005, Dept. of Computer Science, University of Maryland, College

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

Building Intelligent Learning Database Systems

Building Intelligent Learning Database Systems Building Intelligent Learning Database Systems 1. Intelligent Learning Database Systems: A Definition (Wu 1995, Wu 2000) 2. Induction: Mining Knowledge from Data Decision tree construction (ID3 and C4.5)

More information

Hash-Based Indexing 165

Hash-Based Indexing 165 Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19

More information

Department of Industrial Engineering. Sharif University of Technology. Operational and enterprises systems. Exciting directions in systems

Department of Industrial Engineering. Sharif University of Technology. Operational and enterprises systems. Exciting directions in systems Department of Industrial Engineering Sharif University of Technology Session# 9 Contents: The role of managers in Information Technology (IT) Organizational Issues Information Technology Operational and

More information

Parameterized graph separation problems

Parameterized graph separation problems Parameterized graph separation problems Dániel Marx Department of Computer Science and Information Theory, Budapest University of Technology and Economics Budapest, H-1521, Hungary, dmarx@cs.bme.hu Abstract.

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture II: Indexing Part I of this course Indexing 3 Database File Organization and Indexing Remember: Database tables

More information

International Journal of Modern Trends in Engineering and Research. A Survey on Iceberg Query Evaluation strategies

International Journal of Modern Trends in Engineering and Research. A Survey on Iceberg Query Evaluation strategies International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 A Survey on Iceberg Query Evaluation strategies Kale Sarika Prakash 1, P.M.J.Prathap

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06

More information

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data. Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss

More information

Multi-Stack Boundary Labeling Problems

Multi-Stack Boundary Labeling Problems Multi-Stack Boundary Labeling Problems Michael A. Bekos 1, Michael Kaufmann 2, Katerina Potika 1, and Antonios Symvonis 1 1 National Technical University of Athens, School of Applied Mathematical & Physical

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)

CS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures) CS614- Data Warehousing Solved MCQ(S) From Midterm Papers (1 TO 22 Lectures) BY Arslan Arshad Nov 21,2016 BS110401050 BS110401050@vu.edu.pk Arslan.arshad01@gmail.com AKMP01 CS614 - Data Warehousing - Midterm

More information

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data Enhua Jiao, Tok Wang Ling, Chee-Yong Chan School of Computing, National University of Singapore {jiaoenhu,lingtw,chancy}@comp.nus.edu.sg

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Chapter 14 Comp 521 Files and Databases Fall 2010 1 Relational Operations We will consider in more detail how to implement: Selection ( ) Selects a subset of rows from

More information

Compressing Bitmap Indexes for Faster Search Operations

Compressing Bitmap Indexes for Faster Search Operations LBNL-49627 Compressing Bitmap Indexes for Faster Search Operations Kesheng Wu, Ekow J. Otoo and Arie Shoshani Lawrence Berkeley National Laboratory Berkeley, CA 94720, USA Email: fkwu, ejotoo, ashoshanig@lbl.gov

More information

Data Warehousing (Special Indexing Techniques)

Data Warehousing (Special Indexing Techniques) Data Warehousing (Special Indexing Techniques) Naveed Iqbal, Assistant Professor NUCES, Islamabad Campus (Lecture Slides Weeks # 13&14) Special Index Structures Inverted index Bitmap index Cluster index

More information

Data Access Paths for Frequent Itemsets Discovery

Data Access Paths for Frequent Itemsets Discovery Data Access Paths for Frequent Itemsets Discovery Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science {marekw, mzakrz}@cs.put.poznan.pl Abstract. A number

More information

To prove something about all Boolean expressions, we will need the following induction principle: Axiom 7.1 (Induction over Boolean expressions):

To prove something about all Boolean expressions, we will need the following induction principle: Axiom 7.1 (Induction over Boolean expressions): CS 70 Discrete Mathematics for CS Fall 2003 Wagner Lecture 7 This lecture returns to the topic of propositional logic. Whereas in Lecture 1 we studied this topic as a way of understanding proper reasoning

More information

Bitmap Index Design and Evaluation

Bitmap Index Design and Evaluation Bitmap Index Design and Evaluation Chee-Yong Chan Department of Computer Sciences University of Wisconsin-Madison cychan@cs.wisc.edu Yannis E. Ioannidis y Department of Computer Sciences University of

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Benchmarking a B-tree compression method

Benchmarking a B-tree compression method Benchmarking a B-tree compression method Filip Křižka, Michal Krátký, and Radim Bača Department of Computer Science, Technical University of Ostrava, Czech Republic {filip.krizka,michal.kratky,radim.baca}@vsb.cz

More information

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique

A Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 6 ISSN : 2456-3307 A Real Time GIS Approximation Approach for Multiphase

More information

Special Issue of IJCIM Proceedings of the

Special Issue of IJCIM Proceedings of the Special Issue of IJCIM Proceedings of the Eigh th-.. '' '. Jnte ' '... : matio'....' ' nal'.. '.. -... ;p ~--.' :'.... :... ej.lci! -1'--: "'..f(~aa, D-.,.,a...l~ OR elattmng tot.~av-~e-ijajil:u. ~~ Pta~.,

More information

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY [Agrawal, 2(4): April, 2013] ISSN: 2277-9655 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY An Horizontal Aggregation Approach for Preparation of Data Sets in Data Mining Mayur

More information

Max-Count Aggregation Estimation for Moving Points

Max-Count Aggregation Estimation for Moving Points Max-Count Aggregation Estimation for Moving Points Yi Chen Peter Revesz Dept. of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA Abstract Many interesting problems

More information

CS122 Lecture 15 Winter Term,

CS122 Lecture 15 Winter Term, CS122 Lecture 15 Winter Term, 2014-2015 2 Index Op)miza)ons So far, only discussed implementing relational algebra operations to directly access heap Biles Indexes present an alternate access path for

More information

Core Membership Computation for Succinct Representations of Coalitional Games

Core Membership Computation for Succinct Representations of Coalitional Games Core Membership Computation for Succinct Representations of Coalitional Games Xi Alice Gao May 11, 2009 Abstract In this paper, I compare and contrast two formal results on the computational complexity

More information

16 Greedy Algorithms

16 Greedy Algorithms 16 Greedy Algorithms Optimization algorithms typically go through a sequence of steps, with a set of choices at each For many optimization problems, using dynamic programming to determine the best choices

More information

Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14

Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14 Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14 Scan Converting Lines, Circles and Ellipses Hello everybody, welcome again

More information

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4

Summary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4 Principles of Knowledge Discovery in Data Fall 2004 Chapter 3: Data Preprocessing Dr. Osmar R. Zaïane University of Alberta Summary of Last Chapter What is a data warehouse and what is it for? What is

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

All About Bitmap Indexes... And Sorting Them

All About Bitmap Indexes... And Sorting Them http://www.daniel-lemire.com/ Joint work (presented at BDA 08 and DOLAP 08) with Owen Kaser (UNB) and Kamel Aouiche (post-doc). February 12, 2009 Database Indexes Databases use precomputed indexes (auxiliary

More information

ATYPICAL RELATIONAL QUERY OPTIMIZER

ATYPICAL RELATIONAL QUERY OPTIMIZER 14 ATYPICAL RELATIONAL QUERY OPTIMIZER Life is what happens while you re busy making other plans. John Lennon In this chapter, we present a typical relational query optimizer in detail. We begin by discussing

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

Data Warehousing Lecture 8. Toon Calders

Data Warehousing Lecture 8. Toon Calders Data Warehousing Lecture 8 Toon Calders toon.calders@ulb.ac.be 1 Summary How is the data stored? Relational database (ROLAP) Specialized structures (MOLAP) How can we speed up computation? Materialized

More information

Interleaving Schemes on Circulant Graphs with Two Offsets

Interleaving Schemes on Circulant Graphs with Two Offsets Interleaving Schemes on Circulant raphs with Two Offsets Aleksandrs Slivkins Department of Computer Science Cornell University Ithaca, NY 14853 slivkins@cs.cornell.edu Jehoshua Bruck Department of Electrical

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

An Eternal Domination Problem in Grids

An Eternal Domination Problem in Grids Theory and Applications of Graphs Volume Issue 1 Article 2 2017 An Eternal Domination Problem in Grids William Klostermeyer University of North Florida, klostermeyer@hotmail.com Margaret-Ellen Messinger

More information

Using Bitmap Indexing Technology for Combined Numerical and Text Queries

Using Bitmap Indexing Technology for Combined Numerical and Text Queries Using Bitmap Indexing Technology for Combined Numerical and Text Queries Kurt Stockinger John Cieslewicz Kesheng Wu Doron Rotem Arie Shoshani Abstract In this paper, we describe a strategy of using compressed

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

CS 664 Segmentation. Daniel Huttenlocher

CS 664 Segmentation. Daniel Huttenlocher CS 664 Segmentation Daniel Huttenlocher Grouping Perceptual Organization Structural relationships between tokens Parallelism, symmetry, alignment Similarity of token properties Often strong psychophysical

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

DATA WAREHOUING UNIT I

DATA WAREHOUING UNIT I BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009

More information

CSC Advanced Scientific Computing, Fall Numpy

CSC Advanced Scientific Computing, Fall Numpy CSC 223 - Advanced Scientific Computing, Fall 2017 Numpy Numpy Numpy (Numerical Python) provides an interface, called an array, to operate on dense data buffers. Numpy arrays are at the core of most Python

More information

Query Processing with Indexes. Announcements (February 24) Review. CPS 216 Advanced Database Systems

Query Processing with Indexes. Announcements (February 24) Review. CPS 216 Advanced Database Systems Query Processing with Indexes CPS 216 Advanced Database Systems Announcements (February 24) 2 More reading assignment for next week Buffer management (due next Wednesday) Homework #2 due next Thursday

More information

Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Data mining - detailed outline. Problem

Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Data mining - detailed outline. Problem Faloutsos & Pavlo 15415/615 Carnegie Mellon Univ. Dept. of Computer Science 15415/615 DB Applications Lecture # 24: Data Warehousing / Data Mining (R&G, ch 25 and 26) Data mining detailed outline Problem

More information

Treewidth and graph minors

Treewidth and graph minors Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under

More information

Spatial Index Keyword Search in Multi- Dimensional Database

Spatial Index Keyword Search in Multi- Dimensional Database Spatial Index Keyword Search in Multi- Dimensional Database Sushma Ahirrao M. E Student, Department of Computer Engineering, GHRIEM, Jalgaon, India ABSTRACT: Nearest neighbor search in multimedia databases

More information

Compression of the Stream Array Data Structure

Compression of the Stream Array Data Structure Compression of the Stream Array Data Structure Radim Bača and Martin Pawlas Department of Computer Science, Technical University of Ostrava Czech Republic {radim.baca,martin.pawlas}@vsb.cz Abstract. In

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

File Structures and Indexing

File Structures and Indexing File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures

More information

Data Analytics and Boolean Algebras

Data Analytics and Boolean Algebras Data Analytics and Boolean Algebras Hans van Thiel November 28, 2012 c Muitovar 2012 KvK Amsterdam 34350608 Passeerdersstraat 76 1016 XZ Amsterdam The Netherlands T: + 31 20 6247137 E: hthiel@muitovar.com

More information

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 432 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

TELCOM2125: Network Science and Analysis

TELCOM2125: Network Science and Analysis School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning

More information

Striped Grid Files: An Alternative for Highdimensional

Striped Grid Files: An Alternative for Highdimensional Striped Grid Files: An Alternative for Highdimensional Indexing Thanet Praneenararat 1, Vorapong Suppakitpaisarn 2, Sunchai Pitakchonlasap 1, and Jaruloj Chongstitvatana 1 Department of Mathematics 1,

More information

Kathleen Durant PhD Northeastern University CS Indexes

Kathleen Durant PhD Northeastern University CS Indexes Kathleen Durant PhD Northeastern University CS 3200 Indexes Outline for the day Index definition Types of indexes B+ trees ISAM Hash index Choosing indexed fields Indexes in InnoDB 2 Indexes A typical

More information

HICAMP Bitmap. A Space-Efficient Updatable Bitmap Index for In-Memory Databases! Bo Wang, Heiner Litz, David R. Cheriton Stanford University DAMON 14

HICAMP Bitmap. A Space-Efficient Updatable Bitmap Index for In-Memory Databases! Bo Wang, Heiner Litz, David R. Cheriton Stanford University DAMON 14 HICAMP Bitmap A Space-Efficient Updatable Bitmap Index for In-Memory Databases! Bo Wang, Heiner Litz, David R. Cheriton Stanford University DAMON 14 Database Indexing Databases use precomputed indexes

More information

1 Computer arithmetic with unsigned integers

1 Computer arithmetic with unsigned integers 1 Computer arithmetic with unsigned integers All numbers are w-bit unsigned integers unless otherwise noted. A w-bit unsigned integer x can be written out in binary as x x x w 2...x 2 x 1 x 0, where x

More information