Bitmap Index Partition Techniques for Continuous and High Cardinality Discrete Attributes
|
|
- Clarissa Norris
- 5 years ago
- Views:
Transcription
1 Bitmap Index Partition Techniques for Continuous and High Cardinality Discrete Attributes Songrit Maneewongvatana Department of Computer Engineering King s Mongkut s University of Technology, Thonburi, Thailand songrit@cpe.kmutt.ac.th For submission to InTech 2003 Abstract Bitmap indexing is a technique to index data. The main advantage of bitmap indexing is that boolean operations on bitmaps are very fast. This is essential for queries in OLAP applications. Typically, bitmap indexing is used for low cardinality attributes since the overall space requirement depends on the cardinality. For high cardinality attributes, a technique of associating a range of contiguous values to a single bitmap is generally applied to reduce the space requirement. This technique requires an additional step, candidate check, which checks the actual records to verify if they satisfy the condition in the query or not. In this paper, we study techniques on partitioning the attribute domain into intervals, each is assigned to a bitmap. The goal is to minimize the candidate check cost. We propose two partitioning techniques. the first technique is for the situation that the query distribution is similar to the data distribution in the database. We also prove that this partitioning scheme is optimal when both query and data distributions are the same. The second technique uses a set of training queries in addition to the data in database. The partition generated by this technique has the minimal candidate checking cost with respect to the training query set. We consider both equality queries and range queries. 1 Introduction Indexing is a well-known methodology for optimizing the query performance on large databases. In traditional database applications, many indexing data structures have been proposed, but the most popular one is B-tree family [3, 6]. B-tree and its variants provide fast access method and relatively efficient index maintenance. Both properties are essential for traditional OLTP (OnLine Transaction Processing) applications where the systems have to handle many concurrent insert, delete and update operations. However, B-tree takes up a lot of space which can slow down the retrieval process if it can not fit in main memory. Recent uses of database are not limited to OLTP (OnLine Transaction Processing) applications only, but there have been increasing uses of OLAP (OnLine Analytical Processing)/data warehousing applications. In data warehousing applications, the data are relatively static with periodically bulk inserts. Data select criteria are usually in complex form consisting of many attributes joined by boolean operations. With these access patterns, B-Tree or its variants might not be the best indexing structure. Bitmap indexing usually provides a superior performance for such scenarios. It provides an alternative way to index the attribute values from B-tree. Instead of a list of RIDs (Record ID) in B-tree, a string of bit or bitmap is used to store the information about which records contain a particular attribute value. The main advantage is that the boolean operations on a set of bitmaps are very fast. This is essential for data warehousing applications since the most queries are in forms of multi-attributes, connected by logical operators. Another benefit of bitmap 1
2 indexing is that it requires less space, for attributes with low or medium cardinality. However, the attribute with high cardinality such as a typical scientific data stored in floating-point format are not suitable for a simple bitmap index scheme since the large space requirement outweighs other strengths. A common technique to accommodate large cardinality attributes is to assign a bitmap to a set of attribute values [10]. The set of all possible values is partitioned into k subsets, each with associated bitmap. For categorical attributes, each set can be expressed as enumeration of the attribute values. For continuous attributes, we can assign a continuous range of values to a set. Bits in a bitmap is set if the corresponding records contain one of the values associated with the bitmap. It is possible to have false hits, the attribute value is outside the interval associated with hit bitmap, in the bitmap. Therefore, an additional stage is required during the retrieval to filter the set bits so that only records with the desired attribute value are retrieved. The filtering stage involves accessing the actual database in the disk and thus it is time-consuming. Making the right partition can reduce this overhead. The data distribution (the distribution of the attribute values in the database) determines the size of attribute set and how it should be partitioned. The attribute set must contain all values in the data distribution. The query distribution also affects the choice on how to partition the attribute set. In some applications, it is possible to obtain query distribution or the training set of queries. This information can be used for optimization during the partition process. The goal is to have the minimal cost with respect to the data and query distributions. In this paper we focus on continuous and high cardinality discrete attributes. We improve the partition algorithm used to create bitmap index for such attributes. Our contributions include: We present a method to partition the attribute set when the actual query distribution is unknown but assumed to be the same as data distribution. We also prove that this partition is optimal with respect to the data and query distribution. We present a partitioning algorithm when a query training set is present. This algorithm is an extension of an existing algorithm to find a partition of for bitmap index of highcardinality discrete attributes [10]. Its major improvements are: 1) it is applicable to non-discrete attributes 2) it reduces the number of candidate partition points. 3) it allows range queries as well as equality queries We also prove that our algorithm, even using a smaller set of candidate partition points, is still optimal. 2 Related Work Bitmap indexing is an index structure consisting of a collection of bitmaps. It has been used in numerous applications, including commercial database systems like Oracle, Informix and Sybase [7]. Bitmap indexing was first introduced by O Neil for the Model 204 DBMS [11]. Since then, several improvements have been proposed [12, 5, 15, 13, 2]. A major advantage of bitmap indexing is that complex bitmap selections can be performed very quickly using bitwise boolean operations such as AND, OR, XOR and NOT. Its small space requirement, especially for low cardinality attributes, is another benefit. In [12], the authors gave a review of simple bitmap indexing and introduced two approaches for encoding the bitmaps. The first method is called projection index which stores a sequence of attribute values in the tuple-id order. The projection index is particularly efficient when query results are the values of the indexed attribute of all tuples that satisfy the query criteria since it is faster to scan the smaller projection index than to scan the full table. Bit-sliced index is the second method. Its organization is somewhat orthogonal to the structure of projection index. Each bit-slice holds only bits from a single position of the encoding of the attribute value. For example, if k bits are required to encode all possible attribute values (that is, the number of all possible attribute values can be up to 2 k ), then number of bit-slices is k. Wu and Buchmann extended the features of bit-slice in [15] and called it encodedbitmapindexing. Encoded bitmap indexing has the same space requirement as bit-slice index but it adds the flexibility on how 2
3 to encode the attribute values. Encoded bitmap has a separate mapping table that contains mapping from each attribute value to a unique k bit vector. Each bit in this vector corresponds to a bitmap. The mapping function can be adjusted so that the number of bitmaps accessed can be minimized in some queries. Bit-slice is a special encoding bitmap indexing where the mapping function is the a mapping from attribute value to its own binary. Chan and Ioannidis generalized the bitmap encoding by using two-dimensional framework: bitmap index decomposition, bitmap encoding scheme [4]. Bitmap index decomposition determines how to relate collection of bitmaps to a set of attribute values. For example, in a simple bitmap scheme, it is a one-to-one mapping from bitmaps to attribute values. The number of bitmaps required equals to the cardinality of the attribute. In a better space economical schemes, bitmaps can be divided into groups. Each bitmap in a group determines a unique group value. The combination of group values of all groups is used to identify a particular attribute value. Bitmap encoding scheme decides which bit(s) in bit vector should be set to 1. In equality encoding scheme, only bitmap(s) that corresponds to the attribute value is set. The same authors also introduced new encoding schemes range encoding and interval encoding [4, 5]. Each of these encoding schemes is suitable for different types of queries. Another effort to reduce the space-requirement of bitmap indexing is through compression. Each bitmap is compressed separately. The compressed bitmaps are generally much smaller than the uncompressed ones, especially for sparse bitmaps. The typical compression method is to convert the code into a run length encoding (RLE), which keeps distances between adjacent set bits ( 1 bits in this case). Other compression algorithms can also be used, for example: gzip/zlib [8], byte-aligned bitmap code (BBC) [2] or word-aligned hybrid code (WAH) [14]. The main goal of compression for bitmap indices is to reduce the size of bitmaps as much as possible (because it means faster disk scan) but at the same time maintain fast logical operations, the main strength of bitmap indexing. Specifically designed algorithm like BBC allows the logical operations to perform directly against the compressed bitmaps, therefore its speed is the main advantage. [9, 1, 14] discussed the performance of compressed bitmaps. Most of the bitmap indices were designed for the low cardinality discrete attributes. However, bitmap index can also be applied to non-discrete attributes which are common in scientific communities [13]. However, it is impractical to assign a bitmap to each possible value of a typical floating point variable. An approach to solve this problem is to assign a bitmap to a set of attribute values. [13] proposed a method for continuous attributes. The attribute domain is partitioned uniformly and each interval is covered by a bitmap. A related task of finding an optimal partition with respect to data and query distributions is proposed in [10]. 3 Problem Definition For an attribute A of a table T of size n, bitmap indexing is a set of k bitmaps. The value of k is determined by the cardinality of A, the encoding scheme, the decomposition and the domain partitioning. Each bitmap B is a string of n bit, b 1,b 2,...,b n.bitb i corresponds to the value of A in row i. The attribute value and the encoding scheme determine if a bit is set ( 1 ) or reset ( 0 ). For simplicity, we will focus only on the indexed attribute A and ignore the remaining attributes. With this in mind, the record value implies the attribute value. Also, the algorithm will be presented in the simple bitmap context but it generally can be applied to other bitmap schemes. Let us discuss other concepts related to the domain partitioning. The attribute domain X is the set of all possible attribute values. We assume that the attribute is at least in ordinal scale. We denote x min and x max the lowest and highest values. X canbeexpressedinintervalform of {x min,x max },where {, } can be substituted by either close interval [, ] or open interval (, ). The domain partitioning problem is to partition {x min,x max } into k disjoint intervals I i, 1 i k. Let {s i,e i } denote the interval I i, which starts at s i and ends at e i. During a query q is processed, the system identifies the range of the attribute values that needs to be accessed. All the bitmaps whose interval intersects the query range will be examined. However, some records associated with set bits may store the value that is outside the query range. For example, bitmap B i covers interval [40, 50] and a query which extracts all records with A = 42. 3
4 For every set bit in B i, the value of corresponding record could be anything from 40 to 50. Thus those records must be checked. This stage is referred to as candidate check. Wemodelthe candidate check cost to be the number of records covered by the accessed bitmap. The candidate check cost typically dominates the query time, in some cases it consumes more than 80% of the total I/O [13]. The choice of the bitmap partition affects the performance of the index because it decides how many records each bitmap covers. The basic idea is that frequent accessed bitmaps should have relatively few records, and thus lower candidate check cost. In the next two sections, we present the methods for finding partition with minimal candidate check cost in different situations. The query pattern also has influence on how the partition should be made. There are several types of queries. An equality query seeks for records that contain a certain value only. A query range is specified in a range query, e.g. to retrieve any records that contain anything from 20 to Partitioning without Query Training Set In [13], a straightforward partitioning was discussed. The attribute domain is partitioned uniformly so that interval of each bitmap is of the same size (the size is the number of unique attribute values for discrete attributes, or the length of interval for continuous attributes), we refer to this method as equal-interval partitioning. We can formulate the partition as: X = I 1 I 2... I k where I i = {s i,e i },e i s i = xmax xmin k for all i s, s 1 = x min,e k = x max,s i+1 = e i, 1 i<k 1 and I i I j = if i j. One variable of equal-interval partitioning is to use the range of data in the T to determine the interval of each bitmap (except the first and the last which must extend to x min and x max ). This is more efficient when the data distribution spreads over a small portion of attribute domain. Let D = {d 1,d 2,...,d n } denote the data set in table T and d min,d max denote the entries with minimum and maximum values. The size of the intervals (except I 1 and I k ) changes to dmax dmin k. It is easy to find intervals in the first form of equal-interval partitioning since x min and x max are known. For the second form, the table needs to be scanned to obtain d min and d max. In most of the real world situations, the data distribution is not uniform. In some cases, the query distribution Q resembles the data distribution. In this paper, we propose a partitioning method, called equal-density partitioning. Equal-density partitioning is optimal when the data (D)and query come from the same distribution. Also, in the case that query distribution is uniform its candidate check cost is the same as that of equal-interval partitioning. In equal-density partitioning, the attribute domain is partitioned such that the number of records (or density) that fall in each interval is roughly equal. In other words, the number of set bits in each bitmap is about the same. Sometimes, it is not possible to partition such that the density of each bitmap is identical because the n is indivisible by k or because there are duplicate values around partition points (all records with the same value must fall in the same bitmap s interval). We use the criteria min( i j ( I i I j ) 2 )where I i is the number of data records whose value that falls into the interval of bitmap i. Fig. 1 shows the equal-interval and equal-density partitionings. Equal-density partitioning is sensitive to the data set. The interval is smaller, e.g. interval [24,25] in the figure, when point density is dense. We will now show that the bitmap indexing obtaining from equal-density partitioning has the minimal candidate check cost when both data and query distributions are the same. Lemma 4.1 Given data set D and a query training set Q both are drawn from the same distribution, the minimal candidate check cost can be obtained when the intervals of all bitmaps have thesamedensity. Proof sketch: Assume that n is divisible by k. We would like to assign n k data to each bitmap. Let p i be the fraction of data records falling in the interval I i of the bitmap B i. Therefore p i = 1 k 4
5 [0,10) [10,20) [20,30) [30,40) [40,50) [50,60] Equal interval partitioning [0,15] (15,24) (25,32) [32,38) [38,60] Equal density partitioning Figure 1: Equal-interval and equal density interval partitionings for all i s because each interval has the same record density. Since the distribution of Q is the same as D, the probability q i that a query falls in I i is q i = p i = 1 k for all i s. Each query that falls in I i, the candidate check cost is the number of records associated with B i,whichisnp i. For simplicity, a constant factor n factor is removed. The expected candidate check cost is: p i q i = (p i ) 2 = ( 1 k )2 = 1 k. (1) We claim that this expected candidate check cost is the minimum. For any other set of intervals that p i 1 k for some i s, we can rewrite p i as p i = 1 k + δ i,whereδ i 0. Since k p i =1and k δ i = 0, the expected candidate cost is: p i q i = = (p i ) 2 = ( 1 k + δ i) 2 (( 1 k )2 + 2δ i k +(δ i) 2 ). (2) The first term in Eq. 2 is the same as Eq. 1, the second term k 2δ i k = 2 k k δ i =0andthe last term k (δ i) 2 > 0 because some δ i is not equal to 0. Therefore the expected candidate check cost of Eq. 2 is greater than that of Eq. 1 and that completes the proof. Moreover, the equal-density partitioning has the same expected candidate check cost as equal-interval partitioning when the query distribution is uniform. We will show this in the next lemma. Lemma 4.2 Given data set D and uniform query distribution, the expected candidate cost of bitmap indexing obtained from either equal-density or equal-interval partitioning method is identical. Proof sketch: We first formulate the expect candidate check cost for uniform query distribution with equal-density partitioning: p i q i = q i k = 1 k q i. (3) Since k q i = 1, the candidate check cost is 1 k. For equal-interval partitioning, the candidate check cost can be derived using the same method as in Eq. 3, but now q i = 1 k for all i s and 5
6 k p i =1: p i q i = p i k = 1 k. (4) Therefore, equal-density and equal-interval partitioning methods result in the same expected candidate check cost. However, in this case, both equal-density and equal-interval partitioning methods may not yield the minimal candidate check cost. For example, if the data distribution contains few tightly clusters that are far away from one another. The partition that yields the minimal candidate check cost is the one with interval snugly contain each cluster. But this situation seldom occurs in the real world. 5 Partitioning with Query Training Set The overall efficiency of each bitmap partition also depends on the queries. For example, if it is known that most of the queries are in a specific range, such range should be partitioned into intervals whose size is smaller than average. This reduces overall candidate check cost. In this section, we consider the situation when a query training set Q, which reflects the actual query distribution, is present during the partitioning. The size of Q affects the performance and the quality of the partitioning. Basically Q should be as compact as possible but still capture major characteristic of the query distribution. For large data table, the size of Q is usually much smaller than the size of D. In this section, we extend a partitioning algorithm based on dynamic programming technique presented by Koudas in [10]. We now briefly discuss the original algorithm. It is designed for large cardinality discrete domain and supports only equality queries. Let p x and q x denote the number of records and queries that contain attribute value x respectively. The goal of the algorithm is to create a partition that minimizes the number of all false hits based on sets of p x and q x, for all x in attribute domain. The number of false hits F i associated with the interval of bitmap B i can be defined as: F i = e i q j j=s i s i k e iandk j And the number of all false hits is F = k F i. Since its attribute is discrete, the cardinality m is finite. Dynamic algorithm technique is used to efficiently find the partition with k intervals that minimizes the number of all false hits. Suppose that the attribute domain is embedded in a horizontal line and the attribute values are sorted from left to right. We can state our problem of finding optimal k 1 partition points in a recursive form. First, find the optimal split point, split1, that partitions the attribute domain into an interval on the left [x min,split1) and a collection of k 1 intervals on the right (split1,x max ]: split1 =findoptimalsplit(x min,x max,d,q,s)wheres is the set of split point candidates (that is, set of attribute domain). We split the range on the right side: split2 = findoptimalsplit(split1,x max,d D 1,Q Q 1,S S 1 )whered 1, Q 1 and S 1 are data records, queries and split point candidates that are in the first (leftmost) interval. The range on the right side is recursively split until k intervals are found. A naive recursive algorithm tests all possible sets of k unique values and reports the one with the lowest number of all false hits. This method requires O(m k ) time. Dynamic programming technique can significantly reduce the run time to O(m 2 k) by precomputing the solutions of smaller subproblems first, these solutions could be reused when the algorithm finds a solution of a larger subproblem. The detail of dynamic programming technique for partitioning the attribute domain was presented in [10]. Our algorithm has slightly different criteria, it uses the candidate check cost instead of false hits. The candidate check cost includes both true hits and false hits. Obviously one can use either criteria and having the same partition since the number of true hits remains constant for p k, 6
7 a particular query. Candidate check cost gives a closer resemblance to the performance since it is proportional to the number of accessed records. Theoretically, the cardinality of the continuous attribute is infinite. But since these attributes are typically stored as fixed-length variables, the cardinality of the stored data is very large but finite. Even with dynamic programming technique, it is inefficient to consider every unique value in function findoptimalsplit(). We notice that any attribute value x with p x = q x =0 can be removed from the set S. This is because the candidate check cost depends only on p x and q x. With this reduction, the finding optimal partitioning can be significantly faster than the original algorithm in [10] if D and Q do not uniformly spread over the attribute domain which is quite common for both continuous and large discrete attributes. We can further reduce the size of S and improve the speed of the algorithm by removing any attribute value x with q x =0fromS. We will later prove that there exists an optimal partition that does not use such value as a split point. However, we require 2 split point candidates for each value x with q x 0. This is because value x can be assigned to an interval either on the left side of x (the left one has closed end) or on the interval on the right side of x (the right one has closed start). Fig. 2 illustrates this concept and the markers x and x +. The set S can now be expressed as S = {x,x + q x 0}. As discussed above, continuous attributes must be discretized before stored in the system. Hence, for actual implementations, we can use x to represent x and the smallest value that is greater than x to represent x +.Iftwoqueryvalues x, y are adjacent and x<y, x + and y can be combined and represented by y. We now give the proof showing that the set S = {x,x + q x 0} is sufficient for the algorithm to find an optimal partition. [a,x) x [x,b] [a,x] x+ (x,b] a x b Figure 2: Two split points for x, x and x +. Lemma 5.1 Given data set D, query training set Q and set of split point candidates S = {x,x + q x 0}, it is possible the find a partition I that has minimum candidate check cost with respect to D and Q. Proof sketch: Suppose that there is a partition I that has a split point s S, we show that we can move the split point from s to a nearby point s S without additional candidate check cost. Let s l,s r S be the nearest points on the left and right side of s. Since s is a split point, we can define I l and I r to be intervals on the left and right side of s and p Il, q Il be the numbers of data records and queries that fall in interval I l (similar definitions apply to I r ). The candidate check cost of I l + I r is p Il q Il + p Ir q Ir. (5) If q Il > q Ir, the split point can be moved to s + l. Let I l and I r be new intervals on the left and right side of s + l. The candidate check cost of I l + I r is p I l q I l + p I r q I r. (6) Since there is no query point between s + l and s, q I l = q Il and q I r = q Ir.Buttheremight be some c data records in interval (s l,s ), therefore p I l + c = p Il and p I r c = p Ir for some c 0. (6) can be rewritten as: ( p Il c) q Il +( p Ir + c) q Ir, (7) 7
8 which is not greater than (5). Therefore split points can be moved from s to s + l without any increment on candidate check cost. Similar argument also applies in the case that q Il < q Ir, the split point can be moved from s to s r without additional candidate check cost. If q I l = q Ir, the candidate check cost is the same for split point s, s + l, s r. The candidate check cost for equality queries Ci E associated with the interval of bitmap B i is Ci E = q Ii p Ii, and the overall candidate check cost for equality query is the sum of Ci E over all intervals. Up till this point, the queries in the training set are equality queries. We now extend the algorithm for range queries. For range query q =[s, e], any data values which fall in intervals that intersect [s, e] can match the query and are subjected to candidate check. However, if the query range fully covers an interval, all data records in the bitmap associated with such interval are true hits. Hence the candidate check for those fully covered intervals are not required. The candidate check cost for range query Ci R of interval I i is then: C R i = w i p Ii, where w i is the number of range queries whose range intersects but does not contain interval I i. Notice that the candidate check cost changes only when the split points are at certain values: start and end points of query ranges. If [s, e] is a query range, s and e + are added into set S of split point candidates. s + needs to be added into set S only if s is an end point of another query range or if p s 0 (there is at least a data record with value s). Placing a splitting point at s + never gives a better candidate check cost than at s with respect to this particular range query since the bitmap with the interval ending at s ( {..., s] ) needs to be scanned. The same analogy can be applied when adding e into set S. 6 Conclusions In this paper, we presented the two techniques for finding the partition for the bitmap indexing. The first is based on the assumption that the query and data distributions are the same. Basically, the attribute domain is divided so that each interval has equal density of data records. We proved that the candidate check cost of the bitmap indexing with equal-density interval is minimal. In addition, we showed that the candidate check cost of equal-density method is the same as previously proposed equal-interval method. The second partitioning technique is based on an existing dynamic programming technique. We extended the technique so that it can be used for continuous attributes and it accepts range queries which are common in OLAP applications. Also we reduced the size of the split point candidate set without compromising the optimality, hence faster execution time of the algorithm. It is interesting to see how much speed up the bitmap indexing generated by these partitioning techniques can achieve. In future work, we plan to conduct the experiments on query time comparison between bitmaps generated by our methods and the conventional method. Also we would like to extend this work to more general multi-dimensional range queries. References [1] S. Amer-Yahia and T. Johnson. Optimizing queries on compressed bitmaps. In Proc. 26th Int. Conf. Very Large Data Bases (VLDB), pages , [2] G. Antoshenkov. Byte-aligned bitmap compression. Technical report, Oracle Corp., [3] R. Bayer and E. McCreight. Organization and maintenance of large ordered indices. Acta Informatica, 1(3): , [4] C.-Y. Chan and Y. Ioannidis. Bitmap index design and evaluation. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages ,
9 [5] C.-Y. Chan and Y. Ioannidis. An efficient bitmap encoding scheme for selection queries. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages , [6] D. Comer. The ubiquitous b-tree. ACM Computing Surveys, 11(2): , [7] H. Edelstein. Faster data warehouses. Information Week, pages 77 88, December [8] J.-L. Gailly and M. Adler. Zlib home page. [9] T. Johnson. Performance measurements of compressed bitmap indice. In Proc. 25th Int. Conf. Very Large Data Bases (VLDB), pages , [10] N. Koudas. Space efficient bitmap indexing. In Conf. Information and Knowledge Management, pages , [11] P. O Neil. Model 204 architecture and performance. In Int. Workshop on High Performance Transactions Systems, pages 40 59, [12] P. O Neil and D. Quass. Improved query performance with variant indexes. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 38 49, [13] K. Stockinger. Design and implementation of bitmap indices for scientific data. In Int. Database Engineering and Application Sympos., pages 47 57, [14] K. Wu, E. Otoo, and A. Shoshani. Compressing bitmap indexes for faster search operations. In Proc. 14th Int. Conf. Scientific and Statistical Database Management, pages , [15] M.-C. Wu and A. Buchmann. Encoding bitmap indexing for data warehouses. In Proc. 14th IEEE Int. Conf. Data Engineering, pages ,
Analysis of Basic Data Reordering Techniques
Analysis of Basic Data Reordering Techniques Tan Apaydin 1, Ali Şaman Tosun 2, and Hakan Ferhatosmanoglu 1 1 The Ohio State University, Computer Science and Engineering apaydin,hakan@cse.ohio-state.edu
More informationData Compression for Bitmap Indexes. Y. Chen
Data Compression for Bitmap Indexes Y. Chen Abstract Compression Ratio (CR) and Logical Operation Time (LOT) are two major measures of the efficiency of bitmap indexing. Previous works by [5, 9, 10, 11]
More informationBitmap Indices for Fast End-User Physics Analysis in ROOT
Bitmap Indices for Fast End-User Physics Analysis in ROOT Kurt Stockinger a, Kesheng Wu a, Rene Brun b, Philippe Canal c a Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA b European Organization
More informationBitmap Indices for Speeding Up High-Dimensional Data Analysis
Bitmap Indices for Speeding Up High-Dimensional Data Analysis Kurt Stockinger CERN, European Organization for Nuclear Research CH-1211 Geneva, Switzerland Institute for Computer Science and Business Informatics
More informationStrategies for Processing ad hoc Queries on Large Data Warehouses
Strategies for Processing ad hoc Queries on Large Data Warehouses Kurt Stockinger CERN Geneva, Switzerland Kurt.Stockinger@cern.ch Kesheng Wu Lawrence Berkeley Nat l Lab Berkeley, CA, USA KWu@lbl.gov Arie
More informationMinimizing I/O Costs of Multi-Dimensional Queries with Bitmap Indices
Minimizing I/O Costs of Multi-Dimensional Queries with Bitmap Indices Doron Rotem, Kurt Stockinger and Kesheng Wu Computational Research Division Lawrence Berkeley National Laboratory University of California
More informationLawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory
Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Title Breaking the Curse of Cardinality on Bitmap Indexes Permalink https://escholarship.org/uc/item/5v921692 Author Wu, Kesheng
More informationImproving the Performance of High-Energy Physics Analysis through Bitmap Indices
Improving the Performance of High-Energy Physics Analysis through Bitmap Indices Kurt Stockinger,2, Dirk Duellmann, Wolfgang Hoschek,3, and Erich Schikuta 2 CERN IT Division, European Organization for
More informationHistogram-Aware Sorting for Enhanced Word-Aligned Compress
Histogram-Aware Sorting for Enhanced Word-Aligned Compression in Bitmap Indexes 1- University of New Brunswick, Saint John 2- Université du Québec at Montréal (UQAM) October 23, 2008 Bitmap indexes SELECT
More informationAnnouncement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17
Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa
More informationKeynote: About Bitmap Indexes
The ifth International Conference on Advances in Databases, Knowledge, and Data Applications January 27 - ebruary, 23 - Seville, Spain Keynote: About Bitmap Indexes Andreas Schmidt (andreas.schmidt@kit.edu)
More informationEfficient Iceberg Query Evaluation on Multiple Attributes using Set Representation
Efficient Iceberg Query Evaluation on Multiple Attributes using Set Representation V. Chandra Shekhar Rao 1 and P. Sammulal 2 1 Research Scalar, JNTUH, Associate Professor of Computer Science & Engg.,
More informationSorting Improves Bitmap Indexes
Joint work (presented at BDA 08 and DOLAP 08) with Daniel Lemire and Kamel Aouiche, UQAM. December 4, 2008 Database Indexes Databases use precomputed indexes (auxiliary data structures) to speed processing.
More informationProcessing of Very Large Data
Processing of Very Large Data Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first
More informationIntroduction to Spatial Database Systems
Introduction to Spatial Database Systems by Cyrus Shahabi from Ralf Hart Hartmut Guting s VLDB Journal v3, n4, October 1994 Data Structures & Algorithms 1. Implementation of spatial algebra in an integrated
More informationChapter 12: Query Processing
Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation
More informationParallel, In Situ Indexing for Data-intensive Computing. Introduction
FastQuery - LDAV /24/ Parallel, In Situ Indexing for Data-intensive Computing October 24, 2 Jinoh Kim, Hasan Abbasi, Luis Chacon, Ciprian Docan, Scott Klasky, Qing Liu, Norbert Podhorszki, Arie Shoshani,
More informationModule 4: Tree-Structured Indexing
Module 4: Tree-Structured Indexing Module Outline 4.1 B + trees 4.2 Structure of B + trees 4.3 Operations on B + trees 4.4 Extensions 4.5 Generalized Access Path 4.6 ORACLE Clusters Web Forms Transaction
More informationEnhancing Bitmap Indices
Enhancing Bitmap Indices Guadalupe Canahuate The Ohio State University Advisor: Hakan Ferhatosmanoglu Introduction n Bitmap indices: Widely used in data warehouses and read-only domains Implemented in
More informationDATA WAREHOUSING II. CS121: Relational Databases Fall 2017 Lecture 23
DATA WAREHOUSING II CS121: Relational Databases Fall 2017 Lecture 23 Last Time: Data Warehousing 2 Last time introduced the topic of decision support systems (DSS) and data warehousing Very large DBs used
More informationChapter 12: Query Processing. Chapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join
More informationModule 9: Selectivity Estimation
Module 9: Selectivity Estimation Module Outline 9.1 Query Cost and Selectivity Estimation 9.2 Database profiles 9.3 Sampling 9.4 Statistics maintained by commercial DBMS Web Forms Transaction Manager Lock
More informationBinary Encoded Attribute-Pairing Technique for Database Compression
Binary Encoded Attribute-Pairing Technique for Database Compression Akanksha Baid and Swetha Krishnan Computer Sciences Department University of Wisconsin, Madison baid,swetha@cs.wisc.edu Abstract Data
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:
More informationChapter 5. Indexing for DWH
Chapter 5. Indexing for DWH D1 Facts D2 Prof. Bayer, DWH, Ch.5, SS 2000 1 dimension Time with composite key K1 according to hierarchy key K1 = (year int, month int, day int) dimension Region with composite
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationSA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases
SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases Jinlong Wang, Congfu Xu, Hongwei Dan, and Yunhe Pan Institute of Artificial Intelligence, Zhejiang University Hangzhou, 310027,
More informationLecture 3 February 9, 2010
6.851: Advanced Data Structures Spring 2010 Dr. André Schulz Lecture 3 February 9, 2010 Scribe: Jacob Steinhardt and Greg Brockman 1 Overview In the last lecture we continued to study binary search trees
More informationChapter 12: Indexing and Hashing. Basic Concepts
Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationApproximate Encoding for Direct Access and Query Processing over Compressed Bitmaps
Approximate Encoding for Direct Access and Query Processing over Compressed Bitmaps Tan Apaydin The Ohio State University apaydin@cse.ohio-state.edu Hakan Ferhatosmanoglu The Ohio State University hakan@cse.ohio-state.edu
More informationSummary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week:
Summary Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Last week: Logical Model: Cubes,
More informationMulti-resolution Bitmap Indexes for Scientific Data
Multi-resolution Bitmap Indexes for Scientific Data RISHI RAKESH SINHA and MARIANNE WINSLETT University of Illinois at Urbana-Champaign The unique characteristics of scientific data and queries cause traditional
More informationIndexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel
Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes
More informationData Warehousing & Data Mining
Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Summary Last week: Logical Model: Cubes,
More informationCMSC 754 Computational Geometry 1
CMSC 754 Computational Geometry 1 David M. Mount Department of Computer Science University of Maryland Fall 2005 1 Copyright, David M. Mount, 2005, Dept. of Computer Science, University of Maryland, College
More informationQuery Processing & Optimization
Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction
More informationBuilding Intelligent Learning Database Systems
Building Intelligent Learning Database Systems 1. Intelligent Learning Database Systems: A Definition (Wu 1995, Wu 2000) 2. Induction: Mining Knowledge from Data Decision tree construction (ID3 and C4.5)
More informationHash-Based Indexing 165
Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19
More informationDepartment of Industrial Engineering. Sharif University of Technology. Operational and enterprises systems. Exciting directions in systems
Department of Industrial Engineering Sharif University of Technology Session# 9 Contents: The role of managers in Information Technology (IT) Organizational Issues Information Technology Operational and
More informationParameterized graph separation problems
Parameterized graph separation problems Dániel Marx Department of Computer Science and Information Theory, Budapest University of Technology and Economics Budapest, H-1521, Hungary, dmarx@cs.bme.hu Abstract.
More informationChapter 12: Indexing and Hashing
Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL
More informationChapter 13: Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationThe Encoding Complexity of Network Coding
The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture II: Indexing Part I of this course Indexing 3 Database File Organization and Indexing Remember: Database tables
More informationInternational Journal of Modern Trends in Engineering and Research. A Survey on Iceberg Query Evaluation strategies
International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 A Survey on Iceberg Query Evaluation strategies Kale Sarika Prakash 1, P.M.J.Prathap
More informationDatabase System Concepts
Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth
More informationMining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams
Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06
More information2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.
Code No: M0502/R05 Set No. 1 1. (a) Explain data mining as a step in the process of knowledge discovery. (b) Differentiate operational database systems and data warehousing. [8+8] 2. (a) Briefly discuss
More informationMulti-Stack Boundary Labeling Problems
Multi-Stack Boundary Labeling Problems Michael A. Bekos 1, Michael Kaufmann 2, Katerina Potika 1, and Antonios Symvonis 1 1 National Technical University of Athens, School of Applied Mathematical & Physical
More informationChapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join
More informationCS614 - Data Warehousing - Midterm Papers Solved MCQ(S) (1 TO 22 Lectures)
CS614- Data Warehousing Solved MCQ(S) From Midterm Papers (1 TO 22 Lectures) BY Arslan Arshad Nov 21,2016 BS110401050 BS110401050@vu.edu.pk Arslan.arshad01@gmail.com AKMP01 CS614 - Data Warehousing - Midterm
More informationPathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data
PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data Enhua Jiao, Tok Wang Ling, Chee-Yong Chan School of Computing, National University of Singapore {jiaoenhu,lingtw,chancy}@comp.nus.edu.sg
More informationEvaluation of Relational Operations
Evaluation of Relational Operations Chapter 14 Comp 521 Files and Databases Fall 2010 1 Relational Operations We will consider in more detail how to implement: Selection ( ) Selects a subset of rows from
More informationCompressing Bitmap Indexes for Faster Search Operations
LBNL-49627 Compressing Bitmap Indexes for Faster Search Operations Kesheng Wu, Ekow J. Otoo and Arie Shoshani Lawrence Berkeley National Laboratory Berkeley, CA 94720, USA Email: fkwu, ejotoo, ashoshanig@lbl.gov
More informationData Warehousing (Special Indexing Techniques)
Data Warehousing (Special Indexing Techniques) Naveed Iqbal, Assistant Professor NUCES, Islamabad Campus (Lecture Slides Weeks # 13&14) Special Index Structures Inverted index Bitmap index Cluster index
More informationData Access Paths for Frequent Itemsets Discovery
Data Access Paths for Frequent Itemsets Discovery Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science {marekw, mzakrz}@cs.put.poznan.pl Abstract. A number
More informationTo prove something about all Boolean expressions, we will need the following induction principle: Axiom 7.1 (Induction over Boolean expressions):
CS 70 Discrete Mathematics for CS Fall 2003 Wagner Lecture 7 This lecture returns to the topic of propositional logic. Whereas in Lecture 1 we studied this topic as a way of understanding proper reasoning
More informationBitmap Index Design and Evaluation
Bitmap Index Design and Evaluation Chee-Yong Chan Department of Computer Sciences University of Wisconsin-Madison cychan@cs.wisc.edu Yannis E. Ioannidis y Department of Computer Sciences University of
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores Announcements Shumo office hours change See website for details HW2 due next Thurs
More information! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationChapter 13: Query Processing Basic Steps in Query Processing
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationBenchmarking a B-tree compression method
Benchmarking a B-tree compression method Filip Křižka, Michal Krátký, and Radim Bača Department of Computer Science, Technical University of Ostrava, Czech Republic {filip.krizka,michal.kratky,radim.baca}@vsb.cz
More informationA Real Time GIS Approximation Approach for Multiphase Spatial Query Processing Using Hierarchical-Partitioned-Indexing Technique
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 6 ISSN : 2456-3307 A Real Time GIS Approximation Approach for Multiphase
More informationSpecial Issue of IJCIM Proceedings of the
Special Issue of IJCIM Proceedings of the Eigh th-.. '' '. Jnte ' '... : matio'....' ' nal'.. '.. -... ;p ~--.' :'.... :... ej.lci! -1'--: "'..f(~aa, D-.,.,a...l~ OR elattmng tot.~av-~e-ijajil:u. ~~ Pta~.,
More informationINTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY
[Agrawal, 2(4): April, 2013] ISSN: 2277-9655 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY An Horizontal Aggregation Approach for Preparation of Data Sets in Data Mining Mayur
More informationMax-Count Aggregation Estimation for Moving Points
Max-Count Aggregation Estimation for Moving Points Yi Chen Peter Revesz Dept. of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA Abstract Many interesting problems
More informationCS122 Lecture 15 Winter Term,
CS122 Lecture 15 Winter Term, 2014-2015 2 Index Op)miza)ons So far, only discussed implementing relational algebra operations to directly access heap Biles Indexes present an alternate access path for
More informationCore Membership Computation for Succinct Representations of Coalitional Games
Core Membership Computation for Succinct Representations of Coalitional Games Xi Alice Gao May 11, 2009 Abstract In this paper, I compare and contrast two formal results on the computational complexity
More information16 Greedy Algorithms
16 Greedy Algorithms Optimization algorithms typically go through a sequence of steps, with a set of choices at each For many optimization problems, using dynamic programming to determine the best choices
More informationComputer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14
Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14 Scan Converting Lines, Circles and Ellipses Hello everybody, welcome again
More informationSummary of Last Chapter. Course Content. Chapter 3 Objectives. Chapter 3: Data Preprocessing. Dr. Osmar R. Zaïane. University of Alberta 4
Principles of Knowledge Discovery in Data Fall 2004 Chapter 3: Data Preprocessing Dr. Osmar R. Zaïane University of Alberta Summary of Last Chapter What is a data warehouse and what is it for? What is
More information3 No-Wait Job Shops with Variable Processing Times
3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select
More informationAll About Bitmap Indexes... And Sorting Them
http://www.daniel-lemire.com/ Joint work (presented at BDA 08 and DOLAP 08) with Owen Kaser (UNB) and Kamel Aouiche (post-doc). February 12, 2009 Database Indexes Databases use precomputed indexes (auxiliary
More informationATYPICAL RELATIONAL QUERY OPTIMIZER
14 ATYPICAL RELATIONAL QUERY OPTIMIZER Life is what happens while you re busy making other plans. John Lennon In this chapter, we present a typical relational query optimizer in detail. We begin by discussing
More information2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006
2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,
More informationData Warehousing Lecture 8. Toon Calders
Data Warehousing Lecture 8 Toon Calders toon.calders@ulb.ac.be 1 Summary How is the data stored? Relational database (ROLAP) Specialized structures (MOLAP) How can we speed up computation? Materialized
More informationInterleaving Schemes on Circulant Graphs with Two Offsets
Interleaving Schemes on Circulant raphs with Two Offsets Aleksandrs Slivkins Department of Computer Science Cornell University Ithaca, NY 14853 slivkins@cs.cornell.edu Jehoshua Bruck Department of Electrical
More informationQuery Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016
Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,
More informationAn Eternal Domination Problem in Grids
Theory and Applications of Graphs Volume Issue 1 Article 2 2017 An Eternal Domination Problem in Grids William Klostermeyer University of North Florida, klostermeyer@hotmail.com Margaret-Ellen Messinger
More informationUsing Bitmap Indexing Technology for Combined Numerical and Text Queries
Using Bitmap Indexing Technology for Combined Numerical and Text Queries Kurt Stockinger John Cieslewicz Kesheng Wu Doron Rotem Arie Shoshani Abstract In this paper, we describe a strategy of using compressed
More informationLeveraging Set Relations in Exact Set Similarity Join
Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,
More informationCS 664 Segmentation. Daniel Huttenlocher
CS 664 Segmentation Daniel Huttenlocher Grouping Perceptual Organization Structural relationships between tokens Parallelism, symmetry, alignment Similarity of token properties Often strong psychophysical
More informationChapter 11: Indexing and Hashing
Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL
More informationDATA WAREHOUING UNIT I
BHARATHIDASAN ENGINEERING COLLEGE NATTRAMAPALLI DEPARTMENT OF COMPUTER SCIENCE SUB CODE & NAME: IT6702/DWDM DEPT: IT Staff Name : N.RAMESH DATA WAREHOUING UNIT I 1. Define data warehouse? NOV/DEC 2009
More informationCSC Advanced Scientific Computing, Fall Numpy
CSC 223 - Advanced Scientific Computing, Fall 2017 Numpy Numpy Numpy (Numerical Python) provides an interface, called an array, to operate on dense data buffers. Numpy arrays are at the core of most Python
More informationQuery Processing with Indexes. Announcements (February 24) Review. CPS 216 Advanced Database Systems
Query Processing with Indexes CPS 216 Advanced Database Systems Announcements (February 24) 2 More reading assignment for next week Buffer management (due next Wednesday) Homework #2 due next Thursday
More informationCarnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Data mining - detailed outline. Problem
Faloutsos & Pavlo 15415/615 Carnegie Mellon Univ. Dept. of Computer Science 15415/615 DB Applications Lecture # 24: Data Warehousing / Data Mining (R&G, ch 25 and 26) Data mining detailed outline Problem
More informationTreewidth and graph minors
Treewidth and graph minors Lectures 9 and 10, December 29, 2011, January 5, 2012 We shall touch upon the theory of Graph Minors by Robertson and Seymour. This theory gives a very general condition under
More informationSpatial Index Keyword Search in Multi- Dimensional Database
Spatial Index Keyword Search in Multi- Dimensional Database Sushma Ahirrao M. E Student, Department of Computer Engineering, GHRIEM, Jalgaon, India ABSTRACT: Nearest neighbor search in multimedia databases
More informationCompression of the Stream Array Data Structure
Compression of the Stream Array Data Structure Radim Bača and Martin Pawlas Department of Computer Science, Technical University of Ostrava Czech Republic {radim.baca,martin.pawlas}@vsb.cz Abstract. In
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)
More informationFile Structures and Indexing
File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures
More informationData Analytics and Boolean Algebras
Data Analytics and Boolean Algebras Hans van Thiel November 28, 2012 c Muitovar 2012 KvK Amsterdam 34350608 Passeerdersstraat 76 1016 XZ Amsterdam The Netherlands T: + 31 20 6247137 E: hthiel@muitovar.com
More informationData Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A
Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 432 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business
More informationTELCOM2125: Network Science and Analysis
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning
More informationStriped Grid Files: An Alternative for Highdimensional
Striped Grid Files: An Alternative for Highdimensional Indexing Thanet Praneenararat 1, Vorapong Suppakitpaisarn 2, Sunchai Pitakchonlasap 1, and Jaruloj Chongstitvatana 1 Department of Mathematics 1,
More informationKathleen Durant PhD Northeastern University CS Indexes
Kathleen Durant PhD Northeastern University CS 3200 Indexes Outline for the day Index definition Types of indexes B+ trees ISAM Hash index Choosing indexed fields Indexes in InnoDB 2 Indexes A typical
More informationHICAMP Bitmap. A Space-Efficient Updatable Bitmap Index for In-Memory Databases! Bo Wang, Heiner Litz, David R. Cheriton Stanford University DAMON 14
HICAMP Bitmap A Space-Efficient Updatable Bitmap Index for In-Memory Databases! Bo Wang, Heiner Litz, David R. Cheriton Stanford University DAMON 14 Database Indexing Databases use precomputed indexes
More information1 Computer arithmetic with unsigned integers
1 Computer arithmetic with unsigned integers All numbers are w-bit unsigned integers unless otherwise noted. A w-bit unsigned integer x can be written out in binary as x x x w 2...x 2 x 1 x 0, where x
More information