Clustered Index Queries. Non-clustered Index Queries. Non-index Queries

Size: px
Start display at page:

Download "Clustered Index Queries. Non-clustered Index Queries. Non-index Queries"

Transcription

1 Query Classication in Multidatabase Systems Banchong Harangsri John Shepherd Anne Ngu School of Computer Science and Engineering, The University of New South Wales, Sydney 2052, AUSTRALIA. Abstract Query optimisation is a signicant unsolved problem in the development of multidatabase systems. The main reason for this is that the query cost functions for the component database systems may not be known to the global query optimiser. In this paper, we describe a method, based on a classical clustering algorithm, for classifying queries which allows us to derive accurate approximations of these query cost functions. The experimental results show that the cost functions derived by the clustering algorithm yield a lower average error as compared to the error produced by a manual classication. Keywords: Cost function derivation, Classication, Query optimisation, Multidatabase systems 1 Introduction Query optimisation in multidatabase systems is fundamentally dierent from distributed query optimisation, for three major reasons [5]: site autonomy, system heterogeneity and semantic heterogeneity. Site autonomy means that the essential information for optimisation, namely cost functions and database statistics, may not be available to the global query optimiser to assist in choosing query execution plans. Clearly, before eective query optimisation is possible in such a system, some means must be found of estimating query costs in the component (or local) database systems. Du et al. [2] were the rst to address this problem. They identied three types of component database systems: proprietary databases, for which cost functions and database statistics are known; conforming databases, which can provide database statistics but not cost functions; non-conforming databases, for which neither cost functions nor database statistics are available. Du et al.'s approach to this problem was to derive coecients (parameters) of the cost functions by using a Proceedings of the 7th Australasian Database Conference, Melbourne, Australia, January 29{ January synthetic database. The main limitations of this approach were: The derivation can be done only with conforming databases. The derivation process requires us to know a priori the access methods employed by the component databases. The synthetic relations used by the calibrating process must always have a eld (attribute) whose values are normally distributed. Recently Zhu and Larson [8] proposed the use of a query sampling method to derive the cost parameters of a local database. Their method has two steps: 1. develop a manual classication of queries based on their access semantics, and derive a cost function for each class 2. sample queries from each class and run them on a real local database (not synthetic) to observe the running times of the queries 3. use multiple linear regression to derive the parameters of the cost function for the local database The manual classication proposed by Zhu and Larson [8] which we call ZL classication basically gives three of select or join queries: clustered index, non-clustered index, and non-index. The main problem with this method is that when it is used for non-conforming databases, the manual classication of queries is not possible to produce the three (since we know nothing about the underlying access methods). Thus all queries are placed in only a single class, with a single cost function which has a relatively high average error of cost estimation. While query classication is clearly important in deriving accurate cost functions, a signicant question is \How many do we need to get accurate cost functions?". The higher number of of queries we have, the more likely the average error over all will be reduced. There

2 are reasons, however, that can prevent us from having the maximum number of. First, the more we have, the more sampling we are required to do. In non-conforming databases, it is reasonable that one will classify queries into two main of select queries, i.e., the equi and non-equi select queries. The maximum number of query would be 2a where a is the number of all attributes in the local database (the value of 2a will be claried again in section 4.3). Suppose the database we are considering has 10 relations each of which has 10 attributes so the total number of attributes is 100. Thus the maximum number of would be = 200. Each class of select queries requires at least 40 queries [8] to make each cost function of the queries in the class accurate enough when bringing them to use on-line. Let us relax the assumption of 2a to be only half of it; that is, we ignore a of equi queries. The total number of queries we need to perform sampling is = 4000 queries. According to our experiments we have done with database sizes of 10,000 up to 25,000 tuples and 5-10 attributes per relation, the non-equi select queries can run on average approximately queries per hour and therefore the time we would need to perform the sampling for the local database would be days. The problem will be considerably worse when we want to perform the sampling over non-equi join queries and the maximum number of of join queries is required!. Thus the sampling process may form a signicant part of the total expense of running the database system, if a large number of are involved. Second, for dynamic local databases, periodically we need to perform a new sampling to update the cost function coecients. Last, the fact that, in general, several query will have similar characteristics in their running time 1, means that grouping them together into the same class would not reduce the accuracy of the cost functions. In this paper, we suggest that the number of k can be based on the number of queries q that we are willing to run in the sample. Whenever the required number of query is less than the maximum, the query classication problem can be formulated as a clustering problem in a large search space. Here, we propose to use a hierarchical clustering algorithm (HCA) [4] to perform classi- cation. Note that our method does not require any a priori knowledge about the local database system, apart from knowing the relational schema, which makes it more widely applicable than the ZL method (which requires us to know the access methods used in the local database). 1 In this paper, we use the elapsed running time of queries as a cost metric which is the same as [8, 2] The rest of the paper is organised as follows: Section 2 describes the model of queries that we use in optimisation and the cost functions on which the global query optimiser is based. In section 3, we examine query classication (1) where we can use a priori knowledge of the local database to classify queries into top-level and (2) where we have no such knowledge but still can classify the queries further. In section 4 we describes how to perform query sampling. The algorithm HCA is explained in section 5 and its experimental results are shown in section 6. An example of how to apply HCA to a local database is given in the appendix. In the last section, we present our conclusions and give some issues for future research. 2 Query Optimisation and Cost Function In a multidatabase environment, each local database system may use dierent kinds of data models but, in this paper, for the global level the relational data model is assumed. That is, each local database is connected to the global multidatabase agent via an interface that provides the relational appearance, even if the participating database is non-relational [8]. In this paper, we adopt the standard treatment of queries that is used in most of the query optimisation work in the literature. A query is regarded as a sequence of select (), project () and join (./) operations. The cost of a query is the sum of costs of those composite operations sequenced in a particular order. The projection operation usually is grouped together with a select or join operation so its cost is computed in conjunction with the cost of the select or join operation. One of the aims of this work is to produce cost functions which can be exploited by a global query optimiser to answer select-project-join queries in conjunctive normal form (CNF) 2. Each predicate of a CNF query is ANDed together to form the whole condition of the query and is of the form R i :a j const or R i :a j R s :b t, where R i :a j is attribute j of relation i, const is a constant value in the domain of attribute j and 2 f=; 6=; >; ; < ; g. Fundamentally, a predicate is either select or join operation. Our proposed method may be used to derive query cost functions for either of the two main of queries (i.e. select and join queries). However, in this paper, we study the application of our method only to the classication of select queries (for the rest of the paper, we use the word \query" to refer to \select query"). Select queries are of the form L ( F (R i )) as in [8], where L is 2 CNF is the most commonly used query form in the optimisation literature, basically because its search space is smaller than its counter form disjunctive normal form.

3 a list of projected attributes of relation R i and F a predicate of the form R i :a j const. Although simple, the select queries in such a form are enough to be employed to compute the cost of any complex CNF select queries on a single relation. Under our scheme, select queries are to be classied into sub where each subclass has its own cost function which is found by a least squared error (LSE) method. Each select query has two independent variables which aect the running time of the query (which is the dependent variable). The number of tuples of the input relation x1 and the output relation x2 are the two independent variables. Basically, x2 is unknown at runtime, but it can be estimated by the selectivity value of the select query, i.e., x2 = selectivity times x1. Therefore what we try to do is to derive one cost function ^t = f(x1; x2) for queries in each subclass, where ^t is the estimate of the real running time t of the query. To account for variations in system load, we ran each query three times and used the average running time as the value for t. 3 Query Classication We propose a query classication scheme that can be used with local databases which: 1. can provide some priori knowledge 2. cannot provide any knowledge For the former, we can use knowledge gained from the applications at hand such as database schema, key information, query type information (for instance, point query, multipoint query, range query, prex match query [6]) etc. to roughly classify all given queries into top-level. For example, the ZL manual classication method as shown in Figure 1 can be considered as knowledgebased approach; queries which have any kinds of clustered indexes are in the rst class, queries with any kinds of non-clustered indexes would be in the second and queries in the last class are the ones which are out of any of the rst two. Q1 Clustered Index Queries Q Q2 Non-clustered Index Queries Q3 Figure 1: ZL Manual Classication Non-index Queries As for the second classication, this assumes no such knowledge is required to assist in classifying queries in a higher-level class into sub. For example, based on the top-level (Q1; Q2; and Q3), the queries in each class can then be classied further into sub based on this classication in order to gain more accurate cost functions. Recall that the more we have, the more accurate the cost functions we obtain. An algorithm HCA described in sections 5 is used to carry out this kind of classication. Basically this classication can be useful in 2 situations: 1. It can be used to enhance the priori knowledge classication. 2. It can be used to properly derive more sub from a higher-level single class when the local database system is non-conforming the system cannot reveal any useful information to help classify queries into the top-level. For the former situation, let us look at an example. In the ZL manual classication which is knowledge-based, we can see that for each query class in the top level, namely Q1; Q2 or Q3, all queries (such as all clustered index queries) would be in the single class whose only one cost function basically yields a high average error as compared to a lower average error produced by multiple of the same queries. In the second situation where the classication without knowledge can help, this is particularly useful for any non-conforming database system. We start o from a single query class Q which contains all queries from Q1; Q2 and Q3 (in contrast to the knowledge-based classication which starts o from a certain number of, 3 in Figure 1). In both the situations (either starting from Q or Q1; Q2 and Q3), based on the number of queries given, the HCA algorithm works out query sub with their cost functions having a low average error. 4 Query Sampling Query sampling we use here is simple random sampling [7] which is the same as the one in [8]. For the purpose of describing a number of parameters, we will explain the sampling method based on the ZL knowledge-based classication. Although the sampling method presented here is for the knowledge-based classication, the method can be applied straightforwardly to non-conforming database systems, where no knowledge about the local database is available. 4.1 Sampling Method Sampling is best explained by Figure 2. The local database in the gure consists of 7 relations. In query set Q1 (Figure 2(a)), relation R1:a3; R2:b1, and so on are all clustered index attributes, whereas

4 Q1 = Clustered Index Queries R1 R2 R3 R4 R5 R6 R7 R1.a3 R2.b1 R3.h1 R4.d1 R5.e4 R6.f1 R7.g3 {R1.a3=c1} {R2.b1=c2} {R3.h1=c3} {R4.d1=c4} {R5.e4=c5} {R6.f4=c6} {R7.g3=c7} Q11 Q12 Q13 Q14 Q15 Q16 Q17 (a) Sampling Q 1 Q2 = Non-Clustered Index Queries R1 R2 R3 R4 R5 R6 R7 R1.a1 R1.a2 R3.h2 R3.h3 R4.d2 R5.e3 R6.f4 R6.f5 R7.g2 {R1.a1=c1} {R1.a2=c2} {R3.h2=c3}{R3.h3=c4} {R4.d2=c5} {R5.e3=c6} {R6.f4=c7} {R6.f5=c8} {R7.g2=c9} Q21 Q22 Q23 Q24 Q25 Q26 Q27 Q28 Q29 (b) Sampling Q 2 Q3 = Non-Index Queries R1 R1.a1 R1.a2 R1.a3 R1.a4 R1.a5 R1.an <c1 >c2!=c3 <c4 >c5!=c6 <c7 >c8!=c9 =c10 <c11>c12!=c13 Q31 Q32 Q33 Q34 Q35 Q36 Q37 Q38 Q39 (c) Sampling Q 3 Figure 2: Query Sampling

5 R R.m1 R.m2 R.m3 R.m4 R.mn =c1 Q1 {<,>,!=}c2 =c3 {<,>,!=}c4 =c5 {<,>,!=}c6 =c7 {<,>,!=}c8 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Figure 3: Preliminary clustering of \similar" relational operators in Q2 (Figure 2(b)), R1:a1; R1:a2; R3:h2 and so on are non-clustered index attributes. In Q3 shown in Figure 2(c), we use R1 as a representative for the rest of relations. Note that R1:a1; R1:a2 and R1:a3 are index attributes and therefore while drawing up the queries in Q31; Q32; : : : Q39, we consider only operators f<; >; 6=g 3 since queries that use the \=" operator already appear in sets Q11; Q21; and Q22. Given the number of queries to be sampled, what we would do is to sample queries from an entire query population such that each query is randomly chosen with an equal probability. To clarify the sampling method, consider query set Q1. The average number of queries q in each set Q11; Q12; : : : ; Q17 is computed by: q = q K (1) where q is the total number of queries in Q1 to be sampled and K (=7 in Figure 2(c)) is the maximum number of (see the next section). More details about the average number of queries q when q < K; q = K and q > K are explained in reference [8]. 4.2 Maximum Number of Classes (K) Recall that Q(= Q1 [ Q2 [ Q3) is the entire set of queries to be sampled. The maximum number of for query set Q is: 4a (2) where a is the total number of attributes for all relations in the local database and the constant factor 4 is due to the four dierent relational operators f=; <; >; 6=g. Now let us consider the maximum number of for each individual Q1; Q2 and Q3 (see Figure 2). The maximum number of for Q1 and Q2, respectively, is the number of all clustered and non-clustered index attributes in the database. Suppose K1 is the number of clustered indexes and 3 and are treated similarly to < and >, respectively. K2 is the number of non-clustered indexes. For Q3, the maximum number of K3 is 4a? (K1 + K2). Therefore, K1 + K2 + K3 = 4a. 4.3 Preliminary Clustering Since K is a vital factor in controlling the time and search space in searching for the best clustering of query by the algorithm HCA and generally K is large, here we propose a preliminary clustering method to reduce the value of K and thus reduce both time and search space. Figure 3 helps to clarify the method. The basic idea is to cluster \similar" relational operators together into the same query class. For example, in the gure, one may want to cluster equi select queries together to form one class and non-equi queries to form another perhaps because of the justication that equi select queries should have similar characteristic in their running time and so should the non-equi queries. Note that even though the time and search space can be reduced by clustering some relational operators, the maximum number of is still large; namely, the total number of for query set Q after the preliminary clustering is 2a. Recall that a is the total number of attributes in a local database and the preliminary clustering of queries in Figure 3 is based on 2 relational operators, namely equi (=) and non-equi (<; >; 6=) operators; hence, the total number of is equal to 2a. 4.4 Number of Classes Required (k) The number of queries q users wish to sample is the indication of how many k we need. There are expensive of queries, which are around 3/4 of the maximum number of (4a). Therefore, we can aord to have only a certain number of less than the maximum. To make each cost function accurate enough, we require at least w queries. w is more than or equal to 40 queries for select queries as proposed in [8]. Therefore the number of we need is: k = b q w c (3) where bc is the maximum integer value less than or equal to q=w.

6 Figure 4 Algorithm HCA (k) let k be the number of query required, let O be a set of initial query to be clustered, let C i =C c=c ij be a cluster of initial query, let M(C i ; C j ) be the matrix of average RMS errors of size n n, 1 n joj place each initial query class in O in its own cluster C i where i = 1::jOj num clus joj for each pair of clusters C i ; C j do compute an average error RMS of M(C i ; C j ) endfor while num clus > k do choose two clusters C i ; C j of the least RMS in matrix M update matrix M by grouping C i and C j into a new cluster C ij for each cluster c such that c 6= ij do compute an average error RMS of M(C c; C ij ) endfor num clus num clus? 1 endwhile 5 Hierarchical Clustering Algorithm Hierarchical clustering has been applied successfully in several applications. It may yield suboptimal solutions, but its great advantage is that it has polynomial running time. In any hierarchical clustering algorithm, one is required to dene a matrix of similarity values [4, 1]. Based on such values, two clusters of \entities" which have a highest similarity are grouped together into a new cluster. The semantics of similarity are problem-dependent: it could be Euclidean distance, correlation, and so on [1]. In this case, it is the average error value of root mean squared errors (RMS) of each cost function. P c RMS = i=1 RMS i n i Pc i=1 n (4) i where n i is the number of queries in each class, c the number of, and each RMS i can be dened as: s Pni j=1 RMS i = (t j? ^t j ) 2 (5) n i where t j is a real observed running time of query j and ^t j a running time from a cost function. The HCA algorithm is given in Figure 4. The initial query in O are all query sub such as Q11; Q12; : : : ; Q17 for query set Q1 (see Figure 2(a)) and Q21; Q22; : : : ; Q29 for query set Q2 (see Figure 2(b)). The algorithm starts with each initial query class placed in its own cluster, and, in each iteration, combines two existing clusters into a new cluster. That is, the algorithm starts with joj clusters, then joj? 1, joj? 2... until the number of clusters is equal to the desired number of query k. Table 1 shows how the matrix M is updated from 5 5 to 4 4. Note that M is symmetrical. C 1 C 2 C 3 C 4 C 5 C C C C C (a) Before grouping C 2 and C 5 C 1 C 25 C 3 C 4 C C C C (b) After grouping C 2 and C 5 Table 1: Grouping C2 and C5 6 Experimental Results The main aim of the experiments here is to see how well the HCA algorithm performs in reducing the average error RM S of query cost functions. To do this, we compare the average error from the manual query classication [8] to the average error from a query classication produced by HCA. In each experiment, when the number of queries we fed into algorithm HCA multiply increases, namely 40, 80, 120, : : : and so on, the number of will increment by 1. Table 2 shows three dierent database congurations. The databases have tuples per relation and each relation has 5-10 attributes. The number of queries in each experiment varies from 1200 to 2000 for each individual class of clustered index, non-clustered index and non-index queries. We used around 30% of the queries as the sample set to derive cost functions and the remainder as the test set for measuring an average error yielded by the cost functions. The results for each database conguration are shown in Figures 5, 6 and 7, respectively. The results show a tendency towards decreasing the average error when the number of query increases. To illustrate, let us describe the graphs in Figure 5 as an example. In graph 5(a), the maximum number of is 7 as labelled on the rightmost of the X-axis. This number stems from the total number of clustered index attributes, which is the sum of values of K1's row in Table 2(a). For graph 5(b), the maximum number of is 10, which is the total number of non-clustered indexes in K2's row of Table 2(a). As to graph 5(c), we showed only part of the maximum number of

7 K 1 = number of clustered index attributes K 2 = number of non-clustered index attributes others = any other non-index attributes R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 8 R 9 R 10 tuples K K others (a) Database 1 R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 8 R 9 R 10 R 11 tuples K K others (b) Database 2 R 1 R 2 R 3 R 4 R 5 R 6 R 7 R 8 R 9 R 10 R 11 R 12 tuples K K others (c) Database 3 Table 2: Dierent Database Congurations non-index attribute, i.e., 10 out of 307 = 4 81?(7+10). The solid line, for example in graph 5(a) shows the average errors when having only a single class as compared to the errors of a dashed line with multiple (2-7 in Figure 5(a)). Obviously, in most of the cases of dierent number of query, the HCA algorithm manifests its performance in reducing average error and thus provides better cost estimates as compared to a single class. From 81 cases, 71 cases yielded by HCA give lower average errors whereas only 7 cases give worse errors. The reason that there are 7 cases giving the worse errors could be that the number of queries is not sucient initially but after it reaches a sucient number, then the average errors produced by multiple again become lower than the average errors produced by a single class (see Figure 6(c) for example). 7 Conclusion and Future Research The paper addressed the derivation of query cost functions for multidatabase systems. The following are the contributions of the paper: We propose to use a hierarchical clustering algorithm to perform query classication, which achieves a better performance in reducing average error of query cost estimation than the manual classication. We propose a query classication which can be used with both conforming and nonconforming database systems. Especially for the non-conforming systems, the query classication in these systems has not been tackled successfully before. There are several issues of interest that we plan to investigate further: Extend the current method to use non-linear regression techniques to compare with the linear regression technique we currently use. The reason is that there are cases that the distribution of attribute values may not be uniform and therefore the running times of queries in a class may not be linear. Thus, non-linear regression techniques could be better in nding best-t cost functions. Compare the HCA algorithm with other classication algorithms such as the partitioning algorithm in [3] or the algorithms used in machine learning. Investigate how many queries to be sampled are \sucient" for each query class. Investigate how the cost functions of multiple query derived by HCA aect the choice of query execution plans as compared to the

8 "ZL.cl1.db1" "HCA.cl1.db1" "ZL.cl2.db1" "HCA.cl2.db1" "ZL.cl3.db1" "HCA.cl3.db1" (a) clustered index class (b) non-clustered index class (c) non-index class Figure 5: Database "ZL.cl1.db2" "HCA.cl1.db2" "ZL.cl2.db2" "HCA.cl2.db2" 18 "ZL.cl3.db2" "HCA.cl3.db2" (a) clustered index class (b) non-clustered index class (c) non-index class Figure 6: Database "ZL.cl1.db3" "HCA.cl1.db3" "ZL.cl2.db3" "HCA.cl2.db3" "ZL.cl3.db3" "HCA.cl3.db3" (a) clustered index class (b) non-clustered index class (c) non-index class Figure 7: Database 3

9 three cost functions of each individual class (clustered index, non-clustered index and nonindex) derived by the manual classication. Investigate the use of other cost metrics instead of just the elapsed running time. The HCA algorithm runs in polynomial time to produce cost functions of each query class and this is an advantage when we want to combine this algorithm with a non-linear regression technique which perhaps is slower than the linear regression one in nding best-t cost functions. Acknowledgements We would like to thank Christopher R. Birchenhall from University of Manchester, UK for his superb state-of-the-art C++ matclass package that he made publically avaliable together with his excellent manual. His math class library contains several useful linear LSE functions which help our project, such as SVD, QR, LU decompositions. References [1] M.R. Anderberg. Cluster Analysis for Applications. Academic Press, [2] W. Du, R. Krishnamurthy and M.C. Shan. Query Optimization in Heterogeneous DBMS. In Proceedings of the 18th VLDB Conference, pages 277{291, [3] J.A. Hartigan. Clustering Algorithms. John Wiley & Sons, [4] S.C. Johnson. Hierarchical clustering schemes. Psychometrika, Volume 32, Number 3, pages 241{254, September [5] H. Lu, B.C. Ooi and C.H. Goh. Multidatabase Query Optimization: Issues and Solutions. In Proceedings of Third International Workshop on Research Issues in Data Engineering: Interoperability in Multidatabase Systems, pages 137{143, [6] D.E. Shasha. Database Tuning: A Principled Approach, Chapter 3, pages 53{88. Prentice Hall, Englewood Clis, New Jersy, [7] S.K. Thomson. Sampling. John Wiley & Sons, Inc., Basic and Advanced Sampling Methods. [8] Q. Zhu and P.A. Larson. A Query Sampling Method for Estimating Local Cost Parameters in a Multidatabase System. In Data Engineering, pages 144{153, class query x1 x2 t where r1.a5 = 39138; C1 where r1.a5 = 38464; where r1.a5 = 38828; where r5.e7 = 13006; C2 where r5.e7 = 26025; where r5.e7 = 32182; where r7.g3 = 33075; C3 where r7.g3 = 32120; where r7.g3 = 33262; where r8.h2 = 44688; C4 where r8.h2 = 55941; where r8.h2 = 55410; where r9.i3 = 41119; C5 where r9.i3 = 40610; where r9.i3 = 35895; where r10.j9 = 23224; C6 where r10.j9 = 14760; where r10.j9 = 11924; where r11.k5 = 20329; C7 where r11.k5 = 8263; where r11.k5 = 21263; where r12.l3 = 72124; C8 where r12.l3 = 61785; where r12.l3 = 62874; Table 3: Queries, input and output tuples and running times Appendix Example This appendix is to demonstrate how to apply HCA algorithm to yield the solution of query with a low average error. In database conguration 3 (see table 2(c)), there are 8 clustered indexes, namely, r1.a5, r5.e7, r7.g3, r8.h2, r9.i3, r10.j9, r11.k5 and r12.l3. Each clustered index forms one initial query class; that is, queries which use index r1.a5 are in query class C1, queries which use index r5.e7 are in C2 and so on. Table 3 shows queries 4, the number of tuples of an input relation x1 and of an output relation x2 and their running times, in column 2, 3, 4 and 5 respectively. Note that we show only 3 queries (out of 10) for each query class. The number of total queries we were willing to sample in this experiment is 80 and thus, for each class, divided by 8 (number of clustered indexes), giving 10 queries for each class. Initially, HCA placed 8 initial query in their own clusters as shown in the rst column of table 4. The second, third and fourth column in the table are regression coecients of equation 4 Recall that select queries are of the form L ( F (R i )). To make table 3 concise, we omit to show the lists of projected attributes of queries in the table.

10 cluster f C1 g e e f C2 g e e f C3 g e e f C4 g e e f C5 g e e f C6 g e e f C7 g e e f C8 g e e Table 4: clusters and their regression coecients ^t = f(x1; x2) = 0 +1 x1 +2 x2. By \learning" from the number of input and output tuples x1 and x2 of queries and their running times t, regression formulas (cost functions) are found by the LSE method for each individual cluster. These formulas can then be employed to estimate running times of unseen queries at on-line. In the rst iteration of HCA, query C3 and C8 were merged into the same cluster as shown in table 5. The reason C3 and C8 got merged into the same cluster is that compared with other mergings (such as between C1 and C2, C1 and C3 and so on), the merging of C3 and C8 gave the least average error RM S. In addition, HCA recomputed the coecients of the new cluster comprising C3 and C8. cluster f C1 g e e f C2 g e e f C4 g e e f C5 g e e f C6 g e e f C7 g e e f C3; C8 g e Table 5: clusters and their regression coecients Due to the limit of the length of the paper, we ignore to show the outputs from the second to fth iteration and show only the last iteration 6 in table 6, which yielded the nal solution of applying HCA. Recall that HCA will stop when the number of clusters (numclus) is less than or equal to the number of clusters desired 2 (calculated by q=w = 80=40 = 2). cluster f C1 g e e f C2; C3; C4; e C5; C6; C7; C8 g Table 6: clusters and their regression coecients

highest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate

highest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate Searching Information Servers Based on Customized Proles Technical Report USC-CS-96-636 Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California

More information

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA

International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERA International Journal of Foundations of Computer Science c World Scientic Publishing Company DFT TECHNIQUES FOR SIZE ESTIMATION OF DATABASE JOIN OPERATIONS KAM_IL SARAC, OMER E GEC_IO GLU, AMR EL ABBADI

More information

Rowena Cole and Luigi Barone. Department of Computer Science, The University of Western Australia, Western Australia, 6907

Rowena Cole and Luigi Barone. Department of Computer Science, The University of Western Australia, Western Australia, 6907 The Game of Clustering Rowena Cole and Luigi Barone Department of Computer Science, The University of Western Australia, Western Australia, 697 frowena, luigig@cs.uwa.edu.au Abstract Clustering is a technique

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Routing and Ad-hoc Retrieval with the. Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers. University of Dortmund, Germany.

Routing and Ad-hoc Retrieval with the. Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers. University of Dortmund, Germany. Routing and Ad-hoc Retrieval with the TREC-3 Collection in a Distributed Loosely Federated Environment Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers University of Dortmund, Germany

More information

Factorization with Missing and Noisy Data

Factorization with Missing and Noisy Data Factorization with Missing and Noisy Data Carme Julià, Angel Sappa, Felipe Lumbreras, Joan Serrat, and Antonio López Computer Vision Center and Computer Science Department, Universitat Autònoma de Barcelona,

More information

reasonable to store in a software implementation, it is likely to be a signicant burden in a low-cost hardware implementation. We describe in this pap

reasonable to store in a software implementation, it is likely to be a signicant burden in a low-cost hardware implementation. We describe in this pap Storage-Ecient Finite Field Basis Conversion Burton S. Kaliski Jr. 1 and Yiqun Lisa Yin 2 RSA Laboratories 1 20 Crosby Drive, Bedford, MA 01730. burt@rsa.com 2 2955 Campus Drive, San Mateo, CA 94402. yiqun@rsa.com

More information

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17 Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa

More information

Localization in Graphs. Richardson, TX Azriel Rosenfeld. Center for Automation Research. College Park, MD

Localization in Graphs. Richardson, TX Azriel Rosenfeld. Center for Automation Research. College Park, MD CAR-TR-728 CS-TR-3326 UMIACS-TR-94-92 Samir Khuller Department of Computer Science Institute for Advanced Computer Studies University of Maryland College Park, MD 20742-3255 Localization in Graphs Azriel

More information

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Network. Department of Statistics. University of California, Berkeley. January, Abstract Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,

More information

size, runs an existing induction algorithm on the rst subset to obtain a rst set of rules, and then processes each of the remaining data subsets at a

size, runs an existing induction algorithm on the rst subset to obtain a rst set of rules, and then processes each of the remaining data subsets at a Multi-Layer Incremental Induction Xindong Wu and William H.W. Lo School of Computer Science and Software Ebgineering Monash University 900 Dandenong Road Melbourne, VIC 3145, Australia Email: xindong@computer.org

More information

Dynamic Thresholding for Image Analysis

Dynamic Thresholding for Image Analysis Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British

More information

Networks for Control. California Institute of Technology. Pasadena, CA Abstract

Networks for Control. California Institute of Technology. Pasadena, CA Abstract Learning Fuzzy Rule-Based Neural Networks for Control Charles M. Higgins and Rodney M. Goodman Department of Electrical Engineering, 116-81 California Institute of Technology Pasadena, CA 91125 Abstract

More information

1 Introduction Computing approximate answers to multi-dimensional range queries is a problem that arises in query optimization, data mining and data w

1 Introduction Computing approximate answers to multi-dimensional range queries is a problem that arises in query optimization, data mining and data w Paper Number 42 Approximating multi-dimensional aggregate range queries over real attributes Dimitrios Gunopulos George Kollios y Vassilis J. Tsotras z Carlotta Domeniconi x Abstract Finding approximate

More information

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA A taxonomy of race conditions. D. P. Helmbold, C. E. McDowell UCSC-CRL-94-34 September 28, 1994 Board of Studies in Computer and Information Sciences University of California, Santa Cruz Santa Cruz, CA

More information

CHAPTER 4 FUZZY LOGIC, K-MEANS, FUZZY C-MEANS AND BAYESIAN METHODS

CHAPTER 4 FUZZY LOGIC, K-MEANS, FUZZY C-MEANS AND BAYESIAN METHODS CHAPTER 4 FUZZY LOGIC, K-MEANS, FUZZY C-MEANS AND BAYESIAN METHODS 4.1. INTRODUCTION This chapter includes implementation and testing of the student s academic performance evaluation to achieve the objective(s)

More information

Two-Dimensional Visualization for Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California

Two-Dimensional Visualization for Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California Two-Dimensional Visualization for Internet Resource Discovery Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California 90089-0781 fshli, danzigg@cs.usc.edu

More information

Information Integration of Partially Labeled Data

Information Integration of Partially Labeled Data Information Integration of Partially Labeled Data Steffen Rendle and Lars Schmidt-Thieme Information Systems and Machine Learning Lab, University of Hildesheim srendle@ismll.uni-hildesheim.de, schmidt-thieme@ismll.uni-hildesheim.de

More information

detected inference channel is eliminated by redesigning the database schema [Lunt, 1989] or upgrading the paths that lead to the inference [Stickel, 1

detected inference channel is eliminated by redesigning the database schema [Lunt, 1989] or upgrading the paths that lead to the inference [Stickel, 1 THE DESIGN AND IMPLEMENTATION OF A DATA LEVEL DATABASE INFERENCE DETECTION SYSTEM Raymond W. Yip and Karl N. Levitt Abstract: Inference is a way tosubvert access control mechanisms of database systems.

More information

Getting to Know Your Data

Getting to Know Your Data Chapter 2 Getting to Know Your Data 2.1 Exercises 1. Give three additional commonly used statistical measures (i.e., not illustrated in this chapter) for the characterization of data dispersion, and discuss

More information

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University

Minoru SASAKI and Kenji KITA. Department of Information Science & Intelligent Systems. Faculty of Engineering, Tokushima University Information Retrieval System Using Concept Projection Based on PDDP algorithm Minoru SASAKI and Kenji KITA Department of Information Science & Intelligent Systems Faculty of Engineering, Tokushima University

More information

A Boolean Expression. Reachability Analysis or Bisimulation. Equation Solver. Boolean. equations.

A Boolean Expression. Reachability Analysis or Bisimulation. Equation Solver. Boolean. equations. A Framework for Embedded Real-time System Design? Jin-Young Choi 1, Hee-Hwan Kwak 2, and Insup Lee 2 1 Department of Computer Science and Engineering, Korea Univerity choi@formal.korea.ac.kr 2 Department

More information

Let v be a vertex primed by v i (s). Then the number f(v) of neighbours of v which have

Let v be a vertex primed by v i (s). Then the number f(v) of neighbours of v which have Let v be a vertex primed by v i (s). Then the number f(v) of neighbours of v which have been red in the sequence up to and including v i (s) is deg(v)? s(v), and by the induction hypothesis this sequence

More information

Server 1 Server 2 CPU. mem I/O. allocate rec n read elem. n*47.0. n*20.0. select. n*1.0. write elem. n*26.5 send. n*

Server 1 Server 2 CPU. mem I/O. allocate rec n read elem. n*47.0. n*20.0. select. n*1.0. write elem. n*26.5 send. n* Information Needs in Performance Analysis of Telecommunication Software a Case Study Vesa Hirvisalo Esko Nuutila Helsinki University of Technology Laboratory of Information Processing Science Otakaari

More information

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini Metaheuristic Development Methodology Fall 2009 Instructor: Dr. Masoud Yaghini Phases and Steps Phases and Steps Phase 1: Understanding Problem Step 1: State the Problem Step 2: Review of Existing Solution

More information

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection Hyunghoon Cho and David Wu December 10, 2010 1 Introduction Given its performance in recent years' PASCAL Visual

More information

guessed style annotated.style mixed print mixed print cursive mixed mixed cursive

guessed style annotated.style mixed print mixed print cursive mixed mixed cursive COARSE WRITING-STYLE CLUSTERING BASED ON SIMPLE STROKE-RELATED FEATURES LOUIS VUURPIJL, LAMBERT SCHOMAKER NICI, Nijmegen Institute for Cognition and Information, University of Nijmegen, P.O.Box 9104, 6500

More information

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s]

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s] Fast, single-pass K-means algorithms Fredrik Farnstrom Computer Science and Engineering Lund Institute of Technology, Sweden arnstrom@ucsd.edu James Lewis Computer Science and Engineering University of

More information

TELCOM2125: Network Science and Analysis

TELCOM2125: Network Science and Analysis School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning

More information

The p-sized partitioning algorithm for fast computation of factorials of numbers

The p-sized partitioning algorithm for fast computation of factorials of numbers J Supercomput (2006) 38:73 82 DOI 10.1007/s11227-006-7285-5 The p-sized partitioning algorithm for fast computation of factorials of numbers Ahmet Ugur Henry Thompson C Science + Business Media, LLC 2006

More information

Relative Constraints as Features

Relative Constraints as Features Relative Constraints as Features Piotr Lasek 1 and Krzysztof Lasek 2 1 Chair of Computer Science, University of Rzeszow, ul. Prof. Pigonia 1, 35-510 Rzeszow, Poland, lasek@ur.edu.pl 2 Institute of Computer

More information

Efficient Data Distribution in CDBMS based on Location and User Information for Real Time Applications

Efficient Data Distribution in CDBMS based on Location and User Information for Real Time Applications ISSN (e): 2250 3005 Volume, 05 Issue, 03 March 2015 International Journal of Computational Engineering Research (IJCER) Efficient Data Distribution in CDBMS based on Location and User Information for Real

More information

Automated Clustering-Based Workload Characterization

Automated Clustering-Based Workload Characterization Automated Clustering-Based Worload Characterization Odysseas I. Pentaalos Daniel A. MenascŽ Yelena Yesha Code 930.5 Dept. of CS Dept. of EE and CS NASA GSFC Greenbelt MD 2077 George Mason University Fairfax

More information

CS 125 Section #4 RAMs and TMs 9/27/16

CS 125 Section #4 RAMs and TMs 9/27/16 CS 125 Section #4 RAMs and TMs 9/27/16 1 RAM A word-ram consists of: A fixed set of instructions P 1,..., P q. Allowed instructions are: Modular arithmetic and integer division on registers; the standard

More information

HIMIC : A Hierarchical Mixed Type Data Clustering Algorithm

HIMIC : A Hierarchical Mixed Type Data Clustering Algorithm HIMIC : A Hierarchical Mixed Type Data Clustering Algorithm R. A. Ahmed B. Borah D. K. Bhattacharyya Department of Computer Science and Information Technology, Tezpur University, Napam, Tezpur-784028,

More information

(Preliminary Version 2 ) Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science. Texas A&M University. College Station, TX

(Preliminary Version 2 ) Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science. Texas A&M University. College Station, TX Towards an Adaptive Distributed Shared Memory (Preliminary Version ) Jai-Hoon Kim Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3 E-mail: fjhkim,vaidyag@cs.tamu.edu

More information

Bumptrees for Efficient Function, Constraint, and Classification Learning

Bumptrees for Efficient Function, Constraint, and Classification Learning umptrees for Efficient Function, Constraint, and Classification Learning Stephen M. Omohundro International Computer Science Institute 1947 Center Street, Suite 600 erkeley, California 94704 Abstract A

More information

Equi-sized, Homogeneous Partitioning

Equi-sized, Homogeneous Partitioning Equi-sized, Homogeneous Partitioning Frank Klawonn and Frank Höppner 2 Department of Computer Science University of Applied Sciences Braunschweig /Wolfenbüttel Salzdahlumer Str 46/48 38302 Wolfenbüttel,

More information

RELATIONAL OPERATORS #1

RELATIONAL OPERATORS #1 RELATIONAL OPERATORS #1 CS 564- Spring 2018 ACKs: Jeff Naughton, Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? Algorithms for relational operators: select project 2 ARCHITECTURE OF A DBMS query

More information

2 J. Karvo et al. / Blocking of dynamic multicast connections Figure 1. Point to point (top) vs. point to multipoint, or multicast connections (bottom

2 J. Karvo et al. / Blocking of dynamic multicast connections Figure 1. Point to point (top) vs. point to multipoint, or multicast connections (bottom Telecommunication Systems 0 (1998)?? 1 Blocking of dynamic multicast connections Jouni Karvo a;, Jorma Virtamo b, Samuli Aalto b and Olli Martikainen a a Helsinki University of Technology, Laboratory of

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

Downloaded from

Downloaded from UNIT 2 WHAT IS STATISTICS? Researchers deal with a large amount of data and have to draw dependable conclusions on the basis of data collected for the purpose. Statistics help the researchers in making

More information

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER

THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER THE EFFECT OF JOIN SELECTIVITIES ON OPTIMAL NESTING ORDER Akhil Kumar and Michael Stonebraker EECS Department University of California Berkeley, Ca., 94720 Abstract A heuristic query optimizer must choose

More information

of Perceptron. Perceptron CPU Seconds CPU Seconds Per Trial

of Perceptron. Perceptron CPU Seconds CPU Seconds Per Trial Accelerated Learning on the Connection Machine Diane J. Cook Lawrence B. Holder University of Illinois Beckman Institute 405 North Mathews, Urbana, IL 61801 Abstract The complexity of most machine learning

More information

A B. A: sigmoid B: EBA (x0=0.03) C: EBA (x0=0.05) U

A B. A: sigmoid B: EBA (x0=0.03) C: EBA (x0=0.05) U Extending the Power and Capacity of Constraint Satisfaction Networks nchuan Zeng and Tony R. Martinez Computer Science Department, Brigham Young University, Provo, Utah 8460 Email: zengx@axon.cs.byu.edu,

More information

Database Systems. Project 2

Database Systems. Project 2 Database Systems CSCE 608 Project 2 December 6, 2017 Xichao Chen chenxichao@tamu.edu 127002358 Ruosi Lin rlin225@tamu.edu 826009602 1 Project Description 1.1 Overview Our TinySQL project is implemented

More information

Stored Relvars 18 th April 2013 (30 th March 2001) David Livingstone. Stored Relvars

Stored Relvars 18 th April 2013 (30 th March 2001) David Livingstone. Stored Relvars Stored Relvars Introduction The purpose of a Stored Relvar (= Stored Relational Variable) is to provide a mechanism by which the value of a real (or base) relvar may be partitioned into fragments and/or

More information

Computer Experiments: Space Filling Design and Gaussian Process Modeling

Computer Experiments: Space Filling Design and Gaussian Process Modeling Computer Experiments: Space Filling Design and Gaussian Process Modeling Best Practice Authored by: Cory Natoli Sarah Burke, Ph.D. 30 March 2018 The goal of the STAT COE is to assist in developing rigorous,

More information

Probabilistic Learning Approaches for Indexing and Retrieval with the. TREC-2 Collection

Probabilistic Learning Approaches for Indexing and Retrieval with the. TREC-2 Collection Probabilistic Learning Approaches for Indexing and Retrieval with the TREC-2 Collection Norbert Fuhr, Ulrich Pfeifer, Christoph Bremkamp, Michael Pollmann University of Dortmund, Germany Chris Buckley

More information

Summer School in Statistics for Astronomers & Physicists June 15-17, Cluster Analysis

Summer School in Statistics for Astronomers & Physicists June 15-17, Cluster Analysis Summer School in Statistics for Astronomers & Physicists June 15-17, 2005 Session on Computational Algorithms for Astrostatistics Cluster Analysis Max Buot Department of Statistics Carnegie-Mellon University

More information

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1 CME 305: Discrete Mathematics and Algorithms Instructor: Professor Aaron Sidford (sidford@stanfordedu) February 6, 2018 Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1 In the

More information

PRELIMINARY RESULTS ON REAL-TIME 3D FEATURE-BASED TRACKER 1. We present some preliminary results on a system for tracking 3D motion using

PRELIMINARY RESULTS ON REAL-TIME 3D FEATURE-BASED TRACKER 1. We present some preliminary results on a system for tracking 3D motion using PRELIMINARY RESULTS ON REAL-TIME 3D FEATURE-BASED TRACKER 1 Tak-keung CHENG derek@cs.mu.oz.au Leslie KITCHEN ljk@cs.mu.oz.au Computer Vision and Pattern Recognition Laboratory, Department of Computer Science,

More information

The only known methods for solving this problem optimally are enumerative in nature, with branch-and-bound being the most ecient. However, such algori

The only known methods for solving this problem optimally are enumerative in nature, with branch-and-bound being the most ecient. However, such algori Use of K-Near Optimal Solutions to Improve Data Association in Multi-frame Processing Aubrey B. Poore a and in Yan a a Department of Mathematics, Colorado State University, Fort Collins, CO, USA ABSTRACT

More information

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005

Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Abstract Deciding on which algorithm to use, in terms of which is the most effective and accurate

More information

The transition: Each student passes half his store of candies to the right. students with an odd number of candies eat one.

The transition: Each student passes half his store of candies to the right. students with an odd number of candies eat one. Kate s problem: The students are distributed around a circular table The teacher distributes candies to all the students, so that each student has an even number of candies The transition: Each student

More information

CS330. Query Processing

CS330. Query Processing CS330 Query Processing 1 Overview of Query Evaluation Plan: Tree of R.A. ops, with choice of alg for each op. Each operator typically implemented using a `pull interface: when an operator is `pulled for

More information

LATIN SQUARES AND THEIR APPLICATION TO THE FEASIBLE SET FOR ASSIGNMENT PROBLEMS

LATIN SQUARES AND THEIR APPLICATION TO THE FEASIBLE SET FOR ASSIGNMENT PROBLEMS LATIN SQUARES AND THEIR APPLICATION TO THE FEASIBLE SET FOR ASSIGNMENT PROBLEMS TIMOTHY L. VIS Abstract. A significant problem in finite optimization is the assignment problem. In essence, the assignment

More information

Semi-Supervised Clustering with Partial Background Information

Semi-Supervised Clustering with Partial Background Information Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject

More information

Non Linear Calibration Curve and Polynomial

Non Linear Calibration Curve and Polynomial Non Linear Calibration Curve and Polynomial by Dr. Colin Mercer, Technical Director, Prosig Not all systems vary linearly. The issue discussed here is determining a non linear calibration curve and, if

More information

Evaluation of relational operations

Evaluation of relational operations Evaluation of relational operations Iztok Savnik, FAMNIT Slides & Textbook Textbook: Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill, 3 rd ed., 2007. Slides: From Cow Book

More information

EC121 Mathematical Techniques A Revision Notes

EC121 Mathematical Techniques A Revision Notes EC Mathematical Techniques A Revision Notes EC Mathematical Techniques A Revision Notes Mathematical Techniques A begins with two weeks of intensive revision of basic arithmetic and algebra, to the level

More information

Notes. Some of these slides are based on a slide set provided by Ulf Leser. CS 640 Query Processing Winter / 30. Notes

Notes. Some of these slides are based on a slide set provided by Ulf Leser. CS 640 Query Processing Winter / 30. Notes uery Processing Olaf Hartig David R. Cheriton School of Computer Science University of Waterloo CS 640 Principles of Database Management and Use Winter 2013 Some of these slides are based on a slide set

More information

Department of. Computer Science. Remapping Subpartitions of. Hyperspace Using Iterative. Genetic Search. Keith Mathias and Darrell Whitley

Department of. Computer Science. Remapping Subpartitions of. Hyperspace Using Iterative. Genetic Search. Keith Mathias and Darrell Whitley Department of Computer Science Remapping Subpartitions of Hyperspace Using Iterative Genetic Search Keith Mathias and Darrell Whitley Technical Report CS-4-11 January 7, 14 Colorado State University Remapping

More information

and therefore the system throughput in a distributed database system [, 1]. Vertical fragmentation further enhances the performance of database transa

and therefore the system throughput in a distributed database system [, 1]. Vertical fragmentation further enhances the performance of database transa Vertical Fragmentation and Allocation in Distributed Deductive Database Systems Seung-Jin Lim Yiu-Kai Ng Department of Computer Science Brigham Young University Provo, Utah 80, U.S.A. Email: fsjlim,ngg@cs.byu.edu

More information

3. Cluster analysis Overview

3. Cluster analysis Overview Université Laval Multivariate analysis - February 2006 1 3.1. Overview 3. Cluster analysis Clustering requires the recognition of discontinuous subsets in an environment that is sometimes discrete (as

More information

2.3 Algorithms Using Map-Reduce

2.3 Algorithms Using Map-Reduce 28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure

More information

Extensions of One-Dimensional Gray-level Nonlinear Image Processing Filters to Three-Dimensional Color Space

Extensions of One-Dimensional Gray-level Nonlinear Image Processing Filters to Three-Dimensional Color Space Extensions of One-Dimensional Gray-level Nonlinear Image Processing Filters to Three-Dimensional Color Space Orlando HERNANDEZ and Richard KNOWLES Department Electrical and Computer Engineering, The College

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,

More information

Homework # 2 Due: October 6. Programming Multiprocessors: Parallelism, Communication, and Synchronization

Homework # 2 Due: October 6. Programming Multiprocessors: Parallelism, Communication, and Synchronization ECE669: Parallel Computer Architecture Fall 2 Handout #2 Homework # 2 Due: October 6 Programming Multiprocessors: Parallelism, Communication, and Synchronization 1 Introduction When developing multiprocessor

More information

Optimized Query Plan Algorithm for the Nested Query

Optimized Query Plan Algorithm for the Nested Query Optimized Query Plan Algorithm for the Nested Query Chittaranjan Pradhan School of Computer Engineering, KIIT University, Bhubaneswar, India Sushree Sangita Jena School of Computer Engineering, KIIT University,

More information

6 Randomized rounding of semidefinite programs

6 Randomized rounding of semidefinite programs 6 Randomized rounding of semidefinite programs We now turn to a new tool which gives substantially improved performance guarantees for some problems We now show how nonlinear programming relaxations can

More information

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

Chapter 1: Number and Operations

Chapter 1: Number and Operations Chapter 1: Number and Operations 1.1 Order of operations When simplifying algebraic expressions we use the following order: 1. Perform operations within a parenthesis. 2. Evaluate exponents. 3. Multiply

More information

Overview of Query Evaluation. Overview of Query Evaluation

Overview of Query Evaluation. Overview of Query Evaluation Overview of Query Evaluation Chapter 12 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Overview of Query Evaluation v Plan: Tree of R.A. ops, with choice of alg for each op. Each operator

More information

Estimating the Quality of Databases

Estimating the Quality of Databases Estimating the Quality of Databases Ami Motro Igor Rakov George Mason University May 1998 1 Outline: 1. Introduction 2. Simple quality estimation 3. Refined quality estimation 4. Computing the quality

More information

Extra-High Speed Matrix Multiplication on the Cray-2. David H. Bailey. September 2, 1987

Extra-High Speed Matrix Multiplication on the Cray-2. David H. Bailey. September 2, 1987 Extra-High Speed Matrix Multiplication on the Cray-2 David H. Bailey September 2, 1987 Ref: SIAM J. on Scientic and Statistical Computing, vol. 9, no. 3, (May 1988), pg. 603{607 Abstract The Cray-2 is

More information

spline structure and become polynomials on cells without collinear edges. Results of this kind follow from the intrinsic supersmoothness of bivariate

spline structure and become polynomials on cells without collinear edges. Results of this kind follow from the intrinsic supersmoothness of bivariate Supersmoothness of bivariate splines and geometry of the underlying partition. T. Sorokina ) Abstract. We show that many spaces of bivariate splines possess additional smoothness (supersmoothness) that

More information

Data integration supports seamless access to autonomous, heterogeneous information

Data integration supports seamless access to autonomous, heterogeneous information Using Constraints to Describe Source Contents in Data Integration Systems Chen Li, University of California, Irvine Data integration supports seamless access to autonomous, heterogeneous information sources

More information

Learning in Medical Image Databases. Cristian Sminchisescu. Department of Computer Science. Rutgers University, NJ

Learning in Medical Image Databases. Cristian Sminchisescu. Department of Computer Science. Rutgers University, NJ Learning in Medical Image Databases Cristian Sminchisescu Department of Computer Science Rutgers University, NJ 08854 email: crismin@paul.rutgers.edu December, 998 Abstract In this paper we present several

More information

Evaluation of Power Consumption of Modified Bubble, Quick and Radix Sort, Algorithm on the Dual Processor

Evaluation of Power Consumption of Modified Bubble, Quick and Radix Sort, Algorithm on the Dual Processor Evaluation of Power Consumption of Modified Bubble, Quick and, Algorithm on the Dual Processor Ahmed M. Aliyu *1 Dr. P. B. Zirra *2 1 Post Graduate Student *1,2, Computer Science Department, Adamawa State

More information

MOTION. Feature Matching/Tracking. Control Signal Generation REFERENCE IMAGE

MOTION. Feature Matching/Tracking. Control Signal Generation REFERENCE IMAGE Head-Eye Coordination: A Closed-Form Solution M. Xie School of Mechanical & Production Engineering Nanyang Technological University, Singapore 639798 Email: mmxie@ntuix.ntu.ac.sg ABSTRACT In this paper,

More information

Lecture 22 - Oblivious Transfer (OT) and Private Information Retrieval (PIR)

Lecture 22 - Oblivious Transfer (OT) and Private Information Retrieval (PIR) Lecture 22 - Oblivious Transfer (OT) and Private Information Retrieval (PIR) Boaz Barak December 8, 2005 Oblivious Transfer We are thinking of the following situation: we have a server and a client (or

More information

Revisiting Pipelined Parallelism in Multi-Join Query Processing

Revisiting Pipelined Parallelism in Multi-Join Query Processing Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu Elke A. Rundensteiner Department of Computer Science, Worcester Polytechnic Institute Worcester, MA 01609-2280 (binliu rundenst)@cs.wpi.edu

More information

On Color Image Quantization by the K-Means Algorithm

On Color Image Quantization by the K-Means Algorithm On Color Image Quantization by the K-Means Algorithm Henryk Palus Institute of Automatic Control, Silesian University of Technology, Akademicka 16, 44-100 GLIWICE Poland, hpalus@polsl.gliwice.pl Abstract.

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with

More information

DOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA

DOWNLOAD PDF BIG IDEAS MATH VERTICAL SHRINK OF A PARABOLA Chapter 1 : BioMath: Transformation of Graphs Use the results in part (a) to identify the vertex of the parabola. c. Find a vertical line on your graph paper so that when you fold the paper, the left portion

More information

Alternating Projections

Alternating Projections Alternating Projections Stephen Boyd and Jon Dattorro EE392o, Stanford University Autumn, 2003 1 Alternating projection algorithm Alternating projections is a very simple algorithm for computing a point

More information

Wireless Sensor Networks Localization Methods: Multidimensional Scaling vs. Semidefinite Programming Approach

Wireless Sensor Networks Localization Methods: Multidimensional Scaling vs. Semidefinite Programming Approach Wireless Sensor Networks Localization Methods: Multidimensional Scaling vs. Semidefinite Programming Approach Biljana Stojkoska, Ilinka Ivanoska, Danco Davcev, 1 Faculty of Electrical Engineering and Information

More information

Symbolic Evaluation of Sums for Parallelising Compilers

Symbolic Evaluation of Sums for Parallelising Compilers Symbolic Evaluation of Sums for Parallelising Compilers Rizos Sakellariou Department of Computer Science University of Manchester Oxford Road Manchester M13 9PL United Kingdom e-mail: rizos@csmanacuk Keywords:

More information

Online Feedback for Nested Aggregate Queries with. Dept. of Computer Science, National University of Singapore. important.

Online Feedback for Nested Aggregate Queries with. Dept. of Computer Science, National University of Singapore. important. Online Feedback for Nested Aggregate Queries with Multi-Threading y Kian-Lee Tan Cheng Hian Goh z Beng Chin Ooi Dept. of Computer Science, National University of Singapore Abstract In this paper, we study

More information

Skill. Robot/ Controller

Skill. Robot/ Controller Skill Acquisition from Human Demonstration Using a Hidden Markov Model G. E. Hovland, P. Sikka and B. J. McCarragher Department of Engineering Faculty of Engineering and Information Technology The Australian

More information

Object classes. recall (%)

Object classes. recall (%) Using Genetic Algorithms to Improve the Accuracy of Object Detection Victor Ciesielski and Mengjie Zhang Department of Computer Science, Royal Melbourne Institute of Technology GPO Box 2476V, Melbourne

More information

Finding a winning strategy in variations of Kayles

Finding a winning strategy in variations of Kayles Finding a winning strategy in variations of Kayles Simon Prins ICA-3582809 Utrecht University, The Netherlands July 15, 2015 Abstract Kayles is a two player game played on a graph. The game can be dened

More information

8. Relational Calculus (Part II)

8. Relational Calculus (Part II) 8. Relational Calculus (Part II) Relational Calculus, as defined in the previous chapter, provides the theoretical foundations for the design of practical data sub-languages (DSL). In this chapter, we

More information

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5

Artificial Neural Networks Lecture Notes Part 5. Stephen Lucci, PhD. Part 5 Artificial Neural Networks Lecture Notes Part 5 About this file: If you have trouble reading the contents of this file, or in case of transcription errors, email gi0062@bcmail.brooklyn.cuny.edu Acknowledgments:

More information

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for Comparison of Two Image-Space Subdivision Algorithms for Direct Volume Rendering on Distributed-Memory Multicomputers Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc Dept. of Computer Eng. and

More information

V Conclusions. V.1 Related work

V Conclusions. V.1 Related work V Conclusions V.1 Related work Even though MapReduce appears to be constructed specifically for performing group-by aggregations, there are also many interesting research work being done on studying critical

More information

Join algorithm costs revisited

Join algorithm costs revisited The VLDB Journal (1996) 5: 64 84 The VLDB Journal c Springer-Verlag 1996 Join algorithm costs revisited Evan P. Harris, Kotagiri Ramamohanarao Department of Computer Science, The University of Melbourne,

More information

The performance of xed block size fractal coding schemes for this model were investigated by calculating the distortion for each member of an ensemble

The performance of xed block size fractal coding schemes for this model were investigated by calculating the distortion for each member of an ensemble Fractal Coding Performance for First Order Gauss-Markov Models B E Wohlberg and G de Jager Digital Image Processing Laboratory, Electrical Engineering Department, University of Cape Town, Private Bag,

More information

An Introduction to Growth Curve Analysis using Structural Equation Modeling

An Introduction to Growth Curve Analysis using Structural Equation Modeling An Introduction to Growth Curve Analysis using Structural Equation Modeling James Jaccard New York University 1 Overview Will introduce the basics of growth curve analysis (GCA) and the fundamental questions

More information