Selective-NRA Algorithms for Top-k Queries

Size: px
Start display at page:

Download "Selective-NRA Algorithms for Top-k Queries"

Transcription

1 Selective-NRA Algorithms for Top- Queries Jing Yuan, Guang-Zhong Sun, Ye Tian, Guoliang Chen, and Zhi Liu MOE-MS Key Laboratory of Multimedia Computing and Communication, Department of Computer Science and Technology, University of Science and Technology of China, Hefei, , P.R. China Abstract. Efficient processing of top- queries has become a classical research area recently since it has lots of application fields. Fagin et al. proposed the middleware cost for a top- query algorithm. In some databases there is no way to perform a random access, Fagin et al. proposed NRA (No Random Access) algorithm for this case. In this paper, we provided some ey observations of NRA. Based on them, we proposed a new algorithm called Selective-NRA (SNRA) which is designed to minimize the useless access of a top- query. However, we proved the SNRA is not instance optimal in Fagin s notion and we also proposed an instance optimal algorithm Hybrid-SNRA based on algorithm SNRA. We conducted extensive experiments on both synthetic and realworld data. The experiments showed SNRA (Hybrid-SNRA) has less access cost than NRA. For some instances, SNRA performed 50% fewer accesses than NRA. 1 Introduction Assume there are a huge amount of objects and every object has m attributes, for each attribute the object has a score, these scores can be aggregated to a total score by an aggregate function, and we want to now which objects have the largest total scores. This scenario is generalized as top- queries. Top- query has become a classical research area since it has many applications such as information retrieval[5],[7], multimedia databases[2],[9], data mining[6]. In top- queries, we can do random access and sorted access to get an object s score. Under a random access we can get some object s score of a given attribute at one step; a sorted access means we proceed through an attribute list sequentially from the top of a list, i.e. if some object is the lth largest object in the ith list, we should do l sorted access to the ith list to obtain the object s score. As stated in [8], if there are s sorted accesses and r random accesses, the total access cost will be sc S + rc R.(C S is the cost of a single sorted access and C R is the cost of a single random access.) In some cases, random access is forbidden or restricted.[1] For example, a typical web search engine has no way to return a score of a document of our choice Corresponding author. Q. Li et al. (Eds.): APWeb/WAIM 2009, LNCS 5446, pp , c Springer-Verlag Berlin Heidelberg 2009

2 16 J. Yuan et al. under a query. On the assumption that random access is not supported by the database, in [1] the authors proposed the No Random Access Algorithm (NRA)[1]. In this paper we first demonstrate some observations of NRA algorithm and propose a Selective-NRA (SNRA) algorithm, which performs significantly better than NRA algorithm in terms of access cost and booeeping; secondly, we turn SNRA into an instance optimal algorithm which we call Hybrid-SNRA; thirdly, we carry out extensive experiments to compare SNRA (Hybrid-SNRA) algorithms with NRA both on synthetic and real-world data sets. The results also demonstrate that our algorithms have lower access cost than NRA algorithm. The rest of this paper is organized as follows. In section 2, we define the problem and review related wor. In section 3, we introduce SNRA algorithm and Hybrid-SNRA algorithm. In section 4, we show the experiment results. Finally in section 5, we conclude the paper. 2 Problem Definition and Related Wor Our model can be described as follows: assume there are m lists and n objects, the aggregation function is t which has m variables. For a given object R and for list i, R has a score x i and 0 x i 1. R has a total score of t(x 1,x 2,...,x m ). We shall denote the lists as L 1,L 2...,L m which are sorted lists and can not be random accessed. We refer to L i as list i. Each entry of L i is (R, x i )wherex i is the ith field of R. We assume there is an exact value for each object, so the length of each list is n. Since we consider only sorted access, the access cost of an algorithm will be C R r if r sorted accesses are performed. There have been several algorithms which satisfy the assumption of no random access. The famous two algorithms are Güntzer et al. s Stream-Combine Algorithm [2] and Fagin et al. s No Random Access Algorithm [1]. As mentioned in [3], Stream-Combine algorithm considers only upper bounds on overall grades of objects, and cannot say that an object is in the top- unless that object has been seen in every sorted list. In this sense, NRA is better than Stream- Combine. In [1], the authors proved that algorithm NRA is instance optimal with optimality ratio m, and no deterministic algorithm has a lower optimality ratio. (The definition of instance optimal and optimality ratio appeared in [1].) In [10], the authors studied NRA algorithm and proposed an algorithm which they call LARA. In terms of run time cost, LARA is significantly faster than NRA algorithm. However, in terms of access cost, the advantage of LARA is sometimes marginally. Theobald et al. presented several probabilistic algorithms [4] which are also variants of the NRA algorithm. The basic idea of NRA is to evaluate an object s exact value using upper bounds (best value B (d) (R)) and lower bounds (worst value W (d) (R)). The detail of algorithm NRA is given as follows. If not specified elsewhere, we will use the same notations (such as W d (R),T (d) (d),m ) as NRA algorithm used.

3 Selective-NRA Algorithms for Top- Queries 17 Algorithm NRA (by Fagin et al.)[1] Do sorted access in parallel to L 1,L 2,...,L m.ateachdepthd (when d objects have been accessed under sorted access in each list) do: Maintain the bottom values (x (d) 1,x(d) 2,...,x(d) m ) encountered in each list. For every object R with discovered fields S = S (d) (R) = {i 1,i 2,...,i l } {1,...,m} with values x i1,x i2,...,x { il, compute W (d) (R) = W S(R) = xi if i S t(w 1,w 2,...,w m) where w i = and B (d) (R) = B 0 else S(R) = { xi if i S t(b 1,b 2,...,b m)whereb i = x (d). i else Let T (d) (d), the current top list, contain the objects with the largest W values seen so far (and their grades); if two objects have the same W (d) value, then ties are broen using the B (d) values, such that the object with the highest B (d) value wins (and arbitrarily among objects that tie for the highest B (d) value). Let M (d) be the th largest W (d) value in T (d). Call an object Rviableif B (d) (R) >M (d). Halt when (a) at least distinct objects have been seen (so that in particular T (d) contains objects) and (b) there are no viable objects left outside T (d),thatis,whenb(d) (R) M (d) for all R/ T (d). Return the objects in T (d). 3 Selective-NRA Algorithms In this section we will propose our Selective-NRA algorithms. In the rest of this section we first give an equivalent form of algorithm NRA s stopping rule, and introduce some lemmas and observations which motivate us to propose algorithm SNRA; secondly we will show our algorithm SNRA and prove its correctness; finally we will propose an instance optimal algorithm based on algorithm SNRA. 3.1 Observations of NRA Algorithm Definition 1. Call an object R best competitor if R has the largest best value (B d (R)) among all viable 1 objects which are not in the current top-. If there is more than one best competitor, choose one which we have seen earliest (sorted accessed earliest) as the best competitor. The stopping rule(b) of NRA is : there are no viable objects left outside T (d) that is, when B (d) (R) M (d) for all R/ T (d) which is equivalent to T he best competitor sbestvalue M (d). We now give two lemmas which motivate us to propose algorithm SNRA. In this paper, we suppose m 2. 1 The viable object and best value are well defined at the previous section.,

4 18 J. Yuan et al. Lemma 1. If at depth d object R is the best competitor and algorithm NRA does not halt at depth d, thenr has at least one missing (undiscovered) field. Proof. We assume that at depth d we have nown all fields of R.ThenW (d) (R) = B (d) (R). Since R has the largest best value, thebest value of all viable objects except objects in T (d) is not more than B (d) (R). Since algorithm NRA does not halt at depth d, there is some object R T (d) such that B (d) (R) >W (d) (R ), then it follows that, R should be in T (d) because B (d) (R) =W (d) (R) >W (d) (R ). It leads to a confliction. So the conclusion follows, as desired. Lemma 2. If at depth d object R is the best competitor and algorithm NRA does not halt at depth d, B (d) (R) will decrease at depth d +1 only if we sorted access R s missing field. Proof. If we sorted access ith list which is not R s missing field, R s best value will not decrease because we compute R s best value by substituting R s real value for the discovered field and bottom value for the missing field. So R s best value will decrease only if we sorted access R s missing field. (R s best value maybe won t decrease when the values at the next level are the same with this level. It is a necessary but not sufficient condition.) Another Observation: We note that the best competitor changes with the algorithm running. However, by doing experiments, we found that the last best competitor (which means the last one before algorithm NRA terminated) would occupy the position of best competitor for a long time. We speculate the results are most liely similar for common data sets. Table 1 gives an experiment result for a synthetic uniform data set. We set n=100000, m=2,4,...,12, = 20. The aggregation function is summation. Suppose that the last best competitor has only one missing field when algorithm NRA is running. In this case, we still have to sorted access the best competitor s all other nown fields. This will incur lots of useless sorted accesses if the last best competitor continue for a long time before the top- objects are obtained. These observations motivate us to propose Selective-NRA algorithm. 3.2 Selective-NRA Algorithm (SNRA) We will show our algorithm in pseudo-code form. As stated in section3.1, algorithm NRA performs some sorted accesses which will definitely not reduce the best competitor s best value. Our approach can avoid these sorted accesses. To prove the correctness of SNRA algorithm, we Table 1. Accessed depth for last best competitor vs. total accessed depth m depth of the last best competitor total depth of NRA

5 Selective-NRA Algorithms for Top- Queries 19 Algorithm SNRA-Initialize 1: bottom[j]:=1.0, j =1, 2,...,m 2: top:=[dummy 1,dummy 2,...,dummy ] dummy i s best:=worst:=0, missingfield:= 3: for each R i, i =1, 2,...,n do 4: R i.best:=m 5: R i.worst:=0 6: R i.missingfield:=[1,2,...,m] 7: end for 8: candidates:= 9: bestcompetitor:=dummy with missingfield=[1,2,...,m] 10: best:=m 11: min:=dummy 1 with min:=0 Algorithm Selective-NRA Call Algorithm SNRA-Initialize while (min < best) (less than objects have been accessed) do for each j bestcompetitor.missingfield do sorted access L j obtain (p, x j) if (p has not been seen before) then candidates:=candidates {p} end if bottom[j]:=x j //is used for updating an object s best value p.missingfield:=p.missingfield-{j} update p.worst min:=min{d.worst d top}// our min is M (d) min:=argmin d {d.worst d top} if (p.worst > min) (p candidates) then candidates:=candidates-{p} candidates:=candidates {min} top:=top-{min} top:=top {p} end if end for for each R (candidates top) do update R.best if (R.best min) then candidates:=candidates-{r} end if end for bestcompetitor:=argmax R{R.best R candidates} best:=max R{R.best R candidates} end while innra need to give a lemma first. This lemma indicates that SNRA algorithm will not lead to an infinite loop.

6 20 J. Yuan et al. Lemma 3. Assume algorithm SNRA has sorted accessed d j depth to L j (j = 1, 2,...,m), and the stopping rule has not been satisfied, algorithm SNRA will proceed to access the next object at least in one list. Proof. Let d be an array of [d 1,d 2,...,d m ], in the rest of the paper when we say depth d, it means depth d j in L j. We only need to prove that at depth d the best competitor has at least one missing field. Thus the remaining part of this lemma s proof is the same as Lemma 1 s. Theorem 1. If the aggregation function is monotone, then algorithm SNRA correctly finds the top objects. Proof. Accordingto Lemma 3, ifalgorithm SNRA doesn t halt, it will proceed to access the next level at least in one list until the stopping rule is satisfied. Assume that algorithm SNRA halts after d j sorted access to L j (j =1, 2,...,m)andthe objects output by algorithm SNRA are R 1,R 2,...,R.LetR be an object not among R 1,R 2,...,R. We must show that t(r) t(r i )foreachi =1, 2,...,. Let d =[d 1,d 2,...,d m ]. Since algorithm SNRA halts at depth d, thebest competitor s best value is less than M (d), then, B(d) (R) B (d) (best competitor) M (d) and t(r) B (d) (R). Also for each of the objects R i we have M (d) W (d) (R i ) t(r i ). Combining the inequalities we have shown, we have t(r) B (d) (R) M (d) W (d) (R i ) t(r i ) for each i, as desired. Not only accesses fewer objects, our algorithm requires less booeeping than algorithm NRA. At step 2 of algorithm NRA, it will update all seen objects s worst values and best values. Our algorithm just updates an object s worst value when we sorted access it. This is reasonable because the other objects s worst values will not change if they are not accessed. Now we give an example to show how our algorithm SNRA wors. Example 1. Assume m=3, n=5, = 1, the aggregation function is summation, and the lists shown in Tab. 2 can only be sorted accessed. Table 3-6 show how our algorithm SNRA performs sorted accesses at each step on this database. At step(a) SNRA sorted accesses all the 3 lists. After that, object R 1 becomes the top 1 object and R 2 becomes the best competitor. Then at step(b) we only sorted access L 1 and L 3 since L 2 is R 2 s missing field. Now R 1 is the best competitor and R 2 is the top 1 whose worst value is 1.8. At step(c) L 2 is sorted accessed. R 1 becomes the top 1 again and R 2 becomes the best competitor. Atthistime,R 2 s missing field is L 3 so at step(d) we sorted access L 3.Nowthebest competitor is R 4 whose best value is not more than top 1 s worst value. The algorithm terminates. Table 7 shows how the top 1 object and the best competitor update at each step. On this database, algorithm NRA sorted accesses to depth 3 of each list and performs 9 sorted accesses in total while our SNRA s sorted access cost is 7.

7 Selective-NRA Algorithms for Top- Queries 21 Table 2. Sorted lists (R 1,0.9) (R 2,0.9) (R 1,0.6) (R 2,0.9) (R 1,0.8) (R 4,0.6) (R 3,0.6) (R 3,0.7) (R 3,0.4) (R 4,0.5) (R 4,0.6) (R 2,0.3) (R 5,0.2) (R 5,0.2) (R 5,0.3) Table 3. SNRA(a) (R 1,0.9) (R 2,0.9) (R 1,0.6) (R 2,0.9) (R 1,0.8) (R 4,0.6) (R 3,0.6) (R 3,0.7) (R 3,0.4) (R 4,0.5) (R 4,0.6) (R 2,0.3) (R 5,0.2) (R 5,0.2) (R 5,0.3) Table 4. SNRA(b) (R 1,0.9) (R 2,0.9) (R 1,0.6) (R 2,0.9) (R 1,0.8) (R 4,0.6) (R 3,0.6) (R 3,0.7) (R 3,0.4) (R 4,0.5) (R 4,0.6) (R 2,0.3) (R 5,0.2) (R 5,0.2) (R 5,0.3) Table 5. SNRA(c) (R 1,0.9) (R 2,0.9) (R 1,0.6) (R 2,0.9) (R 1,0.8) (R 4,0.6) (R 3,0.6) (R 3,0.7) (R 3,0.4) (R 4,0.5) (R 4,0.6) (R 2,0.3) (R 5,0.2) (R 5,0.2) (R 5,0.3) Table 6. SNRA(d) (R 1,0.9) (R 2,0.9) (R 1,0.6) (R 2,0.9) (R 1,0.8) (R 4,0.6) (R 3,0.6) (R 3,0.7) (R 3,0.4) (R 4,0.5) (R 4,0.6) (R 2,0.3) (R 5,0.2) (R 5,0.2) (R 5,0.3) Table 7. Each step of SNRA top 1 best competitor object worst object best R R R R R R R R Turning SNRA into an Instance Optimal Algorithm Fagin et al. defined instance optimality for a top- query algorithm and proved instance optimality of algorithm NRA. Unfortunately our algorithm SNRA is not instance optimal in his notion. In this section we first show an example that demonstrates SNRA is not instance optimal and then we will modify algorithm SNRA and turn it to be an instance optimal algorithm. Example 2. Assume m=2, =1, the aggregation function is summation. Table 8 shows a database over which algorithm NRA performs only 6 sorted accesses to depth 3, outputs R 1 as the top 1 object while algorithm SNRA sorted accesses the whole L1 list and totally does n + 3 sorted accesses. (After depth [1, 1], R 2 is the best competitor, so we sorted access to depth [2, 1], then R 2 is still the best competitor, then sorted access to depth [3, 1], at this depth, R 3 is the best competitor, then sorted access to depth [3, 2], R 2 becomes the best competitor since R 3 s exact value is less than R 2 s worst value, then R 2 will be the best competitor until the end of L 1 at depth [n 1, 2]. After depth [n, 2], R 2 becomes the top 1 and R 1 becomes the best competitor. Then sorted access L 2,at depth [n, 3] R 1 becomes the top 1 and the algorithm terminates.) Since n could be arbitrary large, algorithm SNRA is not an instance optimal algorithm. The reason SNRA is not instance optimal is that SNRA selects some of the lists instead of all the lists to sorted access. By doing this selection, SNRA may

8 22 J. Yuan et al. Table 8. An example shows SNRA is not instance optimal (R 1,1.0) (R 2,0.9) (R 3,0.5) (R 3,0.39) (R 4,0.45) (R 1,0.38) (R 5,0.32) (R 5,0.32) (R i, n i (R n,0.12) (R n,0.12) (R 2,0.11) (R 4,0.11) n i+1 ) do fewer sorted access than NRA algorithm in most databases but may miss some important information in some particular databases lie our example 2. Nevertheless, we can force SNRA to be an instance optimal algorithm by a little modifying of algorithm SNRA. We call the modified algorithm Hybrid-SNRA, and we now show it as follows. Algorithm Hybrid-SNRA 1: Call Algorithm Initialize 2: step:=0; 3: while (min < best) (less than objects have been accessed) do 4: step++; 5: if (step mod p =0) then 6: field=[1,2,...,m]-{the fields which have been accessed to the bottom} 7: else 8: field=bestcompetitor.missingfield 9: end if 10: for each j field do 11:... //the same as SNRA from line 4 to line 18 12: end for 13:... //the same as SNRA from line 20 to line 27 14: end while We note the sorted access cost of Hybrid-SNRA is at most p times as algorithm NRA where p is an constant. So algorithm Hybrid-SNRA is instance optimal. Since the optimal ratio of algorithm NRA is m, the optimal ratio of algorithm Hybrid-SNRA is pm under some natural assumption [1]. (In fact, if p=1, Hybrid-SNRA is equivalent to NRA and if p=infinity, Hybrid-SNRA becomes SNRA.) 4 Experiment Results Our algorithms were implemented in C++. We performed our experiments on an AMD 1.9 GHz PC with 2GB of memory. In our experiments we used summation

9 Selective-NRA Algorithms for Top- Queries 23 1,200,000 =20 n= uniform data 1,200,000 =20 n= norml data # of sorted accesses 1,000, , , ,000 NRA SNRA HSNRA # of sorted accesses 1,000, , , ,000 NRA SNRA HSNRA 200, , m m Fig. 1. Access cost over uniform database Fig. 2. Access cost over normal database as the aggregation function which was a most common one. For Hybrid-SNRA we set p=11 as default. We used both synthetic and real-world data to evaluate SNRA, Hybrid-SNRA and NRA algorithms. 4.1 Evaluation for Synthetic Data We conducted experiments on three synthetic data sets with different distributions. They are uniform distributed, normal (Gaussian) distributed and exponential distributed. We set n = , m =2, 4,...,12, and = 20. Figure 1-3 show that on the uniform, normal and exponential distributed databases our algorithm SNRA as well as Hybrid-SNRA performs fewer sorted accesses than algorithm NRA, and algorithm SNRA performs the fewest sorted accesses among these three algorithms. When m is small, the difference is not so significant, but as m becomes larger, SNRA outperforms NRA more and more significantly in terms of sorted access cost. 4.2 Evaluation for Real-World Data In addition of synthetic data, we carried out experiments on three different real-world data sets which were all downloaded from UCI KDD Archive 2. The first real-world data (CE) is IPUMS Census Database. This data set contains unweighted PUMS census data from the Los Angeles and Long Beach areas for the year It contains objects and we extracted 4 to 10 attributes from this data set. We normalized this data set with x formula: i Min Max Min if an object s value is x i. The next real-world data is KDD cup 1998 data set (cup98). It contains different objects. This is the data set used for The Second International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-98 The Fourth International Conference on Knowledge Discovery and Data Mining. We extracted 10 attributes to perform our experiment. We normalized this data set with the same formula as CE data. 2

10 24 J. Yuan et al. 500,000 =20 n= exponential data 500,000 =20 n=88443 CE data # of sorted accesses 400, , , ,000 NRA SNRA HSNRA # of sorted accesses 400, , , ,000 NRA SNRA HSNRA m m Fig. 3. Access cost over exponential database Fig. 4. Access cost over IPUMS Census database The last data set El Nino data set contains oceanographic and surface meteorological readings taen from a series of buoys positioned throughout the equatorial Pacific. The data is expected to aid in the understanding and prediction of El Nino/Southern Oscillation (ENSO) cycles. We removed those objects which had some missing fields from the original data set. We normalized the data set with the same formula as CE data. The remaining data contains objects. We chose 7 attributes to test our algorithms. Figure 4 shows the experiment result of SNRA, Hybrid-SNRA and NRA over IPUMS Census data set. We tested when m =4, 5,...,10 algorithm SNRA, Hybrid-SNRA and NRA would do how many sorted accesses to the lists. The result demonstrates that our algorithms are more better in terms of access cost even though the degree of advantage is relevant to the specific database. In addition, this figure also shows that algorithm Hybrid-SNRA does a little more sorted accesses than algorithm SNRA, under our default set (p=11). Figure 5 shows the experiment result of SNRA, Hybrid-SNRA and NRA over Cup98 data set. In this figure, it is also significant that our SNRA does fewer sorted accesses. Furthermore, as m goes larger, the advantage of SNRA is gradually obvious. Figure 6 shows the experiment result of SNRA, Hybrid-SNRA and NRA over El Nino data set. On this data set our SNRA and Hybrid-SNRA also perform very well. Our algorithm saves 50% of NRA s sorted accesses, because of the selecting strategy. 4.3 Summarize Our experiment illustrates that algorithm SNRA does fewer sorted accesses than algorithm NRA, both on synthesized data and real-world data. Algorithm Hybrid-SNRA does a little more sorted accesses than algorithm SNRA. As m becomes larger, the decrease of sorted accesses becomes more significant. (When changes, we obtain similar results which are omitted due to space limitations.)

11 Selective-NRA Algorithms for Top- Queries 25 # of sorted accesses 500, , , , ,000 NRA SNRA HSNRA =20 n=95412 cup98 data # of sorted accesses 250, , , ,000 50,000 =20 n=93935 elnino data NRA SNRA HSNRA m m Fig. 5. Access cost over KDD Cup98 database Fig. 6. Access cost over El Nino database The reason why our algorithm does fewer sorted accesses is that it selects some lists and does useful sorted accesses instead of sorted accessing all the lists. 5 Conclusion and Future Wor In this paper, we analyzed algorithm NRA and gave some observations. We proposed a new algorithm which we called Selective-NRA and we turned it into an instance optimal algorithm Hybrid-SNRA. Extensive experiment results both on synthetic and real-world data show that our algorithms SNRA and Hybrid- SNRA perform significantly better than NRA in terms of sorted access cost. Another interesting result according to our experiments is that algorithm NRA and algorithm Hybrid-SNRA are instance optimal but they perform fewer sorted accesses than a non- instance optimal algorithm on common data sets. This is an issue for our further research. In the future, we will also consider the run time cost of SNRA compared with algorithm NRA. We will design some techniques to lower down the run time cost of SNRA. Acnowledgements. This wor is supported by the National Science Foundation of China under the grant No and No This wor is also supported by the Science Research Fund of MOE-Microsoft Key Laboratory of Multimedia Computing and Communication (Grant No ). References 1. Fagin, R., Lotem, A., Naor, M.: Optimal Aggregation Algorithms for Middleware. In: Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp (2001) 2. Güntzer, U., Bale, W.T., Kie, W.: Towards Efficient Multi-Feature Queries in Heterogeneous Environments. In: Proceedings of the IEEE International Conference on Information Technology: Coding and Computing, pp (2001)

12 26 J. Yuan et al. 3. Fagin, R.: Combining Fuzzy Information: an Overview. SIGMOD Record 31(2), (2002) 4. Theobald, M., Keium, G., Schenel, R.: Top- Query Evaluation with Probabilistic Guarantees. In: Proceedings of the 30th International Conference on Very Large Data Bases, pp (2004) 5. Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading (1989) 6. Getoor, L., Diehl, C.P.: Lin Mining: a Survey. SIGKDD Explorations 7(2), 3 12 (2005) 7. Long, X., Suel, T.: Three-Level Caching for Efficient Query Processing in Large Web Search Engines. In: Proceedings of the 14th International Conference on World Wide Web, pp (2005) 8. Fagin, R.: Combining Fuzzy Information from Multiple Systems. J. Comput. Syst. Sci. 58(1), (1999) 9. Nepal, S., Ramarishna, M.V.: Query Processing Issues in Image (Multimedia) Databases. In: Proceedings of the 15th International Conference on Data Engineering, pp (1999) 10. Mamoulis, N., Yiu, M.H., Cheng, K.H., Cheung, D.W.: Efficient Top- Aggregation of Raned Inputs. ACM Transactions on Database Systems (TODS) 32(3), 19 (2007)

Combining Fuzzy Information: an Overview

Combining Fuzzy Information: an Overview Combining Fuzzy Information: an Overview Ronald Fagin IBM Almaden Research Center 650 Harry Road San Jose, California 95120-6099 email: fagin@almaden.ibm.com http://www.almaden.ibm.com/cs/people/fagin/

More information

Optimal Aggregation Algorithms for Middleware

Optimal Aggregation Algorithms for Middleware Optimal Aggregation Algorithms for Middleware [Extended Abstract] Ronald Fagin IBM Almaden Research Center 650 Harry Road San Jose, CA 95120 fagin@almaden.ibm.com Amnon Lotem University of Maryland College

More information

modern database systems lecture 5 : top-k retrieval

modern database systems lecture 5 : top-k retrieval modern database systems lecture 5 : top-k retrieval Aristides Gionis Michael Mathioudakis spring 2016 announcements problem session on Monday, March 7, 2-4pm, at T2 solutions of the problems in homework

More information

On the Complexity of the Policy Improvement Algorithm. for Markov Decision Processes

On the Complexity of the Policy Improvement Algorithm. for Markov Decision Processes On the Complexity of the Policy Improvement Algorithm for Markov Decision Processes Mary Melekopoglou Anne Condon Computer Sciences Department University of Wisconsin - Madison 0 West Dayton Street Madison,

More information

Efficient Aggregation of Ranked Inputs

Efficient Aggregation of Ranked Inputs Efficient Aggregation of Ranked Inputs Nikos Mamoulis, Kit Hung Cheng, Man Lung Yiu, and David W. Cheung Department of Computer Science University of Hong Kong Pokfulam Road Hong Kong {nikos,khcheng,mlyiu2,dcheung}@cs.hku.hk

More information

A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods

A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods S.Anusuya 1, M.Balaganesh 2 P.G. Student, Department of Computer Science and Engineering, Sembodai Rukmani Varatharajan Engineering

More information

Sequences Modeling and Analysis Based on Complex Network

Sequences Modeling and Analysis Based on Complex Network Sequences Modeling and Analysis Based on Complex Network Li Wan 1, Kai Shu 1, and Yu Guo 2 1 Chongqing University, China 2 Institute of Chemical Defence People Libration Army {wanli,shukai}@cqu.edu.cn

More information

Optimal algorithms for middleware

Optimal algorithms for middleware Optimal aggregation algorithms for middleware S856 Fall 2005 Presentation Weihan Wang w23wang@uwaterloo.ca November 23, 2005 About the paper Ronald Fagin, IBM Research Amnon Lotem, Maryland Moni Naor,

More information

Combining Fuzzy Information - Top-k Query Algorithms. Sanjay Kulhari

Combining Fuzzy Information - Top-k Query Algorithms. Sanjay Kulhari Combining Fuzzy Information - Top-k Query Algorithms Sanjay Kulhari Outline Definitions Objects, Attributes and Scores Querying Fuzzy Data Top-k query algorithms Naïve Algorithm Fagin s Algorithm (FA)

More information

Efficient Top-k Aggregation of Ranked Inputs

Efficient Top-k Aggregation of Ranked Inputs Efficient Top-k Aggregation of Ranked Inputs NIKOS MAMOULIS University of Hong Kong MAN LUNG YIU Aalborg University KIT HUNG CHENG University of Hong Kong and DAVID W. CHEUNG University of Hong Kong A

More information

Dominant Graph: An Efficient Indexing Structure to Answer Top-K Queries

Dominant Graph: An Efficient Indexing Structure to Answer Top-K Queries Dominant Graph: An Efficient Indexing Structure to Answer Top-K Queries Lei Zou 1, Lei Chen 2 1 Huazhong University of Science and Technology 137 Luoyu Road, Wuhan, P. R. China 1 zoulei@mail.hust.edu.cn

More information

New Worst-Case Upper Bound for #2-SAT and #3-SAT with the Number of Clauses as the Parameter

New Worst-Case Upper Bound for #2-SAT and #3-SAT with the Number of Clauses as the Parameter Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) New Worst-Case Upper Bound for #2-SAT and #3-SAT with the Number of Clauses as the Parameter Junping Zhou 1,2, Minghao

More information

Max-Count Aggregation Estimation for Moving Points

Max-Count Aggregation Estimation for Moving Points Max-Count Aggregation Estimation for Moving Points Yi Chen Peter Revesz Dept. of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA Abstract Many interesting problems

More information

Combination of TA- and MD-algorithm for Efficient Solving of Top-K Problem according to User s Preferences

Combination of TA- and MD-algorithm for Efficient Solving of Top-K Problem according to User s Preferences Combination of TA- and MD-algorithm for Efficient Solving of Top-K Problem according to User s Preferences Matúš Ondreička and Jaroslav Pokorný Department of Software Engineering, Faculty of Mathematics

More information

Mining Frequent Itemsets for data streams over Weighted Sliding Windows

Mining Frequent Itemsets for data streams over Weighted Sliding Windows Mining Frequent Itemsets for data streams over Weighted Sliding Windows Pauray S.M. Tsai Yao-Ming Chen Department of Computer Science and Information Engineering Minghsin University of Science and Technology

More information

Web page recommendation using a stochastic process model

Web page recommendation using a stochastic process model Data Mining VII: Data, Text and Web Mining and their Business Applications 233 Web page recommendation using a stochastic process model B. J. Park 1, W. Choi 1 & S. H. Noh 2 1 Computer Science Department,

More information

Hybrid Feature Selection for Modeling Intrusion Detection Systems

Hybrid Feature Selection for Modeling Intrusion Detection Systems Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

A Practical Distributed String Matching Algorithm Architecture and Implementation

A Practical Distributed String Matching Algorithm Architecture and Implementation A Practical Distributed String Matching Algorithm Architecture and Implementation Bi Kun, Gu Nai-jie, Tu Kun, Liu Xiao-hu, and Liu Gang International Science Index, Computer and Information Engineering

More information

Closed Non-Derivable Itemsets

Closed Non-Derivable Itemsets Closed Non-Derivable Itemsets Juho Muhonen and Hannu Toivonen Helsinki Institute for Information Technology Basic Research Unit Department of Computer Science University of Helsinki Finland Abstract. Itemset

More information

Nondeterministic Query Algorithms

Nondeterministic Query Algorithms Journal of Universal Computer Science, vol. 17, no. 6 (2011), 859-873 submitted: 30/7/10, accepted: 17/2/11, appeared: 28/3/11 J.UCS Nondeterministic Query Algorithms Alina Vasilieva (Faculty of Computing,

More information

Ranking Web Pages by Associating Keywords with Locations

Ranking Web Pages by Associating Keywords with Locations Ranking Web Pages by Associating Keywords with Locations Peiquan Jin, Xiaoxiang Zhang, Qingqing Zhang, Sheng Lin, and Lihua Yue University of Science and Technology of China, 230027, Hefei, China jpq@ustc.edu.cn

More information

Diagonal Principal Component Analysis for Face Recognition

Diagonal Principal Component Analysis for Face Recognition Diagonal Principal Component nalysis for Face Recognition Daoqiang Zhang,2, Zhi-Hua Zhou * and Songcan Chen 2 National Laboratory for Novel Software echnology Nanjing University, Nanjing 20093, China 2

More information

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases

SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases SA-IFIM: Incrementally Mining Frequent Itemsets in Update Distorted Databases Jinlong Wang, Congfu Xu, Hongwei Dan, and Yunhe Pan Institute of Artificial Intelligence, Zhejiang University Hangzhou, 310027,

More information

Optimizing Access Cost for Top-k Queries over Web Sources: A Unified Cost-based Approach

Optimizing Access Cost for Top-k Queries over Web Sources: A Unified Cost-based Approach UIUC Technical Report UIUCDCS-R-03-2324, UILU-ENG-03-1711. March 03 (Revised March 04) Optimizing Access Cost for Top-k Queries over Web Sources A Unified Cost-based Approach Seung-won Hwang and Kevin

More information

Linear-time approximation algorithms for minimum subset sum and subset sum

Linear-time approximation algorithms for minimum subset sum and subset sum Linear-time approximation algorithms for minimum subset sum and subset sum Liliana Grigoriu 1 1 : Faultät für Wirtschaftswissenschaften, Wirtschaftsinformati und Wirtschaftsrecht Universität Siegen, Kohlbettstr.

More information

arxiv: v1 [cs.ma] 8 May 2018

arxiv: v1 [cs.ma] 8 May 2018 Ordinal Approximation for Social Choice, Matching, and Facility Location Problems given Candidate Positions Elliot Anshelevich and Wennan Zhu arxiv:1805.03103v1 [cs.ma] 8 May 2018 May 9, 2018 Abstract

More information

Appropriate Item Partition for Improving the Mining Performance

Appropriate Item Partition for Improving the Mining Performance Appropriate Item Partition for Improving the Mining Performance Tzung-Pei Hong 1,2, Jheng-Nan Huang 1, Kawuu W. Lin 3 and Wen-Yang Lin 1 1 Department of Computer Science and Information Engineering National

More information

On the Security of Stream Cipher CryptMT v3

On the Security of Stream Cipher CryptMT v3 On the Security of Stream Cipher CryptMT v3 Haina Zhang 1, and Xiaoyun Wang 1,2 1 Key Laboratory of Cryptologic Technology and Information Security, Ministry of Education, Shandong University, Jinan 250100,

More information

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM EFFICIENT ATTRIBUTE REDUCTION ALGORITHM Zhongzhi Shi, Shaohui Liu, Zheng Zheng Institute Of Computing Technology,Chinese Academy of Sciences, Beijing, China Abstract: Key words: Efficiency of algorithms

More information

A Distribution-Sensitive Dictionary with Low Space Overhead

A Distribution-Sensitive Dictionary with Low Space Overhead A Distribution-Sensitive Dictionary with Low Space Overhead Prosenjit Bose, John Howat, and Pat Morin School of Computer Science, Carleton University 1125 Colonel By Dr., Ottawa, Ontario, CANADA, K1S 5B6

More information

On Multiple Query Optimization in Data Mining

On Multiple Query Optimization in Data Mining On Multiple Query Optimization in Data Mining Marek Wojciechowski, Maciej Zakrzewicz Poznan University of Technology Institute of Computing Science ul. Piotrowo 3a, 60-965 Poznan, Poland {marek,mzakrz}@cs.put.poznan.pl

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 600.363 Introduction to Algorithms / 600.463 Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 23.1 Introduction We spent last week proving that for certain problems,

More information

An Efficient Clustering Method for k-anonymization

An Efficient Clustering Method for k-anonymization An Efficient Clustering Method for -Anonymization Jun-Lin Lin Department of Information Management Yuan Ze University Chung-Li, Taiwan jun@saturn.yzu.edu.tw Meng-Cheng Wei Department of Information Management

More information

NDoT: Nearest Neighbor Distance Based Outlier Detection Technique

NDoT: Nearest Neighbor Distance Based Outlier Detection Technique NDoT: Nearest Neighbor Distance Based Outlier Detection Technique Neminath Hubballi 1, Bidyut Kr. Patra 2, and Sukumar Nandi 1 1 Department of Computer Science & Engineering, Indian Institute of Technology

More information

Formal Model. Figure 1: The target concept T is a subset of the concept S = [0, 1]. The search agent needs to search S for a point in T.

Formal Model. Figure 1: The target concept T is a subset of the concept S = [0, 1]. The search agent needs to search S for a point in T. Although this paper analyzes shaping with respect to its benefits on search problems, the reader should recognize that shaping is often intimately related to reinforcement learning. The objective in reinforcement

More information

Best Keyword Cover Search

Best Keyword Cover Search Vennapusa Mahesh Kumar Reddy Dept of CSE, Benaiah Institute of Technology and Science. Best Keyword Cover Search Sudhakar Babu Pendhurthi Assistant Professor, Benaiah Institute of Technology and Science.

More information

Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach +

Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach + Selecting Topics for Web Resource Discovery: Efficiency Issues in a Database Approach + Abdullah Al-Hamdani, Gultekin Ozsoyoglu Electrical Engineering and Computer Science Dept, Case Western Reserve University,

More information

Comparison of of parallel and random approach to

Comparison of of parallel and random approach to Comparison of of parallel and random approach to acandidate candidatelist listininthe themultifeature multifeaturequerying Peter Gurský Peter Gurský Institute of Computer Science, Faculty of Science Institute

More information

Clustering-Based Distributed Precomputation for Quality-of-Service Routing*

Clustering-Based Distributed Precomputation for Quality-of-Service Routing* Clustering-Based Distributed Precomputation for Quality-of-Service Routing* Yong Cui and Jianping Wu Department of Computer Science, Tsinghua University, Beijing, P.R.China, 100084 cy@csnet1.cs.tsinghua.edu.cn,

More information

Incrementally mining high utility patterns based on pre-large concept

Incrementally mining high utility patterns based on pre-large concept Appl Intell (2014) 40:343 357 DOI 10.1007/s10489-013-0467-z Incrementally mining high utility patterns based on pre-large concept Chun-Wei Lin Tzung-Pei Hong Guo-Cheng Lan Jia-Wei Wong Wen-Yang Lin Published

More information

JOB SHOP SCHEDULING WITH UNIT LENGTH TASKS

JOB SHOP SCHEDULING WITH UNIT LENGTH TASKS JOB SHOP SCHEDULING WITH UNIT LENGTH TASKS MEIKE AKVELD AND RAPHAEL BERNHARD Abstract. In this paper, we consider a class of scheduling problems that are among the fundamental optimization problems in

More information

Subspace Discovery for Promotion: A Cell Clustering Approach

Subspace Discovery for Promotion: A Cell Clustering Approach Subspace Discovery for Promotion: A Cell Clustering Approach Tianyi Wu and Jiawei Han University of Illinois at Urbana-Champaign, USA {twu5,hanj}@illinois.edu Abstract. The promotion analysis problem has

More information

DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY

DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY Reham I. Abdel Monem 1, Ali H. El-Bastawissy 2 and Mohamed M. Elwakil 3 1 Information Systems Department, Faculty of computers and information,

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing 244 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 10, NO 2, APRIL 2002 Heuristic Algorithms for Multiconstrained Quality-of-Service Routing Xin Yuan, Member, IEEE Abstract Multiconstrained quality-of-service

More information

Parameterized graph separation problems

Parameterized graph separation problems Parameterized graph separation problems Dániel Marx Department of Computer Science and Information Theory, Budapest University of Technology and Economics Budapest, H-1521, Hungary, dmarx@cs.bme.hu Abstract.

More information

On top-k search with no random access using small memory

On top-k search with no random access using small memory On top-k search with no random access using small memory Peter Gurský and Peter Vojtáš 2 University of P.J.Šafárik, Košice, Slovakia 2 Charles University, Prague, Czech Republic peter.gursky@upjs.sk,peter.vojtas@mff.cuni.cz

More information

Introducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values

Introducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values Introducing Partial Matching Approach in Association Rules for Better Treatment of Missing Values SHARIQ BASHIR, SAAD RAZZAQ, UMER MAQBOOL, SONYA TAHIR, A. RAUF BAIG Department of Computer Science (Machine

More information

. A quick enumeration leads to five possible upper bounds and we are interested in the smallest of them: h(x 1, x 2, x 3) min{x 1

. A quick enumeration leads to five possible upper bounds and we are interested in the smallest of them: h(x 1, x 2, x 3) min{x 1 large-scale search engines [14]. These intersection lists, however, take up additional space dictating a cost-benefit trade-off, and careful strategies have been proposed to select the pairs of terms for

More information

IO-Top-k at TREC 2006: Terabyte Track

IO-Top-k at TREC 2006: Terabyte Track IO-Top-k at TREC 2006: Terabyte Track Holger Bast Debapriyo Majumdar Ralf Schenkel Martin Theobald Gerhard Weikum Max-Planck-Institut für Informatik, Saarbrücken, Germany {bast,deb,schenkel,mtb,weikum}@mpi-inf.mpg.de

More information

Competitive Analysis of On-line Algorithms for On-demand Data Broadcast Scheduling

Competitive Analysis of On-line Algorithms for On-demand Data Broadcast Scheduling Competitive Analysis of On-line Algorithms for On-demand Data Broadcast Scheduling Weizhen Mao Department of Computer Science The College of William and Mary Williamsburg, VA 23187-8795 USA wm@cs.wm.edu

More information

Algorithms on Minimizing the Maximum Sensor Movement for Barrier Coverage of a Linear Domain

Algorithms on Minimizing the Maximum Sensor Movement for Barrier Coverage of a Linear Domain Algorithms on Minimizing the Maximum Sensor Movement for Barrier Coverage of a Linear Domain Danny Z. Chen 1, Yan Gu 2, Jian Li 3, and Haitao Wang 1 1 Department of Computer Science and Engineering University

More information

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007 CS880: Approximations Algorithms Scribe: Chi Man Liu Lecturer: Shuchi Chawla Topic: Local Search: Max-Cut, Facility Location Date: 2/3/2007 In previous lectures we saw how dynamic programming could be

More information

Study on Personalized Recommendation Model of Internet Advertisement

Study on Personalized Recommendation Model of Internet Advertisement Study on Personalized Recommendation Model of Internet Advertisement Ning Zhou, Yongyue Chen and Huiping Zhang Center for Studies of Information Resources, Wuhan University, Wuhan 430072 chenyongyue@hotmail.com

More information

Predictive Indexing for Fast Search

Predictive Indexing for Fast Search Predictive Indexing for Fast Search Sharad Goel Yahoo! Research New York, NY 10018 goel@yahoo-inc.com John Langford Yahoo! Research New York, NY 10018 jl@yahoo-inc.com Alex Strehl Yahoo! Research New York,

More information

CS264: Homework #1. Due by midnight on Thursday, January 19, 2017

CS264: Homework #1. Due by midnight on Thursday, January 19, 2017 CS264: Homework #1 Due by midnight on Thursday, January 19, 2017 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. See the course site for submission

More information

A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets

A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets A Two-Phase Algorithm for Fast Discovery of High Utility temsets Ying Liu, Wei-keng Liao, and Alok Choudhary Electrical and Computer Engineering Department, Northwestern University, Evanston, L, USA 60208

More information

Feature Selection for Multi-Class Imbalanced Data Sets Based on Genetic Algorithm

Feature Selection for Multi-Class Imbalanced Data Sets Based on Genetic Algorithm Ann. Data. Sci. (2015) 2(3):293 300 DOI 10.1007/s40745-015-0060-x Feature Selection for Multi-Class Imbalanced Data Sets Based on Genetic Algorithm Li-min Du 1,2 Yang Xu 1 Hua Zhu 1 Received: 30 November

More information

Merging Frequent Summaries

Merging Frequent Summaries Merging Frequent Summaries M. Cafaro, M. Pulimeno University of Salento, Italy {massimo.cafaro, marco.pulimeno}@unisalento.it Abstract. Recently, an algorithm for merging counter-based data summaries which

More information

1 Counting triangles and cliques

1 Counting triangles and cliques ITCSC-INC Winter School 2015 26 January 2014 notes by Andrej Bogdanov Today we will talk about randomness and some of the surprising roles it plays in the theory of computing and in coding theory. Let

More information

Randomized Algorithms 2017A - Lecture 10 Metric Embeddings into Random Trees

Randomized Algorithms 2017A - Lecture 10 Metric Embeddings into Random Trees Randomized Algorithms 2017A - Lecture 10 Metric Embeddings into Random Trees Lior Kamma 1 Introduction Embeddings and Distortion An embedding of a metric space (X, d X ) into a metric space (Y, d Y ) is

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

A 2k-Kernelization Algorithm for Vertex Cover Based on Crown Decomposition

A 2k-Kernelization Algorithm for Vertex Cover Based on Crown Decomposition A 2k-Kernelization Algorithm for Vertex Cover Based on Crown Decomposition Wenjun Li a, Binhai Zhu b, a Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha

More information

Probabilistic Graph Summarization

Probabilistic Graph Summarization Probabilistic Graph Summarization Nasrin Hassanlou, Maryam Shoaran, and Alex Thomo University of Victoria, Victoria, Canada {hassanlou,maryam,thomo}@cs.uvic.ca 1 Abstract We study group-summarization of

More information

8 SortinginLinearTime

8 SortinginLinearTime 8 SortinginLinearTime We have now introduced several algorithms that can sort n numbers in O(n lg n) time. Merge sort and heapsort achieve this upper bound in the worst case; quicksort achieves it on average.

More information

II (Sorting and) Order Statistics

II (Sorting and) Order Statistics II (Sorting and) Order Statistics Heapsort Quicksort Sorting in Linear Time Medians and Order Statistics 8 Sorting in Linear Time The sorting algorithms introduced thus far are comparison sorts Any comparison

More information

On the Max Coloring Problem

On the Max Coloring Problem On the Max Coloring Problem Leah Epstein Asaf Levin May 22, 2010 Abstract We consider max coloring on hereditary graph classes. The problem is defined as follows. Given a graph G = (V, E) and positive

More information

Sharp lower bound for the total number of matchings of graphs with given number of cut edges

Sharp lower bound for the total number of matchings of graphs with given number of cut edges South Asian Journal of Mathematics 2014, Vol. 4 ( 2 ) : 107 118 www.sajm-online.com ISSN 2251-1512 RESEARCH ARTICLE Sharp lower bound for the total number of matchings of graphs with given number of cut

More information

Clustering. (Part 2)

Clustering. (Part 2) Clustering (Part 2) 1 k-means clustering 2 General Observations on k-means clustering In essence, k-means clustering aims at minimizing cluster variance. It is typically used in Euclidean spaces and works

More information

Effective Pattern Similarity Match for Multidimensional Sequence Data Sets

Effective Pattern Similarity Match for Multidimensional Sequence Data Sets Effective Pattern Similarity Match for Multidimensional Sequence Data Sets Seo-Lyong Lee, * and Deo-Hwan Kim 2, ** School of Industrial and Information Engineering, Hanu University of Foreign Studies,

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments

Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing Environments Send Orders for Reprints to reprints@benthamscience.ae 368 The Open Automation and Control Systems Journal, 2014, 6, 368-373 Open Access Apriori Algorithm Research Based on Map-Reduce in Cloud Computing

More information

On Using Machine Learning for Logic BIST

On Using Machine Learning for Logic BIST On Using Machine Learning for Logic BIST Christophe FAGOT Patrick GIRARD Christian LANDRAULT Laboratoire d Informatique de Robotique et de Microélectronique de Montpellier, UMR 5506 UNIVERSITE MONTPELLIER

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics

Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Designing Views to Answer Queries under Set, Bag,and BagSet Semantics Rada Chirkova Department of Computer Science, North Carolina State University Raleigh, NC 27695-7535 chirkova@csc.ncsu.edu Foto Afrati

More information

Online algorithms for clustering problems

Online algorithms for clustering problems University of Szeged Department of Computer Algorithms and Artificial Intelligence Online algorithms for clustering problems Summary of the Ph.D. thesis by Gabriella Divéki Supervisor Dr. Csanád Imreh

More information

Mining High Average-Utility Itemsets

Mining High Average-Utility Itemsets Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Mining High Itemsets Tzung-Pei Hong Dept of Computer Science and Information Engineering

More information

Efficient SQL-Querying Method for Data Mining in Large Data Bases

Efficient SQL-Querying Method for Data Mining in Large Data Bases Efficient SQL-Querying Method for Data Mining in Large Data Bases Nguyen Hung Son Institute of Mathematics Warsaw University Banacha 2, 02095, Warsaw, Poland Abstract Data mining can be understood as a

More information

Privacy Breaches in Privacy-Preserving Data Mining

Privacy Breaches in Privacy-Preserving Data Mining 1 Privacy Breaches in Privacy-Preserving Data Mining Johannes Gehrke Department of Computer Science Cornell University Joint work with Sasha Evfimievski (Cornell), Ramakrishnan Srikant (IBM), and Rakesh

More information

Stability of Networks and Protocols in the Adversarial Queueing. Model for Packet Routing. Ashish Goel. December 1, Abstract

Stability of Networks and Protocols in the Adversarial Queueing. Model for Packet Routing. Ashish Goel. December 1, Abstract Stability of Networks and Protocols in the Adversarial Queueing Model for Packet Routing Ashish Goel University of Southern California December 1, 2000 Abstract The adversarial queueing theory model for

More information

Semantics of Ranking Queries for Probabilistic Data and Expected Ranks

Semantics of Ranking Queries for Probabilistic Data and Expected Ranks Semantics of Ranking Queries for Probabilistic Data and Expected Ranks Graham Cormode AT&T Labs Research Florham Park, NJ, USA Feifei Li Computer Science Department FSU, Tallahassee, FL, USA Ke Yi Computer

More information

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix

Privacy-Preserving of Check-in Services in MSNS Based on a Bit Matrix BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 15, No 2 Sofia 2015 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2015-0032 Privacy-Preserving of Check-in

More information

Worst-case running time for RANDOMIZED-SELECT

Worst-case running time for RANDOMIZED-SELECT Worst-case running time for RANDOMIZED-SELECT is ), even to nd the minimum The algorithm has a linear expected running time, though, and because it is randomized, no particular input elicits the worst-case

More information

Parallel Query Processing and Edge Ranking of Graphs

Parallel Query Processing and Edge Ranking of Graphs Parallel Query Processing and Edge Ranking of Graphs Dariusz Dereniowski, Marek Kubale Department of Algorithms and System Modeling, Gdańsk University of Technology, Poland, {deren,kubale}@eti.pg.gda.pl

More information

Collaborative Rough Clustering

Collaborative Rough Clustering Collaborative Rough Clustering Sushmita Mitra, Haider Banka, and Witold Pedrycz Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India {sushmita, hbanka r}@isical.ac.in Dept. of Electrical

More information

AN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE

AN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE AN OPTIMIZATION GENETIC ALGORITHM FOR IMAGE DATABASES IN AGRICULTURE Changwu Zhu 1, Guanxiang Yan 2, Zhi Liu 3, Li Gao 1,* 1 Department of Computer Science, Hua Zhong Normal University, Wuhan 430079, China

More information

A Polygon Rendering Method with Precomputed Information

A Polygon Rendering Method with Precomputed Information A Polygon Rendering Method with Precomputed Information Seunghyun Park #1, Byoung-Woo Oh #2 # Department of Computer Engineering, Kumoh National Institute of Technology, Korea 1 seunghyunpark12@gmail.com

More information

Distributed minimum spanning tree problem

Distributed minimum spanning tree problem Distributed minimum spanning tree problem Juho-Kustaa Kangas 24th November 2012 Abstract Given a connected weighted undirected graph, the minimum spanning tree problem asks for a spanning subtree with

More information

An Efficient Algorithm for Computing Non-overlapping Inversion and Transposition Distance

An Efficient Algorithm for Computing Non-overlapping Inversion and Transposition Distance An Efficient Algorithm for Computing Non-overlapping Inversion and Transposition Distance Toan Thang Ta, Cheng-Yao Lin and Chin Lung Lu Department of Computer Science National Tsing Hua University, Hsinchu

More information

Welfare Navigation Using Genetic Algorithm

Welfare Navigation Using Genetic Algorithm Welfare Navigation Using Genetic Algorithm David Erukhimovich and Yoel Zeldes Hebrew University of Jerusalem AI course final project Abstract Using standard navigation algorithms and applications (such

More information

HISTORICAL BACKGROUND

HISTORICAL BACKGROUND VALID-TIME INDEXING Mirella M. Moro Universidade Federal do Rio Grande do Sul Porto Alegre, RS, Brazil http://www.inf.ufrgs.br/~mirella/ Vassilis J. Tsotras University of California, Riverside Riverside,

More information

A TIGHT BOUND ON THE LENGTH OF ODD CYCLES IN THE INCOMPATIBILITY GRAPH OF A NON-C1P MATRIX

A TIGHT BOUND ON THE LENGTH OF ODD CYCLES IN THE INCOMPATIBILITY GRAPH OF A NON-C1P MATRIX A TIGHT BOUND ON THE LENGTH OF ODD CYCLES IN THE INCOMPATIBILITY GRAPH OF A NON-C1P MATRIX MEHRNOUSH MALEKESMAEILI, CEDRIC CHAUVE, AND TAMON STEPHEN Abstract. A binary matrix has the consecutive ones property

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 22.1 Introduction We spent the last two lectures proving that for certain problems, we can

More information

Lecture 2 The k-means clustering problem

Lecture 2 The k-means clustering problem CSE 29: Unsupervised learning Spring 2008 Lecture 2 The -means clustering problem 2. The -means cost function Last time we saw the -center problem, in which the input is a set S of data points and the

More information

Location Privacy Protection for Preventing Replay Attack under Road-Network Constraints

Location Privacy Protection for Preventing Replay Attack under Road-Network Constraints Location Privacy Protection for Preventing Replay Attack under Road-Network Constraints Lan Sun, Ying-jie Wu, Zhao Luo, Yi-lei Wang College of Mathematics and Computer Science Fuzhou University Fuzhou,

More information

Online k-taxi Problem

Online k-taxi Problem Distributed Computing Online k-taxi Problem Theoretical Patrick Stäuble patricst@ethz.ch Distributed Computing Group Computer Engineering and Networks Laboratory ETH Zürich Supervisors: Georg Bachmeier,

More information

Exact and Approximate Generic Multi-criteria Top-k Query Processing

Exact and Approximate Generic Multi-criteria Top-k Query Processing Exact and Approximate Generic Multi-criteria Top-k Query Processing Mehdi Badr, Dan Vodislav To cite this version: Mehdi Badr, Dan Vodislav. Exact and Approximate Generic Multi-criteria Top-k Query Processing.

More information

Outline. Introduction. 2 Proof of Correctness. 3 Final Notes. Precondition P 1 : Inputs include

Outline. Introduction. 2 Proof of Correctness. 3 Final Notes. Precondition P 1 : Inputs include Outline Computer Science 331 Correctness of Algorithms Mike Jacobson Department of Computer Science University of Calgary Lectures #2-4 1 What is a? Applications 2 Recursive Algorithms 3 Final Notes Additional

More information