Efficient Top-K Problem Solvings for More Users in Tree-Oriented Data Structures

Size: px
Start display at page:

Download "Efficient Top-K Problem Solvings for More Users in Tree-Oriented Data Structures"

Transcription

1 Efficient Top-K Problem Solvings for More Users in Tree-Oriented Data Structures Matúš Ondreička Faculty of Mathematics and Physics Department of Software Engineering Charles University in Prague, Czech Republic Jaroslav Pokorný Faculty of Mathematics and Physics Department of Software Engineering Charles University in Prague, Czech Republic Abstract This paper focuses on efficient searching the best K objects in more attributes according to user preferences. User preferences are modelled locally with fuzzy functions and globally with an aggregation function. Because of local preferences, we have used B + -tree for sorting objects according to a fuzzy function. We deal with the usage of TAalgorithm, which uses B + -trees, and MD-algorithm, which is based on multidimensional B-tree. In this paper we develop a new algorithm, MXT-algorithm, which id based on integration of MD-algorithm with more instances of TAalgorithm. We develop also a new tree-oriented data structure based on B + -trees, multidimensional B-tree with lists, in which MXT-algorithm can effectively find the best K objects according to user preferences. Finally, we show that according to the type of object attribute domains, it is possible to choose the best data structure for objects storage and also top-k algorithm for efficient top-k problem solving. 1 Introduction Nowadays, users of various systems are trying to find various objects, such as flats, cars, holiday stays, etc. These objects have more various attributes. According to the values of these attributes, each user is finding objects with other values of attributes [12][14]. In general, users find a few most convenient objects for him/her. Sometimes, a user looks for only one best object, for example, he/she can buy only one flat. In this paper we assume the set of objects of the same type, which is stored in one data structure. This data structure is common for all users. It is possible to find the best K objects from the set of objects X with more attributes, only if it is possible to decide which objects are better or worse for a user. In this sense we can use a ranking function. Moreover, every user prefers objects with own preferences. 1.1 Related work The problem of searching the best K objects according to values of different attributes in the same time is indicated as a top-k problem [12][13]. In last few years, research of top-k problem solvings is in progress in various domains such as relational databases [1], XML [10], multimedia search [12], the Web [14], or distributed systems [11]. In this paper we focus on the family of Fagin s algorithms [13], which has been widely studied for efficiently computing top-k queries. These algorithms assume that set of objects is stored in lists and the ranking functions are monotone. We say that an f is monotone if f(x 1,x 2,...,x m ) f(x 1,...,x m), whenever x i x i, for every i. However, the ranking functions are not necessarily monotone. In [6] and [7] were presented top-k problem solvings using arbitrary ranking functions. These two approaches use an analytic expression of a ranking function and treeoriented data structures. OPT* algorithm [7] uses indexation of all attributes by B + -trees and in [6] authors use indexation by B + -trees, too. In this paper user preferences are modelled locally with fuzzy functions and globally with an aggregation function [3][8], i.e. we are using arbitrary ranking functions. Moreover, in context of local preferences, we focus on nominal attributes and ordinal attributes. Because of local preferences application, we describe usage of B + -tree for sorting objects according to a fuzzy function [2][3][5]. We focus on searching the best K objects without accessing all the objects. Therefore we deal with methods and data structures for effective top-k problem solving via Fagin s TA-algorithm [13] and also MD-algorithm [2], which is based on multidimensional B-tree (MDB-tree) [15].

2 1.2 Main contribution In this paper TA-algorithm and MD-algorithm use data structures based on B + -trees. These tree-oriented data structures are independent on user preferences. Moreover, it is possible to update these data structures easily and quickly. We developed a new top-k algorithm, MXT-algorithm, and a new data structure based on B + -trees (MDB-tree with lists), in which MXT-algorithm can effectively find the best K objects according to user preferences without accessing all the objects. We present a comparison of MXT-algorithm with the results of TA-algorithm and MD-algorithm. Next we show that MXT-algorithm in some cases achieves the best results. Moreover, we show that according to the type of object attributes, it is possible to choose the best data structure for objects storage and also top-k algorithm for efficient top-k problem solving. 1.3 Paper organization The paper is organized as follows. Section 2 describes top-k problem and user preferences. Section 3 is devoted to explaining principles of TA-algorithm and MD-algorithm. Section 4 describes application of local user preferences in these algorithms. In Section 5, we describe our new MXT-algorithm and new data structure, which the MXTalgorithm uses. Section 6 presents the results of the tests TA-algorithm, MD-algorithm, MXT-algorithm, and their comparison for various data sets and various user preferences. Finally, Section 8 provides some suggestions for a future research. 2 Top-K problem Top-K problem is searching the best top-k objects. In this article, we suppose a set of objects X with m attributes A 1,...,A m. Every object x X has m values a x i,...,ax m of these attributes. 2.1 Rating function It is most suitable to use a rating function (ranking function), which assigns rating for each object x X. In this paper, we suppose a function R with m variables a 1,...,a m specified by scheme R(a 1,...,a m ) : [0,1] m [0,1]. We denote a rating of object x X as a function R(x) = R(a x i,...,ax m) with one variable. R(x) maps every object x X according to the m attribute values into interval [0,1]. For the worst object x X, R(x) = 0 holds, and for the best one, R(x) = 1 holds. According to R it is possible to sort objects from X in descending order and determine the best top-k objects. In this work we suppose that if there are more objects with the same rating as rating of the best K-th object, a random object is chosen. 2.2 User preferences In this paper, we consider a solution of top-k problem for more users with various user preferences. Every user chooses his/her user preferences, which determine suitability of the object x X in dependence on its m values of attributes. In this work, we differentiate between local preferences and global preferences Local preferences Local preferences reflect how the object is preferred according to only one attribute. In this case, we express local preference for i-th attribute A i, as a fuzzy function f i. Fuzzy function f i is understood as a mapping f i : dom(a i ) [0,1], which maps every value of actual attribute A i domain into [0, 1] interval. Local preferences of user U for the attributes A 1,..., A m are represented by user fuzzy functions denoted as f1 U (x),...,fm(x), U respectively. Then a user fuzzy function fi U(x) : ax i [0, 1], where i = 1,...,m, maps every object x X according to the value of its i-th attribute a x i into interval [0, 1]. In general, we differentiate two possible attribute types. Nominal attributes. Nominal attribute has a finite range of possible values, usually strings. For a nominal attribute, the user has to set a rating of each attribute value. For example, brand or kind of some products is a nominal attribute. Ordinal attributes. Ordinal attribute has some natural value ordering, other than lexical ordering. Typical examples are integer numbers. The domain of ordinal attributes is subset of continuous interval. In this case, it is possible to use as the user fuzzy function a continuous function. For example, a price is usually ordinal attribute Global preferences Global preferences express how the user U prefers objects from X according to all attributes A 1,...,A m. We introduced local preferences, where user U prefers every single attribute A i by fuzzy function fi U. In this case, the global preference of user U defines some mutual relations between the attributes A 1,...,A m. We consider aggregation function, which we denote with m variables p 1,...,p m specified 1,...,p m ) : [0,1] m [0,1]. For the user U with his/her user fuzzy functions f1 U,...,fm, U a user rating function R U originates

3 by means of substitution of p i = fi U(x). Then RU (x) U (x),...,fm(x)), U for every x X. With R U (x) it is possible to evaluate global rating of each object x X and to find top-k objects for user U. With aggregation function, a user U can define the mutual relations of the attributes. In practical applications, for implementation of user influence to the aggregate function, it is possible to use weighted average, where weights w 1,...,w m of single attributes A 1,...,A m determine how the user prefers single attributes, i.e. R U (x) = w 1 f U 1 (x) w m f U m(x) w w m. When the user does not care about i-th attribute A i, he/she can then set w i = 0 in the aggregate function. 3 Top-K algorithms We denote algorithms, which solve the top-k problem, as top-k algorithms. The easiest solution how to find the best K objects is to read all objects x X and for every object x to calculate its rating. Then K objects with the highest rating are chosen. In this case, all objects x X have to be accessed. In this section, we show two top-k algorithms, Fagin s TA-algorithm and MD-algorithm, which solve top-k problem without searching of all objects. These algorithms can find the best K objects according to the aggregate function. 3.1 Fagin s TA-algorithm Fagin et al. describe in [13] top-k algorithm TA (threshold algorithm). This algorithm assumes that the objects are stored in m lists L 1,...,L m. Each i-th list L i contains pairs (x,a x i ) for all objects x X and it is sorted in descending order according to the values of i-th attribute. The aggregate must be monotone according to the ordering in lists, e.g. weighted average. TA-algorithm searches the lists sequentially and obtains pairs (x,a x i ). For every object x, which is detected for the first time in obtained pair (x,a x i ), TA-algorithm obtains the missing attribute values of x by a direct access to the other lists and calculates rating of object x, which we TA-algorithm uses the temporary list T K, in which it keeps the best actual K objects ordered according Rating of the K-th best object in T K is denoted M K. TAalgorithm uses a threshold T h last 1,...,a last m ), where a last 1,...,a last m are the last seen values of attributes in the lists L 1,...,L m in the sequential access. When T h M K, TA-algorithm is able to stop and return T K, which contains the best K objects according TA-algorithm can finish before it comes to the end of all the lists [13]. It means that all the object need not be accessed. Figure 1. Set of six objects with values of three attributes stored in sorted lists. The following pseudo-code describes TA-algorithm. The procedure getnextpair obtains next pair (x,a x i ) from one of the list L 1,...,L m sequentially [13][9]. Input: Lists L 1,...,L m, int K; Output: List T K ; var List T K ; begin while( T K < K or T h > M K )do (x,a x i ) = getnextpair(l 1,...,L m ); a last i = a x i ; T h last 1,...,a last m ); if(x / T K )then get the missing attribute values of the object x; if( T K < K)then insert x to the list T K on according else if(@(x) > M K )then begin delete K-th object from the list T K insert x to the list T K according end; endwhile; return T K ; Example 1. Figure 1 contains six objects with values of three attributes stored in sorted lists L 1,L 2,L 3. If TA-algorithm is searching the best three objects according to aggregate = a x 1 + a x 2 + a x 3, then TAalgorithm gets only three pairs (x,a x i ) from each of the lists. In this moment T h last 1,...,a last m ) = 1.8, T K includes three objects with 1 ) = 2.4,@(x 3 ) = 2.2,@(x 4 ) = 2.0, respectively, and holds M K 4 ) = 2.0. Then T h M K holds and TAalgorithm is able to stop and it need not read object x MD-algorithm based on MDB-tree Now we describe MD-algorithm [2], which efficiently solves top-k problem with using the multidimensional B-

4 search. Analogously to the TA-algorithm, MD-algorithm uses the temporary list T K, in which it keeps the best actual K objects ordered according Rating of the K-th best object in T K is denoted M K. MD-algorithm can find best K objects in MDB-tree with the recursive procedure findtopk according to a monotone aggregate and without getting all the objects. The next statement specifies this fact more precisely. Figure 2. Set of eleven objects with values of three attributes stored in MDB-tree. tree (MDB-tree) [15]. MDB-tree allows to index set of objects X by attributes A 1,...,A m, m > 1, in one data structure. In this case, MDB-tree has m levels and values of one attribute are stored in one level. We use a variant of MDB-tree, nodes of which are B + -trees. i-th level of MDBtree is composed from B + -trees containing key values from dom(a i ). For explanation of MD-algorithm, we introduce pointer of the key k, the identifier of B + -tree and the best rating of B + -tree [2]. The pointer of the key k in B + -tree in i-th level of MDBtree we denote by ρ(k i ). If i < m, then ρ(k i ) refers to B + -tree in (i + 1)-th level of MDB-tree. If i = m, i.e. B + -tree is in the last level of MDB-tree, then ρ(k i ) refers to object array, where objects with the same values of all the m attributes are stored. For explicit identification of B + -tree in MDB-tree, we use the sequence of keys called tree identifier here. Tree identifier of B + -tree in i-th level is (k 1,..., k i 1 ). B + -tree in the first level of MDB-tree has tree identifier ( ). In Figure 2, (k 1, k 2 ) = (1.0, 0.0) is the identifier of B + -tree at the third level, which contains keys 0.0, 0.7 and refers to objects x F, x G, x H. In MDB-tree we use a best rating B(S) of B + -tree S. For every B + -tree S in MDB-tree there is a uniquely defined subset of X, which we call a set of available objects from S. For example, in Figure 2, the X S of S with identifier (0.0) contains objects x A, x B, x C, x D. By the best rating B(S) of B + -tree S with identifier (k 1,..., k i 1 ) in the i-th level of MDB-tree we understand the maximal possible rating of not yet known object x from the set of available objects from B + -tree S. Analogously to TA-algorithm we assume that aggregate is nondecreasing in all its variables. Then B(S) is calculated x 1,...,a x m), where the first i 1 attributes values of the object x are k 1,...,k i 1 and values of other attributes are 1 (max. of interval [0, 1]), i.e. B(S) 1,...,k i 1,1,...,1). MD-algorithm is based on a recursive procedure findtopk, which searches MDB-tree in depth-first Statement 1. [2] Let the key k i from the B + -tree S with the identifier (k 1,...,k i 1 ) be the key obtained one by one in descending order by a run of procedure findtopk. The pointer ρ(k i ) refers to B + -tree P in the next level or to the object array P, the best rating of P is B(P) 1,...,k i,1,...,1) and aggregate is monotone. If B(P) M K holds, no such object x X P can not get in T K. Moreover, it is not necessary to obtain a next key ki next from B + -tree S, which refers to P next, because ki next k i and it mneans that B(P next ) B(P) M K. The procedure findtopk can stop in B + -tree S, because no such object x X S can not get in T K. The following pseudo-code describes MD-algorithm. Input: MDBtree MDB-tree, int K; Output: List T K ; var List T K ; begin findtopk(mdb-tree, ( K); return T K ; procedure findtopk(mdbtree MDB-tree, TreeId (k 1,...,k i 1 ), int K); while(exists next key in B + -tree (k 1,...,k i 1 ))do k i = getnextkey(mdb-tree,(k 1,...,k i 1 )); {ρ(k i ) refers to B + -tree P or to object array P } if( T K = K and B(P) M K )then return; {Statement 1.} if(p is B + -tree)then findtopk(mdb-tree, (k 1,...,k i K); if(p is the object array)then while(there is the next object x in P )do if( T K < K )then insert object x to T K according else if(@(x) > M K )then begin delete K-th object from the list T K insert object x to the list T K end; endwhile; endwhile;

5 4 Application of local preferences This section discusses application support of local preferences (see Section 2.2.1) in TA-algorithm and MDalgorithm. For application of local user preferences, we used B + -tree. Moreover, this structure is common for all users and independent on users preferences [2]. 4.1 Usage of B + -tree Figure 3. MD-algorithm is searching the best object in MDB-tree according to aggregate function. procedure getnextkey(mdbtree MDB-tree, TreeId (k 1,...,k i 1 )); choose the next key k i with next highest value of A i in B + -tree of MDB-tree with identifier (k 1,...,k i 1 ); return k i ; Example 2. In Figure 3, fourteen objects with values of three attributes are stored in MDB-tree with three levels. MD-algorithm is searching the best object according to aggregate = a x 1 + a x 2. MD-algorithm starts in B + -tree ( ) and it obtains key 1.0, which refers to B + -tree (1.0). MD-algorithm obtains key 0.5, which refers to object array, where it obtains object W with = 1.5. MD-algorithm inserts object W the temporary list T K, because T K is empty. Then MD-algorithm obtains next key 0.4 in B + -tree (1.0). This key refers to object array, which has the best rating smaller than rating of the K-th best object in T K. MD-algorithm can stop in B + -tree (1.0) and continues in B + -tree ( ). It obtains next key 0.8, which refers to B + - tree (0.8). In this B + -tree MD-algorithm obtains key 0.8, which refers to object array, where object M with = 1.6 is obtained. MD-algorithm deletes object W from the list T K and inserts object M to the list T K, Then MD-algorithm obtains next key 0.0 in B + -tree (0.8). This key refers to object array, which has the best rating smaller than rating of the K-th best object in T K. MD-algorithm can stop in B + -tree (0.8) and continues in B + -tree ( ) in the first level of MDB-tree. MD-algorithm obtains next key 0.6, which refers to B + -tree (0.6). The best rating of this B + -tree is 1.6. It is less and MD-algorithm can stop. The best object according to the aggregate function is M. It means that MD-algorithm does not search all the objects in MDB-tree. In B + -tree the keys are sorted in ascending order. Since the leaf nodes of the B + -tree are linked in two directions, it is possible to cross the B + -tree through the leaf level and to get all the keys. Therefore, it is possible to obtain objects from B + -tree in descending order according to course of user fuzzy function f U [2][3][5]. When the user fuzzy function f U is monotone on its domain then the following holds. Let the f U be nondecreasing. We have to cross the leaf level of the B + -tree from the right to the left. It is possible to get the pairs (x,f U (x)) in the descending order according to the user preference f U, because a x a y f U (x) f U (y) holds. Let the f U be nonincreasing. We have to cross the leaf level of the B + -tree from the left to the right. It is possible to get the pairs (x,f U (x)) in the descending order according to the user preference f U, because a x a y f U (x) f U (y) holds. In general, user fuzzy function f U might not be monotone in its domain. In this case, the domain can be divided into continuous intervals, where f U is monotone on each of these intervals. Then the leaf level of B + -tree is divided into some parts according to the intervals. From these parts, objects can be obtained concurrently according f U as well as for nondecreasing and nonincreasing fuzzy functions [2]. Example 3. Figure 4 shows a fuzzy function f U and B + -tree. The domain of f U is divided into monotone intervals w 1,...,w 5. Objects are obtained from the B + -tree concurrently according to these intervals. Finally, we get objects x K, x M, x N, x G, x H, x F, x T, x S, x E, x C, x Q, x U, x R, x D, x Y in descending order according to f U. 4.2 Application in TA-algorithm Original TA-algorithm (see Section 3.1) offers the possibility to rate objects with aggregate and to find the best K object for the user U only according to his/her global preference. For the support of the local preferences, it is necessary that every i-th list L i contains pairs (x,fi U (x)) in descending order according to user fuzzy function fi U(x).

6 5 MXT-algorithm In this section, we describe a new top-k algorithm, which is based on integration of MD-algorithm and Fagin s TA-algorithm. This new algorithm can also find the best K objects according to aggregate without searching of all objects. 5.1 Usage of TA- and MD-algorithm Figure 4. An example of objects obtained from the B + -tree concurrently according to a fuzzy function. Therefore, TA-algorithm uses as the lists L 1,...,L m a set of m B + -trees B 1,...,B m. In B + -tree B i, all objects are indexed by values of i-th attribute A i. TA-algorithm can search B + -trees B 1,...,B m sequentially. Pairs (x,f U i (x)) can be obtained one by one from B + -tree B i. TA-algorithm also uses the direct access to the lists L 1,..., L m, where for object x, it is needed to obtain its unknown value a x i from L i. Because B + -tree is not able to make this operation, for a realization of direct access we can use, for example, an associative array. 4.3 Application in MD-algorithm Because MDB-tree is composed from B + -trees, it is possible to apply the local user preferences directly in MD-algorithm by obtaining keys from every B + -tree. The following procedure getnextkey changes the MDalgorithm. procedure getnextkey(mdbtree MDB-tree, TreeId (k 1,...,k i 1 ), FuzzyFunction f U i ); choose the next key k i with next highest value of f U i (k i) in B + -tree of MDB-tree with identifier (k 1,...,k i 1 ); return k i ; In general, during the computations of TA-algorithm and MD-algorithm the number of accessed objects is less than the number of all objects. The number of accessed objects depends on more factors. MD-algorithm has the best results, when the objects stored in MDB-tree have uniform distribution [2]. When attributes of objects have different sizes of their actual domains, the order of attributes in levels of MDB-tree is very important for efficiency of MD-algorithm. For MDalgorithm it is better to build MDB-tree with small-sized domains in its higher levels and attributes with big-sized domains in its lower levels. When most of the attributes have big-sized actual domains, the usage of MD-algorithm is not suitable solution of top-k problem. In this case, the usage of TA-algorithm is more suitable. Example 4. We had data about of flats for rent in Prague at disposal. These flats have four important attributes for users, District, Type, Area, and Price. These attributes have the following domain sizes: dom(district) = 10, dom(t ype) = 10, dom(area) = 229, dom(p rice) = 411. When a user prefers attributes District and Type, then it is better to store flats in MDB-tree and to use MD-algorithm. On the other hand, when a user prefers attributes Area and Price, then it is better to use TA-algorithm and to store flats in Fagin s lists. In general, the attribute with a small-sized domain is nominal and attribute with big-sized domain is ordinal (see Section 2). This is valid also in Example 4. Attributes District and Type are nominal attributes, Area and Price are ordinal attributes. 5.2 Integration of TA- and MD-algorithm For a set of objects with more nominal attributes and more ordinal attributes, we developed a new top-k algorithm, MXT-algorithm, which is based on integration of MD-algorithm and Fagin s TA-algorithm. MXT-algorithm uses a new data structure, MDB-tree with lists, which is composed of MDB-tree and Fagin s sorted lists.

7 Figure 5. MDB-tree with lists, in which a set of objects with two nominal attributes and two ordinal attributes is stored. We suppose a set of objects X with m attributes A 1,...,A m. Attributes A 1,...,A n are nominal attributes and A n+1,...,a m are ordinal attributes. Attributes A 1,...,A n are stored in MDB-tree with n levels. Instead of the following m n levels of MDB-tree, groups of m n Fagin s sorted lists are used. These lists contain pairs (x,a x i ) with values of attributes A n+1,...,a m. MDB-tree with lists is shown in Figure 5. Two nominal attributes are stored as MDB-tree and two ordinal attributes are stored as groups of Fagin s lists. MXT-algorithm uses also the temporary list T K, in which it keeps the best actual K objects ordered according Rating of the K-th best object in T K is denoted M K. MXT-algorithm is developed on the base of MD-algorithm. Values of the first n attributes A 1,...,A n are searched in the same way as during the computation of MD-algorithm. Analogously to MD-algorithm, Statement 1 holds for MXT-algorithm (see Section 3.2) In every B + -tree in n-th level of MDB-tree, there are keys with pointers, which refer to groups of m n Fagin s sorted lists. In each of these groups a new instance of TAalgorithm is run. Each instance of TA-algorithm uses a local threshold Th local. It is not needed to obtain the best K objects from each the group of Fagin s lists. It is sufficient that Th local is compared with M K in temporary list T K, because MXTalgorithm is not searching the best K objects in a group of m n Fagin s sorted lists, but the best K objects throughout the whole data structure. Analogously to the TA-algorithm, when Th local M K holds in an instance of TA-algorithm, this instance is able to stop. Then computation of MXT-algorithm continues as in MD-algorithm. The efficiency of MXT-algorithm is based on idea, that during the computation of MXT-algorithm, it is not needed to obtain the best K objects from each the group of Fagin s lists, i.e. only objects > M K. In Figure 5, MXT-algorithm is searching for the best few objects. Under dotted line, a part of the data structure, in which the MXT-algorithm does not access during its computation, is illustrated. The following pseudo-code describes MXT-algorithm. Procedure getnextkey is the same as in the MDalgorithm (see Section 3.2). Procedure getnextpair obtains next pair (x,a x i ) from one of the list L 1,...,L m n sequentially as in the TA-algorithm (see Section 3.1). Input: MDBtree MDB-tree, int K; Output: List T K ; var List T K ; {temporary list of objects} begin findtopk(mdb-tree, ( K); return T K ; procedure findtopk(mdbtree MDB-tree, TreeId (k 1,...,k i 1 ), int K); while(exists next key in B + -tree (k 1,...,k i 1 ))do k i = getnextkey(mdb-tree, (k 1,...,k i 1 )); {ρ(k i ) refers to B + -tree P or to group of lists P } if( T K = K and B(P) M K )then return; {Statement 1.} if(p is B + -tree)then findtopk(mdb-tree, (k 1,...,k i K); if(p is group of lists)then while( T K < K or Th local > M K )do (x,a x i ) = getnextpair(l 1,...,L m ); a last Th local i = a x i ; last 1,...,a last m ); if(x / T K )then get the missing attribute values of the x; if( T K < K)then insert object x to the list T K on the right place else if(@(x) > M K )then begin delete K-th object from the list T K ; insert object x to list T K end; endwhile; endwhile; 5.3 Application of Local Preferences Analogously to MD-algorithm, for attributes A 1,...,A n it is possible to apply the local user preferences. Procedure getnextkey changes MXT-algorithm in the same way as in the MD-algorithm (see Section 4.3). For attributes A n+1,...,a m, it is also possible to apply the local user preferences. Analogously to TA-algorithm, we use as the group of m n Fagin s lists L 1,...,L m n a group of m n B + -trees B 1,...,B m n (see Section 4.2).

8 Moreover, after we were making these modifications, we obtain tree-oriented data structure, which is composed only of B + -trees. In other words, it is an MDB-tree with n levels, where leaf nodes of B + -trees in n-th level refer to a group of m n B + -trees. 6 Experiments We implemented and tested the described top-k algorithms. The implementation of TA-algorithm, MD-algorithm and MXT-algorithm have been developed in Java with tree-oriented data structures created in memory. Important for us was the number of accesses into these data structures during calculation of top-k algorithms. We tested the top-k algorithms. During the tests, we used user fuzzy functions with linear course as user local preferences and the arithmetic average as user global preference. Objects from X with their m attributes values were stored in data structures considered. Obtaining one attribute value of one object we conceive as one access into data structures. We can simulate access to external memories in this way. 6.1 Distribution of Attribute Values At first, we tested two sets of objects with 5 attributes with normal and uniform distribution of attribute values. We used TA-algorithm, MD-algorithm and three variants of MXT-algorithm, i.e. MXT 3, MXT 2 and MXT 1. For example, MXT 3 uses first 3 nominal attributes, which are stored as MDB-tree with 3 levels, and other 2 attributes are stored as groups of 2 Fagin s sorted lists. Figure 6 and Figure 7 show results of this test. The best results have been achieved with MXT 3 and MD-algorithm for the set of objects with the uniform distribution of the attributes values. The test for sets of objects with normal distribution of attribute values has shown Figure 7. Uniform distribution of attributes values. that the new MXT-algorithm can in some cases also achieve worst results. 6.2 Flats for Rent We tested the sets of flats for rent in Prague (see Section 5.1, Example 4). There were two nominal attributes with a small domain size and two ordinal attributes with a big domain size. We used TA-algorithm, MD-algorithm and the most suitable variant of MXT-algorithm. Figure 8 shows results of this test. Figure 8. Finding the best flats in Prague. The best result has been achieved with MXT-algorithm. This test shown that MXT-algorithm is most efficient solution of top-k problem in this case. In practice, most objects, which are searched by users, have more nominal attributes and more ordinal attributes. In this case, it is suitable to use MXT-algorithm. Figure 6. Normal distribution of attributes values. 6.3 Various user preferences In general, it is problematic to test efficiency of top-k algorithms in dependence on user preferences. For various

9 settings of the user preferences and for various distributions of attribute values, top-k algorithms achieve different results. In this subsection we focus on global user preference expressed by weighted average (see Section 2.2.2). We used a set of objects with two nominal attributes and three ordinal attributes. The distribution of attribute values was uniform. Figure 9 shows results of the test, where the weight for each attribute was the same. In this test, the worst results have been achieved with TA-algorithm. For choosing the best objects, MXT-algorithm and MD-algorithm needed more than 10 times less accesses than the TAalgorithm. MXT-algorithm and MD-algorithm achieved nearly the same results. This shows that using MXT-algorithm and MDalgorithm is the most efficient solution for set of objects with more nominal attributes, which are preferred by users. Figure 10 shows results of the test, where weights of nominal attributes were equal to 0. In this test, the result of searching the best K objects is independent on values of nominal attributes. TA-algorithm is not disadvantaged and achieves good results. Finally, Figure 11 shows the test, where MXT-algorithm achieves the best results in number of accesses. There the weight of the first ordinal attribute was equal to 0. MDalgorithm achieved worse results than MXT-algorithm, because of the attributes order in levels of MDB-tree. The weight of attribute just in third level was 0 and MD-algorithm was often searching B + -trees in this level without rising of best rating B(S) (see Section 3.2). These and other tests, which we accomplished, have shown that MXT-algorithm is efficient solution of top-k problem in some cases. On the other hand, TA-algorithm and MD-algorithm achieved the best results in some different cases. Figure 9. Weights of all the attributes were the same. MXT-algorithm and MD-algorithm achieved nearly the same results. Figure 10. Weights of nominal attributes were equal to 0. Figure 11. Weight of first ordinal attribute was 0. 7 Conclusion We developed a new MXT-algorithm, which can efficiently find the best K objects by user preferences without accessing all the objects. We implemented top-k algorithms TA-algorithm, MD-algorithm and MXT-algorithm with support of user preferences. MXT-algorithm is based on integration TA-algorithm and MD-algorithm. Results of MXT-algorithm have shown that it is comparable with results obtained by other top-k algorithms. According to the types of object attributes, it is possible to store these objects in MDB-tree, in Fagin s lists or in MDB-tree with lists. Each one of the implemented top-k algorithms searches in a different data structure. According to the properties of set of objects we can decide in which of the data structures objects should be stored, in order for the objects to be searched by the most efficient top-k algorithm. The process of choosing the best data structure can be automated according to analyzing attribute domains. In this sense, information about the types and sizes of attribute do-

10 mains are important. MXT-algorithm is the most efficient solution of top-k problem in some cases. Especially, it is efficient for a set of objects, which has several nominal attributes with smallsized domains and several ordinal attributes with big-sized domains. Moreover, MXT-algorithm can find the best K objects in each of these tree-oriented data structures. In this sense, TA-algorithm and MD-algorithm are extreme cases of the MXT-algorithm. Because of MXT-algorithm construction, it can be also interesting that some of instances of TA-algorithm should be computing continuously. It can be also interesting to develop MXT-algorithm with usage of parallel computing. In this work, we used a model of preferences based on local and global user preferences. In future work, we can use user preferences based on different models. For example, when a dependence exists between values of more attributes, a user has to set his/her preference together for these attributes. In this case, we should evolve some modifications of top-k algorithms. It can be a new direction of future research. Motivation of future research can be also to find application of developed algorithms in diferent contexts. For example, in some cases it is needed to find K objects most similar to a given query object. Similarities between objects are most often computed as aggregated similarities of their attribute values. In [4] is described multi-dimensional indexing of non-metric spaces and top-k algorithm, which performs much better than the family of Fagin s algorithms. Some attribute values can be stored on remote servers. In this case, some of attribute values from web-accessible external sources might not be available in the same time. In [14] authors study how to process top-k queries efficiently in this setting. Implementation of our top-k algorithms in environment of more information resources could be a next direction of our research. 8 Acknowledgments This research is supported by Grant Agency of Charles University (GAUK), grant number 9209 (204-10/259011), Charles University in Prague, Czech republic. References [1] Ilyas, I. F., Beskales, G., and Soliman, M. A A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40, 4 (Oct. 2008), [2] Ondreička, M., Pokorný J.: Extending Fagin s algorithm for more users based on multidimensional B-tree. In: Proc. of ADBIS 2008, P. Atzeni, A. Caplinskas, and H. Jaakkola (Eds.), LNCS 5207, Springer-Verlag Berlin Heidelberg, 2008, pp [3] Gurský P., Vaneková V., Pribolová J.: Fuzzy User Preference Model for Top-k Search. In: Proceedings of IEEE World Congress on Computational Intelligence (WCCI), Hong Kong, FS0377, [4] Deshpande, P. M., P, D., and Kummamuru, K Efficient online top-k retrieval with arbitrary similarity measures. In Proceedings of the 11th international Conference on Extending Database Technology: Advances in Database Technology (Nantes, France, March 25-29, 2008). EDBT 08, vol ACM, New York, NY, pp [5] Eckhardt, A., Pokorný, J., Vojtáš, P.: A system recommending top-k objects for multiple users preference. In: Proc. of 2007 IEEE International Conference on Fuzzy Systems, July 24-26, 2007, London, England, pp [6] Xin, D., Han, J., and Chang, K. C.: Progressive and selective merge: computing top-k with ad-hoc ranking functions. In: Proc. of the 2007 ACM SIGMOD international Conference on Management of Data (Beijing, China, June 11-14, 2007). SIGMOD 07. ACM, New York, NY, pp [7] Zhang, Z., Hwang, S., Chang, K. C., Wang, M., Lang, C. A., and Chang, Y.: Boolean + ranking: querying a database by k-constrained optimization. In Proc ACM SIGMOD international Conference on Management of Data, Chicago, IL, USA, June 27-29, 2006, pp [8] Vojtáš, P.: Fuzzy logic aggregation for semantic web search for the best (top-k) answer. Capturing Intelligence, Chapter 17 Volume 1, 2006, pp [9] Gurský, P., Lencses, R., Vojtáš, P.: Algorithms for user dependent integration of ranked distributed information. In: Proceedings of TED Conference on e-government (TCGOV 2005), pp , [10] Marian, A., Amer-Yahia, S., Koudas, N., Srivastava, D.: Adaptive Processing of Top-k Queries in XML. In Proc. of the 21st international Conference on Data Engineering, April 05-08, 2005, ICDE IEEE Computer Society, Washington, DC, pp [11] Michel, S., Triantafillou, P., and 3kum, G.: KLEE: a framework for distributed top-k query algorithms. In Proc. of the 31st international Conference on Very Large Data Bases (Trondheim, Norway, August 30 - September 02, 2005). Very Large Data Bases. VLDB Endowment, pp [12] Chaudhuri, S., Gravano, L., Marian, M.: Optimizing Top-k Selection Queries over Multimedia Repositories. IEEE Trans. On Knowledge and Data Engineering, August 2004 (Vol. 16, No. 8), pp [13] Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences 66, 2003, pp [14] Bruno, N., L. Gravano, L., Marian, A.: Evaluating top-k queries over web-accessible databases. In: Proc. of ICDE, 2002, pp [15] Scheuerman, P., Ouksel, M.: Multidimensional B-trees for associative searching in database systems. Information systems, Vol. 34, No. 2, 1982, pp

Combination of TA- and MD-algorithm for Efficient Solving of Top-K Problem according to User s Preferences

Combination of TA- and MD-algorithm for Efficient Solving of Top-K Problem according to User s Preferences Combination of TA- and MD-algorithm for Efficient Solving of Top-K Problem according to User s Preferences Matúš Ondreička and Jaroslav Pokorný Department of Software Engineering, Faculty of Mathematics

More information

On top-k search with no random access using small memory

On top-k search with no random access using small memory On top-k search with no random access using small memory Peter Gurský and Peter Vojtáš 2 University of P.J.Šafárik, Košice, Slovakia 2 Charles University, Prague, Czech Republic peter.gursky@upjs.sk,peter.vojtas@mff.cuni.cz

More information

Combining Fuzzy Information - Top-k Query Algorithms. Sanjay Kulhari

Combining Fuzzy Information - Top-k Query Algorithms. Sanjay Kulhari Combining Fuzzy Information - Top-k Query Algorithms Sanjay Kulhari Outline Definitions Objects, Attributes and Scores Querying Fuzzy Data Top-k query algorithms Naïve Algorithm Fagin s Algorithm (FA)

More information

Peter Gurský. Institute of Computer Science, Faculty of Science.

Peter Gurský. Institute of Computer Science, Faculty of Science. Towards TowardsBetter better Semantics semantics in in the the multifeature Multifeature Querying querying Peter Gurský Peter Gurský Institute of Computer Science, Faculty of Science Institute of P.J.Šafárik

More information

A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods

A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods S.Anusuya 1, M.Balaganesh 2 P.G. Student, Department of Computer Science and Engineering, Sembodai Rukmani Varatharajan Engineering

More information

Comparison of of parallel and random approach to

Comparison of of parallel and random approach to Comparison of of parallel and random approach to acandidate candidatelist listininthe themultifeature multifeaturequerying Peter Gurský Peter Gurský Institute of Computer Science, Faculty of Science Institute

More information

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA

SEQUENTIAL PATTERN MINING FROM WEB LOG DATA SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract

More information

modern database systems lecture 5 : top-k retrieval

modern database systems lecture 5 : top-k retrieval modern database systems lecture 5 : top-k retrieval Aristides Gionis Michael Mathioudakis spring 2016 announcements problem session on Monday, March 7, 2-4pm, at T2 solutions of the problems in homework

More information

DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY

DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY Reham I. Abdel Monem 1, Ali H. El-Bastawissy 2 and Mohamed M. Elwakil 3 1 Information Systems Department, Faculty of computers and information,

More information

Extending E-R for Modelling XML Keys

Extending E-R for Modelling XML Keys Extending E-R for Modelling XML Keys Martin Necasky Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic martin.necasky@mff.cuni.cz Jaroslav Pokorny Faculty of Mathematics and

More information

Compression of the Stream Array Data Structure

Compression of the Stream Array Data Structure Compression of the Stream Array Data Structure Radim Bača and Martin Pawlas Department of Computer Science, Technical University of Ostrava Czech Republic {radim.baca,martin.pawlas}@vsb.cz Abstract. In

More information

Rank-aware XML Data Model and Algebra: Towards Unifying Exact Match and Similar Match in XML

Rank-aware XML Data Model and Algebra: Towards Unifying Exact Match and Similar Match in XML Proceedings of the 7th WSEAS International Conference on Multimedia, Internet & Video Technologies, Beijing, China, September 15-17, 2007 253 Rank-aware XML Data Model and Algebra: Towards Unifying Exact

More information

Charles University in Prague Faculty of Mathematics and Physics DOCTORAL THESIS. RNDr. Matúš Ondreička

Charles University in Prague Faculty of Mathematics and Physics DOCTORAL THESIS. RNDr. Matúš Ondreička Charles University in Prague Faculty of Mathematics and Physics DOCTORAL THESIS RNDr. Matúš Ondreička Preference Top-k Search Based on Multidimensional B-tree Department of Software Engineering Supervisor:

More information

Model theoretic and fixpoint semantics for preference queries over imperfect data

Model theoretic and fixpoint semantics for preference queries over imperfect data Model theoretic and fixpoint semantics for preference queries over imperfect data Peter Vojtáš Charles University and Czech Academy of Science, Prague Peter.Vojtas@mff.cuni.cz Abstract. We present an overview

More information

Similarity Joins of Text with Incomplete Information Formats

Similarity Joins of Text with Incomplete Information Formats Similarity Joins of Text with Incomplete Information Formats Shaoxu Song and Lei Chen Department of Computer Science Hong Kong University of Science and Technology {sshaoxu,leichen}@cs.ust.hk Abstract.

More information

Revisiting the Upper Bounding Process in a Safe Branch and Bound Algorithm

Revisiting the Upper Bounding Process in a Safe Branch and Bound Algorithm Revisiting the Upper Bounding Process in a Safe Branch and Bound Algorithm Alexandre Goldsztejn 1, Yahia Lebbah 2,3, Claude Michel 3, and Michel Rueher 3 1 CNRS / Université de Nantes 2, rue de la Houssinière,

More information

Keyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan

Keyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan Keyword search in relational databases By SO Tsz Yan Amanda & HON Ka Lam Ethan 1 Introduction Ubiquitous relational databases Need to know SQL and database structure Hard to define an object 2 Query representation

More information

Generalized Coordinates for Cellular Automata Grids

Generalized Coordinates for Cellular Automata Grids Generalized Coordinates for Cellular Automata Grids Lev Naumov Saint-Peterburg State Institute of Fine Mechanics and Optics, Computer Science Department, 197101 Sablinskaya st. 14, Saint-Peterburg, Russia

More information

Using Natural Clusters Information to Build Fuzzy Indexing Structure

Using Natural Clusters Information to Build Fuzzy Indexing Structure Using Natural Clusters Information to Build Fuzzy Indexing Structure H.Y. Yue, I. King and K.S. Leung Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, New Territories,

More information

Inverted Index for Fast Nearest Neighbour

Inverted Index for Fast Nearest Neighbour Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,

More information

Top-k Keyword Search Over Graphs Based On Backward Search

Top-k Keyword Search Over Graphs Based On Backward Search Top-k Keyword Search Over Graphs Based On Backward Search Jia-Hui Zeng, Jiu-Ming Huang, Shu-Qiang Yang 1College of Computer National University of Defense Technology, Changsha, China 2College of Computer

More information

Striped Grid Files: An Alternative for Highdimensional

Striped Grid Files: An Alternative for Highdimensional Striped Grid Files: An Alternative for Highdimensional Indexing Thanet Praneenararat 1, Vorapong Suppakitpaisarn 2, Sunchai Pitakchonlasap 1, and Jaruloj Chongstitvatana 1 Department of Mathematics 1,

More information

Optimization Problems Under One-sided (max, min)-linear Equality Constraints

Optimization Problems Under One-sided (max, min)-linear Equality Constraints WDS'12 Proceedings of Contributed Papers, Part I, 13 19, 2012. ISBN 978-80-7378-224-5 MATFYZPRESS Optimization Problems Under One-sided (max, min)-linear Equality Constraints M. Gad Charles University,

More information

Updatable Indices for Efficient, Generalised Top-k Queries [Extended Abstract]

Updatable Indices for Efficient, Generalised Top-k Queries [Extended Abstract] Updatable Indices for Efficient, Generalised Top-k Queries [Extended Abstract] Sean Chester Supervised by Venkatesh Srinivasan, Alex Thomo, and Sue Whitesides schester@uvic.ca, venkat@cs.uvic.ca, thomo@cs.uvic.ca,

More information

XML Data in (Object-) Relational Databases

XML Data in (Object-) Relational Databases XML Data in (Object-) Relational Databases RNDr. Irena Mlýnková irena.mlynkova@mff.cuni.cz Charles University Faculty of Mathematics and Physics Department of Software Engineering Prague, Czech Republic

More information

Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection

Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection Petr Somol 1,2, Jana Novovičová 1,2, and Pavel Pudil 2,1 1 Dept. of Pattern Recognition, Institute of Information Theory and

More information

Nondeterministic Query Algorithms

Nondeterministic Query Algorithms Journal of Universal Computer Science, vol. 17, no. 6 (2011), 859-873 submitted: 30/7/10, accepted: 17/2/11, appeared: 28/3/11 J.UCS Nondeterministic Query Algorithms Alina Vasilieva (Faculty of Computing,

More information

Benchmarking the UB-tree

Benchmarking the UB-tree Benchmarking the UB-tree Michal Krátký, Tomáš Skopal Department of Computer Science, VŠB Technical University of Ostrava, tř. 17. listopadu 15, Ostrava, Czech Republic michal.kratky@vsb.cz, tomas.skopal@vsb.cz

More information

Hierarchical Ordering for Approximate Similarity Ranking

Hierarchical Ordering for Approximate Similarity Ranking Hierarchical Ordering for Approximate Similarity Ranking Joselíto J. Chua and Peter E. Tischer School of Computer Science and Software Engineering Monash University, Victoria 3800, Australia jjchua@mail.csse.monash.edu.au

More information

A NOVEL APPROACH ON SPATIAL OBJECTS FOR OPTIMAL ROUTE SEARCH USING BEST KEYWORD COVER QUERY

A NOVEL APPROACH ON SPATIAL OBJECTS FOR OPTIMAL ROUTE SEARCH USING BEST KEYWORD COVER QUERY A NOVEL APPROACH ON SPATIAL OBJECTS FOR OPTIMAL ROUTE SEARCH USING BEST KEYWORD COVER QUERY S.Shiva Reddy *1 P.Ajay Kumar *2 *12 Lecterur,Dept of CSE JNTUH-CEH Abstract Optimal route search using spatial

More information

Resolving Schema and Value Heterogeneities for XML Web Querying

Resolving Schema and Value Heterogeneities for XML Web Querying Resolving Schema and Value Heterogeneities for Web ing Nancy Wiegand and Naijun Zhou University of Wisconsin 550 Babcock Drive Madison, WI 53706 wiegand@cs.wisc.edu, nzhou@wisc.edu Isabel F. Cruz and William

More information

1. Discovering Important Nodes through Graph Entropy The Case of Enron Database

1. Discovering Important Nodes through Graph Entropy The Case of Enron  Database 1. Discovering Important Nodes through Graph Entropy The Case of Enron Email Database ACM KDD 2005 Chicago, Illinois. 2. Optimizing Video Search Reranking Via Minimum Incremental Information Loss ACM MIR

More information

Data Structure. IBPS SO (IT- Officer) Exam 2017

Data Structure. IBPS SO (IT- Officer) Exam 2017 Data Structure IBPS SO (IT- Officer) Exam 2017 Data Structure: In computer science, a data structure is a way of storing and organizing data in a computer s memory so that it can be used efficiently. Data

More information

Data Distortion for Privacy Protection in a Terrorist Analysis System

Data Distortion for Privacy Protection in a Terrorist Analysis System Data Distortion for Privacy Protection in a Terrorist Analysis System Shuting Xu, Jun Zhang, Dianwei Han, and Jie Wang Department of Computer Science, University of Kentucky, Lexington KY 40506-0046, USA

More information

Answering Top K Queries Efficiently with Overlap in Sources and Source Paths

Answering Top K Queries Efficiently with Overlap in Sources and Source Paths Answering Top K Queries Efficiently with Overlap in Sources and Source Paths Louiqa Raschid University of Maryland louiqa@umiacs.umd.edu María Esther Vidal Universidad Simón Bolívar mvidal@ldc.usb.ve Yao

More information

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632

More information

Evaluating XPath Queries

Evaluating XPath Queries Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But

More information

A Survey on Disk-based Genome. Sequence Indexing

A Survey on Disk-based Genome. Sequence Indexing Contemporary Engineering Sciences, Vol. 7, 2014, no. 15, 743-748 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ces.2014.4684 A Survey on Disk-based Genome Sequence Indexing Woong-Kee Loh Department

More information

DOT-K: Distributed Online Top-K Elements Algorithm with Extreme Value Statistics

DOT-K: Distributed Online Top-K Elements Algorithm with Extreme Value Statistics DOT-K: Distributed Online Top-K Elements Algorithm with Extreme Value Statistics Nick Carey, Tamás Budavári, Yanif Ahmad, Alexander Szalay Johns Hopkins University Department of Computer Science ncarey4@jhu.edu

More information

Implementation of Process Networks in Java

Implementation of Process Networks in Java Implementation of Process Networks in Java Richard S, Stevens 1, Marlene Wan, Peggy Laramie, Thomas M. Parks, Edward A. Lee DRAFT: 10 July 1997 Abstract A process network, as described by G. Kahn, is a

More information

Locating Objects in a Sensor Grid

Locating Objects in a Sensor Grid Locating Objects in a Sensor Grid Buddhadeb Sau 1 and Krishnendu Mukhopadhyaya 2 1 Department of Mathematics, Jadavpur University, Kolkata - 700032, India buddhadebsau@indiatimes.com 2 Advanced Computing

More information

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees

A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Fundamenta Informaticae 56 (2003) 105 120 105 IOS Press A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Jesper Jansson Department of Computer Science Lund University, Box 118 SE-221

More information

Accelerating XML Structural Matching Using Suffix Bitmaps

Accelerating XML Structural Matching Using Suffix Bitmaps Accelerating XML Structural Matching Using Suffix Bitmaps Feng Shao, Gang Chen, and Jinxiang Dong Dept. of Computer Science, Zhejiang University, Hangzhou, P.R. China microf_shao@msn.com, cg@zju.edu.cn,

More information

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA

UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University

More information

Mining Quantitative Association Rules on Overlapped Intervals

Mining Quantitative Association Rules on Overlapped Intervals Mining Quantitative Association Rules on Overlapped Intervals Qiang Tong 1,3, Baoping Yan 2, and Yuanchun Zhou 1,3 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {tongqiang,

More information

Analyzing a Greedy Approximation of an MDL Summarization

Analyzing a Greedy Approximation of an MDL Summarization Analyzing a Greedy Approximation of an MDL Summarization Peter Fontana fontanap@seas.upenn.edu Faculty Advisor: Dr. Sudipto Guha April 10, 2007 Abstract Many OLAP (On-line Analytical Processing) applications

More information

An Efficient Algorithm for Computing Non-overlapping Inversion and Transposition Distance

An Efficient Algorithm for Computing Non-overlapping Inversion and Transposition Distance An Efficient Algorithm for Computing Non-overlapping Inversion and Transposition Distance Toan Thang Ta, Cheng-Yao Lin and Chin Lung Lu Department of Computer Science National Tsing Hua University, Hsinchu

More information

Formal Model. Figure 1: The target concept T is a subset of the concept S = [0, 1]. The search agent needs to search S for a point in T.

Formal Model. Figure 1: The target concept T is a subset of the concept S = [0, 1]. The search agent needs to search S for a point in T. Although this paper analyzes shaping with respect to its benefits on search problems, the reader should recognize that shaping is often intimately related to reinforcement learning. The objective in reinforcement

More information

Generating Uniformly Distributed Pareto Optimal Points for Constrained and Unconstrained Multicriteria Optimization

Generating Uniformly Distributed Pareto Optimal Points for Constrained and Unconstrained Multicriteria Optimization Generating Uniformly Distributed Pareto Optimal Points for Constrained and Unconstrained Multicriteria Optimization Crina Grosan Department of Computer Science Babes-Bolyai University Cluj-Napoca, Romania

More information

Visualizing and Animating Search Operations on Quadtrees on the Worldwide Web

Visualizing and Animating Search Operations on Quadtrees on the Worldwide Web Visualizing and Animating Search Operations on Quadtrees on the Worldwide Web František Brabec Computer Science Department University of Maryland College Park, Maryland 20742 brabec@umiacs.umd.edu Hanan

More information

Fuzzy Sets, Multisets, and Rough Approximations

Fuzzy Sets, Multisets, and Rough Approximations Fuzzy Sets, Multisets, and ough Approximations Sadaaki Miyamoto (B) Department of isk Engineering, Faculty of Systems and Information Engineering, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki

More information

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM

INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM G.Amlu #1 S.Chandralekha #2 and PraveenKumar *1 # B.Tech, Information Technology, Anand Institute of Higher Technology, Chennai, India

More information

Dominant Graph: An Efficient Indexing Structure to Answer Top-K Queries

Dominant Graph: An Efficient Indexing Structure to Answer Top-K Queries Dominant Graph: An Efficient Indexing Structure to Answer Top-K Queries Lei Zou 1, Lei Chen 2 1 Huazhong University of Science and Technology 137 Luoyu Road, Wuhan, P. R. China 1 zoulei@mail.hust.edu.cn

More information

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India

Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the

More information

Best Keyword Cover Search

Best Keyword Cover Search Vennapusa Mahesh Kumar Reddy Dept of CSE, Benaiah Institute of Technology and Science. Best Keyword Cover Search Sudhakar Babu Pendhurthi Assistant Professor, Benaiah Institute of Technology and Science.

More information

Algorithms for Minimum m-connected k-dominating Set Problem

Algorithms for Minimum m-connected k-dominating Set Problem Algorithms for Minimum m-connected k-dominating Set Problem Weiping Shang 1,2, Frances Yao 2,PengjunWan 3, and Xiaodong Hu 1 1 Institute of Applied Mathematics, Chinese Academy of Sciences, Beijing, China

More information

IMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETS

IMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETS IMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETS Leontyev Denis Vasilevich, Kharitonov Dmitry Ivanovich and Tarasov Georgiy Vitalievich ABSTRACT Institute of Automation and

More information

A compromise method for solving fuzzy multi objective fixed charge transportation problem

A compromise method for solving fuzzy multi objective fixed charge transportation problem Lecture Notes in Management Science (2016) Vol. 8, 8 15 ISSN 2008-0050 (Print), ISSN 1927-0097 (Online) A compromise method for solving fuzzy multi objective fixed charge transportation problem Ratnesh

More information

IZAR THE CONCEPT OF UNIVERSAL MULTICRITERIA DECISION SUPPORT SYSTEM

IZAR THE CONCEPT OF UNIVERSAL MULTICRITERIA DECISION SUPPORT SYSTEM Jana Kalčevová Petr Fiala IZAR THE CONCEPT OF UNIVERSAL MULTICRITERIA DECISION SUPPORT SYSTEM Abstract Many real decision making problems are evaluated by multiple criteria. To apply appropriate multicriteria

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Form Identifying. Figure 1 A typical HTML form

Form Identifying. Figure 1 A typical HTML form Table of Contents Form Identifying... 2 1. Introduction... 2 2. Related work... 2 3. Basic elements in an HTML from... 3 4. Logic structure of an HTML form... 4 5. Implementation of Form Identifying...

More information

Performance Assessment of DMOEA-DD with CEC 2009 MOEA Competition Test Instances

Performance Assessment of DMOEA-DD with CEC 2009 MOEA Competition Test Instances Performance Assessment of DMOEA-DD with CEC 2009 MOEA Competition Test Instances Minzhong Liu, Xiufen Zou, Yu Chen, Zhijian Wu Abstract In this paper, the DMOEA-DD, which is an improvement of DMOEA[1,

More information

Security-Conscious XML Indexing

Security-Conscious XML Indexing Security-Conscious XML Indexing Yan Xiao, Bo Luo, and Dongwon Lee The Pennsylvania State University, University Park, USA xiaoyan515@gmail.com, {bluo,dongwon}@psu.edu Abstract. To support secure exchanging

More information

An Efficient XML Index Structure with Bottom-Up Query Processing

An Efficient XML Index Structure with Bottom-Up Query Processing An Efficient XML Index Structure with Bottom-Up Query Processing Dong Min Seo, Jae Soo Yoo, and Ki Hyung Cho Department of Computer and Communication Engineering, Chungbuk National University, 48 Gaesin-dong,

More information

A Note on Scheduling Parallel Unit Jobs on Hypercubes

A Note on Scheduling Parallel Unit Jobs on Hypercubes A Note on Scheduling Parallel Unit Jobs on Hypercubes Ondřej Zajíček Abstract We study the problem of scheduling independent unit-time parallel jobs on hypercubes. A parallel job has to be scheduled between

More information

TIMES A Tool for Modelling and Implementation of Embedded Systems

TIMES A Tool for Modelling and Implementation of Embedded Systems TIMES A Tool for Modelling and Implementation of Embedded Systems Tobias Amnell, Elena Fersman, Leonid Mokrushin, Paul Pettersson, and Wang Yi Uppsala University, Sweden. {tobiasa,elenaf,leom,paupet,yi}@docs.uu.se.

More information

C-NBC: Neighborhood-Based Clustering with Constraints

C-NBC: Neighborhood-Based Clustering with Constraints C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is

More information

Comparison of Evolutionary Multiobjective Optimization with Reference Solution-Based Single-Objective Approach

Comparison of Evolutionary Multiobjective Optimization with Reference Solution-Based Single-Objective Approach Comparison of Evolutionary Multiobjective Optimization with Reference Solution-Based Single-Objective Approach Hisao Ishibuchi Graduate School of Engineering Osaka Prefecture University Sakai, Osaka 599-853,

More information

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM

EFFICIENT ATTRIBUTE REDUCTION ALGORITHM EFFICIENT ATTRIBUTE REDUCTION ALGORITHM Zhongzhi Shi, Shaohui Liu, Zheng Zheng Institute Of Computing Technology,Chinese Academy of Sciences, Beijing, China Abstract: Key words: Efficiency of algorithms

More information

A Chosen-Plaintext Linear Attack on DES

A Chosen-Plaintext Linear Attack on DES A Chosen-Plaintext Linear Attack on DES Lars R. Knudsen and John Erik Mathiassen Department of Informatics, University of Bergen, N-5020 Bergen, Norway {lars.knudsen,johnm}@ii.uib.no Abstract. In this

More information

Searching SNT in XML Documents Using Reduction Factor

Searching SNT in XML Documents Using Reduction Factor Searching SNT in XML Documents Using Reduction Factor Mary Posonia A Department of computer science, Sathyabama University, Tamilnadu, Chennai, India maryposonia@sathyabamauniversity.ac.in http://www.sathyabamauniversity.ac.in

More information

EECS 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization

EECS 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization EECS 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization Dataflow Lecture: SDF, Kahn Process Networks Stavros Tripakis University of California, Berkeley Stavros Tripakis: EECS

More information

Combining Fuzzy Information: an Overview

Combining Fuzzy Information: an Overview Combining Fuzzy Information: an Overview Ronald Fagin IBM Almaden Research Center 650 Harry Road San Jose, California 95120-6099 email: fagin@almaden.ibm.com http://www.almaden.ibm.com/cs/people/fagin/

More information

Implementation of Skyline Sweeping Algorithm

Implementation of Skyline Sweeping Algorithm Implementation of Skyline Sweeping Algorithm BETHINEEDI VEERENDRA M.TECH (CSE) K.I.T.S. DIVILI Mail id:veeru506@gmail.com B.VENKATESWARA REDDY Assistant Professor K.I.T.S. DIVILI Mail id: bvr001@gmail.com

More information

Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce

Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce Huayu Wu Institute for Infocomm Research, A*STAR, Singapore huwu@i2r.a-star.edu.sg Abstract. Processing XML queries over

More information

Innovating R Tree for Message Forwarding Technique and Hierarchical Network Clustering in Service Based Routing

Innovating R Tree for Message Forwarding Technique and Hierarchical Network Clustering in Service Based Routing Internet of Things and Cloud Computing 2015; 3(3): 59-65 Published online October 11, 2015 (http://www.sciencepublishinggroup.com/j/iotcc) doi: 10.11648/j.iotcc.s.2015030601.17 ISSN: 2376-7715 (Print);

More information

IO-Top-k at TREC 2006: Terabyte Track

IO-Top-k at TREC 2006: Terabyte Track IO-Top-k at TREC 2006: Terabyte Track Holger Bast Debapriyo Majumdar Ralf Schenkel Martin Theobald Gerhard Weikum Max-Planck-Institut für Informatik, Saarbrücken, Germany {bast,deb,schenkel,mtb,weikum}@mpi-inf.mpg.de

More information

Complexity Analysis of Routing Algorithms in Computer Networks

Complexity Analysis of Routing Algorithms in Computer Networks Complexity Analysis of Routing Algorithms in Computer Networks Peter BARTALOS Slovak University of Technology Faculty of Informatics and Information Technologies Ilkovičova 3, 84 6 Bratislava, Slovakia

More information

Parallel Implementation of Interval Analysis for Equations Solving

Parallel Implementation of Interval Analysis for Equations Solving Parallel Implementation of Interval Analysis for Equations Solving Yves Papegay, David Daney, and Jean-Pierre Merlet INRIA Sophia Antipolis COPRIN Team, 2004 route des Lucioles, F-06902 Sophia Antipolis,

More information

Web Applications Usability Testing With Task Model Skeletons

Web Applications Usability Testing With Task Model Skeletons Web Applications Usability Testing With Task Model Skeletons Ivo Maly, Zdenek Mikovec, Czech Technical University in Prague, Faculty of Electrical Engineering, Karlovo namesti 13, 121 35 Prague, Czech

More information

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06

More information

Cardinality Estimation: An Experimental Survey

Cardinality Estimation: An Experimental Survey : An Experimental Survey and Felix Naumann VLDB 2018 Estimation and Approximation Session Rio de Janeiro-Brazil 29 th August 2018 Information System Group Hasso Plattner Institut University of Potsdam

More information

FlexBench: A Flexible XML Query Benchmark

FlexBench: A Flexible XML Query Benchmark FlexBench: A Flexible XML Query Benchmark Maroš Vranec Irena Mlýnková Department of Software Engineering Faculty of Mathematics and Physics Charles University Prague, Czech Republic maros.vranec@gmail.com

More information

An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm

An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm Henan Zhao and Rizos Sakellariou Department of Computer Science, University of Manchester,

More information

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data

PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data Enhua Jiao, Tok Wang Ling, Chee-Yong Chan School of Computing, National University of Singapore {jiaoenhu,lingtw,chancy}@comp.nus.edu.sg

More information

A Framework for Clustering Massive Text and Categorical Data Streams

A Framework for Clustering Massive Text and Categorical Data Streams A Framework for Clustering Massive Text and Categorical Data Streams Charu C. Aggarwal IBM T. J. Watson Research Center charu@us.ibm.com Philip S. Yu IBM T. J.Watson Research Center psyu@us.ibm.com Abstract

More information

Using semantic links to support top-k join queries in peer-to-peer networks

Using semantic links to support top-k join queries in peer-to-peer networks CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Published online 19 December 2006 in Wiley InterScience (www.interscience.wiley.com)..1145 Using semantic links to support top-k join queries in peer-to-peer

More information

Edit Distance between XML and Probabilistic XML Documents

Edit Distance between XML and Probabilistic XML Documents Edit Distance between XML and Probabilistic XML Documents Ruiming Tang 1,HuayuWu 1, Sadegh Nobari 1, and Stéphane Bressan 2 1 School of Computing, National University of Singapore {tangruiming,wuhuayu,snobari}@comp.nus.edu.sg

More information

SQL-to-MapReduce Translation for Efficient OLAP Query Processing

SQL-to-MapReduce Translation for Efficient OLAP Query Processing , pp.61-70 http://dx.doi.org/10.14257/ijdta.2017.10.6.05 SQL-to-MapReduce Translation for Efficient OLAP Query Processing with MapReduce Hyeon Gyu Kim Department of Computer Engineering, Sahmyook University,

More information

General properties of staircase and convex dual feasible functions

General properties of staircase and convex dual feasible functions General properties of staircase and convex dual feasible functions JÜRGEN RIETZ, CLÁUDIO ALVES, J. M. VALÉRIO de CARVALHO Centro de Investigação Algoritmi da Universidade do Minho, Escola de Engenharia

More information

Clustering-Based Distributed Precomputation for Quality-of-Service Routing*

Clustering-Based Distributed Precomputation for Quality-of-Service Routing* Clustering-Based Distributed Precomputation for Quality-of-Service Routing* Yong Cui and Jianping Wu Department of Computer Science, Tsinghua University, Beijing, P.R.China, 100084 cy@csnet1.cs.tsinghua.edu.cn,

More information

Association Rule Mining. Introduction 46. Study core 46

Association Rule Mining. Introduction 46. Study core 46 Learning Unit 7 Association Rule Mining Introduction 46 Study core 46 1 Association Rule Mining: Motivation and Main Concepts 46 2 Apriori Algorithm 47 3 FP-Growth Algorithm 47 4 Assignment Bundle: Frequent

More information

Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis

Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis Sensors & Transducers 2014 by IFSA Publishing, S. L. http://www.sensorsportal.com Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis 1 Xulin LONG, 1,* Qiang CHEN, 2 Xiaoya

More information

Image Classification Using Wavelet Coefficients in Low-pass Bands

Image Classification Using Wavelet Coefficients in Low-pass Bands Proceedings of International Joint Conference on Neural Networks, Orlando, Florida, USA, August -7, 007 Image Classification Using Wavelet Coefficients in Low-pass Bands Weibao Zou, Member, IEEE, and Yan

More information

Online algorithms for clustering problems

Online algorithms for clustering problems University of Szeged Department of Computer Algorithms and Artificial Intelligence Online algorithms for clustering problems Summary of the Ph.D. thesis by Gabriella Divéki Supervisor Dr. Csanád Imreh

More information

XML Query Processing. Announcements (March 31) Overview. CPS 216 Advanced Database Systems. Course project milestone 2 due today

XML Query Processing. Announcements (March 31) Overview. CPS 216 Advanced Database Systems. Course project milestone 2 due today XML Query Processing CPS 216 Advanced Database Systems Announcements (March 31) 2 Course project milestone 2 due today Hardcopy in class or otherwise email please I will be out of town next week No class

More information

Sharing Several Secrets based on Lagrange s Interpolation formula and Cipher Feedback Mode

Sharing Several Secrets based on Lagrange s Interpolation formula and Cipher Feedback Mode Int. J. Nonlinear Anal. Appl. 5 (2014) No. 2, 60-66 ISSN: 2008-6822 (electronic) http://www.ijnaa.semnan.ac.ir Sharing Several Secrets based on Lagrange s Interpolation formula and Cipher Feedback Mode

More information

Predictive Indexing for Fast Search

Predictive Indexing for Fast Search Predictive Indexing for Fast Search Sharad Goel Yahoo! Research New York, NY 10018 goel@yahoo-inc.com John Langford Yahoo! Research New York, NY 10018 jl@yahoo-inc.com Alex Strehl Yahoo! Research New York,

More information

Revision of a Floating-Point Genetic Algorithm GENOCOP V for Nonlinear Programming Problems

Revision of a Floating-Point Genetic Algorithm GENOCOP V for Nonlinear Programming Problems 4 The Open Cybernetics and Systemics Journal, 008,, 4-9 Revision of a Floating-Point Genetic Algorithm GENOCOP V for Nonlinear Programming Problems K. Kato *, M. Sakawa and H. Katagiri Department of Artificial

More information

Bichromatic Line Segment Intersection Counting in O(n log n) Time

Bichromatic Line Segment Intersection Counting in O(n log n) Time Bichromatic Line Segment Intersection Counting in O(n log n) Time Timothy M. Chan Bryan T. Wilkinson Abstract We give an algorithm for bichromatic line segment intersection counting that runs in O(n log

More information