Efficient Top-K Problem Solvings for More Users in Tree-Oriented Data Structures
|
|
- Delilah Barbara Allison
- 6 years ago
- Views:
Transcription
1 Efficient Top-K Problem Solvings for More Users in Tree-Oriented Data Structures Matúš Ondreička Faculty of Mathematics and Physics Department of Software Engineering Charles University in Prague, Czech Republic Jaroslav Pokorný Faculty of Mathematics and Physics Department of Software Engineering Charles University in Prague, Czech Republic Abstract This paper focuses on efficient searching the best K objects in more attributes according to user preferences. User preferences are modelled locally with fuzzy functions and globally with an aggregation function. Because of local preferences, we have used B + -tree for sorting objects according to a fuzzy function. We deal with the usage of TAalgorithm, which uses B + -trees, and MD-algorithm, which is based on multidimensional B-tree. In this paper we develop a new algorithm, MXT-algorithm, which id based on integration of MD-algorithm with more instances of TAalgorithm. We develop also a new tree-oriented data structure based on B + -trees, multidimensional B-tree with lists, in which MXT-algorithm can effectively find the best K objects according to user preferences. Finally, we show that according to the type of object attribute domains, it is possible to choose the best data structure for objects storage and also top-k algorithm for efficient top-k problem solving. 1 Introduction Nowadays, users of various systems are trying to find various objects, such as flats, cars, holiday stays, etc. These objects have more various attributes. According to the values of these attributes, each user is finding objects with other values of attributes [12][14]. In general, users find a few most convenient objects for him/her. Sometimes, a user looks for only one best object, for example, he/she can buy only one flat. In this paper we assume the set of objects of the same type, which is stored in one data structure. This data structure is common for all users. It is possible to find the best K objects from the set of objects X with more attributes, only if it is possible to decide which objects are better or worse for a user. In this sense we can use a ranking function. Moreover, every user prefers objects with own preferences. 1.1 Related work The problem of searching the best K objects according to values of different attributes in the same time is indicated as a top-k problem [12][13]. In last few years, research of top-k problem solvings is in progress in various domains such as relational databases [1], XML [10], multimedia search [12], the Web [14], or distributed systems [11]. In this paper we focus on the family of Fagin s algorithms [13], which has been widely studied for efficiently computing top-k queries. These algorithms assume that set of objects is stored in lists and the ranking functions are monotone. We say that an f is monotone if f(x 1,x 2,...,x m ) f(x 1,...,x m), whenever x i x i, for every i. However, the ranking functions are not necessarily monotone. In [6] and [7] were presented top-k problem solvings using arbitrary ranking functions. These two approaches use an analytic expression of a ranking function and treeoriented data structures. OPT* algorithm [7] uses indexation of all attributes by B + -trees and in [6] authors use indexation by B + -trees, too. In this paper user preferences are modelled locally with fuzzy functions and globally with an aggregation function [3][8], i.e. we are using arbitrary ranking functions. Moreover, in context of local preferences, we focus on nominal attributes and ordinal attributes. Because of local preferences application, we describe usage of B + -tree for sorting objects according to a fuzzy function [2][3][5]. We focus on searching the best K objects without accessing all the objects. Therefore we deal with methods and data structures for effective top-k problem solving via Fagin s TA-algorithm [13] and also MD-algorithm [2], which is based on multidimensional B-tree (MDB-tree) [15].
2 1.2 Main contribution In this paper TA-algorithm and MD-algorithm use data structures based on B + -trees. These tree-oriented data structures are independent on user preferences. Moreover, it is possible to update these data structures easily and quickly. We developed a new top-k algorithm, MXT-algorithm, and a new data structure based on B + -trees (MDB-tree with lists), in which MXT-algorithm can effectively find the best K objects according to user preferences without accessing all the objects. We present a comparison of MXT-algorithm with the results of TA-algorithm and MD-algorithm. Next we show that MXT-algorithm in some cases achieves the best results. Moreover, we show that according to the type of object attributes, it is possible to choose the best data structure for objects storage and also top-k algorithm for efficient top-k problem solving. 1.3 Paper organization The paper is organized as follows. Section 2 describes top-k problem and user preferences. Section 3 is devoted to explaining principles of TA-algorithm and MD-algorithm. Section 4 describes application of local user preferences in these algorithms. In Section 5, we describe our new MXT-algorithm and new data structure, which the MXTalgorithm uses. Section 6 presents the results of the tests TA-algorithm, MD-algorithm, MXT-algorithm, and their comparison for various data sets and various user preferences. Finally, Section 8 provides some suggestions for a future research. 2 Top-K problem Top-K problem is searching the best top-k objects. In this article, we suppose a set of objects X with m attributes A 1,...,A m. Every object x X has m values a x i,...,ax m of these attributes. 2.1 Rating function It is most suitable to use a rating function (ranking function), which assigns rating for each object x X. In this paper, we suppose a function R with m variables a 1,...,a m specified by scheme R(a 1,...,a m ) : [0,1] m [0,1]. We denote a rating of object x X as a function R(x) = R(a x i,...,ax m) with one variable. R(x) maps every object x X according to the m attribute values into interval [0,1]. For the worst object x X, R(x) = 0 holds, and for the best one, R(x) = 1 holds. According to R it is possible to sort objects from X in descending order and determine the best top-k objects. In this work we suppose that if there are more objects with the same rating as rating of the best K-th object, a random object is chosen. 2.2 User preferences In this paper, we consider a solution of top-k problem for more users with various user preferences. Every user chooses his/her user preferences, which determine suitability of the object x X in dependence on its m values of attributes. In this work, we differentiate between local preferences and global preferences Local preferences Local preferences reflect how the object is preferred according to only one attribute. In this case, we express local preference for i-th attribute A i, as a fuzzy function f i. Fuzzy function f i is understood as a mapping f i : dom(a i ) [0,1], which maps every value of actual attribute A i domain into [0, 1] interval. Local preferences of user U for the attributes A 1,..., A m are represented by user fuzzy functions denoted as f1 U (x),...,fm(x), U respectively. Then a user fuzzy function fi U(x) : ax i [0, 1], where i = 1,...,m, maps every object x X according to the value of its i-th attribute a x i into interval [0, 1]. In general, we differentiate two possible attribute types. Nominal attributes. Nominal attribute has a finite range of possible values, usually strings. For a nominal attribute, the user has to set a rating of each attribute value. For example, brand or kind of some products is a nominal attribute. Ordinal attributes. Ordinal attribute has some natural value ordering, other than lexical ordering. Typical examples are integer numbers. The domain of ordinal attributes is subset of continuous interval. In this case, it is possible to use as the user fuzzy function a continuous function. For example, a price is usually ordinal attribute Global preferences Global preferences express how the user U prefers objects from X according to all attributes A 1,...,A m. We introduced local preferences, where user U prefers every single attribute A i by fuzzy function fi U. In this case, the global preference of user U defines some mutual relations between the attributes A 1,...,A m. We consider aggregation function, which we denote with m variables p 1,...,p m specified 1,...,p m ) : [0,1] m [0,1]. For the user U with his/her user fuzzy functions f1 U,...,fm, U a user rating function R U originates
3 by means of substitution of p i = fi U(x). Then RU (x) U (x),...,fm(x)), U for every x X. With R U (x) it is possible to evaluate global rating of each object x X and to find top-k objects for user U. With aggregation function, a user U can define the mutual relations of the attributes. In practical applications, for implementation of user influence to the aggregate function, it is possible to use weighted average, where weights w 1,...,w m of single attributes A 1,...,A m determine how the user prefers single attributes, i.e. R U (x) = w 1 f U 1 (x) w m f U m(x) w w m. When the user does not care about i-th attribute A i, he/she can then set w i = 0 in the aggregate function. 3 Top-K algorithms We denote algorithms, which solve the top-k problem, as top-k algorithms. The easiest solution how to find the best K objects is to read all objects x X and for every object x to calculate its rating. Then K objects with the highest rating are chosen. In this case, all objects x X have to be accessed. In this section, we show two top-k algorithms, Fagin s TA-algorithm and MD-algorithm, which solve top-k problem without searching of all objects. These algorithms can find the best K objects according to the aggregate function. 3.1 Fagin s TA-algorithm Fagin et al. describe in [13] top-k algorithm TA (threshold algorithm). This algorithm assumes that the objects are stored in m lists L 1,...,L m. Each i-th list L i contains pairs (x,a x i ) for all objects x X and it is sorted in descending order according to the values of i-th attribute. The aggregate must be monotone according to the ordering in lists, e.g. weighted average. TA-algorithm searches the lists sequentially and obtains pairs (x,a x i ). For every object x, which is detected for the first time in obtained pair (x,a x i ), TA-algorithm obtains the missing attribute values of x by a direct access to the other lists and calculates rating of object x, which we TA-algorithm uses the temporary list T K, in which it keeps the best actual K objects ordered according Rating of the K-th best object in T K is denoted M K. TAalgorithm uses a threshold T h last 1,...,a last m ), where a last 1,...,a last m are the last seen values of attributes in the lists L 1,...,L m in the sequential access. When T h M K, TA-algorithm is able to stop and return T K, which contains the best K objects according TA-algorithm can finish before it comes to the end of all the lists [13]. It means that all the object need not be accessed. Figure 1. Set of six objects with values of three attributes stored in sorted lists. The following pseudo-code describes TA-algorithm. The procedure getnextpair obtains next pair (x,a x i ) from one of the list L 1,...,L m sequentially [13][9]. Input: Lists L 1,...,L m, int K; Output: List T K ; var List T K ; begin while( T K < K or T h > M K )do (x,a x i ) = getnextpair(l 1,...,L m ); a last i = a x i ; T h last 1,...,a last m ); if(x / T K )then get the missing attribute values of the object x; if( T K < K)then insert x to the list T K on according else if(@(x) > M K )then begin delete K-th object from the list T K insert x to the list T K according end; endwhile; return T K ; Example 1. Figure 1 contains six objects with values of three attributes stored in sorted lists L 1,L 2,L 3. If TA-algorithm is searching the best three objects according to aggregate = a x 1 + a x 2 + a x 3, then TAalgorithm gets only three pairs (x,a x i ) from each of the lists. In this moment T h last 1,...,a last m ) = 1.8, T K includes three objects with 1 ) = 2.4,@(x 3 ) = 2.2,@(x 4 ) = 2.0, respectively, and holds M K 4 ) = 2.0. Then T h M K holds and TAalgorithm is able to stop and it need not read object x MD-algorithm based on MDB-tree Now we describe MD-algorithm [2], which efficiently solves top-k problem with using the multidimensional B-
4 search. Analogously to the TA-algorithm, MD-algorithm uses the temporary list T K, in which it keeps the best actual K objects ordered according Rating of the K-th best object in T K is denoted M K. MD-algorithm can find best K objects in MDB-tree with the recursive procedure findtopk according to a monotone aggregate and without getting all the objects. The next statement specifies this fact more precisely. Figure 2. Set of eleven objects with values of three attributes stored in MDB-tree. tree (MDB-tree) [15]. MDB-tree allows to index set of objects X by attributes A 1,...,A m, m > 1, in one data structure. In this case, MDB-tree has m levels and values of one attribute are stored in one level. We use a variant of MDB-tree, nodes of which are B + -trees. i-th level of MDBtree is composed from B + -trees containing key values from dom(a i ). For explanation of MD-algorithm, we introduce pointer of the key k, the identifier of B + -tree and the best rating of B + -tree [2]. The pointer of the key k in B + -tree in i-th level of MDBtree we denote by ρ(k i ). If i < m, then ρ(k i ) refers to B + -tree in (i + 1)-th level of MDB-tree. If i = m, i.e. B + -tree is in the last level of MDB-tree, then ρ(k i ) refers to object array, where objects with the same values of all the m attributes are stored. For explicit identification of B + -tree in MDB-tree, we use the sequence of keys called tree identifier here. Tree identifier of B + -tree in i-th level is (k 1,..., k i 1 ). B + -tree in the first level of MDB-tree has tree identifier ( ). In Figure 2, (k 1, k 2 ) = (1.0, 0.0) is the identifier of B + -tree at the third level, which contains keys 0.0, 0.7 and refers to objects x F, x G, x H. In MDB-tree we use a best rating B(S) of B + -tree S. For every B + -tree S in MDB-tree there is a uniquely defined subset of X, which we call a set of available objects from S. For example, in Figure 2, the X S of S with identifier (0.0) contains objects x A, x B, x C, x D. By the best rating B(S) of B + -tree S with identifier (k 1,..., k i 1 ) in the i-th level of MDB-tree we understand the maximal possible rating of not yet known object x from the set of available objects from B + -tree S. Analogously to TA-algorithm we assume that aggregate is nondecreasing in all its variables. Then B(S) is calculated x 1,...,a x m), where the first i 1 attributes values of the object x are k 1,...,k i 1 and values of other attributes are 1 (max. of interval [0, 1]), i.e. B(S) 1,...,k i 1,1,...,1). MD-algorithm is based on a recursive procedure findtopk, which searches MDB-tree in depth-first Statement 1. [2] Let the key k i from the B + -tree S with the identifier (k 1,...,k i 1 ) be the key obtained one by one in descending order by a run of procedure findtopk. The pointer ρ(k i ) refers to B + -tree P in the next level or to the object array P, the best rating of P is B(P) 1,...,k i,1,...,1) and aggregate is monotone. If B(P) M K holds, no such object x X P can not get in T K. Moreover, it is not necessary to obtain a next key ki next from B + -tree S, which refers to P next, because ki next k i and it mneans that B(P next ) B(P) M K. The procedure findtopk can stop in B + -tree S, because no such object x X S can not get in T K. The following pseudo-code describes MD-algorithm. Input: MDBtree MDB-tree, int K; Output: List T K ; var List T K ; begin findtopk(mdb-tree, ( K); return T K ; procedure findtopk(mdbtree MDB-tree, TreeId (k 1,...,k i 1 ), int K); while(exists next key in B + -tree (k 1,...,k i 1 ))do k i = getnextkey(mdb-tree,(k 1,...,k i 1 )); {ρ(k i ) refers to B + -tree P or to object array P } if( T K = K and B(P) M K )then return; {Statement 1.} if(p is B + -tree)then findtopk(mdb-tree, (k 1,...,k i K); if(p is the object array)then while(there is the next object x in P )do if( T K < K )then insert object x to T K according else if(@(x) > M K )then begin delete K-th object from the list T K insert object x to the list T K end; endwhile; endwhile;
5 4 Application of local preferences This section discusses application support of local preferences (see Section 2.2.1) in TA-algorithm and MDalgorithm. For application of local user preferences, we used B + -tree. Moreover, this structure is common for all users and independent on users preferences [2]. 4.1 Usage of B + -tree Figure 3. MD-algorithm is searching the best object in MDB-tree according to aggregate function. procedure getnextkey(mdbtree MDB-tree, TreeId (k 1,...,k i 1 )); choose the next key k i with next highest value of A i in B + -tree of MDB-tree with identifier (k 1,...,k i 1 ); return k i ; Example 2. In Figure 3, fourteen objects with values of three attributes are stored in MDB-tree with three levels. MD-algorithm is searching the best object according to aggregate = a x 1 + a x 2. MD-algorithm starts in B + -tree ( ) and it obtains key 1.0, which refers to B + -tree (1.0). MD-algorithm obtains key 0.5, which refers to object array, where it obtains object W with = 1.5. MD-algorithm inserts object W the temporary list T K, because T K is empty. Then MD-algorithm obtains next key 0.4 in B + -tree (1.0). This key refers to object array, which has the best rating smaller than rating of the K-th best object in T K. MD-algorithm can stop in B + -tree (1.0) and continues in B + -tree ( ). It obtains next key 0.8, which refers to B + - tree (0.8). In this B + -tree MD-algorithm obtains key 0.8, which refers to object array, where object M with = 1.6 is obtained. MD-algorithm deletes object W from the list T K and inserts object M to the list T K, Then MD-algorithm obtains next key 0.0 in B + -tree (0.8). This key refers to object array, which has the best rating smaller than rating of the K-th best object in T K. MD-algorithm can stop in B + -tree (0.8) and continues in B + -tree ( ) in the first level of MDB-tree. MD-algorithm obtains next key 0.6, which refers to B + -tree (0.6). The best rating of this B + -tree is 1.6. It is less and MD-algorithm can stop. The best object according to the aggregate function is M. It means that MD-algorithm does not search all the objects in MDB-tree. In B + -tree the keys are sorted in ascending order. Since the leaf nodes of the B + -tree are linked in two directions, it is possible to cross the B + -tree through the leaf level and to get all the keys. Therefore, it is possible to obtain objects from B + -tree in descending order according to course of user fuzzy function f U [2][3][5]. When the user fuzzy function f U is monotone on its domain then the following holds. Let the f U be nondecreasing. We have to cross the leaf level of the B + -tree from the right to the left. It is possible to get the pairs (x,f U (x)) in the descending order according to the user preference f U, because a x a y f U (x) f U (y) holds. Let the f U be nonincreasing. We have to cross the leaf level of the B + -tree from the left to the right. It is possible to get the pairs (x,f U (x)) in the descending order according to the user preference f U, because a x a y f U (x) f U (y) holds. In general, user fuzzy function f U might not be monotone in its domain. In this case, the domain can be divided into continuous intervals, where f U is monotone on each of these intervals. Then the leaf level of B + -tree is divided into some parts according to the intervals. From these parts, objects can be obtained concurrently according f U as well as for nondecreasing and nonincreasing fuzzy functions [2]. Example 3. Figure 4 shows a fuzzy function f U and B + -tree. The domain of f U is divided into monotone intervals w 1,...,w 5. Objects are obtained from the B + -tree concurrently according to these intervals. Finally, we get objects x K, x M, x N, x G, x H, x F, x T, x S, x E, x C, x Q, x U, x R, x D, x Y in descending order according to f U. 4.2 Application in TA-algorithm Original TA-algorithm (see Section 3.1) offers the possibility to rate objects with aggregate and to find the best K object for the user U only according to his/her global preference. For the support of the local preferences, it is necessary that every i-th list L i contains pairs (x,fi U (x)) in descending order according to user fuzzy function fi U(x).
6 5 MXT-algorithm In this section, we describe a new top-k algorithm, which is based on integration of MD-algorithm and Fagin s TA-algorithm. This new algorithm can also find the best K objects according to aggregate without searching of all objects. 5.1 Usage of TA- and MD-algorithm Figure 4. An example of objects obtained from the B + -tree concurrently according to a fuzzy function. Therefore, TA-algorithm uses as the lists L 1,...,L m a set of m B + -trees B 1,...,B m. In B + -tree B i, all objects are indexed by values of i-th attribute A i. TA-algorithm can search B + -trees B 1,...,B m sequentially. Pairs (x,f U i (x)) can be obtained one by one from B + -tree B i. TA-algorithm also uses the direct access to the lists L 1,..., L m, where for object x, it is needed to obtain its unknown value a x i from L i. Because B + -tree is not able to make this operation, for a realization of direct access we can use, for example, an associative array. 4.3 Application in MD-algorithm Because MDB-tree is composed from B + -trees, it is possible to apply the local user preferences directly in MD-algorithm by obtaining keys from every B + -tree. The following procedure getnextkey changes the MDalgorithm. procedure getnextkey(mdbtree MDB-tree, TreeId (k 1,...,k i 1 ), FuzzyFunction f U i ); choose the next key k i with next highest value of f U i (k i) in B + -tree of MDB-tree with identifier (k 1,...,k i 1 ); return k i ; In general, during the computations of TA-algorithm and MD-algorithm the number of accessed objects is less than the number of all objects. The number of accessed objects depends on more factors. MD-algorithm has the best results, when the objects stored in MDB-tree have uniform distribution [2]. When attributes of objects have different sizes of their actual domains, the order of attributes in levels of MDB-tree is very important for efficiency of MD-algorithm. For MDalgorithm it is better to build MDB-tree with small-sized domains in its higher levels and attributes with big-sized domains in its lower levels. When most of the attributes have big-sized actual domains, the usage of MD-algorithm is not suitable solution of top-k problem. In this case, the usage of TA-algorithm is more suitable. Example 4. We had data about of flats for rent in Prague at disposal. These flats have four important attributes for users, District, Type, Area, and Price. These attributes have the following domain sizes: dom(district) = 10, dom(t ype) = 10, dom(area) = 229, dom(p rice) = 411. When a user prefers attributes District and Type, then it is better to store flats in MDB-tree and to use MD-algorithm. On the other hand, when a user prefers attributes Area and Price, then it is better to use TA-algorithm and to store flats in Fagin s lists. In general, the attribute with a small-sized domain is nominal and attribute with big-sized domain is ordinal (see Section 2). This is valid also in Example 4. Attributes District and Type are nominal attributes, Area and Price are ordinal attributes. 5.2 Integration of TA- and MD-algorithm For a set of objects with more nominal attributes and more ordinal attributes, we developed a new top-k algorithm, MXT-algorithm, which is based on integration of MD-algorithm and Fagin s TA-algorithm. MXT-algorithm uses a new data structure, MDB-tree with lists, which is composed of MDB-tree and Fagin s sorted lists.
7 Figure 5. MDB-tree with lists, in which a set of objects with two nominal attributes and two ordinal attributes is stored. We suppose a set of objects X with m attributes A 1,...,A m. Attributes A 1,...,A n are nominal attributes and A n+1,...,a m are ordinal attributes. Attributes A 1,...,A n are stored in MDB-tree with n levels. Instead of the following m n levels of MDB-tree, groups of m n Fagin s sorted lists are used. These lists contain pairs (x,a x i ) with values of attributes A n+1,...,a m. MDB-tree with lists is shown in Figure 5. Two nominal attributes are stored as MDB-tree and two ordinal attributes are stored as groups of Fagin s lists. MXT-algorithm uses also the temporary list T K, in which it keeps the best actual K objects ordered according Rating of the K-th best object in T K is denoted M K. MXT-algorithm is developed on the base of MD-algorithm. Values of the first n attributes A 1,...,A n are searched in the same way as during the computation of MD-algorithm. Analogously to MD-algorithm, Statement 1 holds for MXT-algorithm (see Section 3.2) In every B + -tree in n-th level of MDB-tree, there are keys with pointers, which refer to groups of m n Fagin s sorted lists. In each of these groups a new instance of TAalgorithm is run. Each instance of TA-algorithm uses a local threshold Th local. It is not needed to obtain the best K objects from each the group of Fagin s lists. It is sufficient that Th local is compared with M K in temporary list T K, because MXTalgorithm is not searching the best K objects in a group of m n Fagin s sorted lists, but the best K objects throughout the whole data structure. Analogously to the TA-algorithm, when Th local M K holds in an instance of TA-algorithm, this instance is able to stop. Then computation of MXT-algorithm continues as in MD-algorithm. The efficiency of MXT-algorithm is based on idea, that during the computation of MXT-algorithm, it is not needed to obtain the best K objects from each the group of Fagin s lists, i.e. only objects > M K. In Figure 5, MXT-algorithm is searching for the best few objects. Under dotted line, a part of the data structure, in which the MXT-algorithm does not access during its computation, is illustrated. The following pseudo-code describes MXT-algorithm. Procedure getnextkey is the same as in the MDalgorithm (see Section 3.2). Procedure getnextpair obtains next pair (x,a x i ) from one of the list L 1,...,L m n sequentially as in the TA-algorithm (see Section 3.1). Input: MDBtree MDB-tree, int K; Output: List T K ; var List T K ; {temporary list of objects} begin findtopk(mdb-tree, ( K); return T K ; procedure findtopk(mdbtree MDB-tree, TreeId (k 1,...,k i 1 ), int K); while(exists next key in B + -tree (k 1,...,k i 1 ))do k i = getnextkey(mdb-tree, (k 1,...,k i 1 )); {ρ(k i ) refers to B + -tree P or to group of lists P } if( T K = K and B(P) M K )then return; {Statement 1.} if(p is B + -tree)then findtopk(mdb-tree, (k 1,...,k i K); if(p is group of lists)then while( T K < K or Th local > M K )do (x,a x i ) = getnextpair(l 1,...,L m ); a last Th local i = a x i ; last 1,...,a last m ); if(x / T K )then get the missing attribute values of the x; if( T K < K)then insert object x to the list T K on the right place else if(@(x) > M K )then begin delete K-th object from the list T K ; insert object x to list T K end; endwhile; endwhile; 5.3 Application of Local Preferences Analogously to MD-algorithm, for attributes A 1,...,A n it is possible to apply the local user preferences. Procedure getnextkey changes MXT-algorithm in the same way as in the MD-algorithm (see Section 4.3). For attributes A n+1,...,a m, it is also possible to apply the local user preferences. Analogously to TA-algorithm, we use as the group of m n Fagin s lists L 1,...,L m n a group of m n B + -trees B 1,...,B m n (see Section 4.2).
8 Moreover, after we were making these modifications, we obtain tree-oriented data structure, which is composed only of B + -trees. In other words, it is an MDB-tree with n levels, where leaf nodes of B + -trees in n-th level refer to a group of m n B + -trees. 6 Experiments We implemented and tested the described top-k algorithms. The implementation of TA-algorithm, MD-algorithm and MXT-algorithm have been developed in Java with tree-oriented data structures created in memory. Important for us was the number of accesses into these data structures during calculation of top-k algorithms. We tested the top-k algorithms. During the tests, we used user fuzzy functions with linear course as user local preferences and the arithmetic average as user global preference. Objects from X with their m attributes values were stored in data structures considered. Obtaining one attribute value of one object we conceive as one access into data structures. We can simulate access to external memories in this way. 6.1 Distribution of Attribute Values At first, we tested two sets of objects with 5 attributes with normal and uniform distribution of attribute values. We used TA-algorithm, MD-algorithm and three variants of MXT-algorithm, i.e. MXT 3, MXT 2 and MXT 1. For example, MXT 3 uses first 3 nominal attributes, which are stored as MDB-tree with 3 levels, and other 2 attributes are stored as groups of 2 Fagin s sorted lists. Figure 6 and Figure 7 show results of this test. The best results have been achieved with MXT 3 and MD-algorithm for the set of objects with the uniform distribution of the attributes values. The test for sets of objects with normal distribution of attribute values has shown Figure 7. Uniform distribution of attributes values. that the new MXT-algorithm can in some cases also achieve worst results. 6.2 Flats for Rent We tested the sets of flats for rent in Prague (see Section 5.1, Example 4). There were two nominal attributes with a small domain size and two ordinal attributes with a big domain size. We used TA-algorithm, MD-algorithm and the most suitable variant of MXT-algorithm. Figure 8 shows results of this test. Figure 8. Finding the best flats in Prague. The best result has been achieved with MXT-algorithm. This test shown that MXT-algorithm is most efficient solution of top-k problem in this case. In practice, most objects, which are searched by users, have more nominal attributes and more ordinal attributes. In this case, it is suitable to use MXT-algorithm. Figure 6. Normal distribution of attributes values. 6.3 Various user preferences In general, it is problematic to test efficiency of top-k algorithms in dependence on user preferences. For various
9 settings of the user preferences and for various distributions of attribute values, top-k algorithms achieve different results. In this subsection we focus on global user preference expressed by weighted average (see Section 2.2.2). We used a set of objects with two nominal attributes and three ordinal attributes. The distribution of attribute values was uniform. Figure 9 shows results of the test, where the weight for each attribute was the same. In this test, the worst results have been achieved with TA-algorithm. For choosing the best objects, MXT-algorithm and MD-algorithm needed more than 10 times less accesses than the TAalgorithm. MXT-algorithm and MD-algorithm achieved nearly the same results. This shows that using MXT-algorithm and MDalgorithm is the most efficient solution for set of objects with more nominal attributes, which are preferred by users. Figure 10 shows results of the test, where weights of nominal attributes were equal to 0. In this test, the result of searching the best K objects is independent on values of nominal attributes. TA-algorithm is not disadvantaged and achieves good results. Finally, Figure 11 shows the test, where MXT-algorithm achieves the best results in number of accesses. There the weight of the first ordinal attribute was equal to 0. MDalgorithm achieved worse results than MXT-algorithm, because of the attributes order in levels of MDB-tree. The weight of attribute just in third level was 0 and MD-algorithm was often searching B + -trees in this level without rising of best rating B(S) (see Section 3.2). These and other tests, which we accomplished, have shown that MXT-algorithm is efficient solution of top-k problem in some cases. On the other hand, TA-algorithm and MD-algorithm achieved the best results in some different cases. Figure 9. Weights of all the attributes were the same. MXT-algorithm and MD-algorithm achieved nearly the same results. Figure 10. Weights of nominal attributes were equal to 0. Figure 11. Weight of first ordinal attribute was 0. 7 Conclusion We developed a new MXT-algorithm, which can efficiently find the best K objects by user preferences without accessing all the objects. We implemented top-k algorithms TA-algorithm, MD-algorithm and MXT-algorithm with support of user preferences. MXT-algorithm is based on integration TA-algorithm and MD-algorithm. Results of MXT-algorithm have shown that it is comparable with results obtained by other top-k algorithms. According to the types of object attributes, it is possible to store these objects in MDB-tree, in Fagin s lists or in MDB-tree with lists. Each one of the implemented top-k algorithms searches in a different data structure. According to the properties of set of objects we can decide in which of the data structures objects should be stored, in order for the objects to be searched by the most efficient top-k algorithm. The process of choosing the best data structure can be automated according to analyzing attribute domains. In this sense, information about the types and sizes of attribute do-
10 mains are important. MXT-algorithm is the most efficient solution of top-k problem in some cases. Especially, it is efficient for a set of objects, which has several nominal attributes with smallsized domains and several ordinal attributes with big-sized domains. Moreover, MXT-algorithm can find the best K objects in each of these tree-oriented data structures. In this sense, TA-algorithm and MD-algorithm are extreme cases of the MXT-algorithm. Because of MXT-algorithm construction, it can be also interesting that some of instances of TA-algorithm should be computing continuously. It can be also interesting to develop MXT-algorithm with usage of parallel computing. In this work, we used a model of preferences based on local and global user preferences. In future work, we can use user preferences based on different models. For example, when a dependence exists between values of more attributes, a user has to set his/her preference together for these attributes. In this case, we should evolve some modifications of top-k algorithms. It can be a new direction of future research. Motivation of future research can be also to find application of developed algorithms in diferent contexts. For example, in some cases it is needed to find K objects most similar to a given query object. Similarities between objects are most often computed as aggregated similarities of their attribute values. In [4] is described multi-dimensional indexing of non-metric spaces and top-k algorithm, which performs much better than the family of Fagin s algorithms. Some attribute values can be stored on remote servers. In this case, some of attribute values from web-accessible external sources might not be available in the same time. In [14] authors study how to process top-k queries efficiently in this setting. Implementation of our top-k algorithms in environment of more information resources could be a next direction of our research. 8 Acknowledgments This research is supported by Grant Agency of Charles University (GAUK), grant number 9209 (204-10/259011), Charles University in Prague, Czech republic. References [1] Ilyas, I. F., Beskales, G., and Soliman, M. A A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40, 4 (Oct. 2008), [2] Ondreička, M., Pokorný J.: Extending Fagin s algorithm for more users based on multidimensional B-tree. In: Proc. of ADBIS 2008, P. Atzeni, A. Caplinskas, and H. Jaakkola (Eds.), LNCS 5207, Springer-Verlag Berlin Heidelberg, 2008, pp [3] Gurský P., Vaneková V., Pribolová J.: Fuzzy User Preference Model for Top-k Search. In: Proceedings of IEEE World Congress on Computational Intelligence (WCCI), Hong Kong, FS0377, [4] Deshpande, P. M., P, D., and Kummamuru, K Efficient online top-k retrieval with arbitrary similarity measures. In Proceedings of the 11th international Conference on Extending Database Technology: Advances in Database Technology (Nantes, France, March 25-29, 2008). EDBT 08, vol ACM, New York, NY, pp [5] Eckhardt, A., Pokorný, J., Vojtáš, P.: A system recommending top-k objects for multiple users preference. In: Proc. of 2007 IEEE International Conference on Fuzzy Systems, July 24-26, 2007, London, England, pp [6] Xin, D., Han, J., and Chang, K. C.: Progressive and selective merge: computing top-k with ad-hoc ranking functions. In: Proc. of the 2007 ACM SIGMOD international Conference on Management of Data (Beijing, China, June 11-14, 2007). SIGMOD 07. ACM, New York, NY, pp [7] Zhang, Z., Hwang, S., Chang, K. C., Wang, M., Lang, C. A., and Chang, Y.: Boolean + ranking: querying a database by k-constrained optimization. In Proc ACM SIGMOD international Conference on Management of Data, Chicago, IL, USA, June 27-29, 2006, pp [8] Vojtáš, P.: Fuzzy logic aggregation for semantic web search for the best (top-k) answer. Capturing Intelligence, Chapter 17 Volume 1, 2006, pp [9] Gurský, P., Lencses, R., Vojtáš, P.: Algorithms for user dependent integration of ranked distributed information. In: Proceedings of TED Conference on e-government (TCGOV 2005), pp , [10] Marian, A., Amer-Yahia, S., Koudas, N., Srivastava, D.: Adaptive Processing of Top-k Queries in XML. In Proc. of the 21st international Conference on Data Engineering, April 05-08, 2005, ICDE IEEE Computer Society, Washington, DC, pp [11] Michel, S., Triantafillou, P., and 3kum, G.: KLEE: a framework for distributed top-k query algorithms. In Proc. of the 31st international Conference on Very Large Data Bases (Trondheim, Norway, August 30 - September 02, 2005). Very Large Data Bases. VLDB Endowment, pp [12] Chaudhuri, S., Gravano, L., Marian, M.: Optimizing Top-k Selection Queries over Multimedia Repositories. IEEE Trans. On Knowledge and Data Engineering, August 2004 (Vol. 16, No. 8), pp [13] Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences 66, 2003, pp [14] Bruno, N., L. Gravano, L., Marian, A.: Evaluating top-k queries over web-accessible databases. In: Proc. of ICDE, 2002, pp [15] Scheuerman, P., Ouksel, M.: Multidimensional B-trees for associative searching in database systems. Information systems, Vol. 34, No. 2, 1982, pp
Combination of TA- and MD-algorithm for Efficient Solving of Top-K Problem according to User s Preferences
Combination of TA- and MD-algorithm for Efficient Solving of Top-K Problem according to User s Preferences Matúš Ondreička and Jaroslav Pokorný Department of Software Engineering, Faculty of Mathematics
More informationOn top-k search with no random access using small memory
On top-k search with no random access using small memory Peter Gurský and Peter Vojtáš 2 University of P.J.Šafárik, Košice, Slovakia 2 Charles University, Prague, Czech Republic peter.gursky@upjs.sk,peter.vojtas@mff.cuni.cz
More informationCombining Fuzzy Information - Top-k Query Algorithms. Sanjay Kulhari
Combining Fuzzy Information - Top-k Query Algorithms Sanjay Kulhari Outline Definitions Objects, Attributes and Scores Querying Fuzzy Data Top-k query algorithms Naïve Algorithm Fagin s Algorithm (FA)
More informationPeter Gurský. Institute of Computer Science, Faculty of Science.
Towards TowardsBetter better Semantics semantics in in the the multifeature Multifeature Querying querying Peter Gurský Peter Gurský Institute of Computer Science, Faculty of Science Institute of P.J.Šafárik
More informationA Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods
A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods S.Anusuya 1, M.Balaganesh 2 P.G. Student, Department of Computer Science and Engineering, Sembodai Rukmani Varatharajan Engineering
More informationComparison of of parallel and random approach to
Comparison of of parallel and random approach to acandidate candidatelist listininthe themultifeature multifeaturequerying Peter Gurský Peter Gurský Institute of Computer Science, Faculty of Science Institute
More informationSEQUENTIAL PATTERN MINING FROM WEB LOG DATA
SEQUENTIAL PATTERN MINING FROM WEB LOG DATA Rajashree Shettar 1 1 Associate Professor, Department of Computer Science, R. V College of Engineering, Karnataka, India, rajashreeshettar@rvce.edu.in Abstract
More informationmodern database systems lecture 5 : top-k retrieval
modern database systems lecture 5 : top-k retrieval Aristides Gionis Michael Mathioudakis spring 2016 announcements problem session on Monday, March 7, 2-4pm, at T2 solutions of the problems in homework
More informationDIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY
DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY Reham I. Abdel Monem 1, Ali H. El-Bastawissy 2 and Mohamed M. Elwakil 3 1 Information Systems Department, Faculty of computers and information,
More informationExtending E-R for Modelling XML Keys
Extending E-R for Modelling XML Keys Martin Necasky Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic martin.necasky@mff.cuni.cz Jaroslav Pokorny Faculty of Mathematics and
More informationCompression of the Stream Array Data Structure
Compression of the Stream Array Data Structure Radim Bača and Martin Pawlas Department of Computer Science, Technical University of Ostrava Czech Republic {radim.baca,martin.pawlas}@vsb.cz Abstract. In
More informationRank-aware XML Data Model and Algebra: Towards Unifying Exact Match and Similar Match in XML
Proceedings of the 7th WSEAS International Conference on Multimedia, Internet & Video Technologies, Beijing, China, September 15-17, 2007 253 Rank-aware XML Data Model and Algebra: Towards Unifying Exact
More informationCharles University in Prague Faculty of Mathematics and Physics DOCTORAL THESIS. RNDr. Matúš Ondreička
Charles University in Prague Faculty of Mathematics and Physics DOCTORAL THESIS RNDr. Matúš Ondreička Preference Top-k Search Based on Multidimensional B-tree Department of Software Engineering Supervisor:
More informationModel theoretic and fixpoint semantics for preference queries over imperfect data
Model theoretic and fixpoint semantics for preference queries over imperfect data Peter Vojtáš Charles University and Czech Academy of Science, Prague Peter.Vojtas@mff.cuni.cz Abstract. We present an overview
More informationSimilarity Joins of Text with Incomplete Information Formats
Similarity Joins of Text with Incomplete Information Formats Shaoxu Song and Lei Chen Department of Computer Science Hong Kong University of Science and Technology {sshaoxu,leichen}@cs.ust.hk Abstract.
More informationRevisiting the Upper Bounding Process in a Safe Branch and Bound Algorithm
Revisiting the Upper Bounding Process in a Safe Branch and Bound Algorithm Alexandre Goldsztejn 1, Yahia Lebbah 2,3, Claude Michel 3, and Michel Rueher 3 1 CNRS / Université de Nantes 2, rue de la Houssinière,
More informationKeyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan
Keyword search in relational databases By SO Tsz Yan Amanda & HON Ka Lam Ethan 1 Introduction Ubiquitous relational databases Need to know SQL and database structure Hard to define an object 2 Query representation
More informationGeneralized Coordinates for Cellular Automata Grids
Generalized Coordinates for Cellular Automata Grids Lev Naumov Saint-Peterburg State Institute of Fine Mechanics and Optics, Computer Science Department, 197101 Sablinskaya st. 14, Saint-Peterburg, Russia
More informationUsing Natural Clusters Information to Build Fuzzy Indexing Structure
Using Natural Clusters Information to Build Fuzzy Indexing Structure H.Y. Yue, I. King and K.S. Leung Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin, New Territories,
More informationInverted Index for Fast Nearest Neighbour
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationTop-k Keyword Search Over Graphs Based On Backward Search
Top-k Keyword Search Over Graphs Based On Backward Search Jia-Hui Zeng, Jiu-Ming Huang, Shu-Qiang Yang 1College of Computer National University of Defense Technology, Changsha, China 2College of Computer
More informationStriped Grid Files: An Alternative for Highdimensional
Striped Grid Files: An Alternative for Highdimensional Indexing Thanet Praneenararat 1, Vorapong Suppakitpaisarn 2, Sunchai Pitakchonlasap 1, and Jaruloj Chongstitvatana 1 Department of Mathematics 1,
More informationOptimization Problems Under One-sided (max, min)-linear Equality Constraints
WDS'12 Proceedings of Contributed Papers, Part I, 13 19, 2012. ISBN 978-80-7378-224-5 MATFYZPRESS Optimization Problems Under One-sided (max, min)-linear Equality Constraints M. Gad Charles University,
More informationUpdatable Indices for Efficient, Generalised Top-k Queries [Extended Abstract]
Updatable Indices for Efficient, Generalised Top-k Queries [Extended Abstract] Sean Chester Supervised by Venkatesh Srinivasan, Alex Thomo, and Sue Whitesides schester@uvic.ca, venkat@cs.uvic.ca, thomo@cs.uvic.ca,
More informationXML Data in (Object-) Relational Databases
XML Data in (Object-) Relational Databases RNDr. Irena Mlýnková irena.mlynkova@mff.cuni.cz Charles University Faculty of Mathematics and Physics Department of Software Engineering Prague, Czech Republic
More informationFlexible-Hybrid Sequential Floating Search in Statistical Feature Selection
Flexible-Hybrid Sequential Floating Search in Statistical Feature Selection Petr Somol 1,2, Jana Novovičová 1,2, and Pavel Pudil 2,1 1 Dept. of Pattern Recognition, Institute of Information Theory and
More informationNondeterministic Query Algorithms
Journal of Universal Computer Science, vol. 17, no. 6 (2011), 859-873 submitted: 30/7/10, accepted: 17/2/11, appeared: 28/3/11 J.UCS Nondeterministic Query Algorithms Alina Vasilieva (Faculty of Computing,
More informationBenchmarking the UB-tree
Benchmarking the UB-tree Michal Krátký, Tomáš Skopal Department of Computer Science, VŠB Technical University of Ostrava, tř. 17. listopadu 15, Ostrava, Czech Republic michal.kratky@vsb.cz, tomas.skopal@vsb.cz
More informationHierarchical Ordering for Approximate Similarity Ranking
Hierarchical Ordering for Approximate Similarity Ranking Joselíto J. Chua and Peter E. Tischer School of Computer Science and Software Engineering Monash University, Victoria 3800, Australia jjchua@mail.csse.monash.edu.au
More informationA NOVEL APPROACH ON SPATIAL OBJECTS FOR OPTIMAL ROUTE SEARCH USING BEST KEYWORD COVER QUERY
A NOVEL APPROACH ON SPATIAL OBJECTS FOR OPTIMAL ROUTE SEARCH USING BEST KEYWORD COVER QUERY S.Shiva Reddy *1 P.Ajay Kumar *2 *12 Lecterur,Dept of CSE JNTUH-CEH Abstract Optimal route search using spatial
More informationResolving Schema and Value Heterogeneities for XML Web Querying
Resolving Schema and Value Heterogeneities for Web ing Nancy Wiegand and Naijun Zhou University of Wisconsin 550 Babcock Drive Madison, WI 53706 wiegand@cs.wisc.edu, nzhou@wisc.edu Isabel F. Cruz and William
More information1. Discovering Important Nodes through Graph Entropy The Case of Enron Database
1. Discovering Important Nodes through Graph Entropy The Case of Enron Email Database ACM KDD 2005 Chicago, Illinois. 2. Optimizing Video Search Reranking Via Minimum Incremental Information Loss ACM MIR
More informationData Structure. IBPS SO (IT- Officer) Exam 2017
Data Structure IBPS SO (IT- Officer) Exam 2017 Data Structure: In computer science, a data structure is a way of storing and organizing data in a computer s memory so that it can be used efficiently. Data
More informationData Distortion for Privacy Protection in a Terrorist Analysis System
Data Distortion for Privacy Protection in a Terrorist Analysis System Shuting Xu, Jun Zhang, Dianwei Han, and Jie Wang Department of Computer Science, University of Kentucky, Lexington KY 40506-0046, USA
More informationAnswering Top K Queries Efficiently with Overlap in Sources and Source Paths
Answering Top K Queries Efficiently with Overlap in Sources and Source Paths Louiqa Raschid University of Maryland louiqa@umiacs.umd.edu María Esther Vidal Universidad Simón Bolívar mvidal@ldc.usb.ve Yao
More information[Gidhane* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EFFICIENT APPROACH FOR TEXT MINING USING SIDE INFORMATION Kiran V. Gaidhane*, Prof. L. H. Patil, Prof. C. U. Chouhan DOI: 10.5281/zenodo.58632
More informationEvaluating XPath Queries
Chapter 8 Evaluating XPath Queries Peter Wood (BBK) XML Data Management 201 / 353 Introduction When XML documents are small and can fit in memory, evaluating XPath expressions can be done efficiently But
More informationA Survey on Disk-based Genome. Sequence Indexing
Contemporary Engineering Sciences, Vol. 7, 2014, no. 15, 743-748 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ces.2014.4684 A Survey on Disk-based Genome Sequence Indexing Woong-Kee Loh Department
More informationDOT-K: Distributed Online Top-K Elements Algorithm with Extreme Value Statistics
DOT-K: Distributed Online Top-K Elements Algorithm with Extreme Value Statistics Nick Carey, Tamás Budavári, Yanif Ahmad, Alexander Szalay Johns Hopkins University Department of Computer Science ncarey4@jhu.edu
More informationImplementation of Process Networks in Java
Implementation of Process Networks in Java Richard S, Stevens 1, Marlene Wan, Peggy Laramie, Thomas M. Parks, Edward A. Lee DRAFT: 10 July 1997 Abstract A process network, as described by G. Kahn, is a
More informationLocating Objects in a Sensor Grid
Locating Objects in a Sensor Grid Buddhadeb Sau 1 and Krishnendu Mukhopadhyaya 2 1 Department of Mathematics, Jadavpur University, Kolkata - 700032, India buddhadebsau@indiatimes.com 2 Advanced Computing
More informationA Fast Algorithm for Optimal Alignment between Similar Ordered Trees
Fundamenta Informaticae 56 (2003) 105 120 105 IOS Press A Fast Algorithm for Optimal Alignment between Similar Ordered Trees Jesper Jansson Department of Computer Science Lund University, Box 118 SE-221
More informationAccelerating XML Structural Matching Using Suffix Bitmaps
Accelerating XML Structural Matching Using Suffix Bitmaps Feng Shao, Gang Chen, and Jinxiang Dong Dept. of Computer Science, Zhejiang University, Hangzhou, P.R. China microf_shao@msn.com, cg@zju.edu.cn,
More informationUAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA
UAPRIORI: AN ALGORITHM FOR FINDING SEQUENTIAL PATTERNS IN PROBABILISTIC DATA METANAT HOOSHSADAT, SAMANEH BAYAT, PARISA NAEIMI, MAHDIEH S. MIRIAN, OSMAR R. ZAÏANE Computing Science Department, University
More informationMining Quantitative Association Rules on Overlapped Intervals
Mining Quantitative Association Rules on Overlapped Intervals Qiang Tong 1,3, Baoping Yan 2, and Yuanchun Zhou 1,3 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China {tongqiang,
More informationAnalyzing a Greedy Approximation of an MDL Summarization
Analyzing a Greedy Approximation of an MDL Summarization Peter Fontana fontanap@seas.upenn.edu Faculty Advisor: Dr. Sudipto Guha April 10, 2007 Abstract Many OLAP (On-line Analytical Processing) applications
More informationAn Efficient Algorithm for Computing Non-overlapping Inversion and Transposition Distance
An Efficient Algorithm for Computing Non-overlapping Inversion and Transposition Distance Toan Thang Ta, Cheng-Yao Lin and Chin Lung Lu Department of Computer Science National Tsing Hua University, Hsinchu
More informationFormal Model. Figure 1: The target concept T is a subset of the concept S = [0, 1]. The search agent needs to search S for a point in T.
Although this paper analyzes shaping with respect to its benefits on search problems, the reader should recognize that shaping is often intimately related to reinforcement learning. The objective in reinforcement
More informationGenerating Uniformly Distributed Pareto Optimal Points for Constrained and Unconstrained Multicriteria Optimization
Generating Uniformly Distributed Pareto Optimal Points for Constrained and Unconstrained Multicriteria Optimization Crina Grosan Department of Computer Science Babes-Bolyai University Cluj-Napoca, Romania
More informationVisualizing and Animating Search Operations on Quadtrees on the Worldwide Web
Visualizing and Animating Search Operations on Quadtrees on the Worldwide Web František Brabec Computer Science Department University of Maryland College Park, Maryland 20742 brabec@umiacs.umd.edu Hanan
More informationFuzzy Sets, Multisets, and Rough Approximations
Fuzzy Sets, Multisets, and ough Approximations Sadaaki Miyamoto (B) Department of isk Engineering, Faculty of Systems and Information Engineering, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki
More informationINFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM
INFREQUENT WEIGHTED ITEM SET MINING USING NODE SET BASED ALGORITHM G.Amlu #1 S.Chandralekha #2 and PraveenKumar *1 # B.Tech, Information Technology, Anand Institute of Higher Technology, Chennai, India
More informationDominant Graph: An Efficient Indexing Structure to Answer Top-K Queries
Dominant Graph: An Efficient Indexing Structure to Answer Top-K Queries Lei Zou 1, Lei Chen 2 1 Huazhong University of Science and Technology 137 Luoyu Road, Wuhan, P. R. China 1 zoulei@mail.hust.edu.cn
More informationWeb Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India
Web Page Classification using FP Growth Algorithm Akansha Garg,Computer Science Department Swami Vivekanad Subharti University,Meerut, India Abstract - The primary goal of the web site is to provide the
More informationBest Keyword Cover Search
Vennapusa Mahesh Kumar Reddy Dept of CSE, Benaiah Institute of Technology and Science. Best Keyword Cover Search Sudhakar Babu Pendhurthi Assistant Professor, Benaiah Institute of Technology and Science.
More informationAlgorithms for Minimum m-connected k-dominating Set Problem
Algorithms for Minimum m-connected k-dominating Set Problem Weiping Shang 1,2, Frances Yao 2,PengjunWan 3, and Xiaodong Hu 1 1 Institute of Applied Mathematics, Chinese Academy of Sciences, Beijing, China
More informationIMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETS
IMPERATIVE PROGRAMS BEHAVIOR SIMULATION IN TERMS OF COMPOSITIONAL PETRI NETS Leontyev Denis Vasilevich, Kharitonov Dmitry Ivanovich and Tarasov Georgiy Vitalievich ABSTRACT Institute of Automation and
More informationA compromise method for solving fuzzy multi objective fixed charge transportation problem
Lecture Notes in Management Science (2016) Vol. 8, 8 15 ISSN 2008-0050 (Print), ISSN 1927-0097 (Online) A compromise method for solving fuzzy multi objective fixed charge transportation problem Ratnesh
More informationIZAR THE CONCEPT OF UNIVERSAL MULTICRITERIA DECISION SUPPORT SYSTEM
Jana Kalčevová Petr Fiala IZAR THE CONCEPT OF UNIVERSAL MULTICRITERIA DECISION SUPPORT SYSTEM Abstract Many real decision making problems are evaluated by multiple criteria. To apply appropriate multicriteria
More informationJoint Entity Resolution
Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute
More informationForm Identifying. Figure 1 A typical HTML form
Table of Contents Form Identifying... 2 1. Introduction... 2 2. Related work... 2 3. Basic elements in an HTML from... 3 4. Logic structure of an HTML form... 4 5. Implementation of Form Identifying...
More informationPerformance Assessment of DMOEA-DD with CEC 2009 MOEA Competition Test Instances
Performance Assessment of DMOEA-DD with CEC 2009 MOEA Competition Test Instances Minzhong Liu, Xiufen Zou, Yu Chen, Zhijian Wu Abstract In this paper, the DMOEA-DD, which is an improvement of DMOEA[1,
More informationSecurity-Conscious XML Indexing
Security-Conscious XML Indexing Yan Xiao, Bo Luo, and Dongwon Lee The Pennsylvania State University, University Park, USA xiaoyan515@gmail.com, {bluo,dongwon}@psu.edu Abstract. To support secure exchanging
More informationAn Efficient XML Index Structure with Bottom-Up Query Processing
An Efficient XML Index Structure with Bottom-Up Query Processing Dong Min Seo, Jae Soo Yoo, and Ki Hyung Cho Department of Computer and Communication Engineering, Chungbuk National University, 48 Gaesin-dong,
More informationA Note on Scheduling Parallel Unit Jobs on Hypercubes
A Note on Scheduling Parallel Unit Jobs on Hypercubes Ondřej Zajíček Abstract We study the problem of scheduling independent unit-time parallel jobs on hypercubes. A parallel job has to be scheduled between
More informationTIMES A Tool for Modelling and Implementation of Embedded Systems
TIMES A Tool for Modelling and Implementation of Embedded Systems Tobias Amnell, Elena Fersman, Leonid Mokrushin, Paul Pettersson, and Wang Yi Uppsala University, Sweden. {tobiasa,elenaf,leom,paupet,yi}@docs.uu.se.
More informationC-NBC: Neighborhood-Based Clustering with Constraints
C-NBC: Neighborhood-Based Clustering with Constraints Piotr Lasek Chair of Computer Science, University of Rzeszów ul. Prof. St. Pigonia 1, 35-310 Rzeszów, Poland lasek@ur.edu.pl Abstract. Clustering is
More informationComparison of Evolutionary Multiobjective Optimization with Reference Solution-Based Single-Objective Approach
Comparison of Evolutionary Multiobjective Optimization with Reference Solution-Based Single-Objective Approach Hisao Ishibuchi Graduate School of Engineering Osaka Prefecture University Sakai, Osaka 599-853,
More informationEFFICIENT ATTRIBUTE REDUCTION ALGORITHM
EFFICIENT ATTRIBUTE REDUCTION ALGORITHM Zhongzhi Shi, Shaohui Liu, Zheng Zheng Institute Of Computing Technology,Chinese Academy of Sciences, Beijing, China Abstract: Key words: Efficiency of algorithms
More informationA Chosen-Plaintext Linear Attack on DES
A Chosen-Plaintext Linear Attack on DES Lars R. Knudsen and John Erik Mathiassen Department of Informatics, University of Bergen, N-5020 Bergen, Norway {lars.knudsen,johnm}@ii.uib.no Abstract. In this
More informationSearching SNT in XML Documents Using Reduction Factor
Searching SNT in XML Documents Using Reduction Factor Mary Posonia A Department of computer science, Sathyabama University, Tamilnadu, Chennai, India maryposonia@sathyabamauniversity.ac.in http://www.sathyabamauniversity.ac.in
More informationEECS 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization
EECS 144/244: Fundamental Algorithms for System Modeling, Analysis, and Optimization Dataflow Lecture: SDF, Kahn Process Networks Stavros Tripakis University of California, Berkeley Stavros Tripakis: EECS
More informationCombining Fuzzy Information: an Overview
Combining Fuzzy Information: an Overview Ronald Fagin IBM Almaden Research Center 650 Harry Road San Jose, California 95120-6099 email: fagin@almaden.ibm.com http://www.almaden.ibm.com/cs/people/fagin/
More informationImplementation of Skyline Sweeping Algorithm
Implementation of Skyline Sweeping Algorithm BETHINEEDI VEERENDRA M.TECH (CSE) K.I.T.S. DIVILI Mail id:veeru506@gmail.com B.VENKATESWARA REDDY Assistant Professor K.I.T.S. DIVILI Mail id: bvr001@gmail.com
More informationParallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce
Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce Huayu Wu Institute for Infocomm Research, A*STAR, Singapore huwu@i2r.a-star.edu.sg Abstract. Processing XML queries over
More informationInnovating R Tree for Message Forwarding Technique and Hierarchical Network Clustering in Service Based Routing
Internet of Things and Cloud Computing 2015; 3(3): 59-65 Published online October 11, 2015 (http://www.sciencepublishinggroup.com/j/iotcc) doi: 10.11648/j.iotcc.s.2015030601.17 ISSN: 2376-7715 (Print);
More informationIO-Top-k at TREC 2006: Terabyte Track
IO-Top-k at TREC 2006: Terabyte Track Holger Bast Debapriyo Majumdar Ralf Schenkel Martin Theobald Gerhard Weikum Max-Planck-Institut für Informatik, Saarbrücken, Germany {bast,deb,schenkel,mtb,weikum}@mpi-inf.mpg.de
More informationComplexity Analysis of Routing Algorithms in Computer Networks
Complexity Analysis of Routing Algorithms in Computer Networks Peter BARTALOS Slovak University of Technology Faculty of Informatics and Information Technologies Ilkovičova 3, 84 6 Bratislava, Slovakia
More informationParallel Implementation of Interval Analysis for Equations Solving
Parallel Implementation of Interval Analysis for Equations Solving Yves Papegay, David Daney, and Jean-Pierre Merlet INRIA Sophia Antipolis COPRIN Team, 2004 route des Lucioles, F-06902 Sophia Antipolis,
More informationWeb Applications Usability Testing With Task Model Skeletons
Web Applications Usability Testing With Task Model Skeletons Ivo Maly, Zdenek Mikovec, Czech Technical University in Prague, Faculty of Electrical Engineering, Karlovo namesti 13, 121 35 Prague, Czech
More informationMining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams
Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06
More informationCardinality Estimation: An Experimental Survey
: An Experimental Survey and Felix Naumann VLDB 2018 Estimation and Approximation Session Rio de Janeiro-Brazil 29 th August 2018 Information System Group Hasso Plattner Institut University of Potsdam
More informationFlexBench: A Flexible XML Query Benchmark
FlexBench: A Flexible XML Query Benchmark Maroš Vranec Irena Mlýnková Department of Software Engineering Faculty of Mathematics and Physics Charles University Prague, Czech Republic maros.vranec@gmail.com
More informationAn Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm
An Experimental Investigation into the Rank Function of the Heterogeneous Earliest Finish Time Scheduling Algorithm Henan Zhao and Rizos Sakellariou Department of Computer Science, University of Manchester,
More informationPathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data
PathStack : A Holistic Path Join Algorithm for Path Query with Not-predicates on XML Data Enhua Jiao, Tok Wang Ling, Chee-Yong Chan School of Computing, National University of Singapore {jiaoenhu,lingtw,chancy}@comp.nus.edu.sg
More informationA Framework for Clustering Massive Text and Categorical Data Streams
A Framework for Clustering Massive Text and Categorical Data Streams Charu C. Aggarwal IBM T. J. Watson Research Center charu@us.ibm.com Philip S. Yu IBM T. J.Watson Research Center psyu@us.ibm.com Abstract
More informationUsing semantic links to support top-k join queries in peer-to-peer networks
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Published online 19 December 2006 in Wiley InterScience (www.interscience.wiley.com)..1145 Using semantic links to support top-k join queries in peer-to-peer
More informationEdit Distance between XML and Probabilistic XML Documents
Edit Distance between XML and Probabilistic XML Documents Ruiming Tang 1,HuayuWu 1, Sadegh Nobari 1, and Stéphane Bressan 2 1 School of Computing, National University of Singapore {tangruiming,wuhuayu,snobari}@comp.nus.edu.sg
More informationSQL-to-MapReduce Translation for Efficient OLAP Query Processing
, pp.61-70 http://dx.doi.org/10.14257/ijdta.2017.10.6.05 SQL-to-MapReduce Translation for Efficient OLAP Query Processing with MapReduce Hyeon Gyu Kim Department of Computer Engineering, Sahmyook University,
More informationGeneral properties of staircase and convex dual feasible functions
General properties of staircase and convex dual feasible functions JÜRGEN RIETZ, CLÁUDIO ALVES, J. M. VALÉRIO de CARVALHO Centro de Investigação Algoritmi da Universidade do Minho, Escola de Engenharia
More informationClustering-Based Distributed Precomputation for Quality-of-Service Routing*
Clustering-Based Distributed Precomputation for Quality-of-Service Routing* Yong Cui and Jianping Wu Department of Computer Science, Tsinghua University, Beijing, P.R.China, 100084 cy@csnet1.cs.tsinghua.edu.cn,
More informationAssociation Rule Mining. Introduction 46. Study core 46
Learning Unit 7 Association Rule Mining Introduction 46 Study core 46 1 Association Rule Mining: Motivation and Main Concepts 46 2 Apriori Algorithm 47 3 FP-Growth Algorithm 47 4 Assignment Bundle: Frequent
More informationImprovement of SURF Feature Image Registration Algorithm Based on Cluster Analysis
Sensors & Transducers 2014 by IFSA Publishing, S. L. http://www.sensorsportal.com Improvement of SURF Feature Image Registration Algorithm Based on Cluster Analysis 1 Xulin LONG, 1,* Qiang CHEN, 2 Xiaoya
More informationImage Classification Using Wavelet Coefficients in Low-pass Bands
Proceedings of International Joint Conference on Neural Networks, Orlando, Florida, USA, August -7, 007 Image Classification Using Wavelet Coefficients in Low-pass Bands Weibao Zou, Member, IEEE, and Yan
More informationOnline algorithms for clustering problems
University of Szeged Department of Computer Algorithms and Artificial Intelligence Online algorithms for clustering problems Summary of the Ph.D. thesis by Gabriella Divéki Supervisor Dr. Csanád Imreh
More informationXML Query Processing. Announcements (March 31) Overview. CPS 216 Advanced Database Systems. Course project milestone 2 due today
XML Query Processing CPS 216 Advanced Database Systems Announcements (March 31) 2 Course project milestone 2 due today Hardcopy in class or otherwise email please I will be out of town next week No class
More informationSharing Several Secrets based on Lagrange s Interpolation formula and Cipher Feedback Mode
Int. J. Nonlinear Anal. Appl. 5 (2014) No. 2, 60-66 ISSN: 2008-6822 (electronic) http://www.ijnaa.semnan.ac.ir Sharing Several Secrets based on Lagrange s Interpolation formula and Cipher Feedback Mode
More informationPredictive Indexing for Fast Search
Predictive Indexing for Fast Search Sharad Goel Yahoo! Research New York, NY 10018 goel@yahoo-inc.com John Langford Yahoo! Research New York, NY 10018 jl@yahoo-inc.com Alex Strehl Yahoo! Research New York,
More informationRevision of a Floating-Point Genetic Algorithm GENOCOP V for Nonlinear Programming Problems
4 The Open Cybernetics and Systemics Journal, 008,, 4-9 Revision of a Floating-Point Genetic Algorithm GENOCOP V for Nonlinear Programming Problems K. Kato *, M. Sakawa and H. Katagiri Department of Artificial
More informationBichromatic Line Segment Intersection Counting in O(n log n) Time
Bichromatic Line Segment Intersection Counting in O(n log n) Time Timothy M. Chan Bryan T. Wilkinson Abstract We give an algorithm for bichromatic line segment intersection counting that runs in O(n log
More information