Information Retrieval Rank aggregation. Luca Bondi

Size: px

Start display at page:

Download "Information Retrieval Rank aggregation. Luca Bondi"

Jordan Chambers
5 years ago
Views:

1 Rank aggregation Luca Bondi

2 Motivations 2 Metasearch For a given query, combine the results from different search engines Combining ranking functions Text, links, anchor text, page title, etc. Comparing search engine qualities Multi-criteria queries (price, availability, quality of service) e.g., travel, restaurant Similarity search

3 Framework 3 Input: n candidates m voters Preferential voting Each voter gives a (partial) list of the candidates in order of preference Rank Voter 1 Voter n 2 Goal: produce a good consensus ordering of all n candidates

4 Borda s proposal (1770) 4 Election by order of merit First place 1 point Second place 2 points Candidate s score: sum of points Borda winner: lowest scoring candidate

5 Condorcet s proposal (1785) 5 Condorcet s winner: A candidate who defeats every other candidate in pair-wise majority rule election

6 Condorcet!= Borda 6 m = 10, n = A A A A A A B B B B B B B B B B C C C C C C C C C C A A A A Borda scores: A: = 18 B: = 16 Borda winner C: = 26 Condorcet s criterion: A beats both B and C in pair-wire majority A is the Condorcet winner A C B

7 Condorcet s paradox A B C B C A C A B A B C Condorcet s winner may not exist! Choose Condorcet s winner; if none, choose Borda winner Choose candidate with highest outdegree indegree in the majority voting graph

8 Kemeny s proposal 8 Axiomatic approach Define a distance between two preference orderings Obtain ordering that is least distant from individual orderings How to measure the distance between two preference orderings?

9 Metrics on permutations 9 Domain {1,2,, n} A ranking function r(i) < r(j) means that the ranking function r ranks i above j Most common rank distance functions Kendall-tau distance Spearman s footrule distance

10 Kendall-tau distance 10 Two ranking functions r, s K(r, s): number of pairs (i, j) such that r ranks (i, j) in one order and s ranks them in the opposite order Bubble-sort distance K is a metric Example AB: disagreement AC: agreement AD: disagreement BC: agreement BD: agreement CD: disagreement Number of disagreements: 3 (AB, AD, CD) K r, s = 3 r A B C D s B D A C

11 Spearman s footrule distance 11 > F r, s = < r i s i F is a metric (L A norm)?@a Example shift(a) = 2 shift(b) = 1 shift(c) = 1 shift(d) = 2 F r, s = 6 r A B C D s B D A C

12 Metrics on permutations 12 Many of the other metrics are computationally expensive (some NPhard, some not known to be polynomial-time computable, etc.) Also these two are perhaps the most natural for many applications Relationship between Kendall-tau distance and Spearman s footrule distance Diaconis-Graham inequality K r, s F r, s 2K r, s

13 Optimal aggregation 13 Given a metric d(r?,r E ) and input permutations r A,, r F Find permutation r such that r = arg min N F < d(r?, r)?@a Kemeny (Kendall) optimal aggregation: d = K NP-hard Spearman footrule optimal aggregation: d = F Can be computed in polynomial time

14 Footrule optimal aggregation 14 Optimal aggregation computed via minimum cost perfect matching Elements in the optimal rank 1 1 Positions in source rankings p F q < r p q?@a n n

15 Footrule optimal aggregation: example 15 Define all the ranking permutations For each candidate permutation Calculate the Spearman s distance between the candidate permutation and each of the input rankings Sum the distances A B C B D D C A B D C A Select as optimal the permutation with the smallest total distance to the input rankings

16 16 Footrule aggregation: Median rank approximation Given input permutations r A,, r F μ R i = median{r A i,, r F i } Order μ R to obtain a new permutation μ Example μ R A = median 1,3,4 = 3 A B C μ R B = median 2,1,3 = 2 B D D μ R C = median 3,4,1 = 3 C A B μ R D = median 4,2,2 = 2 Different possible aggregated rankings D C A μ = B, D, A, C; μ = D, B, A, C; μ = B, D, C, A; μ = D, B, C, A;

17 17 Footrule aggregation: Median rank approximation If the median ranks of the candidates are unique (i.e., form a permutation), then this permutation is a footrule optimal aggregation not the case of the previous example Even when the ranks are not unique, median rank aggregation is an approximation to Spearman footrule optimal aggregation

18 Top-k rank aggregation 18 Data attributes may be non-numeric Even if they are numeric, differences within certain ranges may not matter to the user Rankings may have ties between elements (partial rankings) Traditionally, two ways of accessing data: Random access: given an element, retrieve its score (position in the ranked list or other associated value) Sorted (sequential) access: access, one by one, the next element (together with its score) in a ranked list, starting from top

19 Top-k rank aggregation 19 Main interest in the top k elements of the aggregation Need for algorithms that quickly obtain the top results without having to read each ranking in its entirety Several algorithms developed in the literature to minimize the accesses when determining the top k elements

20 Combining opaque rankings 20 Techniques using only the position of the elements in the ranking (no other associated score) We review MedRank, An algorithm for rank aggregation based on the notion of median Input: m rankings of n elements Output: the top k elements in the aggregated ranking 1. Use sequential accesses in each ranking, one element at a time, until there are k elements that occur in more than m/2 rankings 2. These are the top k elements

21 MedRank: example 21 Hotels by price Ibis Etap Novotel Mercure Hilton Sheraton Crillon Hotels by rating Crillon Novotel Sheraton Hilton Ibis Ritz Lutetia Top 3 hotels Novotel Hilton Ibis Strategy: Make one sequential access at a time in each ranking Look for hotels that appear in both rankings NB: price and rating are opaque, only the position matters

22 Combining ranking with an objective function 22 Several studies consider rankings where the objects, besides the position, also include a score given in the [0, 1] interval Score gives the degree of matching between the object and the given criterion Example query: names of albums by the Beatles whose cover color is red Being an album by the Beatles has a crisp truth value (0 or 1) Redness of the cover has a fuzzy truth value in [0, 1]

23 Fagin s algorithm for monotone queries 23 Input: a monotone query combining rankings r A,, r F Output: the top k <object, score> pairs 1. Extract the same number of objects by sequential accesses in each ranking until there are at least k objects that match the query 2. For each extracted object, compute its overall score by making random accesses wherever needed 3. Among these, output the k objects with the best overall score Complexity is sub-linear in the number n of objects Proportional to the square root of n when combining two rankings

24 24 Fagin s algorithm for monotone queries: example Hotels Cheapness Hotels Rating Ibis.92 Crillon.9 Etap.91 Novotel.9 Novotel.85 Sheraton.8 Mercure.85 Hilton.7 Hilton.825 Ibis.7 Sheraton.8 Etap.7 Crillon.75 Mercure.6 Top 3 Score Query: hotels with best price and rating Strategy: Make one sequential access at a time in each ranking Look for hotels that appear in both rankings NB: price and rating are used to compute the overall score

25 25 Fagin s algorithm for monotone queries: example Hotels Cheapness Hotels Rating Ibis.92 Crillon.9 Etap.91 Novotel.9 Novotel.85 Sheraton.8 Mercure.85 Hilton.7 Hilton.825 Ibis.7 Sheraton.8 Etap.7 Crillon.75 Mercure.6 Top 3 Score 1. Extract the same number of objects by sequential accesses in each ranking until there are at least k objects that match the query

26 26 Fagin s algorithm for monotone queries: example Hotels Cheapness Hotels Rating Ibis.92 Crillon.9 Etap.91 Novotel.9 Novotel.85 Sheraton.8 Mercure.85 Hilton.7 Hilton.825 Ibis.7 Sheraton.8 Etap.7 Crillon.75 Mercure.6 Top 3 Score 1. Extract the same number of objects by sequential accesses in each ranking until there are at least k objects that match the query

27 27 Fagin s algorithm for monotone queries: example Hotels Cheapness Hotels Rating Ibis.92 Crillon.9 Etap.91 Novotel.9 Novotel.85 Sheraton.8 Mercure.85 Hilton.7 Hilton.825 Ibis.7 Sheraton.8 Etap.7 Crillon.75 Mercure.6 Top 3 Score 1. Extract the same number of objects by sequential accesses in each ranking until there are at least k objects that match the query

28 28 Fagin s algorithm for monotone queries: example Hotels Cheapness Hotels Rating Ibis.92 Crillon.9 Etap.91 Novotel.9 Novotel.85 Sheraton.8 Mercure.85 Hilton.7 Hilton.825 Ibis.7 Sheraton.8 Etap.7 Crillon.75 Mercure.6 Top 3 Score 1. Extract the same number of objects by sequential accesses in each ranking until there are at least k objects that match the query

29 29 Fagin s algorithm for monotone queries: example Hotels Cheapness Hotels Rating Ibis.92 Crillon.9 Etap.91 Novotel.9 Novotel.85 Sheraton.8 Mercure.85 Hilton.7 Hilton.825 Ibis.7 Sheraton.8 Etap.7 Crillon.75 Mercure.6 Top 3 Score 1. Extract the same number of objects by sequential accesses in each ranking until there are at least k objects that match the query

30 30 Fagin s algorithm for monotone queries: example Hotels Cheapness Hotels Rating Ibis.92 Crillon.9 Etap.91 Novotel.9 Novotel.85 Sheraton.8 Mercure.85 Hilton.7 Hilton.825 Ibis.7 Sheraton.8 Etap.7 Crillon.75 Mercure.6 Top 3 Score Novotel.875 Crillon.825 Ibis.81 Ibis: \.^_`\.a _ = 0.81; Crillon: 0.825; Etap: 0.805; Novotel:0.875; Sheraton: 0.8; Mercure: 0.725; Hilton: For each extracted object, compute its overall score by making random accesses wherever needed 3. Among these, output the k objects with the best overall score

31 31 Fagin s threshold algorithm for monotone queries Input: a monotone query combining rankings r A,, r F Output: the top k <object, score> pairs 1. Init: R = empty set, th = + 2. For i = 1:m a) Extract the next object from r? by sequential accesses b) For each extracted object, compute its overall score by making random accesses wherever needed c) If R contains less than k objects or the minimum score of the objects in R is smaller then the new object score, add the object to R 3. Let th = threshold value obtained aggregating the last score observed in each ranking 4. If more than k objects have been evaluated, and the minimum of the score of the documents in R is greater than or equal to th, return R 5. Redo step 2

32 32 Fagin s threshold algorithm for monotone queries: example Hotels Cheapness Hotels Rating Ibis.92 Crillon.9 Etap.91 Novotel.9 Novotel.85 Sheraton.8 Mercure.85 Hilton.7 Hilton.825 Ibis.7 Sheraton.8 Etap.7 Crillon.75 Mercure.6 Top 3 Score th = + 1. Init: R = empty set, th = +

33 33 Fagin s threshold algorithm for monotone queries: example Hotels Cheapness Hotels Rating Ibis.92 Crillon.9 Etap.91 Novotel.9 Novotel.85 Sheraton.8 Mercure.85 Hilton.7 Hilton.825 Ibis.7 Sheraton.8 Etap.7 Crillon.75 Mercure.6 Top 3 Score Ibis.81 th = + 2. For i = 1:m a) Extract the next object from r? by sequential accesses b) For each extracted object, compute its overall score by making random accesses wherever needed c) If R contains less than k objects or the minimum score of the objects in R is smaller then the new object score add the object to R

34 34 Fagin s threshold algorithm for monotone queries: example Hotels Cheapness Hotels Rating Ibis.92 Crillon.9 Etap.91 Novotel.9 Novotel.85 Sheraton.8 Mercure.85 Hilton.7 Hilton.825 Ibis.7 Sheraton.8 Etap.7 Crillon.75 Mercure.6 Top 3 Score Crillon.825 Ibis.81 th = + 2. For i = 1:m a) Extract the next object from r? by sequential accesses b) For each extracted object, compute its overall score by making random accesses wherever needed c) If R contains less than k objects or the minimum score of the objects in R is smaller then the new object score add the object to R

35 35 Fagin s threshold algorithm for monotone queries: example Hotels Cheapness Hotels Rating Ibis.92 Crillon.9 Etap.91 Novotel.9 Novotel.85 Sheraton.8 Mercure.85 Hilton.7 Hilton.825 Ibis.7 Sheraton.8 Etap.7 Crillon.75 Mercure.6 Top 3 Score Crillon.825 Ibis.81 th = \.^_`\.^ _ = Let th = threshold value obtained aggregating the last score observed in each ranking 4. If more than k objects have been evaluated, and the minimum of the score of the documents in R is greater than or equal to th, return R 5. Redo step 2

36 36 Fagin s threshold algorithm for monotone queries: example Hotels Cheapness Hotels Rating Ibis.92 Crillon.9 Etap.91 Novotel.9 Novotel.85 Sheraton.8 Mercure.85 Hilton.7 Hilton.825 Ibis.7 Sheraton.8 Etap.7 Crillon.75 Mercure.6 Top 3 Score Crillon.825 Ibis.81 Etap.805 th = For i = 1:m a) Extract the next object from r? by sequential accesses b) For each extracted object, compute its overall score by making random accesses wherever needed c) If R contains less than k objects or the minimum score of the objects in R is smaller then the new object score add the object to R

37 37 Fagin s threshold algorithm for monotone queries: example Hotels Cheapness Hotels Rating Ibis.92 Crillon.9 Etap.91 Novotel.9 Novotel.85 Sheraton.8 Mercure.85 Hilton.7 Hilton.825 Ibis.7 Sheraton.8 Etap.7 Crillon.75 Mercure.6 Top 3 Score Novotel.875 Crillon.825 Ibis.81 th = For i = 1:m a) Extract the next object from r? by sequential accesses b) For each extracted object, compute its overall score by making random accesses wherever needed c) If R contains less than k objects or the minimum score of the objects in R is smaller then the new object score add the object to R

38 38 Fagin s threshold algorithm for monotone queries: example Hotels Cheapness Hotels Rating Ibis.92 Crillon.9 Etap.91 Novotel.9 Novotel.85 Sheraton.8 Mercure.85 Hilton.7 Hilton.825 Ibis.7 Sheraton.8 Etap.7 Crillon.75 Mercure.6 Top 3 Score Novotel.875 Crillon.825 Ibis.81 th = \.^A`\.^ _ = Let th = threshold value obtained aggregating the last score observed in each ranking 4. If more than k objects have been evaluated, and the minimum of the score of the documents in R is greater than or equal to th, return R 5. Redo step 2

39 39 Fagin s threshold algorithm for monotone queries: example Hotels Cheapness Hotels Rating Ibis.92 Crillon.9 Etap.91 Novotel.9 Novotel.85 Sheraton.8 Mercure.85 Hilton.7 Hilton.825 Ibis.7 Sheraton.8 Etap.7 Crillon.75 Mercure.6 Top 3 Score Novotel.875 Crillon.825 Ibis.81 th = For i = 1:m a) Extract the next object from r? by sequential accesses b) For each extracted object, compute its overall score by making random accesses wherever needed c) If R contains less than k objects or the minimum score of the objects in R is smaller then the new object score add the object to R

40 40 Fagin s threshold algorithm for monotone queries: example Hotels Cheapness Hotels Rating Ibis.92 Crillon.9 Etap.91 Novotel.9 Novotel.85 Sheraton.8 Mercure.85 Hilton.7 Hilton.825 Ibis.7 Sheraton.8 Etap.7 Crillon.75 Mercure.6 Top 3 Score Novotel.875 Crillon.825 Ibis.81 th = For i = 1:m a) Extract the next object from r? by sequential accesses b) For each extracted object, compute its overall score by making random accesses wherever needed c) If R contains less than k objects or the minimum score of the objects in R is smaller then the new object score add the object to R

41 41 Fagin s threshold algorithm for monotone queries: example Hotels Cheapness Hotels Rating Ibis.92 Crillon.9 Etap.91 Novotel.9 Novotel.85 Sheraton.8 Mercure.85 Hilton.7 Hilton.825 Ibis.7 Sheraton.8 Etap.7 Crillon.75 Mercure.6 Top 3 Score Novotel.875 Crillon.825 Ibis.81 th = \.fg`\.f _ = Let th = threshold value obtained aggregating the last score observed in each ranking 4. If more than k objects have been evaluated, and the minimum of the score of the documents in R is greater than or equal to th, return R 5. Redo step 2

42 42 Fagin s threshold algorithm for monotone queries: example Hotels Cheapness Hotels Rating Ibis.92 Crillon.9 Etap.91 Novotel.9 Novotel.85 Sheraton.8 Mercure.85 Hilton.7 Hilton.825 Ibis.7 Sheraton.8 Etap.7 Crillon.75 Mercure.6 Top 3 Score Novotel.875 Crillon.825 Ibis.81 th = For i = 1:m a) Extract the next object from r? by sequential accesses b) For each extracted object, compute its overall score by making random accesses wherever needed c) If R contains less than k objects or the minimum score of the objects in R is smaller then the new object score add the object to R

43 43 Fagin s threshold algorithm for monotone queries: example Hotels Cheapness Hotels Rating Ibis.92 Crillon.9 Etap.91 Novotel.9 Novotel.85 Sheraton.8 Mercure.85 Hilton.7 Hilton.825 Ibis.7 Sheraton.8 Etap.7 Crillon.75 Mercure.6 Top 3 Score Novotel.875 Crillon.825 Ibis.81 th = For i = 1:m a) Extract the next object from r? by sequential accesses b) For each extracted object, compute its overall score by making random accesses wherever needed c) If R contains less than k objects or the minimum score of the objects in R is smaller then the new object score add the object to R

44 44 Fagin s threshold algorithm for monotone queries: example Hotels Cheapness Hotels Rating Ibis.92 Crillon.9 Etap.91 Novotel.9 Novotel.85 Sheraton.8 Mercure.85 Hilton.7 Hilton.825 Ibis.7 Sheraton.8 Etap.7 Crillon.75 Mercure.6 Top 3 Score Novotel.875 Crillon.825 Ibis.81 th = \.fg`\.a _ = Let th = threshold value obtained aggregating the last score observed in each ranking 4. If more than k objects have been evaluated, and the minimum of the score of the documents in R is greater than or equal to th, return R 5. Redo step 2

Consensus Answers for Queries over Probabilistic Databases. Jian Li and Amol Deshpande University of Maryland, College Park, USA

Consensus Answers for Queries over Probabilistic Databases Jian Li and Amol Deshpande University of Maryland, College Park, USA Probabilistic Databases Motivation: Increasing amounts of uncertain data