modern database systems lecture 5 : top-k retrieval
|
|
- Cora Martin
- 6 years ago
- Views:
Transcription
1 modern database systems lecture 5 : top-k retrieval Aristides Gionis Michael Mathioudakis spring 2016
2 announcements problem session on Monday, March 7, 2-4pm, at T2 solutions of the problems in homework 1 homework 2 will be out on Monday, Feb 29
3 Journal of Computer and System Sciences 66 (2003) today s Optimal aggregation algorithms for middleware $ Ronald Fagin, a, Amnon Lotem, b and Moni Naor c,1 a IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120, USA b Department of Computer Science, University of Maryland-College Park, College Park, MD 20742, USA c Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 76100, Israel lecture Received 6 September 2001; revised 1 April 2002 Abstract Assume that each object in a database has m grades, or scores, one for each of m attributes. For example, an object can have a color grade, that tells how red it is, and a shape grade, that tells how round it is. For each attribute, there is a sorted list, which lists each object and its grade under that attribute, sorted by grade (highest grade first). Each object is assigned an overall grade, that is obtained by combining the attribute grades using a fixed monotone aggregation function, or combining rule, such as min or average. To determine the top k objects, that is, k objects with the highest overall grades, the naive algorithm must access every object in the database, to find its grade under each attribute. Fagin has given an algorithm ( Fagin s Algorithm, or FA) that is much more efficient. For some monotone aggregation functions, FA is optimal with high probability in the worst case. We analyze an elegant and remarkably simple algorithm ( the threshold algorithm, or TA) that is optimal in a much stronger sense than FA. We show that TA is essentially optimal, not just for some monotone aggregation functions, but for all of them, and not just in a high-probability worst-case sense, but over every database. Unlike FA, which requires large buffers (whose size may grow unboundedly as the database size grows), TA requires only a small, constant-size buffer. TA allows early stopping, which yields, in a precise sense, an approximate version of the top k answers. We distinguish two types of access: sorted access (where the middleware system obtains the grade of an object in some sorted list by proceeding through the list sequentially from the top), and random access (where the middleware system requests the grade of object in a list, and obtains it in one step). We consider the scenarios where random access is either impossible, or expensive relative to sorted access, and provide algorithms that are essentially optimal for these cases as well. r 2003 Elsevier Science (USA). All rights reserved. Ronald Fagin, Amnon Lotem, Moni Naor Optimal aggregation algorithms for middleware JCSS 2003 $ Extended abstract appeared in Proceedings of the 20th ACM Symposium on Principles of Database Systems, 2001 (PODS 2001), pp Corresponding author. addresses: fagin@almaden.ibm.com (R. Fagin), lotem@cs.umd.edu (A. Lotem), naor@wisdom.weizmann. ac.il (M. Naor). 1 The work of this author was performed while he was a Visiting Scientist at the IBM Almaden Research Center /03/$ - see front matter r 2003 Elsevier Science (USA). All rights reserved. doi: /s (03)
4 top-k retrieval users specify information need via a query SQL, mongodb, keyword search, too many data objects satisfy the query present top-k objects assumes ranking according to a relevance score examples find a flat to rent according to price, location, size, find a flight according to price, departure and arrival time, number of stops,
5 top-k retrieval consider the following scenario data objects have different attributes given a query, we can obtain a ranking of the objects according to the different attributes a black-box subsystem for each attribute want to combine (aggregate) the individual rankings into a single ranking top-k is obtained from the aggregate ranking aggregator is built on top of the subsystems cannot modify the black-box subsystems subsystems are viewed as middleware
6 middleware aggregation examples example 1: building a meta-search engine
7 middleware aggregation examples example 1: building a meta-search engine
8 middleware aggregation examples example 1: building a meta-search engine
9 middleware aggregation examples example 1: building a meta-search engine
10 middleware aggregation examples example 1: building a meta-search engine
11 middleware aggregation examples example 1: building a meta-search engine
12 middleware aggregation examples example 2 : image retrieval with multiple attributes
13 middleware aggregation examples example 2 : image retrieval with multiple attributes query query is a photo in flickr; assume that it is geolocated in Helsinki and contains the tag cinnamon roll
14 middleware aggregation examples example 2 : image retrieval with multiple attributes query query is a photo in flickr; assume that it is geolocated in Helsinki and contains the tag cinnamon roll text search :
15 middleware aggregation examples example 2 : image retrieval with multiple attributes query query is a photo in flickr; assume that it is geolocated in Helsinki and contains the tag cinnamon roll text search : color search :
16 middleware aggregation examples example 2 : image retrieval with multiple attributes query query is a photo in flickr; assume that it is geolocated in Helsinki and contains the tag cinnamon roll text search : color search : location search :
17 middleware aggregation examples example 2 : image retrieval with multiple attributes query query is a photo in flickr; assume that it is geolocated in Helsinki and contains the tag cinnamon roll text search : color search : location search :
18 top-k aggregation abstraction we are given a set of n objects each has a set of m attributes X 1,...,X n A 1,...,A m object i on attribute j has score r ij we typically assume 0 apple r ij apple 1 r ij the higher the value of the better the object X i according to attribute object i has overall score A j f i = f(r i1,...,r im ) retrieve the top-k items according to score f i
19 top-k aggregation example A 1 A 2 A 3 X 1 X 2 X 3 X
20 top-k aggregation example A 1 A 2 A 3 f = max{r 1,r 2,r 3 } X 1 X 2 X 3 X
21 top-k aggregation example A 1 A 2 A 3 f f = max{r 1,r 2,r 3 } X 1 X 2 X 3 X
22 top-k aggregation example f = max{r 1,r 2,r 3 } A 1 A 2 A 3 f rank X X X X
23 sorted lists we assume that objects are available in m sorted lists this models our assumption about middleware subsystems the j-th list corresponds to attribute A j the j-th list ranks all objects according to values r ij
24 aggregation functions score of object i is given by aggregation function f i = f(r i1,...,r im ) common choices for min average or sum f we typically assume monotonicity an aggregate function is monotone if f f(r 1,...,r m ) f(r 0 1,...,r 0 m) whenever r j r 0 j
25 modes of access and cost model sorted access can get objects in each list in decreasing order cost to get the next object in a list C S random access can get the value of a specific object in a list cost for a random access C R middleware cost cost for s sorted accesses and random accesses sc S + rc R r
26 modes of access and cost model what is and for the web meta-search engine setting? C S C R
27 modes of access and cost model what is and for the web meta-search engine C S setting? C R = 1 C R
28 modes of access and cost model what is and for the web meta-search engine C S setting? C R = 1 C R no random access (NRA) special case of the model
29 example R 1 R 2 R 3 X 1 X X X X X 3 X X X 1 X X X 5 X X X compute top-2 for sum aggregation function
30 naive algorithm for each object i, use the aggregation function to compute the score f i get the top-k according to all computed scores
31 naive algorithm questions : do we need to compute the score for every object in the database? can we safely ignore some objects whose scores are lower than what we already have?
32 Fagin s algorithm (FA) 1. perform sorted accesses in all lists in parallel until there are k objects that have been seen in all lists 2. perform random accesses to obtain the scores of all objects seen so far 3. compute score for all objects and find the top-k
33 Fagin s algorithm example R 1 R 2 R 3 X 1 X X X X X 3 X X X 1 X X X 5 X X X compute top-2 for sum aggregation function
34 Fagin s algorithm example R 1 R 2 R 3 X 1 X X X X X 3 X X X 1 X X X 5 X X X compute top-2 for sum aggregation function sorted access random access
35 Fagin s algorithm example R 1 R 2 R 3 X 1 X X X X X 3 X X X 1 X X X 5 X X X compute top-2 for sum aggregation function sorted access random access
36 Fagin s algorithm example R 1 R 2 R 3 X 1 X X X X X 3 X X X 1 X X X 5 X X X compute top-2 for sum aggregation function sorted access random access
37 Fagin s algorithm example R 1 R 2 R 3 X 1 X X X X X 3 X X X 1 X X X 5 X X X compute top-2 for sum aggregation function sorted access random access
38 Fagin s algorithm example R 1 R 2 R 3 X 1 X X X X X 3 X X X 1 X X X 5 X X X compute top-2 for sum aggregation function sorted access random access
39 Fagin s algorithm example R 1 R 2 R 3 X 1 X X X X X 3 X X X 1 X X X 5 X X X X 5 cannot be in the top-2. why? sorted access random access
40 Fagin s algorithm example R 1 R 2 R 3 X 1 X X X X X 3 X X X 1 X X X 5 X X X X 5 cannot be in the top-2. why? monotonicity sorted access random access
41 Fagin s algorithm is correct assume object y was not seen at all object y has values y 1,...,y m assume object x is one of the objects seen by FA during sorted access object x has values for all attributes j it is : x 1,...,x m y j apple x j therefore f y = f(y 1,...,y m ) apple f(x 1,...,x m )=f x for all objects seen the values of all attributes are known thus, top-k returns the correct results
42 Fagin s algorithm note correctness proof assumes only monotonicity Fagin s algorithm is correct for any monotone aggregation function
43 can we do better?
44 can we do better? yes! threshold algorithm also proposed by Fagin
45 the threshold algorithm (TA) 1. do a sorted access in parallel to each of the m sorted lists 2. for each object seen under sorted access : 1. retrieve all of its values by random access 2. compute 3. if this is one of the top-k answers so far, remember it 3. for the j-th list, let be the value of the last object seen under sorted access 4. define the threshold value to be x 5. when k objects have been seen whose score is at least, then stop 6. return the top-k answers x 1,...,x m f x = f(x 1,...,x m ) ˆx j = f(ˆx 1,...,ˆx m )
46 threshold algorithm example R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k compute top-2 for sum aggregation function
47 threshold algorithm example R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k compute top-2 for sum aggregation function
48 threshold algorithm example R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k compute top-2 for sum aggregation function
49 threshold algorithm example R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k compute top-2 for sum aggregation function
50 threshold algorithm example R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k X X compute top-2 for sum aggregation function
51 threshold algorithm example R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k X X compute top-2 for sum aggregation function
52 threshold algorithm example R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k X X compute top-2 for sum aggregation function
53 threshold algorithm example R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k X X compute top-2 for sum aggregation function
54 threshold algorithm example R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k X X compute top-2 for sum aggregation function
55 threshold algorithm example R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k X X compute top-2 for sum aggregation function
56 threshold algorithm example R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k X X compute top-2 for sum aggregation function
57 threshold algorithm example R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k X X compute top-2 for sum aggregation function
58 threshold algorithm is correct assume object y was not seen at all object y has values y 1,...,y m assume object x is one of the objects seen by TA during sorted access object x has values x 1,...,x m for all attributes j it is : y j apple ˆx j apple x j therefore f y = f(y 1,...,y m ) apple f(ˆx 1,...,ˆx m ) apple f(x 1,...,x m )=f x for all objects seen the values of all attributes are known thus, top-k returns the correct results
59 threshold algorithm properties TA is correct for any monotone aggregation function TA uses a bounded-size buffer independent of the size of the database TA is optimal in a very strong sense it is as good as any other algorithm on every instance (instance optimal) any other algorithm means : except pathological algorithms as good means : within a constant factor pathological means : making wild guesses
60 instance optimality let A be a class of algorithms let D be a class of legal inputs (datasets) for A 2 A and D 2 D we consider performance cost cost(a, D) definition an algorithm B 2 A is called instance optimal if for any algorithm A 2 A and any dataset D 2 D it is cost(b, D) =O(cost(A, D)) that is, there are constants c and d such that cost(b, D) =c cost(a, D)+d
61 instance optimality instance optimality is a very strong notion we are comparing a deterministic algorithm against all possible nondeterministic algorithms consider search on a sorted list binary search is worst-case optimal however it is not instance optimal there is a nondeterministic algorithm that finds an object with one probe, or finds that the object does not exist with two probes but such a nondeterministic algorithm makes wild guesses
62 instance optimality of TA assume that the aggregation function is monotone f let let D A be the class of all databases be the class of all algorithms that correctly find the top k answers for f for every database and that do not make wild guesses then TA is instance optimal over A and D
63 instance optimality of TA proof sketch let A be any algorithm that runs over a database s.t. it returns the correct top-k and it does not make wild guesses let d = max the maximum depth of 1applejapplem d j A assume that A sees a distinct objects then since A makes no wild guesses a d cost of A is at least a C S
64 instance optimality of TA proof sketch A A 1 A 2 A m execution of d 1 d 2... d m maximum depth : d = max 1applejapplem d j cost : a C S, a d
65 instance optimality of TA proof sketch A A 1 A 2 A m execution of d 1 d 2... d m maximum depth : d = max 1applejapplem d j cost : a C S, a d claim : TA reaches maximum depth a + k
66 instance optimality of TA proof sketch assume claim true (TA reaches maximum depth a + k ) cost of TA is at most (a + k)mc S +(a + k)m(m 1)C R or amc S + am(m 1)C R +(kmc S + km(m 1)C R ) last term is a constant optimality ratio between A and TA is amc S + am(m 1)C R = m + m(m 1) C R ac S C S that is, a constant ratio QED modulo claim
67 instance optimality of TA proof sketch (main case : we show that TA reaches max depth a) (bound a + k is shown in corner cases) let Y be the output of A (consisting of top-k objects) let ˆx 1,...,ˆx m be the values of objects at the end of each list when A terminates define A = f(ˆx 1,...,ˆx m ) an object y is called big if all objects y 2 Y are big f y A
68 instance optimality of TA proof sketch A 1 A 2 A m execution of A on database D d 1 d 2 d m ˆx m ˆx 1... ˆx 2
69 instance optimality of TA proof sketch execution of A on database D d 1 x : A 1 A 2 A m d m 1 ˆx m ˆx 2 d... 2 ˆx x : 2 ˆx ˆx m ˆx 1 x : consider database with planted object x :ˆx 1...ˆx m D 0
70 instance optimality of TA proof sketch execution of A on database D d 1 x : A 1 A 2 A m d m 1 ˆx m ˆx 2 d... 2 ˆx x : 2 ˆx ˆx m ˆx 1 x : consider database with planted object x :ˆx 1...ˆx m D 0 execution of A on D and is identical by correctness of A we get D 0 f y f x = f(ˆx 1,...,ˆx m ) for all y 2 Y
71 instance optimality of TA proof sketch when TA reaches depth d apple a it has seen all objects in Y since all objects in Y are big (they have value larger than threshold) TA will halt QED
72 restricting sorted access assume that a subset of the lists is not accessible under sorted access mode TA can be easily modified to handle such scenario define = f(ˆx 1,...,ˆx m ) where ˆx j =1 for all inaccessible lists all lists that are inaccessible under sorted access are access only under random access mode
73 threshold algorithm no sorted access in R3 R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k compute top-2 for sum aggregation function
74 threshold algorithm no sorted access in R3 R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k compute top-2 for sum aggregation function
75 threshold algorithm no sorted access in R3 R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k compute top-2 for sum aggregation function
76 threshold algorithm no sorted access in R3 R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k compute top-2 for sum aggregation function
77 threshold algorithm no sorted access in R3 R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k X X compute top-2 for sum aggregation function
78 threshold algorithm no sorted access in R3 R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k X X compute top-2 for sum aggregation function
79 threshold algorithm no sorted access in R3 R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k X X compute top-2 for sum aggregation function
80 threshold algorithm no sorted access in R3 R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k X X compute top-2 for sum aggregation function
81 threshold algorithm no sorted access in R3 R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k X X compute top-2 for sum aggregation function
82 threshold algorithm no sorted access in R3 R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k X X compute top-2 for sum aggregation function
83 threshold algorithm no sorted access in R3 R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k X X compute top-2 for sum aggregation function
84 threshold algorithm no sorted access in R3 R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k X X compute top-2 for sum aggregation function
85 threshold algorithm no sorted access in R3 R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k X X compute top-2 for sum aggregation function
86 threshold algorithm no sorted access in R3 R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k X X compute top-2 for sum aggregation function
87 threshold algorithm no sorted access in R3 R 1 R 2 R 3 threshold X 1 X X X X X 3 X X X 1 X X X 5 X X X top-k X X compute top-2 for sum aggregation function
88 restricting random access perform sorted access on all lists in parallel; at depth d: maintain worst scores x any object seen in lists best(x) =f(x 1,...,x j, ˆx j+1,...,ˆx m ) worst(x) =f(x 1,...,x j, 0,...,0) top-k contains k objects with max worst scores at depth d (break ties using best) = k-th worst score in top-k object y is viable if stop when top-k contains more than k distinct objects and no object outside top-k is viable ˆx 1,...,ˆx m {1,...,j} best(y) >
89 approximate top-k finding top-k objects approximately for > 0, an -approximation of top k answers is a collection of k objects x 1,...,x k so that for any y not among them, it is (1 + )f f xi y TA can be easily modified to an approximation algorithm simply change the stopping rule into : when k objects have been seen whose score is at least 1+ then stop
90 summary rank aggregation and top-k algorithms Fagin s algorithm and threshold algorithm instance optimality algorithm variants depending on cost model next lecture (Michael) big data platforms
Integrating rankings: Problem statement
Integrating rankings: Problem statement Each object has m grades, oneforeachofm criteria. The grade of an object for field i is x i. Normally assume 0 x i 1. Typically evaluations based on different criteria
More informationCombining Fuzzy Information - Top-k Query Algorithms. Sanjay Kulhari
Combining Fuzzy Information - Top-k Query Algorithms Sanjay Kulhari Outline Definitions Objects, Attributes and Scores Querying Fuzzy Data Top-k query algorithms Naïve Algorithm Fagin s Algorithm (FA)
More informationCombining Fuzzy Information: an Overview
Combining Fuzzy Information: an Overview Ronald Fagin IBM Almaden Research Center 650 Harry Road San Jose, California 95120-6099 email: fagin@almaden.ibm.com http://www.almaden.ibm.com/cs/people/fagin/
More informationOptimal algorithms for middleware
Optimal aggregation algorithms for middleware S856 Fall 2005 Presentation Weihan Wang w23wang@uwaterloo.ca November 23, 2005 About the paper Ronald Fagin, IBM Research Amnon Lotem, Maryland Moni Naor,
More informationOptimal Aggregation Algorithms for Middleware
Optimal Aggregation Algorithms for Middleware [Extended Abstract] Ronald Fagin IBM Almaden Research Center 650 Harry Road San Jose, CA 95120 fagin@almaden.ibm.com Amnon Lotem University of Maryland College
More informationThe interaction of theory and practice in database research
The interaction of theory and practice in database research Ron Fagin IBM Research Almaden 1 Purpose of This Talk Encourage collaboration between theoreticians and system builders via two case studies
More information. A quick enumeration leads to five possible upper bounds and we are interested in the smallest of them: h(x 1, x 2, x 3) min{x 1
large-scale search engines [14]. These intersection lists, however, take up additional space dictating a cost-benefit trade-off, and careful strategies have been proposed to select the pairs of terms for
More informationA Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods
A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods S.Anusuya 1, M.Balaganesh 2 P.G. Student, Department of Computer Science and Engineering, Sembodai Rukmani Varatharajan Engineering
More informationCS264: Homework #1. Due by midnight on Thursday, January 19, 2017
CS264: Homework #1 Due by midnight on Thursday, January 19, 2017 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. See the course site for submission
More informationDIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY
DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY Reham I. Abdel Monem 1, Ali H. El-Bastawissy 2 and Mohamed M. Elwakil 3 1 Information Systems Department, Faculty of computers and information,
More informationINFO 1103 Homework Project 2
INFO 1103 Homework Project 2 February 15, 2019 Due March 13, 2019, at the end of the lecture period. 1 Introduction In this project, you will design and create the appropriate tables for a version of the
More informationCSC 261/461 Database Systems Lecture 19
CSC 261/461 Database Systems Lecture 19 Fall 2017 Announcements CIRC: CIRC is down!!! MongoDB and Spark (mini) projects are at stake. L Project 1 Milestone 4 is out Due date: Last date of class We will
More informationLecture 8 13 March, 2012
6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 8 13 March, 2012 1 From Last Lectures... In the previous lecture, we discussed the External Memory and Cache Oblivious memory models.
More informationEfficient Top-k Algorithms for Fuzzy Search in String Collections
Efficient Top-k Algorithms for Fuzzy Search in String Collections Rares Vernica Chen Li Department of Computer Science University of California, Irvine First International Workshop on Keyword Search on
More informationModern Database Systems CS-E4610
Modern Database Systems CS-E4610 Aristides Gionis Michael Mathioudakis Spring 2017 what is a database? a collection of data what is a database management system?... a.k.a. database system software to store,
More informationPredictive Indexing for Fast Search
Predictive Indexing for Fast Search Sharad Goel Yahoo! Research New York, NY 10018 goel@yahoo-inc.com John Langford Yahoo! Research New York, NY 10018 jl@yahoo-inc.com Alex Strehl Yahoo! Research New York,
More informationFinding k-dominant Skylines in High Dimensional Space
Finding k-dominant Skylines in High Dimensional Space Chee-Yong Chan, H.V. Jagadish 2, Kian-Lee Tan, Anthony K.H. Tung, Zhenjie Zhang School of Computing 2 Dept. of Electrical Engineering & Computer Science
More informationCombination of TA- and MD-algorithm for Efficient Solving of Top-K Problem according to User s Preferences
Combination of TA- and MD-algorithm for Efficient Solving of Top-K Problem according to User s Preferences Matúš Ondreička and Jaroslav Pokorný Department of Software Engineering, Faculty of Mathematics
More informationIO-Top-k at TREC 2006: Terabyte Track
IO-Top-k at TREC 2006: Terabyte Track Holger Bast Debapriyo Majumdar Ralf Schenkel Martin Theobald Gerhard Weikum Max-Planck-Institut für Informatik, Saarbrücken, Germany {bast,deb,schenkel,mtb,weikum}@mpi-inf.mpg.de
More informationEvaluating Top-k Queries Over Web-Accessible Databases
Evaluating Top-k Queries Over Web-Accessible Databases AMÉLIE MARIAN Columbia University, New York NICOLAS BRUNO Microsoft Research, Redmond, Washington and LUIS GRAVANO Columbia University, New York A
More informationAdvanced Data Management Technologies
ADMT 2017/18 Unit 13 J. Gamper 1/42 Advanced Data Management Technologies Unit 13 DW Pre-aggregation and View Maintenance J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Acknowledgements:
More informationEfficient Aggregation of Ranked Inputs
Efficient Aggregation of Ranked Inputs Nikos Mamoulis, Kit Hung Cheng, Man Lung Yiu, and David W. Cheung Department of Computer Science University of Hong Kong Pokfulam Road Hong Kong {nikos,khcheng,mlyiu2,dcheung}@cs.hku.hk
More informationCSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find. Lauren Milne Spring 2015
CSE: Data Structures & Algorithms Lecture : Implementing Union-Find Lauren Milne Spring 05 Announcements Homework due in ONE week Wednesday April 9 th! TA sessions Catie will be back on Monday. The plan
More informationModern Database Systems Lecture 1
Modern Database Systems Lecture 1 Aristides Gionis Michael Mathioudakis T.A.: Orestis Kostakis Spring 2016 logistics assignment will be up by Monday (you will receive email) due Feb 12 th if you re not
More informationPredictive Indexing for Fast Search
Predictive Indexing for Fast Search Sharad Goel, John Langford and Alex Strehl Yahoo! Research, New York Modern Massive Data Sets (MMDS) June 25, 2008 Goel, Langford & Strehl (Yahoo! Research) Predictive
More informationIn this paper we consider probabilistic algorithms for that task. Each processor is equipped with a perfect source of randomness, and the processor's
A lower bound on probabilistic algorithms for distributive ring coloring Moni Naor IBM Research Division Almaden Research Center San Jose, CA 9510 Abstract Suppose that n processors are arranged in a ring
More informationComparison of of parallel and random approach to
Comparison of of parallel and random approach to acandidate candidatelist listininthe themultifeature multifeaturequerying Peter Gurský Peter Gurský Institute of Computer Science, Faculty of Science Institute
More informationBalanced Trees Part Two
Balanced Trees Part Two Outline for Today Recap from Last Time Review of B-trees, 2-3-4 trees, and red/black trees. Order Statistic Trees BSTs with indexing. Augmented Binary Search Trees Building new
More informationClass Note #02. [Overall Information] [During the Lecture]
Class Note #02 Date: 01/11/2006 [Overall Information] In this class, after a few additional announcements, we study the worst-case running time of Insertion Sort. The asymptotic notation (also called,
More informationVannevar Bush. Information Retrieval. Prophetic: Hypertext. Historic Vision 2/8/17
Information Retrieval Vannevar Bush Director of the Office of Scientific Research and Development (1941-1947) Vannevar Bush,1890-1974 End of WW2 - what next big challenge for scientists? 1 Historic Vision
More informationCS 4349 Lecture October 18th, 2017
CS 4349 Lecture October 18th, 2017 Main topics for #lecture include #minimum_spanning_trees. Prelude Homework 6 due today. Homework 7 due Wednesday, October 25th. Homework 7 has one normal homework problem.
More informationLecture 7: Asymmetric K-Center
Advanced Approximation Algorithms (CMU 18-854B, Spring 008) Lecture 7: Asymmetric K-Center February 5, 007 Lecturer: Anupam Gupta Scribe: Jeremiah Blocki In this lecture, we will consider the K-center
More informationCombining Fuzzy Information from Multiple Systems*
Journal of Computer and ystem ciences 58, 8399 (1999) Article ID jcss.1998.1600, available online at http:www.idealibrary.com on Combining Fuzzy Information from Multiple ystems* Ronald Fagin - IBM Almaden
More informationThe hierarchical model for load balancing on two machines
The hierarchical model for load balancing on two machines Orion Chassid Leah Epstein Abstract Following previous work, we consider the hierarchical load balancing model on two machines of possibly different
More informationSpeeding up Queries in a Leaf Image Database
1 Speeding up Queries in a Leaf Image Database Daozheng Chen May 10, 2007 Abstract We have an Electronic Field Guide which contains an image database with thousands of leaf images. We have a system which
More informationAnnouncements. Homework 4. Project 3. Due tonight at 11:59pm. Due 3/8 at 4:00pm
Announcements Homework 4 Due tonight at 11:59pm Project 3 Due 3/8 at 4:00pm CS 188: Artificial Intelligence Constraint Satisfaction Problems Instructor: Stuart Russell & Sergey Levine, University of California,
More informationOptimizing Access Cost for Top-k Queries over Web Sources: A Unified Cost-based Approach
UIUC Technical Report UIUCDCS-R-03-2324, UILU-ENG-03-1711. March 03 (Revised March 04) Optimizing Access Cost for Top-k Queries over Web Sources A Unified Cost-based Approach Seung-won Hwang and Kevin
More informationSpecifying and Proving Broadcast Properties with TLA
Specifying and Proving Broadcast Properties with TLA William Hipschman Department of Computer Science The University of North Carolina at Chapel Hill Abstract Although group communication is vitally important
More information/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17
601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 5.1 Introduction You should all know a few ways of sorting in O(n log n)
More informationAnnouncement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17
Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa
More informationScribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017
CS6 Lecture 4 Greedy Algorithms Scribe: Virginia Williams, Sam Kim (26), Mary Wootters (27) Date: May 22, 27 Greedy Algorithms Suppose we want to solve a problem, and we re able to come up with some recursive
More informationLecture 7: Efficient Collections via Hashing
Lecture 7: Efficient Collections via Hashing These slides include material originally prepared by Dr. Ron Cytron, Dr. Jeremy Buhler, and Dr. Steve Cole. 1 Announcements Lab 6 due Friday Lab 7 out tomorrow
More informationTable of Contents. Course Minutiae. Course Overview Algorithm Design Strategies Algorithm Correctness Asymptotic Analysis 2 / 32
Intro Lecture CS 584/684: Algorithm Design and Analysis Daniel Leblanc1 1 Senior Adjunct Instructor Portland State University Maseeh College of Engineering and Computer Science Spring 2018 1 / 32 2 / 32
More informationEfficient Top-K Problem Solvings for More Users in Tree-Oriented Data Structures
Efficient Top-K Problem Solvings for More Users in Tree-Oriented Data Structures Matúš Ondreička Faculty of Mathematics and Physics Department of Software Engineering Charles University in Prague, Czech
More information/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18
601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 22.1 Introduction We spent the last two lectures proving that for certain problems, we can
More informationCourse : Data mining
Course : Data mining Lecture : Mining data streams Aristides Gionis Department of Computer Science Aalto University visiting in Sapienza University of Rome fall 2016 reading assignment LRU book: chapter
More informationCSE 21 Spring 2016 Homework 5. Instructions
CSE 21 Spring 2016 Homework 5 Instructions Homework should be done in groups of one to three people. You are free to change group members at any time throughout the quarter. Problems should be solved together,
More informationCSCI 136 Data Structures & Advanced Programming. Lecture 7 Spring 2018 Bill and Jon
CSCI 136 Data Structures & Advanced Programming Lecture 7 Spring 2018 Bill and Jon Administrative Details Lab 3 Wednesday! You may work with a partner Fill out Lab 3 Partners Google form either way! Come
More informationData Structure and Algorithm Homework #3 Due: 2:20pm, Tuesday, April 9, 2013 TA === Homework submission instructions ===
Data Structure and Algorithm Homework #3 Due: 2:20pm, Tuesday, April 9, 2013 TA email: dsa1@csientuedutw === Homework submission instructions === For Problem 1, submit your source code, a Makefile to compile
More informationLECTURE 18 LECTURE OUTLINE
LECTURE 18 LECTURE OUTLINE Generalized polyhedral approximation methods Combined cutting plane and simplicial decomposition methods Lecture based on the paper D. P. Bertsekas and H. Yu, A Unifying Polyhedral
More informationA NOVEL APPROACH ON SPATIAL OBJECTS FOR OPTIMAL ROUTE SEARCH USING BEST KEYWORD COVER QUERY
A NOVEL APPROACH ON SPATIAL OBJECTS FOR OPTIMAL ROUTE SEARCH USING BEST KEYWORD COVER QUERY S.Shiva Reddy *1 P.Ajay Kumar *2 *12 Lecterur,Dept of CSE JNTUH-CEH Abstract Optimal route search using spatial
More informationClustering. (Part 2)
Clustering (Part 2) 1 k-means clustering 2 General Observations on k-means clustering In essence, k-means clustering aims at minimizing cluster variance. It is typically used in Euclidean spaces and works
More informationExact and Approximate Generic Multi-criteria Top-k Query Processing
Exact and Approximate Generic Multi-criteria Top-k Query Processing Mehdi Badr, Dan Vodislav To cite this version: Mehdi Badr, Dan Vodislav. Exact and Approximate Generic Multi-criteria Top-k Query Processing.
More informationIntroduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14
600.363 Introduction to Algorithms / 600.463 Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 23.1 Introduction We spent last week proving that for certain problems,
More informationTreaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19
CSE34T/CSE549T /05/04 Lecture 9 Treaps Binary Search Trees (BSTs) Search trees are tree-based data structures that can be used to store and search for items that satisfy a total order. There are many types
More informationSelective-NRA Algorithms for Top-k Queries
Selective-NRA Algorithms for Top- Queries Jing Yuan, Guang-Zhong Sun, Ye Tian, Guoliang Chen, and Zhi Liu MOE-MS Key Laboratory of Multimedia Computing and Communication, Department of Computer Science
More informationParallel and Sequential Data Structures and Algorithms Lecture (Spring 2012) Lecture 16 Treaps; Augmented BSTs
Lecture 16 Treaps; Augmented BSTs Parallel and Sequential Data Structures and Algorithms, 15-210 (Spring 2012) Lectured by Margaret Reid-Miller 8 March 2012 Today: - More on Treaps - Ordered Sets and Tables
More informationOptimal algorithms for selecting top-k combinations of attributes: theory and applications
The VLDB Journal DOI 10.1007/s00778-017-0485-2 REGULAR PAPER Optimal algorithms for selecting top-k combinations of attributes: theory and applications Chunbin Lin 1 Jiaheng Lu 2 Zhewei Wei 3 Jianguo Wang
More informationLecture 7 February 26, 2010
6.85: Advanced Data Structures Spring Prof. Andre Schulz Lecture 7 February 6, Scribe: Mark Chen Overview In this lecture, we consider the string matching problem - finding all places in a text where some
More information6.856 Randomized Algorithms
6.856 Randomized Algorithms David Karger Handout #4, September 21, 2002 Homework 1 Solutions Problem 1 MR 1.8. (a) The min-cut algorithm given in class works because at each step it is very unlikely (probability
More informationEfficient Top-k Aggregation of Ranked Inputs
Efficient Top-k Aggregation of Ranked Inputs NIKOS MAMOULIS University of Hong Kong MAN LUNG YIU Aalborg University KIT HUNG CHENG University of Hong Kong and DAVID W. CHEUNG University of Hong Kong A
More informationMulti-objective Query Processing for Database Systems
Multi-objective Query Processing for Database Systems Wolf-Tilo Balke Computer Science Department University of California Berkeley, CA, USA balke@eecs.berkeley.edu Abstract Query processing in database
More informationAtCoder World Tour Finals 2019
AtCoder World Tour Finals 201 writer: rng 58 February 21st, 2018 A: Magic Suppose that the magician moved the treasure in the order y 1 y 2 y K+1. Here y i y i+1 for each i because it doesn t make sense
More informationSome material taken from: Yuri Boykov, Western Ontario
CS664 Lecture #22: Distance transforms, Hausdorff matching, flexible models Some material taken from: Yuri Boykov, Western Ontario Announcements The SIFT demo toolkit is available from http://www.evolution.com/product/oem/d
More informationAdvanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016
Advanced Topics in Information Retrieval Learning to Rank Vinay Setty vsetty@mpi-inf.mpg.de Jannik Strötgen jannik.stroetgen@mpi-inf.mpg.de ATIR July 14, 2016 Before we start oral exams July 28, the full
More informationFeedback Week 4 - Problem Set
4/26/13 Homework Feedback Introduction to Cryptography Feedback Week 4 - Problem Set You submitted this homework on Mon 17 Dec 2012 11:40 PM GMT +0000. You got a score of 10.00 out of 10.00. Question 1
More informationLeveraging Transitive Relations for Crowdsourced Joins*
Leveraging Transitive Relations for Crowdsourced Joins* Jiannan Wang #, Guoliang Li #, Tim Kraska, Michael J. Franklin, Jianhua Feng # # Department of Computer Science, Tsinghua University, Brown University,
More informationCompetitive analysis of aggregate max in windowed streaming. July 9, 2009
Competitive analysis of aggregate max in windowed streaming Elias Koutsoupias University of Athens Luca Becchetti University of Rome July 9, 2009 The streaming model Streaming A stream is a sequence of
More informationInformation Retrieval Rank aggregation. Luca Bondi
Rank aggregation Luca Bondi Motivations 2 Metasearch For a given query, combine the results from different search engines Combining ranking functions Text, links, anchor text, page title, etc. Comparing
More informationNotes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 3 Notes. Class URL:
Notes slides from before lecture CSE 21, Winter 2017, Section A00 Lecture 3 Notes Class URL: http://vlsicad.ucsd.edu/courses/cse21-w17/ Notes slides from before lecture Notes January 18 (1) HW2 has been
More informationMidterm 2. Read all of the following information before starting the exam:
Midterm 2 ECE 608 April 7, 2004, 7-9pm Name: Read all of the following information before starting the exam: NOTE: Unanswered questions are worth 30% credit, rounded down. Writing any answer loses this
More informationDistributed Computing over Communication Networks: Leader Election
Distributed Computing over Communication Networks: Leader Election Motivation Reasons for electing a leader? Reasons for not electing a leader? Motivation Reasons for electing a leader? Once elected, coordination
More informationOnline Algorithms. - Lecture 4 -
Online Algorithms - Lecture 4 - Outline Quick recap.. The Cashing Problem Randomization in Online Algorithms Other views to Online Algorithms The Ski-rental problem The Parking Permit Problem 2 The Caching
More informationmodern database systems lecture 4 : information retrieval
modern database systems lecture 4 : information retrieval Aristides Gionis Michael Mathioudakis spring 2016 in perspective structured data relational data RDBMS MySQL semi-structured data data-graph representation
More informationApproximation Algorithms
Approximation Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A 4 credit unit course Part of Theoretical Computer Science courses at the Laboratory of Mathematics There will be 4 hours
More informationA Review to the Approach for Transformation of Data from MySQL to NoSQL
A Review to the Approach for Transformation of Data from MySQL to NoSQL Monika 1 and Ashok 2 1 M. Tech. Scholar, Department of Computer Science and Engineering, BITS College of Engineering, Bhiwani, Haryana
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part VI Lecture 17, March 24, 2015 Mohammad Hammoud Today Last Two Sessions: DBMS Internals- Part V External Sorting How to Start a Company in Five (maybe
More informationApproximate Linear Programming for Average-Cost Dynamic Programming
Approximate Linear Programming for Average-Cost Dynamic Programming Daniela Pucci de Farias IBM Almaden Research Center 65 Harry Road, San Jose, CA 51 pucci@mitedu Benjamin Van Roy Department of Management
More information1. Introduction. performance of numerical methods. complexity bounds. structural convex optimization. course goals and topics
1. Introduction EE 546, Univ of Washington, Spring 2016 performance of numerical methods complexity bounds structural convex optimization course goals and topics 1 1 Some course info Welcome to EE 546!
More informationCost-aware top-k join algorithms
Cost-aware top-k join algorithms Stefano Ceri Davide Martinenghi Marco Tagliasacchi Dipartimento di Elettronica e Informazione Politecnico di Milano Piazza Leonardo da Vinci, 32 2033 Milano, Italy {ceri,martinen,tagliasa}@elet.polimi.it
More informationAnswering Top K Queries Efficiently with Overlap in Sources and Source Paths
Answering Top K Queries Efficiently with Overlap in Sources and Source Paths Louiqa Raschid University of Maryland louiqa@umiacs.umd.edu María Esther Vidal Universidad Simón Bolívar mvidal@ldc.usb.ve Yao
More informationAnd Now to Something Completely Different: Finding Roots of Real Valued Functions
And Now to Something Completely Different: Finding Roots of Real Valued Functions Four other Oysters followed them, And yet another four; And thick and fast they came at last, And more, and more, and more{
More informationDistributed Algorithms 6.046J, Spring, Nancy Lynch
Distributed Algorithms 6.046J, Spring, 205 Nancy Lynch What are Distributed Algorithms? Algorithms that run on networked processors, or on multiprocessors that share memory. They solve many kinds of problems:
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #6: Mining Data Streams Seoul National University 1 Outline Overview Sampling From Data Stream Queries Over Sliding Window 2 Data Streams In many data mining situations,
More informationThe Rainbow Connection of a Graph Is (at Most) Reciprocal to Its Minimum Degree
The Rainbow Connection of a Graph Is (at Most) Reciprocal to Its Minimum Degree Michael Krivelevich 1 and Raphael Yuster 2 1 SCHOOL OF MATHEMATICS, TEL AVIV UNIVERSITY TEL AVIV, ISRAEL E-mail: krivelev@post.tau.ac.il
More informationFlexible Coloring. Xiaozhou Li a, Atri Rudra b, Ram Swaminathan a. Abstract
Flexible Coloring Xiaozhou Li a, Atri Rudra b, Ram Swaminathan a a firstname.lastname@hp.com, HP Labs, 1501 Page Mill Road, Palo Alto, CA 94304 b atri@buffalo.edu, Computer Sc. & Engg. dept., SUNY Buffalo,
More information2.993: Principles of Internet Computing Quiz 1. Network
2.993: Principles of Internet Computing Quiz 1 2 3:30 pm, March 18 Spring 1999 Host A Host B Network 1. TCP Flow Control Hosts A, at MIT, and B, at Stanford are communicating to each other via links connected
More informationISSUES IN SPATIAL DATABASES AND GEOGRAPHICAL INFORMATION SYSTEMS (GIS) HANAN SAMET
zk0 ISSUES IN SPATIAL DATABASES AND GEOGRAPHICAL INFORMATION SYSTEMS (GIS) HANAN SAMET COMPUTER SCIENCE DEPARTMENT AND CENTER FOR AUTOMATION RESEARCH AND INSTITUTE FOR ADVANCED COMPUTER STUDIES UNIVERSITY
More informationCSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators)
Name: Email address: Quiz Section: CSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators) Instructions: Read the directions for each question carefully before answering. We will
More informationEvaluating Top-N Queries in n-dimensional Normed Spaces
Evaluating Top-N Queries in n-dimensional Normed Spaces Liang Zhu 1, Feifei Liu 1, Weiyi Meng 2, Qin Ma 3, Yu Wang 1, Fang Yuan 4 1 Intelligent Database Laboratory, School of Computer Science and Technology,
More informationPeter Gurský. Institute of Computer Science, Faculty of Science.
Towards TowardsBetter better Semantics semantics in in the the multifeature Multifeature Querying querying Peter Gurský Peter Gurský Institute of Computer Science, Faculty of Science Institute of P.J.Šafárik
More informationWhat is an algorithm?
Reminders CS 142 Lecture 3 Analysis, ADTs & Objects Program 1 was assigned - Due on 1/27 by 11:55pm 2 Abstraction Measuring Algorithm Efficiency When you utilize the mylist.index(item) function you are
More informationInformation Retrieval CSCI
Information Retrieval CSCI 4141-6403 My name is Anwar Alhenshiri My email is: anwar@cs.dal.ca I prefer: aalhenshiri@gmail.com The course website is: http://web.cs.dal.ca/~anwar/ir/main.html 5/6/2012 1
More informationarxiv:cs/ v1 [cs.cc] 28 Apr 2003
ICM 2002 Vol. III 1 3 arxiv:cs/0304039v1 [cs.cc] 28 Apr 2003 Approximation Thresholds for Combinatorial Optimization Problems Uriel Feige Abstract An NP-hard combinatorial optimization problem Π is said
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationLectures 6+7: Zero-Leakage Solutions
Lectures 6+7: Zero-Leakage Solutions Contents 1 Overview 1 2 Oblivious RAM 1 3 Oblivious RAM via FHE 2 4 Oblivious RAM via Symmetric Encryption 4 4.1 Setup........................................ 5 4.2
More informationDe-identifying Facial Images using k-anonymity
De-identifying Facial Images using k-anonymity Ori Brostovski March 2, 2008 Outline Introduction General notions Our Presentation Basic terminology Exploring popular de-identification algorithms Examples
More informationNondeterministic Query Algorithms
Journal of Universal Computer Science, vol. 17, no. 6 (2011), 859-873 submitted: 30/7/10, accepted: 17/2/11, appeared: 28/3/11 J.UCS Nondeterministic Query Algorithms Alina Vasilieva (Faculty of Computing,
More informationMergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: April 1, 2015
CS161, Lecture 2 MergeSort, Recurrences, Asymptotic Analysis Scribe: Michael P. Kim Date: April 1, 2015 1 Introduction Today, we will introduce a fundamental algorithm design paradigm, Divide-And-Conquer,
More informationA Mathematical Proof. Zero Knowledge Protocols. Interactive Proof System. Other Kinds of Proofs. When referring to a proof in logic we usually mean:
A Mathematical Proof When referring to a proof in logic we usually mean: 1. A sequence of statements. 2. Based on axioms. Zero Knowledge Protocols 3. Each statement is derived via the derivation rules.
More information