Optimal Tracking of Distributed Heavy Hitters and Quantiles
|
|
- Rafe Eaton
- 5 years ago
- Views:
Transcription
1 Optimal Tracking of Distributed Heavy Hitters and Quantiles Qin Zhang Hong Kong University of Science & Technology Joint work with Ke Yi PODS 2009 June 30,
2 2- Models and Problems
3 Distributed streaming model [Babcock, Olston SIGMOD 2003] Alice observes A(t) by time t 3 2 Carole tries to track f(a(t) B(t)) for all t (an ɛ-approximation is usually allowed) Bob observes B(t) by time t A(t), B(t) : multisets 3-
4 Distributed streaming model [Babcock, Olston SIGMOD 2003] Alice observes A(t) by time t 3 2 Carole tries to track f(a(t) B(t)) for all t (an ɛ-approximation is usually allowed) Bob observes B(t) by time t Find an (approximate) tracking protocol while minimizing communication A(t), B(t) : multisets 3-2
5 Distributed streaming model [Babcock, Olston SIGMOD 2003] Extend to k sites A (t) 3 2 Coordinator f(a (t) A 2 (t)... A k (t)) Site Site A 2 (t) Site Find a protocol to minimize communication. 3-3
6 Combination of two models X Y (One-shot) communication model f(x Y ); minimize communication 4-
7 Combination of two models X Y (One-shot) communication model f(x Y ); minimize communication 4-2
8 Combination of two models X Y A(t) Streaming model f(a(t)); minimize space (One-shot) communication model f(x Y ); minimize communication 4-3
9 Combination of two models X Y A(t) Streaming model f(a(t)); minimize space (One-shot) communication model f(x Y ); minimize communication Distributed streaming model a.k.a. Continuous commun. model Only interested in communication cost 4-4
10 Applied motivation: distributed monitoring C Large-scale distributed holistic querying/monitoring. Picture from the tutorial Streaming in a connected world: Querying and tracking distributed data streams at VLDB 06 and SIGMOD 07 5-
11 Three different fs Heavy hitters: For a multiset A, item i is a heavy hitter if (count of i) > φ A non-heavy hitter if (count of i) < (φ ɛ) A Application: Find out IP addresses that appear % over a network 6-
12 Three different fs Heavy hitters: For a multiset A, item i is a heavy hitter if (count of i) > φ A non-heavy hitter if (count of i) < (φ ɛ) A Application: Find out IP addresses that appear % over a network Single Quantile: Median, /4-quantile. Application: The median size of packets over a network. 6-2
13 Three different fs Heavy hitters: For a multiset A, item i is a heavy hitter if (count of i) > φ A non-heavy hitter if (count of i) < (φ ɛ) A Application: Find out IP addresses that appear % over a network Single Quantile: Median, /4-quantile. Application: The median size of packets over a network. All Quantiles: A is a set of distinct elements from a totally ordered universe. Quantile of x = y < x, y A / A, usually allows a ±ɛ error. Need a data structure to extract ɛ-approximate quantile for any x. 6-3
14 7- Previous and our results
15 Previous results in distributed streaming model Frequent moments (F p = i (count of i)p ) Total count F Simple: each site sends an update every time its local count has increased by a ( + ɛ) factor Complexity O(k/ɛ log n) n: total # items received at all sites, ɛ: relative error 8-
16 Previous results in distributed streaming model Frequent moments (F p = i (count of i)p ) Total count F Simple: each site sends an update every time its local count has increased by a ( + ɛ) factor Complexity O(k/ɛ log n) n: total # items received at all sites, ɛ: relative error Distinct count F 0, Second frequency moment F 2 F 0 : O(k/ɛ 2 log n log(n/δ)) [Cormode, Muthukrishnan, Yi, SODA 08] F 2 : O((k 2 /ɛ 2 + k 3/2 /ɛ 4 ) log n log(n/δ) [same paper] 8-2
17 Previous results in distributed streaming model Frequent moments (F p = i (count of i)p ) Total count F Simple: each site sends an update every time its local count has increased by a ( + ɛ) factor Complexity O(k/ɛ log n) n: total # items received at all sites, ɛ: relative error Distinct count F 0, Second frequency moment F 2 F 0 : O(k/ɛ 2 log n log(n/δ)) [Cormode, Muthukrishnan, Yi, SODA 08] F 2 : O((k 2 /ɛ 2 + k 3/2 /ɛ 4 ) log n log(n/δ) [same paper] Entropy of the stream Shannon entropy and related entropies [Arackaparambil, Brody, Chakrabarti, ICALP 09] 8-3
18 Previous results: Heavy Hitters Streaming model (space complexity) O(/ɛ), [Karp, Shenker, Papadimitriou TODS 03], [Demaine, Munro, Lopez-Ortiz, ESA 02], [Metwally, Agrawal, Abbadi TODS 06], [Misra, Gries 982] 9-
19 Previous results: Heavy Hitters Streaming model (space complexity) O(/ɛ), [Karp, Shenker, Papadimitriou TODS 03], [Demaine, Munro, Lopez-Ortiz, ESA 02], [Metwally, Agrawal, Abbadi TODS 06], [Misra, Gries 982] One-shot communication model O(k/ɛ) - easy 9-2
20 Previous results: Heavy Hitters Streaming model (space complexity) O(/ɛ), [Karp, Shenker, Papadimitriou TODS 03], [Demaine, Munro, Lopez-Ortiz, ESA 02], [Metwally, Agrawal, Abbadi TODS 06], [Misra, Gries 982] One-shot communication model O(k/ɛ) - easy Distributed streaming model Heuristics [e.g. Babcock, Olston, SIGMOD 03] 9-3
21 Previous results: All Quantiles Streaming model (space complexity) O(/ɛ log(ɛn)) [Greenwald, Khanna, SIGMOD 0] O(/ɛ log(/δ)), with failure probability δ [Cormode, Muthukrishnan, J. Alg 05] 0-
22 Previous results: All Quantiles Streaming model (space complexity) O(/ɛ log(ɛn)) [Greenwald, Khanna, SIGMOD 0] O(/ɛ log(/δ)), with failure probability δ [Cormode, Muthukrishnan, J. Alg 05] One-shot communication model O(k/ɛ) - easy 0-2
23 Previous results: All Quantiles Streaming model (space complexity) O(/ɛ log(ɛn)) [Greenwald, Khanna, SIGMOD 0] O(/ɛ log(/δ)), with failure probability δ [Cormode, Muthukrishnan, J. Alg 05] One-shot communication model O(k/ɛ) - easy Distributed streaming model O(k/ɛ 2 log n) [Cormode, Garofalakis, Muthukrishnan, Rastogi, SIGMOD 05] 0-3
24 Our results Tracking heavy hitters Complexity: Θ(k/ɛ log n) -
25 Our results Tracking heavy hitters Complexity: Θ(k/ɛ log n) Tracking one quantile (in particular: Median) Complexity: Θ(k/ɛ log n) -2
26 Our results Tracking heavy hitters Complexity: Θ(k/ɛ log n) Tracking one quantile (in particular: Median) Complexity: Θ(k/ɛ log n) Tracking all quantiles Upper bound: O(k/ɛ log 2 (/ɛ) log n) -3
27 2- Technical details
28 Tracking Heavy Hitters Divide into multiple rounds: Every time m increase by a factor of + ɛ. m: total # elements inserted 3-
29 Tracking Heavy Hitters Divide into multiple rounds: Every time m increase by a factor of + ɛ. m: total # elements inserted In each round Compute total count m: O(k) messages Broadcast m: O(k) messages Each site: For each i, sends out a message every time count(i) increases by ɛm/(2k) Sends out a message every time total local count increases by ɛm/(2k) Coordinator: Terminate the round when O(k) messages have been received 3-2
30 Tracking Heavy Hitters Divide into multiple rounds: Every time m increase by a factor of + ɛ. m: total # elements inserted In each round Compute total count m: O(k) messages Broadcast m: O(k) messages Each site: For each i, sends out a message every time count(i) increases by ɛm/(2k) Sends out a message every time total local count increases by ɛm/(2k) Coordinator: Terminate the round when O(k) messages have been received Track the count of each element with error at most ɛm 3-3
31 Tracking Heavy Hitters Total messages in one round: O(k) # rounds: O(log +ɛ n) = O(log n/ɛ) Divide into multiple rounds: Every time m increase by a factor of + ɛ. m: total # elements inserted In each round Compute total count m: O(k) messages Broadcast m: O(k) messages Each site: For each i, sends out a message every time count(i) increases by ɛm/(2k) Sends out a message every time total local count increases by ɛm/(2k) Coordinator: Terminate the round when O(k) messages have been received Track the count of each element with error at most ɛm 3-4
32 Tracking Heavy Hitters: lower bound Construct an input where the set of heavy hitters undergoes Ω(log n/ɛ) updates In each update, one item changes from nonheavy to heavy Argue that Ω(k) sites need to be contacted in order to correctly identify this item at the right time 4-
33 Tracking the Median Divide tracking period into rounds: m (roughly) doubles in each round In each round Maintain a dynamic set of intervals each has ɛ 8 m to ɛ m items: # messages O(k/ɛ) in each 2 round each leaf contains ɛm/8 ɛm/2 elements, maintain an approximate count pivot element 5-
34 Tracking the Median Divide tracking period into rounds: m (roughly) doubles in each round In each round Maintain a dynamic set of intervals each has ɛ 8 m to ɛ m items: # messages O(k/ɛ) in each 2 round After Θ(ɛm) items, recalculate the median from the old median by using the dynamic set of intervals. # messages O(k) per recalculation, O(/ɛ) recalculations in each round 5-2
35 Tracking the Median Divide tracking period into rounds: m (roughly) doubles in each round In each round Maintain a dynamic set of intervals each has ɛ Total messages in one round: O(k/ɛ) # rounds: O(log n) 8 m to ɛ m items: # messages O(k/ɛ) in each 2 round After Θ(ɛm) items, recalculate the median from the old median by using the dynamic set of intervals. # messages O(k) per recalculation, O(/ɛ) recalculations in each round 5-3
36 Tracking all Quantiles approximate count with absolute error < ɛm/ log(/ɛ) A balanced tree approximate median: either half contains at least /4 of the elements each leaf contains Θ(ɛm) elements structure can be used to extract the rank of any x with absolute error < ɛm quantile with error < ɛ 6-
37 Tracking all Quantiles approximate count with absolute error < ɛm/ log(/ɛ) A balanced tree approximate median: either half contains at least /4 of the elements each leaf contains Θ(ɛm) elements Divide tracking period into rounds: m (roughly) doubles in each round 6-2
38 Tracking all Quantiles approximate count with absolute error < ɛm/ log(/ɛ) A balanced tree approximate median: either half contains at least /4 of the elements each leaf contains Θ(ɛm) elements Divide tracking period into rounds: m (roughly) doubles in each round At the beginning of each round, initialize the structure with O(k/ɛ) communication, then broadcast. 6-3
39 Tracking all Quantiles approximate count with absolute error < ɛm/ log(/ɛ) A balanced tree approximate median: either half contains at least /4 of the elements each leaf contains Θ(ɛm) elements Divide tracking period into rounds: m (roughly) doubles in each round At the beginning of each round, initialize the structure with O(k/ɛ) communication, then broadcast. Each site tracks each interval, sends a message for every Θ(ɛm/(k log(/ɛ))) new elements arrive. Total: O(k/ɛ log 2 (/ɛ)) 6-4
40 Tracking all Quantiles approximate count with absolute error < ɛm/ log(/ɛ) A balanced tree approximate median: either half contains at least /4 of the elements each leaf contains Θ(ɛm) elements Divide tracking period into rounds: m (roughly) doubles in each round At the beginning of each round, initialize the structure with O(k/ɛ) communication, then broadcast. Each site tracks each interval, sends a message for every Θ(ɛm/(k log(/ɛ))) new elements arrive. Total: O(k/ɛ log 2 (/ɛ)) Maintain the balance of the tree. Total: O(k/ɛ log(/ɛ)) 6-5
41 Remarks and open problems Focused on communication only, but the algorithms can be implemented with small space O(/ɛ) each site for heavy hitter O(/ɛ log(ɛn)) each site for quantiles 7-
42 Remarks and open problems Focused on communication only, but the algorithms can be implemented with small space O(/ɛ) each site for heavy hitter O(/ɛ log(ɛn)) each site for quantiles Randomized algorithms? There is an upperbound O ( (k + /ɛ 2 ) poly log(n, k, /ɛ) ) 7-2
43 Remarks and open problems Focused on communication only, but the algorithms can be implemented with small space O(/ɛ) each site for heavy hitter O(/ɛ log(ɛn)) each site for quantiles Randomized algorithms? There is an upperbound O ( (k + /ɛ 2 ) poly log(n, k, /ɛ) ) How about deletions? One site, arbitrary function in Z d, competitive analysis. (Yi and Zhang SODA 2009) 7-3
44 Remarks and open problems Focused on communication only, but the algorithms can be implemented with small space O(/ɛ) each site for heavy hitter O(/ɛ log(ɛn)) each site for quantiles Randomized algorithms? There is an upperbound O ( (k + /ɛ 2 ) poly log(n, k, /ɛ) ) How about deletions? One site, arbitrary function in Z d, competitive analysis. (Yi and Zhang SODA 2009) Other tracking problems
45 The End T HAN K YOU Q and A 8-
Optimal Sampling from Distributed Streams
Optimal Sampling from Distributed Streams Graham Cormode AT&T Labs-Research Joint work with S. Muthukrishnan (Rutgers) Ke Yi (HKUST) Qin Zhang (HKUST) 1-1 Reservoir sampling [Waterman??; Vitter 85] Maintain
More informationarxiv: v1 [cs.ds] 1 Dec 2008
Optimal Tracking of Distributed Heavy Hitters and Quantiles Ke Yi Qin Zhang arxiv:0812.0209v1 [cs.ds] 1 Dec 2008 Department of Computer Science and Engineering Hong Kong University of Science and Technology
More informationB561 Advanced Database Concepts. 6. Streaming Algorithms. Qin Zhang 1-1
B561 Advanced Database Concepts 6. Streaming Algorithms Qin Zhang 1-1 The model and challenge The data stream model (Alon, Matias and Szegedy 1996) a n a 2 a 1 RAM CPU Why hard? Cannot store everything.
More informationMisra-Gries Summaries
Title: Misra-Gries Summaries Name: Graham Cormode 1 Affil./Addr. Department of Computer Science, University of Warwick, Coventry, UK Keywords: streaming algorithms; frequent items; approximate counting
More informationB561 Advanced Database Concepts Streaming Model. Qin Zhang 1-1
B561 Advanced Database Concepts 2.2. Streaming Model Qin Zhang 1-1 Data Streams Continuous streams of data elements (massive possibly unbounded, rapid, time-varying) Some examples: 1. network monitoring
More informationSummarizing and mining inverse distributions on data streams via dynamic inverse sampling
Summarizing and mining inverse distributions on data streams via dynamic inverse sampling Presented by Graham Cormode cormode@bell-labs.com S. Muthukrishnan muthu@cs.rutgers.edu Irina Rozenbaum rozenbau@paul.rutgers.edu
More informationDynamic Range Majority Data Structures
Dynamic Range Majority Data Structures Amr Elmasry 1, Meng He 2, J. Ian Munro 3, and Patrick K. Nicholson 3 1 Department of Computer Science, University of Copenhagen, Denmark 2 Faculty of Computer Science,
More informationEfficient Approximation of Correlated Sums on Data Streams
Efficient Approximation of Correlated Sums on Data Streams Rohit Ananthakrishna Cornell University rohit@cs.cornell.edu Flip Korn AT&T Labs Research flip@research.att.com Abhinandan Das Cornell University
More informationDistinct random sampling from a distributed stream
Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2013 Distinct random sampling from a distributed stream Yung-Yu Chung Iowa State University Follow this and additional
More informationOne-Pass Streaming Algorithms
One-Pass Streaming Algorithms Theory and Practice Complaints and Grievances about theory in practice Disclaimer Experiences with Gigascope. A practitioner s perspective. Will be using my own implementations,
More informationCourse : Data mining
Course : Data mining Lecture : Mining data streams Aristides Gionis Department of Computer Science Aalto University visiting in Sapienza University of Rome fall 2016 reading assignment LRU book: chapter
More informationSimply Top Talkers Jeroen Massar, Andreas Kind and Marc Ph. Stoecklin
IBM Research - Zurich Simply Top Talkers Jeroen Massar, Andreas Kind and Marc Ph. Stoecklin 2009 IBM Corporation Motivation and Outline Need to understand and correctly handle dominant aspects within the
More informationRobust Lower Bounds for Communication and Stream Computation
Robust Lower Bounds for Communication and Stream Computation! Amit Chakrabarti! Dartmouth College! Graham Cormode! AT&T Labs! Andrew McGregor! UC San Diego Communication Complexity Communication Complexity
More informationDynamic Indexability and the Optimality of B-trees and Hash Tables
Dynamic Indexability and the Optimality of B-trees and Hash Tables Ke Yi Hong Kong University of Science and Technology Dynamic Indexability and Lower Bounds for Dynamic One-Dimensional Range Query Indexes,
More informationMulti-Dimensional Online Tracking
Multi-Dimensional Online Tracking Ke Yi Qin Zhang Department of Computer Science and Engineering Hong Kong University of Science and Technology Hong Kong, China {yike,qinzhang}@cse.ust.hk Abstract We propose
More informationUnivMon: Software-defined Monitoring with Universal Sketch
UnivMon: Software-defined Monitoring with Universal Sketch Zaoxing (Alan) Liu Joint work with Antonis Manousis (CMU), Greg Vorsanger(JHU), Vyas Sekar (CMU), and Vladimir Braverman(JHU) Network Management:
More informationMethods for Finding Frequent Items in Data Streams
The VLDB Journal manuscript No. (will be inserted by the editor) Methods for Finding Frequent Items in Data Streams Graham Cormode Marios Hadjieleftheriou Received: date / Accepted: date Abstract The frequent
More informationSublinear Models for Streaming and/or Distributed Data
Sublinear Models for Streaming and/or Distributed Data Qin Zhang Guest lecture in B649 Feb. 3, 2015 1-1 Now about the Big Data Big data is everywhere : over 2.5 petabytes of sales transactions : an index
More informationMerging Frequent Summaries
Merging Frequent Summaries M. Cafaro, M. Pulimeno University of Salento, Italy {massimo.cafaro, marco.pulimeno}@unisalento.it Abstract. Recently, an algorithm for merging counter-based data summaries which
More informationEfficient Strategies for Continuous Distributed Tracking Tasks
Efficient Strategies for Continuous Distributed Tracking Tasks Graham Cormode Bell Labs, Lucent Technologies cormode@bell-labs.com Minos Garofalakis Bell Labs, Lucent Technologies minos@bell-labs.com Abstract
More informationOn the Cell Probe Complexity of Dynamic Membership or
On the Cell Probe Complexity of Dynamic Membership or Can We Batch Up Updates in External Memory? Ke Yi and Qin Zhang Hong Kong University of Science & Technology SODA 2010 Jan. 17, 2010 1-1 The power
More informationTracking Frequent Items Dynamically: What s Hot and What s Not To appear in PODS 2003
Tracking Frequent Items Dynamically: What s Hot and What s Not To appear in PODS 2003 Graham Cormode graham@dimacs.rutgers.edu dimacs.rutgers.edu/~graham S. Muthukrishnan muthu@cs.rutgers.edu Everyday
More informationFinding Frequent Items in Data Streams
Finding Frequent Items in Data Streams Graham Cormode Marios Hadjieleftheriou AT&T Labs Research, Florham Park, NJ {graham,marioh}@research.att.com ABSTRACT The frequent items problem is to process a stream
More informationSketching Asynchronous Streams Over a Sliding Window
Sketching Asynchronous Streams Over a Sliding Window Srikanta Tirthapura (Iowa State University) Bojian Xu (Iowa State University) Costas Busch (Rensselaer Polytechnic Institute) 1/32 Data Stream Processing
More informationUniversity of Waterloo Waterloo, Ontario, Canada N2L 3G1 Cambridge, MA, USA Introduction. Abstract
Finding Frequent Items in Sliding Windows with Multinomially-Distributed Item Frequencies (University of Waterloo Technical Report CS-2004-06, January 2004) Lukasz Golab, David DeHaan, Alejandro López-Ortiz
More informationStreaming Algorithms for Matching Size in Sparse Graphs
Streaming Algorithms for Matching Size in Sparse Graphs Graham Cormode g.cormode@warwick.ac.uk Joint work with S. Muthukrishnan (Rutgers), Morteza Monemizadeh (Rutgers Amazon) Hossein Jowhari (Warwick
More informationCAM Conscious Integrated Answering of Frequent Elements and Top-k Queries over Data Streams
CAM Conscious Integrated Answering of Frequent Elements and Top-k Queries over Data Streams ABSTRACT Sudipto Das Divyakant Agrawal Amr El Abbadi Department of Computer Science University of California,
More informationAlgorithms for data streams
Intro Puzzles Sampling Sketches Graphs Lower bounds Summing up Thanks! Algorithms for data streams Irene Finocchi finocchi@di.uniroma1.it http://www.dsi.uniroma1.it/ finocchi/ May 9, 2012 1 / 99 Irene
More informationAn improved data stream summary: the count-min sketch and its applications
Journal of Algorithms 55 (2005) 58 75 www.elsevier.com/locate/jalgor An improved data stream summary: the count-min sketch and its applications Graham Cormode a,,1, S. Muthukrishnan b,2 a Center for Discrete
More informationAn Efficient Transformation for Klee s Measure Problem in the Streaming Model
An Efficient Transformation for Klee s Measure Problem in the Streaming Model Gokarna Sharma, Costas Busch, Ramachandran Vaidyanathan, Suresh Rai, and Jerry Trahan Louisiana State University Outline of
More informationSketching Probabilistic Data Streams
Sketching Probabilistic Data Streams Graham Cormode AT&T Labs - Research graham@research.att.com Minos Garofalakis Yahoo! Research minos@acm.org Challenge of Uncertain Data Many applications generate data
More informationClustering from Data Streams
Clustering from Data Streams João Gama LIAAD-INESC Porto, University of Porto, Portugal jgama@fep.up.pt 1 Introduction 2 Clustering Micro Clustering 3 Clustering Time Series Growing the Structure Adapting
More informationComputing Frequent Elements Using Gossip
Computing Frequent Elements Using Gossip Bibudh Lahiri and Srikanta Tirthapura Iowa State University, Ames, IA, 50011, USA {bibudh,snt}@iastate.edu Abstract. We present algorithms for identifying frequently
More informationDuplicate Detection in Click Streams
Duplicate Detection in Click Streams Ahmed Metwally Divyakant Agrawal Amr El Abbadi Department of Computer Science University of California, Santa Barbara {metwally, agrawal, amr}@cs.ucsb.edu Abstract
More informationTracking Most Frequent Items Dynamically
What s Hot and What s Not: Tracking Most Frequent Items Dynamically GRAHAM CORMODE DIMACS Center, Rutgers University and S. MUTHUKRISHNAN Department of Computer Science, Rutgers University Most database
More informationEfficiently Summarizing Data Streams over Sliding Windows
Efficiently Summarizing Data Streams over Sliding Windows Nicoló Rivetti, Yann Busnel, Achour Mostefaoui To cite this version: Nicoló Rivetti, Yann Busnel, Achour Mostefaoui. Efficiently Summarizing Data
More informationHeavy-Hitter Detection Entirely in the Data Plane
Heavy-Hitter Detection Entirely in the Data Plane VIBHAALAKSHMI SIVARAMAN SRINIVAS NARAYANA, ORI ROTTENSTREICH, MUTHU MUTHUKRSISHNAN, JENNIFER REXFORD 1 Heavy Hitter Flows Flows above a certain threshold
More informationFinding the Frequent Items in Streams of Data By Graham Cormode and Marios Hadjieleftheriou
Finding the Frequent Items in Streams of Data By Graham Cormode and Marios Hadjieleftheriou doi:1.1145/1562764.1562789 Abstract The frequent items problem is to process a stream otems and find all those
More informationStreaming algorithms for embedding and computing edit distance
Streaming algorithms for embedding and computing edit distance in the low distance regime Diptarka Chakraborty joint work with Elazar Goldenberg and Michal Koucký 7 November, 2015 Outline 1 Introduction
More informationSketching and streaming algorithms for processing massive data
Sketching and streaming algorithms for processing massive data The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Nelson,
More informationEfficiently Summarizing Distributed Data Streams over Sliding Windows
Efficiently Summarizing Distributed Data Streams over Sliding Windows Nicolò Rivetti, Yann Busnel, Achour Mostefaoui To cite this version: Nicolò Rivetti, Yann Busnel, Achour Mostefaoui. Efficiently Summarizing
More informationMining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams
Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06
More informationEfficient Quantile Retrieval on Multi-Dimensional Data
Efficient Quantile Retrieval on Multi-Dimensional Data Man Lung Yiu 1, Nikos Mamoulis 1, and Yufei Tao 2 1 Department of Computer Science, University of Hong Kong, Pokfulam Road, Hong Kong, {mlyiu2,nikos}@cs.hku.hk
More informationMergeable Summaries. Pankaj K. Agarwal. Zengfeng Huang. Graham Cormode. Ke Yi. Jeff M. Phillips. Zhewei Wei ABSTRACT
Mergeable Summaries Pankaj K. Agarwal Duke University Jeff M. Phillips University of Utah Graham Cormode AT&T Labs Research Zhewei Wei HKUST Zengfeng Huang HKUST Ke Yi HKUST ABSTRACT We study the mergeability
More informationQuerying Distributed Data Streams (Invited Keynote Talk)
Querying Distributed Data Streams (Invited Keynote Talk) Minos Garofalakis School of Electronic and Computer Engineering Technical University of Crete minos@softnet.tuc.gr Abstract. Effective Big Data
More informationSublinear Algorithms for Big Data Analysis
Sublinear Algorithms for Big Data Analysis Michael Kapralov Theory of Computation Lab 4 EPFL 7 September 2017 The age of big data: massive amounts of data collected in various areas of science and technology
More informationSet-Difference Range Queries
CCCG 2013, Waterloo, Ontario, August 8 10, 2013 Set-Difference Range Queries David Eppstein Michael T. Goodrich Joseph A. Simons Abstract We introduce the problem of performing set-difference range queries,
More informationFaster, Space-Efficient Selection Algorithms in Read-Only Memory for Integers
Faster, Space-Efficient Selection Algorithms in Read-Only Memory for Integers Timothy M. Chan 1, J. Ian Munro 1, and Venkatesh Raman 2 1 Cheriton School of Computer Science, University of Waterloo, Waterloo,
More informationSummarizing and Mining Inverse Distributions on Data Streams via Dynamic Inverse Sampling
Summarizing and Mining Inverse Distributions on Data Streams via Dynamic Inverse Sampling Graham Cormode Bell Laboratories cormode@lucent.com S. Muthukrishnan Rutgers University muthu@cs.rutgers.edu Irina
More informationSummarizing and Mining Inverse Distributions on Data Streams via Dynamic Inverse Sampling
Summarizing and Mining Inverse Distributions on Data Streams via Dynamic Inverse Sampling Graham Cormode Bell Laboratories, Murray Hill, NJ cormode@lucent.com S. Muthukrishnan Rutgers University, Piscataway,
More informationQuerying Distributed Data Streams
Querying Distributed Data Streams Minos Garofalakis Technical University of Crete Software Technology and Network Applications Lab LIFT Cast: Antonios Deligiannakis, Vasilis Samoladas, Odysseas Papapetrou,
More informationNo Pane, No Gain: Efficient Evaluation of Sliding-Window Aggregates over Data Streams
No Pane, No Gain: Efficient Evaluation of Sliding-Window Aggregates over Data Streams Jin ~i', David ~aier', Kristin Tuftel, Vassilis papadimosl, Peter A. Tucke? 1 Portland State University 2~hitworth
More informationLecture 5: Data Streaming Algorithms
Great Ideas in Theoretical Computer Science Summer 2013 Lecture 5: Data Streaming Algorithms Lecturer: Kurt Mehlhorn & He Sun In the data stream scenario, the input arrive rapidly in an arbitrary order,
More informationCS168: The Modern Algorithmic Toolbox Lecture #2: Approximate Heavy Hitters and the Count-Min Sketch
CS168: The Modern Algorithmic Toolbox Lecture #2: Approximate Heavy Hitters and the Count-Min Sketch Tim Roughgarden & Gregory Valiant March 30, 2016 1 The Heavy Hitters Problem 1.1 Finding the Majority
More informationOptimisation While Streaming
Optimisation While Streaming Amit Chakrabarti Dartmouth College Joint work with S. Kale, A. Wirth DIMACS Workshop on Big Data Through the Lens of Sublinear Algorithms, Aug 2015 Combinatorial Optimisation
More informationFast and Approximate Stream Mining of Quantiles and Frequencies Using Graphics Processors
Fast and Approximate Stream Mining of Quantiles and Frequencies Using Graphics Processors Naga K. Govindaraju Nikunj Raghuvanshi Dinesh Manocha University of North Carolina at Chapel Hill {naga,nikunj,dm}@cs.unc.edu
More informationarxiv: v2 [cs.ds] 30 Sep 2016
Synergistic Sorting, MultiSelection and Deferred Data Structures on MultiSets Jérémy Barbay 1, Carlos Ochoa 1, and Srinivasa Rao Satti 2 1 Departamento de Ciencias de la Computación, Universidad de Chile,
More informationMining data streams. Irene Finocchi. finocchi/ Intro Puzzles
Intro Puzzles Mining data streams Irene Finocchi finocchi@di.uniroma1.it http://www.dsi.uniroma1.it/ finocchi/ 1 / 33 Irene Finocchi Mining data streams Intro Puzzles Stream sources Data stream model Storing
More informationFinding Frequent Items in Time Decayed Data Streams
Finding Frequent Items in Time Decayed Data Streams Shanshan Wu 1, Huaizhong Lin 1(B), Leong Hou U 2, Yunjun Gao 1, and Dongming Lu 1 1 College of Computer Science and Technology, Zhejiang University,
More informationData Streaming Algorithms for Geometric Problems
Data Streaming Algorithms for Geometric roblems R.Sharathkumar Duke University 1 Introduction A data stream is an ordered sequence of points that can be read only once or a small number of times. Formally,
More informationAnnotations For Sparse Data Streams
Annotations For Sparse Data Streams Justin Thaler, Simons Institute, UC Berkeley Joint Work with: Amit Chakrabarti, Dartmouth Graham Cormode, University of Warwick Navin Goyal, Microsoft Research India
More informationFast and Approximate Stream Mining of Quantiles and Frequencies Using Graphics Processors
Fast and Approximate Stream Mining of Quantiles and Frequencies Using Graphics Processors Naga K. Govindaraju Nikunj Raghuvanshi Dinesh Manocha University of North Carolina at Chapel Hill {naga,nikunj,dm}@cs.unc.edu
More informationMining for Patterns and Anomalies in Data Streams. Sampath Kannan University of Pennsylvania
Mining for Patterns and Anomalies in Data Streams Sampath Kannan University of Pennsylvania The Problem Data sizes too large to fit in primary memory Devices with small memory Access times to secondary
More informationParameterized Approximation Schemes Using Graph Widths. Michael Lampis Research Institute for Mathematical Sciences Kyoto University
Parameterized Approximation Schemes Using Graph Widths Michael Lampis Research Institute for Mathematical Sciences Kyoto University May 13, 2014 Overview Topic of this talk: Randomized Parameterized Approximation
More informationApproximate Continuous Querying over Distributed Streams
Approximate Continuous Querying over Distributed Streams GRAHAM CORMODE AT&T Labs Research and MINOS GAROFALAKIS Yahoo! Research & University of California, Berkeley While traditional database systems
More informationWhat s Hot and What s Not: Tracking Most Frequent Items Dynamically
What s Hot and What s Not: Tracking Most Frequent Items Dynamically Graham Cormode Center for Discrete Mathematics and Computer Science (DIMACS) Rutgers University, Piscataway NJ graham@dimacs.rutgers.edu
More informationSensors. Outline. Sensor networks. Tributaries and Deltas: Efficient and Robust Aggregation in Sensor Network Streams
ributaries and s: Efficient and Robust Aggregation in Sensor Network Streams Acknowledgements for slides: Amit Manjhi, SIGMOD 5 Amit Manjhi, Suman Nath, Phillip B. Gibbons Carnegie Mellon University Intel
More informationData Streams Algorithms
Data Streams Algorithms Phillip B. Gibbons Intel Research Pittsburgh Guest Lecture in 15-853 Algorithms in the Real World Phil Gibbons, 15-853, 12/4/06 # 1 Outline Data Streams in the Real World Formal
More informationAn Efficient Transformation for Klee s Measure Problem in the Streaming Model Abstract Given a stream of rectangles over a discrete space, we consider the problem of computing the total number of distinct
More informationarxiv: v1 [cs.db] 30 Jun 2012
Sketch-based Querying of Distributed Sliding-Window Data Streams Odysseas Papapetrou, Minos Garofalakis, Antonios Deligiannakis Technical University of Crete {papapetrou,minos,adeli}@softnet.tuc.gr arxiv:1207.0139v1
More informationLecture Online Algorithms and the k-server problem June 14, 2011
Approximation Algorithms Workshop June 13-17, 2011, Princeton Lecture Online Algorithms and the k-server problem June 14, 2011 Joseph (Seffi) Naor Scribe: Mohammad Moharrami 1 Overview In this lecture,
More informationParameterized Approximation Schemes Using Graph Widths. Michael Lampis Research Institute for Mathematical Sciences Kyoto University
Parameterized Approximation Schemes Using Graph Widths Michael Lampis Research Institute for Mathematical Sciences Kyoto University July 11th, 2014 Visit Kyoto for ICALP 15! Parameterized Approximation
More informationarxiv: v2 [cs.ds] 22 May 2017
A High-Performance Algorithm for Identifying Frequent Items in Data Streams Daniel Anderson Pryce Bevin Kevin Lang Edo Liberty Lee Rhodes Justin Thaler arxiv:1705.07001v2 [cs.ds] 22 May 2017 Abstract Estimating
More informationReporting Consecutive Substring Occurrences Under Bounded Gap Constraints
Reporting Consecutive Substring Occurrences Under Bounded Gap Constraints Gonzalo Navarro University of Chile, Chile gnavarro@dcc.uchile.cl Sharma V. Thankachan Georgia Institute of Technology, USA sthankachan@gatech.edu
More informationBuffered Count-Min Sketch on SSD: Theory and Experiments
Buffered Count-Min Sketch on SSD: Theory and Experiments Mayank Goswami, Dzejla Medjedovic, Emina Mekic, and Prashant Pandey ESA 2018, Helsinki, Finland The heavy hitters problem (HH(k)) Given stream of
More informationKernelization via Sampling with Applications to Finding Matchings and Related Problems in Dynamic Graph Streams
Kernelization via Sampling with Applications to Finding Matchings and Related Problems in Dynamic Graph Streams Sofya Vorotnikova University of Massachusetts Amherst Joint work with Rajesh Chitnis, Graham
More informationTopics in Data Stream Sampling and Insider Threat Detection
Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2016 Topics in Data Stream Sampling and Insider Threat Detection Yung-Yu Chung Iowa State University Follow this
More informationA Simple Augmentation Method for Matchings with Applications to Streaming Algorithms
A Simple Augmentation Method for Matchings with Applications to Streaming Algorithms MFCS 2018 Christian Konrad 27.08.2018 Streaming Algorithms sequential access random access Streaming (1996 -) Objective:
More informationDeterministic Communication
University of California, Los Angeles CS 289A Communication Complexity Instructor: Alexander Sherstov Scribe: Rebecca Hicks Date: January 9, 2012 LECTURE 1 Deterministic Communication This lecture is a
More informationApproximating the Caro-Wei Bound for Independent Sets in Graph Streams
Approximating the Caro-Wei Bound for Independent Sets in Graph Streams Graham Cormode, Jacques Dark, and Christian Konrad Department of Computer Science, Centre for Discrete Mathematics and its Applications
More informationAnswering Aggregation Queries on Hierarchical Web Sites Using Adaptive Sampling (Technical Report, UCI ICS, August, 2005)
Answering Aggregation Queries on Hierarchical Web Sites Using Adaptive Sampling (Technical Report, UCI ICS, August, 2005) Foto N. Afrati Computer Science Division NTUA, Athens, Greece afrati@softlab.ece.ntua.gr
More informationImproved Algorithm for Weighted Flow Time
Joint work with Yossi Azar Denition Previous Work and Our Results Overview Single machine receives jobs over time. Each job has an arrival time, a weight and processing time (volume). Allow preemption.
More informationStream Sequential Pattern Mining with Precise Error Bounds
Stream Sequential Pattern Mining with Precise Error Bounds Luiz F. Mendes,2 Bolin Ding Jiawei Han University of Illinois at Urbana-Champaign 2 Google Inc. lmendes@google.com {bding3, hanj}@uiuc.edu Abstract
More informationFrequent Itemsets in Streams
Frequent Itemsets in Streams Eiríkur Fannar Torfason teirikur@student.ethz.ch 31.5.2013 1 Introduction This report is written for a seminar at ETH Zürich, titled Algorithms for Database Systems. This year
More informationLinear-Space Data Structures for Range Minority Query in Arrays
Linear-Space Data Structures for Range Minority Query in Arrays Timothy M. Chan 1, Stephane Durocher 2, Matthew Skala 2, and Bryan T. Wilkinson 1 1 Cheriton School of Computer Science, University of Waterloo,
More informationQuestion 7.11 Show how heapsort processes the input:
Question 7.11 Show how heapsort processes the input: 142, 543, 123, 65, 453, 879, 572, 434, 111, 242, 811, 102. Solution. Step 1 Build the heap. 1.1 Place all the data into a complete binary tree in the
More informationChecking and Spot-Checking the Correctness of Priority Queues
Checking and Spot-Checking the Correctness of Priority Queues Matthew Chu 1, Sampath Kannan 1, and Andrew McGregor 2 1 Dept. of Computer and Information Science, University of Pennsylvania {mdchu,kannan}@cis.upenn.edu
More informationMining top-k frequent itemsets from data streams
Data Min Knowl Disc (2006) 13:193 217 DOI 10.1007/s10618-006-0042-x Mining top-k frequent itemsets from data streams Raymond Chi-Wing Wong Ada Wai-Chee Fu Received: 29 April 2005 / Accepted: 1 February
More informationProcessing Data-Stream Join Aggregates Using Skimmed Sketches
Processing Data-Stream Join Aggregates Using Skimmed Sketches Sumit Ganguly, Minos Garofalakis, and Rajeev Rastogi Bell Laboratories, Lucent Technologies, Murray Hill NJ, USA. sganguly,minos,rastogi @research.bell-labs.com
More informationDepartment of Computer Science
Yale University Department of Computer Science Pass-Efficient Algorithms for Facility Location Kevin L. Chang Department of Computer Science Yale University kchang@cs.yale.edu YALEU/DCS/TR-1337 Supported
More informationOnline Algorithms. Lecture 11
Online Algorithms Lecture 11 Today DC on trees DC on arbitrary metrics DC on circle Scheduling K-server on trees Theorem The DC Algorithm is k-competitive for the k server problem on arbitrary tree metrics.
More informationImplementing Hierarchical Heavy Hitters in RapidMiner: Solutions and Open Questions
Implementing Hierarchical Heavy Hitters in RapidMiner: Solutions and Open Questions Marco Stolpe and Peter Fricke firstname.surname@tu-dortmund.de Technical University of Dortmund, Artificial Intelligence
More informationApproximate Quantile Computation over Sensor Networks
Approximate Quantile Computation over Sensor Networks Xifeng Yan, Jiong Yang, Wei Wang, Jiawei Han University of Illinois at Urbana-Champaign, {xyan, jioyang, hanj}@cs.uiuc.edu University of North Carolina
More informationMining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding Window
Mining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding Window Downloaded 02/06/18 to 37.44.198.214. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
More informationMining Approximate Frequent Closed Flows over Packet Streams
Mining Approximate Frequent Closed Flows over Packet Streams Imen Brahmi 1, Sadok Ben Yahia 1, and Pascal Poncelet 2 1 Faculty of Sciences of Tunis, Tunisia imen.brahmi@gmail.com, sadok.benyahia@fst.rnu.tn
More informationCardinality Estimation: An Experimental Survey
: An Experimental Survey and Felix Naumann VLDB 2018 Estimation and Approximation Session Rio de Janeiro-Brazil 29 th August 2018 Information System Group Hasso Plattner Institut University of Potsdam
More informationDistributed Data Streaming Algorithms for Network Anomaly Detection
Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2017 Distributed Data Streaming Algorithms for Network Anomaly Detection Wenji Chen Iowa State University Follow
More informationJoin Sizes, Frequency Moments, and Applications
Join Sizes, Frequency Moments, and Applications Graham Cormode 1 and Minos Garofalakis 2 1 University of Warwick, Department of Computer Science, Coventry CV4 7AL, UK. Email: G.Cormode@warwick.ac.uk 2
More informationOnline Mining Changes of Items over Continuous Append-only and Dynamic Data Streams
Journal of Universal Computer Science, vol., no. 8 (2005), 4-425 submitted: 0/3/05, accepted: 5/5/05, appeared: 28/8/05 J.UCS Online Mining Changes of Items over Continuous Append-only and Dynamic Data
More informationLatest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of Handwaving. Andrew McGregor University of Massachusetts
Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of Handwaving Andrew McGregor University of Massachusetts Latest on Linear Sketches for Large Graphs: Lots of Problems,
More information