Randomized Composable Core-sets for Distributed Optimization Vahab Mirrokni

Size: px
Start display at page:

Download "Randomized Composable Core-sets for Distributed Optimization Vahab Mirrokni"

Transcription

1 Randomized Composable Core-sets for Distributed Optimization Vahab Mirrokni Mainly based on joint work with: Algorithms Research Group, Google Research, New York Hossein Bateni, Aditya Bhaskara, Hossein Esfandiari, Silvio Lattanzi, Morteza Zadimoghaddam

2 Our team: Google NYC Algorithms Research Teams Market Algorithms/ Ads Optimization (search & display) Tools: e.g. Clustering common expertise: online allocation problems Large-Scale Graph Mining/Distributed Optimization Infrastructure and Large-Scale Optimization Tools: e.g. Balanced Partitioning

3 Three most popular techniques applied in our tools Local Algorithms: Message Passing/Label Propagation/Local Random Walks e.g., similarity ranking via PPR etc, Connected Components Connected components code that s times faster the state-of-the-art Embedding/Hashing/Sketching Techniques e.g., linear embedding for balanced graph partitioning to minimize cut Improves the state-of-the-art by 26%. Improved flash bandwidth for search backend by 25%. Paper appeared in WSDM 16. Randomized Composable Core-sets for Distributed Computation: This Talk

4 Proprietary + Confidential Agenda Composable core-sets: Definitions & Applications Applications in Distributed & Streaming settings Applications: Feature Selection, Diversity in Search & Recom. Composable Core-sets for Four Problems: Survey Diversity Maximization(PODS 14, AAAI 17), Clustering(NIPS 14), Submodular Maximization(STOC 15), and Column Subset Selection (ICML 16) Sketching for Coverage Problems (on arxiv) Sketching Technique

5 Composable Core-Sets for Distributed Optimization Run ALG in each machine Machine 1 T1 T1 T2 Machine 2 T2 S1 Run ALG on selected items to find the final output set S2 Selected Items Input Set Tm Output Set Sm Machine m Tm

6 Composable Core-sets Setup: Consider partitioning data set T of elements into m sets (T1,T2,...,Tm). Goal: Given a set function f, find a subset S* with Find: small core-set,,.,, optimizing f(s*). such that optimum solution in union of core-sets approximates the optimum solution of T

7 Application in MapReduce/Distributed Computation Run ALG in each machine Machine 1 T1 T1 T2 Machine 2 T2 S1 Run ALG on selected items to find the final output set S2 Selected Items Input Set Tm Output Set Sm Machine m Tm E.g., two rounds of MapReduce

8 Application in Streaming Computation Streaming Computation: Processing sequence of n data points on the fly Limited storage Use C-composable core-set of size k, for example: Chunks of size, thus number of chunks is Compute core-set of size k for each chunk Total space:

9 Overview of recent theoretical results Need to solve (combinatorial) optimization problems on large data 1. Diversity Maximization, PODS 14 by IndykMahdianMahabadiMirrokni for Feature Selection in AAAI 17 by AbbasiGhadiriMirrokniZadimoghaddam 2. Capacitated ℓp Clustering, NIPS 14 by BateniBhaskaraLattanziMirrokni Submodular Maximization, STOC 15 by MirrokniZadimoghaddam Column Subset Selection (Feature Selection), ICML 16 by Alschulter et al. Coverage Problems: Submitted by BateniEsfandiariMirrokni

10 Applications: Diversity & Submodular Maximization Diverse suggestions Play apps Campaign keywords Search results News articles YouTube videos Data summarization Feature selection Exemplar sampling

11 Feature selection We have Data points (docs, web pages, etc.) Features (topics, etc.) Goal: pick a small set of representative features Emotion Hotel Finance Hospital Money Business Weather Cloud Laundry Movie Gaming Smartphone Car World Theatre Software Laptop Boat Home Camera Education Security Train Shopping

12 Five Problems Considered General: Find a set S of k items & maximize/minimize f(s). Diversity Maximization: Find a set S of k points, and maximize the sum of pairwise distances i.e. max diversity(s) =. Capacitated/Balanced Clustering: Find a set S of k centers and cluster nodes around them while minimizing the sum of distances to S. Coverage/Submodular Maximization: Find a set S of k items. Maximize submodular function f(s). Generalizing set cover. Column subset selection: Given a matrix A, find a set S of k columns. Minimize

13 Diversity Maximization Problem Given: A set of n points in a metric space (X, dist) Find a set S of k points Goal: maximize diversity(s) i.e. diversity(s) = sum of pairwise distances of points in S. Background: Max Dispersion (Halldorson et al, Abbassi et al) Useful for feature selection, diverse candidate selection in Search, representative centers...

14 Core-sets for Diversity Maximization Two rounds of MapReduce Run LocalSearch on each machine Machine 1 T1 T1 Input Set T2 Machine 2 T2 S1 Run LocalSearch on selected items to find the final output set S2 Selected Items Output Set Sm Tm Machine m Tm Arbitrary Partitioning works. Random partitioning is better.

15 Composable Core-set Results for Diversity Maximization Theorem(IndykMahabadiMahdianM. 14): The local search algorithm computes a constant-factor composable core-set for maximizing sum of pairwise distances in 2 rounds: Theorem(EpastoM.ZadiMoghaddam 16): A sampling+greedy algorithm computes a randomized 2-approximate composable small-size core-set for diversity maximization in one round. randomized: works under random partitioning small-size: size of core-set is less than k.

16 Distributed Clustering Problems Clustering: Divide data into groups containing nearby points Minimize: k-center : k-means : Metric space (d, X) α-approximation algorithm: cost less than α*opt k-median :

17 Mapping Core-sets for Capacitated Clustering

18 Capacitated ℓp clustering Problem: Given n points in a metric space, find k centers and assign points to centers, respecting capacities, to minimize ℓp norm of the distance vector. Generalizes balanced k-median, k-means & k-center. Objective is not minimizing cut size (cf. balanced partitioning in the library) Theorem: For any p and k< n, distributed balanced clustering with approx ratio: small constant * best single machine guarantee # rounds: 2 memory: (n/m)2 with m machines Improves [BMVKV 12] and [BEL 13] (Bateni,Bhaskara,Lattanzi,Mirrokni, NIPS 14)

19 Empirical study for distributed clustering Test in terms of scalability and quality of solution Two base instances & subsamples US graph ~30M nodes World graph ~500M nodes Size of seq. inst Increase in OPT US 1/ World 1/ Quality: pessimistic analysis Sublinear running time scaling

20 Submodular maximization Problem: Given k & submodular function f, find set S of size k that maximizes f(s). Some applications Data summarization Feature selection Exemplar clustering Special case: coverage maximization : Given a family of subsets, choose a subfamily of k sets, and maximize cardinality of union. cover various topics/meanings target all kinds of users

21 Submodular maximization Problem: Given k & submodular function f, find set S of size k that maximizes f(s). Some applications Data summarization Feature selection Exemplar clustering Special case: coverage maximization : Given a family of subsets, choose a subfamily of k sets, and maximize cardinality of union. cover various topics/meanings target all kinds of users [IMMM 14] Bad News: No deterministic composable core-set with approx

22 Submodular maximization Problem: Given k & submodular function f, find set S of size k that maximizes f(s). Some applications Data summarization Feature selection Exemplar clustering Special case: coverage maximization : Given a family of subsets, choose a subfamily of k sets, and maximize cardinality of union. cover various topics/meanings target all kinds of users [IMMM 14] Bad News: No deterministic composable core-set with approx Randomization is necessary and useful: Send each set randomly to some machine Build a coreset on each machine by greedy algorithm

23 Randomization to the Rescue: Randomized Core-sets Run GREEDY on each machine Machine 1 Random T1 Random T2 S1 Machine 2 Run GREEDY on selected items to find the final output set S2 Selected Items Input Set Output Set Sm Random Tm Machine m Two rounds of MapReduce

24 Results for Submodular Maximization: MZ (STOC 15) A class of 0.33-approximate randomized composable core-sets of size k for non-monotone submodular maximization. For example, Greedy Algorithm. Hard to go beyond ½ approximation with size k. Impossible to get better than 1-1/e approximate randomized composable core-set of size 4k for monotone f. Results in 0.54-approximate distributed algorithm in two rounds with linear communication complexity. For small-size composable core-sets of k less than k: sqrt{k /k}-approximate randomized composable core-set.

25 Low-Rank Approximation Given (large) matrix A in Rmxn and target rank k << m,n: Optimal solution: k-rank SVD Applications: Dimensionality reduction Signal denoising Compression...

26 Column Subset Selection (CSS) Columns often have important meaning CSS: Low-rank matrix approximation in column space of A n k n k m A A m A[S]

27 DISTGREEDY: GCSS(A,B,k) with L machines B Machine 1 Machine 2 Machine L

28 DISTGREEDY: GCSS(A,B,k) with L machines B Machine 1 Machine 2 Machine L

29 DISTGREEDY: GCSS(A,B,k) with L machines B Machine 1 Machine 2 Machine L Designated machine

30 DISTGREEDY: GCSS(A,B,k) with L machines B Machine 1 Machine 2 Machine L Designated machine

31 DISTGREEDY for column subset selection

32 Empirical result for column subset selection Training accuracy on massive data set (news 20.binary, 15k x 100k matrix) Speedup over 2-phase algorithm in parentheses Interesting experiment: What if we partition more carefully and not randomly? Recent observation: If we treat each machine separately, it does not help much! Random partitioning is good even compared with more careful partitioning.

33 Coverage Problems Problems: Given a set system (n sets and m elements), 1. K-coverage : pick k sets to max. size of union 2. set cover : cover all elements with least number of sets 3. set cover with outliers : cover (1-λ)m elements with least number of sets

34 Coverage Problems Problems: Given a set system (n sets and m elements), 1. K-coverage : pick k sets to max. size of union 2. set cover : cover all elements with least number of sets 3. set cover with outliers : cover (1-λ)m elements with least number of sets Greedy Algorithm: Pick a subset with the maximum marginal coverage,

35 Coverage Problems Problems: Given a set system (n sets and m elements), 1. K-coverage : pick k sets to max. size of union 2. set cover : cover all elements with least number of sets 3. set cover with outliers : cover (1-λ)m elements with least number of sets Greedy Algorithm: Pick a subset with the maximum marginal coverage, 1-1/e-approx. To k-coverage, log n-approximation for set cover... Goal: Achieve good fast approximation with minimum memory footprint Streaming: elements arrive one by one, not sets Distributed: linear communication and memory independent of the size of ground set

36 Submodular Maximization vs. Maximum Coverage Coverage function is a special case of submodular function: f(r) = cardinality of union of family R of subsets

37 Submodular Maximization vs. Maximum Coverage Coverage function is a special case of submodular maximization: f(r) = cardinality of union of family R of subsets So problem solved? [MirrokniZadimoghaddam STOC 15]: Randomized composable core-sets work [Mirzasoleiman et al NIPS 14]: This method works well in Practice!

38 Submodular Maximization vs. Maximum Coverage Coverage function is a special case of submodular maximization: f(r) = cardinality of union of family R of subsets So problem solved? [MirrokniZadimoghaddam STOC 15]: Randomized composable core-sets work [Mirzasoleiman et al NIPS 14]: This method works well in Practice! No. This solution has several issues for coverage problems: It requires expensive oracle access to computing cardinality of union! Distributed Computation: Send whole sets around? Streaming: Handles set arrival model, does not handle element arrival model!

39 Why can t we apply core-sets for submodular functions? Run ALG in each machine Subfamily of subsets T1 Family of subsets Subfamily of subsets Tm Machine 1 T1 Machine 2 T2 S1 Run ALG on selected items to find the final output set S2 Selected Items Output Set Sm Machine m Tm What if the subsets are large? Can we send a sketch of them?

40 Idea: Send a sketch for each set (e.g., sample of elements) Run ALG in each machine Sketch of subsets T1 Family of subsets Sketch of subsets Tm Machine 1 T1 Machine 2 T2 S1 Run ALG on selected items to find the final output set S2 Selected Items Output Set Sm Machine m Tm Question: Does any approximation-preserving sketch work?

41 Approximation-preserving sketching is not sufficient. Idea: Use sketching to define a (1±ε)-approx oracle to cardinality of union function? [BateniEsfandiariMirrokni 16]: Thm 1: A (1±ε)-approx sketch of coverage function May NOT Help Given an (1±ε)-approx oracle to coverage function, we get n0.49 approximation

42 Approximation-preserving sketching is not sufficient. Idea: Use sketching to define a (1±ε)-approx oracle to cardinality of union function? [BateniEsfandiariMirrokni 16]: Thm 1: A (1±ε)-approx sketch of coverage function May NOT Help Given an (1±ε)-approx oracle to coverage function, we get n0.49 approximation Thm 2: With some tricks, MinHash-based sketch + proper sampling WORKS Sample elements not sets (different from previous coreset idea) Correlation between samples (MinHash) Cap degrees of elements in the sketch (reduces memory footprint)

43 Bipartite Graph Formulation for Coverage Problems Bipartite graph G(U, V, E) U: sets V: elements E: membership sets Set cover problem: Pick minimum number of sets that cover all elements. Set cover with outliers problem: Pick minimum number of sets that cover a 1 - fraction of elements. elements Maximum coverage problem: Pick sets that cover maximum number of elements.

44 Sketching Technique sets Construction Dependent sampling: Assign hash values from [0,1) to elements. Remove any element with hash value exceeding. Arbitrarily remove edges to have max-degree for elements. Parameters Sample parameters 1) =is0.6 easy to compute. 2) 2) = 2 be found via a round of MapReduce. can Hash: elements

45 Approach Build graph Sketch construction Core-set method Final greedy Extract results Sketch: sparse subgraph with sufficient information For instance with many sets, parallelize using core sets. Any single-machine greedy algorithm

46 Proof ingredients: 1. Parameters are chosen to produce small sketch (indep. of size of ground set): O(#sets) Challenge: how to choose parameters in distributed or streaming models 2. Any -approximation on the sketch is an + approximation for original instance

47 Summary of Results for Coverage Functions Special case of submodular maximization Problems are NP-hard and APX-hard Greedy algorithm gives best guarantees Good implementations (linear-time) Lazy greedy algorithm Lazier-than-lazy algorithm GREEDY 1) Start with empty solution 2) Until done, (a) find set with best marginal coverage, and (b) add it to tentative solution. Problem: Graph should be stored in RAM Our algorithm: Memory O(#sets) Linear-time Optimal approximation guarantees MapReduce, streaming, etc.

48 Bounds for distributed coverage problems From [BEM 16]: 1) Space indep. of size of sets or ground set, 2) Optimal Approximation Factor, 3) Communication linear in #sets (indep. of their size), 4) small #rounds Previous work: [39]=[CKT 11], [42]=[MZ 15], [19]=[BENW 16], [43]=[MBKK 16]

49 Bounds for streaming coverage problems From [BEM 16]: 1) Space indep. of size of ground set, 2) Optimal Approximation Factor, 3) Edge vs set arrival Previous work:[14]=[cw 15], [22]=[DIMV 14], [24]=[ER 14], [31]=[IMV 15], [49]=[SG 09]

50 Empirical Study Public datasets Social networks Bags of words Contribution graphs Planted instances Very small sketches (0.01 5%) suffice for obtaining good approximations (95+%). Without core sets, can handle in <1h XXXB edges or elements.

51 features Goal: Pick k representative features entities Feature Selection (ongoing) Based on composable core sets k Random clusters Best cluster method Set cover (pairs) ) Pick features that cover all entities 2) Pick features that cover many pairs (or triples, etc.) of entities

52 Summary: Distributed Algorithms for Five Problems Define on a metric space & composable core-sets apply. 1. Diversity Maximization, 2. PODS 14 by IndykMahdianMahabadiM. for Feature Selection in AAAI 17 by AbbasiGhadiriMirrokniZadimoghaddam Capacitated ℓp Clustering, NIPS 14 by BateniBhaskaraLattanziM. Beyond Metric Spaces. Only Randomized partitioning apply Submodular Maximization, STOC 15 by M. Zadimoghaddam Feature Selection (Column Subset Selection), ICML 16 by Alschulter et al. Needs adaptive sampling/sketching techniques 5. Coverage Problems: by BateniEsfandiariM

53 Our team: Google NYC Algorithms Research Team Recently released external team website: research.google.com/teams/nycalg/ Market Algorithms/ Ads Optimization (search & display) Tools: e.g. Clustering common expertise: online allocation problems Large-Scale Graph Mining/Distributed Optimization Infrastructure and Large-Scale Optimization Tools: e.g. Balanced Partitioning

54 THANK YOU

55 Local Search for Diversity Maximization [KDD 13]

Large-scale Graph Google NY

Large-scale Graph Google NY Large-scale Graph Mining @ Google NY Vahab Mirrokni Google Research New York, NY DIMACS Workshop Large-scale graph mining Many applications Friend suggestions Recommendation systems Security Advertising

More information

Distributed Submodular Maximization in Massive Datasets. Alina Ene. Joint work with Rafael Barbosa, Huy L. Nguyen, Justin Ward

Distributed Submodular Maximization in Massive Datasets. Alina Ene. Joint work with Rafael Barbosa, Huy L. Nguyen, Justin Ward Distributed Submodular Maximization in Massive Datasets Alina Ene Joint work with Rafael Barbosa, Huy L. Nguyen, Justin Ward Combinatorial Optimization Given A set of objects V A function f on subsets

More information

A survey of submodular functions maximization. Yao Zhang 03/19/2015

A survey of submodular functions maximization. Yao Zhang 03/19/2015 A survey of submodular functions maximization Yao Zhang 03/19/2015 Example Deploy sensors in the water distribution network to detect contamination F(S): the performance of the detection when a set S of

More information

Coverage Approximation Algorithms

Coverage Approximation Algorithms DATA MINING LECTURE 12 Coverage Approximation Algorithms Example Promotion campaign on a social network We have a social network as a graph. People are more likely to buy a product if they have a friend

More information

Absorbing Random walks Coverage

Absorbing Random walks Coverage DATA MINING LECTURE 3 Absorbing Random walks Coverage Random Walks on Graphs Random walk: Start from a node chosen uniformly at random with probability. n Pick one of the outgoing edges uniformly at random

More information

Absorbing Random walks Coverage

Absorbing Random walks Coverage DATA MINING LECTURE 3 Absorbing Random walks Coverage Random Walks on Graphs Random walk: Start from a node chosen uniformly at random with probability. n Pick one of the outgoing edges uniformly at random

More information

Clustering Large scale data using MapReduce

Clustering Large scale data using MapReduce Clustering Large scale data using MapReduce Topics Clustering Very Large Multi-dimensional Datasets with MapReduce by Robson L. F. Cordeiro et al. Fast Clustering using MapReduce by Alina Ene et al. Background

More information

Distributed Balanced Clustering via Mapping Coresets

Distributed Balanced Clustering via Mapping Coresets Distributed Balanced Clustering via Mapping Coresets MohammadHossein Bateni Aditya Bhaskara Silvio Lattanzi Vahab Mirrokni Google Research NYC Abstract Large-scale clustering of data points in metric spaces

More information

Diversity Maximization Under Matroid Constraints

Diversity Maximization Under Matroid Constraints Diversity Maximization Under Matroid Constraints Zeinab Abbassi Department of Computer Science Columbia University zeinab@cs.olumbia.edu Vahab S. Mirrokni Google Research, New York mirrokni@google.com

More information

COMP Analysis of Algorithms & Data Structures

COMP Analysis of Algorithms & Data Structures COMP 3170 - Analysis of Algorithms & Data Structures Shahin Kamali Approximation Algorithms CLRS 35.1-35.5 University of Manitoba COMP 3170 - Analysis of Algorithms & Data Structures 1 / 30 Approaching

More information

Parallel Algorithms for Geometric Graph Problems Grigory Yaroslavtsev

Parallel Algorithms for Geometric Graph Problems Grigory Yaroslavtsev Parallel Algorithms for Geometric Graph Problems Grigory Yaroslavtsev http://grigory.us Appeared in STOC 2014, joint work with Alexandr Andoni, Krzysztof Onak and Aleksandar Nikolov. The Big Data Theory

More information

CS 598CSC: Approximation Algorithms Lecture date: March 2, 2011 Instructor: Chandra Chekuri

CS 598CSC: Approximation Algorithms Lecture date: March 2, 2011 Instructor: Chandra Chekuri CS 598CSC: Approximation Algorithms Lecture date: March, 011 Instructor: Chandra Chekuri Scribe: CC Local search is a powerful and widely used heuristic method (with various extensions). In this lecture

More information

Optimisation While Streaming

Optimisation While Streaming Optimisation While Streaming Amit Chakrabarti Dartmouth College Joint work with S. Kale, A. Wirth DIMACS Workshop on Big Data Through the Lens of Sublinear Algorithms, Aug 2015 Combinatorial Optimisation

More information

Kernelization via Sampling with Applications to Finding Matchings and Related Problems in Dynamic Graph Streams

Kernelization via Sampling with Applications to Finding Matchings and Related Problems in Dynamic Graph Streams Kernelization via Sampling with Applications to Finding Matchings and Related Problems in Dynamic Graph Streams Sofya Vorotnikova University of Massachusetts Amherst Joint work with Rajesh Chitnis, Graham

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

Sublinear Algorithms for Big Data Analysis

Sublinear Algorithms for Big Data Analysis Sublinear Algorithms for Big Data Analysis Michael Kapralov Theory of Computation Lab 4 EPFL 7 September 2017 The age of big data: massive amounts of data collected in various areas of science and technology

More information

Approximation Algorithms

Approximation Algorithms Chapter 8 Approximation Algorithms Algorithm Theory WS 2016/17 Fabian Kuhn Approximation Algorithms Optimization appears everywhere in computer science We have seen many examples, e.g.: scheduling jobs

More information

Streaming Algorithms for Matching Size in Sparse Graphs

Streaming Algorithms for Matching Size in Sparse Graphs Streaming Algorithms for Matching Size in Sparse Graphs Graham Cormode g.cormode@warwick.ac.uk Joint work with S. Muthukrishnan (Rutgers), Morteza Monemizadeh (Rutgers Amazon) Hossein Jowhari (Warwick

More information

A Class of Submodular Functions for Document Summarization

A Class of Submodular Functions for Document Summarization A Class of Submodular Functions for Document Summarization Hui Lin, Jeff Bilmes University of Washington, Seattle Dept. of Electrical Engineering June 20, 2011 Lin and Bilmes Submodular Summarization June

More information

Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of Handwaving. Andrew McGregor University of Massachusetts

Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of Handwaving. Andrew McGregor University of Massachusetts Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of Handwaving Andrew McGregor University of Massachusetts Latest on Linear Sketches for Large Graphs: Lots of Problems,

More information

Graph Connectivity in MapReduce...How Hard Could it Be?

Graph Connectivity in MapReduce...How Hard Could it Be? Graph Connectivity in MapReduce......How Hard Could it Be? Sergei Vassilvitskii +Karloff, Kumar, Lattanzi, Moseley, Roughgarden, Suri, Vattani, Wang August 28, 2015 Google NYC Maybe Easy...Maybe Hard...

More information

A Computational Theory of Clustering

A Computational Theory of Clustering A Computational Theory of Clustering Avrim Blum Carnegie Mellon University Based on work joint with Nina Balcan, Anupam Gupta, and Santosh Vempala Point of this talk A new way to theoretically analyze

More information

Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs

Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Alessandro Epasto J. Feldman*, S. Lattanzi*, S. Leonardi, V. Mirrokni*. *Google Research Sapienza U. Rome Motivation Recommendation

More information

Fast Clustering using MapReduce

Fast Clustering using MapReduce Fast Clustering using MapReduce Alina Ene Sungjin Im Benjamin Moseley September 6, 2011 Abstract Clustering problems have numerous applications and are becoming more challenging as the size of the data

More information

Similarity Ranking in Large- Scale Bipartite Graphs

Similarity Ranking in Large- Scale Bipartite Graphs Similarity Ranking in Large- Scale Bipartite Graphs Alessandro Epasto Brown University - 20 th March 2014 1 Joint work with J. Feldman, S. Lattanzi, S. Leonardi, V. Mirrokni [WWW, 2014] 2 AdWords Ads Ads

More information

Problem 1: Complexity of Update Rules for Logistic Regression

Problem 1: Complexity of Update Rules for Logistic Regression Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1

More information

Graphs and Network Flows IE411. Lecture 21. Dr. Ted Ralphs

Graphs and Network Flows IE411. Lecture 21. Dr. Ted Ralphs Graphs and Network Flows IE411 Lecture 21 Dr. Ted Ralphs IE411 Lecture 21 1 Combinatorial Optimization and Network Flows In general, most combinatorial optimization and integer programming problems are

More information

Approximation Algorithms for Clustering Uncertain Data

Approximation Algorithms for Clustering Uncertain Data Approximation Algorithms for Clustering Uncertain Data Graham Cormode AT&T Labs - Research graham@research.att.com Andrew McGregor UCSD / MSR / UMass Amherst andrewm@ucsd.edu Introduction Many applications

More information

Fast Clustering using MapReduce

Fast Clustering using MapReduce Fast Clustering using MapReduce Alina Ene, Sungjin Im, Benjamin Moseley UIUC KDD 2011 Clustering Massive Data Group web pages based on their content Group users based on their online behavior Finding communities

More information

Combinatorial Selection and Least Absolute Shrinkage via The CLASH Operator

Combinatorial Selection and Least Absolute Shrinkage via The CLASH Operator Combinatorial Selection and Least Absolute Shrinkage via The CLASH Operator Volkan Cevher Laboratory for Information and Inference Systems LIONS / EPFL http://lions.epfl.ch & Idiap Research Institute joint

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 29 Approximation Algorithms Load Balancing Weighted Vertex Cover Reminder: Fill out SRTEs online Don t forget to click submit Sofya Raskhodnikova 12/7/2016 Approximation

More information

Diffusion and Clustering on Large Graphs

Diffusion and Clustering on Large Graphs Diffusion and Clustering on Large Graphs Alexander Tsiatas Final Defense 17 May 2012 Introduction Graphs are omnipresent in the real world both natural and man-made Examples of large graphs: The World

More information

arxiv: v1 [cs.ma] 8 May 2018

arxiv: v1 [cs.ma] 8 May 2018 Ordinal Approximation for Social Choice, Matching, and Facility Location Problems given Candidate Positions Elliot Anshelevich and Wennan Zhu arxiv:1805.03103v1 [cs.ma] 8 May 2018 May 9, 2018 Abstract

More information

Improved Approximations for Graph-TSP in Regular Graphs

Improved Approximations for Graph-TSP in Regular Graphs Improved Approximations for Graph-TSP in Regular Graphs R Ravi Carnegie Mellon University Joint work with Uriel Feige (Weizmann), Jeremy Karp (CMU) and Mohit Singh (MSR) 1 Graph TSP Given a connected unweighted

More information

Lecture 2. 1 Introduction. 2 The Set Cover Problem. COMPSCI 632: Approximation Algorithms August 30, 2017

Lecture 2. 1 Introduction. 2 The Set Cover Problem. COMPSCI 632: Approximation Algorithms August 30, 2017 COMPSCI 632: Approximation Algorithms August 30, 2017 Lecturer: Debmalya Panigrahi Lecture 2 Scribe: Nat Kell 1 Introduction In this lecture, we examine a variety of problems for which we give greedy approximation

More information

Clustering. Unsupervised Learning

Clustering. Unsupervised Learning Clustering. Unsupervised Learning Maria-Florina Balcan 11/05/2018 Clustering, Informal Goals Goal: Automatically partition unlabeled data into groups of similar datapoints. Question: When and why would

More information

Social Data Exploration

Social Data Exploration Social Data Exploration Sihem Amer-Yahia DR CNRS @ LIG Sihem.Amer-Yahia@imag.fr Big Data & Optimization Workshop 12ème Séminaire POC LIP6 Dec 5 th, 2014 Collaborative data model User space (with attributes)

More information

Tracking Frequent Items Dynamically: What s Hot and What s Not To appear in PODS 2003

Tracking Frequent Items Dynamically: What s Hot and What s Not To appear in PODS 2003 Tracking Frequent Items Dynamically: What s Hot and What s Not To appear in PODS 2003 Graham Cormode graham@dimacs.rutgers.edu dimacs.rutgers.edu/~graham S. Muthukrishnan muthu@cs.rutgers.edu Everyday

More information

A Simple Augmentation Method for Matchings with Applications to Streaming Algorithms

A Simple Augmentation Method for Matchings with Applications to Streaming Algorithms A Simple Augmentation Method for Matchings with Applications to Streaming Algorithms MFCS 2018 Christian Konrad 27.08.2018 Streaming Algorithms sequential access random access Streaming (1996 -) Objective:

More information

1 Overview. 2 Applications of submodular maximization. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Applications of submodular maximization. AM 221: Advanced Optimization Spring 2016 AM : Advanced Optimization Spring 06 Prof. Yaron Singer Lecture 0 April th Overview Last time we saw the problem of Combinatorial Auctions and framed it as a submodular maximization problem under a partition

More information

Graphs: Introduction. Ali Shokoufandeh, Department of Computer Science, Drexel University

Graphs: Introduction. Ali Shokoufandeh, Department of Computer Science, Drexel University Graphs: Introduction Ali Shokoufandeh, Department of Computer Science, Drexel University Overview of this talk Introduction: Notations and Definitions Graphs and Modeling Algorithmic Graph Theory and Combinatorial

More information

Multi-label Classification. Jingzhou Liu Dec

Multi-label Classification. Jingzhou Liu Dec Multi-label Classification Jingzhou Liu Dec. 6 2016 Introduction Multi-class problem, Training data (x $, y $ ) ( ), x $ X R., y $ Y = 1,2,, L Learn a mapping f: X Y Each instance x $ is associated with

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford

More information

Clustering. (Part 2)

Clustering. (Part 2) Clustering (Part 2) 1 k-means clustering 2 General Observations on k-means clustering In essence, k-means clustering aims at minimizing cluster variance. It is typically used in Euclidean spaces and works

More information

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams

Mining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06

More information

Algorithms for Grid Graphs in the MapReduce Model

Algorithms for Grid Graphs in the MapReduce Model University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Computer Science and Engineering: Theses, Dissertations, and Student Research Computer Science and Engineering, Department

More information

Massive Data Analysis

Massive Data Analysis Professor, Department of Electrical and Computer Engineering Tennessee Technological University February 25, 2015 Big Data This talk is based on the report [1]. The growth of big data is changing that

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu /2/8 Jure Leskovec, Stanford CS246: Mining Massive Datasets 2 Task: Given a large number (N in the millions or

More information

On the Approximability of Modularity Clustering

On the Approximability of Modularity Clustering On the Approximability of Modularity Clustering Newman s Community Finding Approach for Social Nets Bhaskar DasGupta Department of Computer Science University of Illinois at Chicago Chicago, IL 60607,

More information

Approximation Algorithms

Approximation Algorithms 15-251: Great Ideas in Theoretical Computer Science Spring 2019, Lecture 14 March 5, 2019 Approximation Algorithms 1 2 SAT 3SAT Clique Hamiltonian- Cycle given a Boolean formula F, is it satisfiable? same,

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

Stanford University CS261: Optimization Handout 1 Luca Trevisan January 4, 2011

Stanford University CS261: Optimization Handout 1 Luca Trevisan January 4, 2011 Stanford University CS261: Optimization Handout 1 Luca Trevisan January 4, 2011 Lecture 1 In which we describe what this course is about and give two simple examples of approximation algorithms 1 Overview

More information

On Modularity Clustering. Group III (Ying Xuan, Swati Gambhir & Ravi Tiwari)

On Modularity Clustering. Group III (Ying Xuan, Swati Gambhir & Ravi Tiwari) On Modularity Clustering Presented by: Presented by: Group III (Ying Xuan, Swati Gambhir & Ravi Tiwari) Modularity A quality index for clustering a graph G=(V,E) G=(VE) q( C): EC ( ) EC ( ) + ECC (, ')

More information

Models of distributed computing: port numbering and local algorithms

Models of distributed computing: port numbering and local algorithms Models of distributed computing: port numbering and local algorithms Jukka Suomela Adaptive Computing Group Helsinki Institute for Information Technology HIIT University of Helsinki FMT seminar, 26 February

More information

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity

More information

Outline. CS38 Introduction to Algorithms. Approximation Algorithms. Optimization Problems. Set Cover. Set cover 5/29/2014. coping with intractibility

Outline. CS38 Introduction to Algorithms. Approximation Algorithms. Optimization Problems. Set Cover. Set cover 5/29/2014. coping with intractibility Outline CS38 Introduction to Algorithms Lecture 18 May 29, 2014 coping with intractibility approximation algorithms set cover TSP center selection randomness in algorithms May 29, 2014 CS38 Lecture 18

More information

Computational and Communication Complexity in Massively Parallel Computation

Computational and Communication Complexity in Massively Parallel Computation Computational and Communication Complexity in Massively Parallel Computation Grigory Yaroslavtsev (Indiana University, Bloomington) http://grigory.us + S space Cluster Computation (a la BSP) Input: size

More information

Hardness of Approximation for the TSP. Michael Lampis LAMSADE Université Paris Dauphine

Hardness of Approximation for the TSP. Michael Lampis LAMSADE Université Paris Dauphine Hardness of Approximation for the TSP Michael Lampis LAMSADE Université Paris Dauphine Sep 2, 2015 Overview Hardness of Approximation What is it? How to do it? (Easy) Examples The PCP Theorem What is it?

More information

CS224W: Analysis of Networks Jure Leskovec, Stanford University

CS224W: Analysis of Networks Jure Leskovec, Stanford University HW2 Q1.1 parts (b) and (c) cancelled. HW3 released. It is long. Start early. CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/26/17 Jure Leskovec, Stanford

More information

Theorem 2.9: nearest addition algorithm

Theorem 2.9: nearest addition algorithm There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used

More information

IEOR E4008: Computational Discrete Optimization

IEOR E4008: Computational Discrete Optimization Yuri Faenza IEOR Department Jan 23th, 2018 Logistics Instructor: Yuri Faenza Assistant Professor @ IEOR from 2016 Research area: Discrete Optimization Schedule: MW, 10:10-11:25 Room: 303 Mudd Office Hours:

More information

Polynomial-Time Approximation Algorithms

Polynomial-Time Approximation Algorithms 6.854 Advanced Algorithms Lecture 20: 10/27/2006 Lecturer: David Karger Scribes: Matt Doherty, John Nham, Sergiy Sidenko, David Schultz Polynomial-Time Approximation Algorithms NP-hard problems are a vast

More information

Online Ad Allocation: Theory and Practice

Online Ad Allocation: Theory and Practice Online Ad Allocation: Theory and Practice Vahab Mirrokni December 19, 2014 Based on recent papers in collaboration with my colleagues at Google, Columbia Univ., Cornell, MIT, and Stanford. [S. Balseiro,

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank

More information

6 Randomized rounding of semidefinite programs

6 Randomized rounding of semidefinite programs 6 Randomized rounding of semidefinite programs We now turn to a new tool which gives substantially improved performance guarantees for some problems We now show how nonlinear programming relaxations can

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS46: Mining Massive Datasets Jure Leskovec, Stanford University http://cs46.stanford.edu /7/ Jure Leskovec, Stanford C46: Mining Massive Datasets Many real-world problems Web Search and Text Mining Billions

More information

Database and Knowledge-Base Systems: Data Mining. Martin Ester

Database and Knowledge-Base Systems: Data Mining. Martin Ester Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro

More information

Maximizing the Spread of Influence through a Social Network. David Kempe, Jon Kleinberg and Eva Tardos

Maximizing the Spread of Influence through a Social Network. David Kempe, Jon Kleinberg and Eva Tardos Maximizing the Spread of Influence through a Social Network David Kempe, Jon Kleinberg and Eva Tardos Group 9 Lauren Thomas, Ryan Lieblein, Joshua Hammock and Mary Hanvey Introduction In a social network,

More information

Diffusion and Clustering on Large Graphs

Diffusion and Clustering on Large Graphs Diffusion and Clustering on Large Graphs Alexander Tsiatas Thesis Proposal / Advancement Exam 8 December 2011 Introduction Graphs are omnipresent in the real world both natural and man-made Examples of

More information

Finding Similar Sets. Applications Shingling Minhashing Locality-Sensitive Hashing

Finding Similar Sets. Applications Shingling Minhashing Locality-Sensitive Hashing Finding Similar Sets Applications Shingling Minhashing Locality-Sensitive Hashing Goals Many Web-mining problems can be expressed as finding similar sets:. Pages with similar words, e.g., for classification

More information

Maximum Betweenness Centrality: Approximability and Tractable Cases

Maximum Betweenness Centrality: Approximability and Tractable Cases Maximum Betweenness Centrality: Approximability and Tractable Cases Martin Fink and Joachim Spoerhase Chair of Computer Science I University of Würzburg {martin.a.fink, joachim.spoerhase}@uni-wuerzburg.de

More information

Developing MapReduce Programs

Developing MapReduce Programs Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2017/18 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours

More information

1 Case study of SVM (Rob)

1 Case study of SVM (Rob) DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

More information

CSE 202: Design and Analysis of Algorithms Lecture 4

CSE 202: Design and Analysis of Algorithms Lecture 4 CSE 202: Design and Analysis of Algorithms Lecture 4 Instructor: Kamalika Chaudhuri Announcements HW 1 due in class on Tue Jan 24 Email me your homework partner name, or if you need a partner today Greedy

More information

MaxCover in MapReduce Flavio Chierichetti, Ravi Kumar, Andrew Tomkins

MaxCover in MapReduce Flavio Chierichetti, Ravi Kumar, Andrew Tomkins MaxCover in MapReduce Flavio Chierichetti, Ravi Kumar, Andrew Tomkins Advisor Klaus Berberich Presented By: Isha Khosla Outline Motivation Introduction Classical Approach: Greedy Proposed Algorithm: M

More information

Approximation-stability and proxy objectives

Approximation-stability and proxy objectives Harnessing implicit assumptions in problem formulations: Approximation-stability and proxy objectives Avrim Blum Carnegie Mellon University Based on work joint with Pranjal Awasthi, Nina Balcan, Anupam

More information

Homomorphic Sketches Shrinking Big Data without Sacrificing Structure. Andrew McGregor University of Massachusetts

Homomorphic Sketches Shrinking Big Data without Sacrificing Structure. Andrew McGregor University of Massachusetts Homomorphic Sketches Shrinking Big Data without Sacrificing Structure Andrew McGregor University of Massachusetts 4Mv 2 2 32 3 2 3 2 3 4 M 5 3 5 = v 6 7 4 5 = 4Mv5 = 4Mv5 Sketches: Encode data as vector;

More information

Part I Part II Part III Part IV Part V. Influence Maximization

Part I Part II Part III Part IV Part V. Influence Maximization Part I Part II Part III Part IV Part V Influence Maximization 1 Word-of-mouth (WoM) effect in social networks xphone is good xphone is good xphone is good xphone is good xphone is good xphone is good xphone

More information

Locality- Sensitive Hashing Random Projections for NN Search

Locality- Sensitive Hashing Random Projections for NN Search Case Study 2: Document Retrieval Locality- Sensitive Hashing Random Projections for NN Search Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 18, 2017 Sham Kakade

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 22.1 Introduction We spent the last two lectures proving that for certain problems, we can

More information

Network Wide Policy Enforcement. Michael K. Reiter (joint work with V. Sekar, R. Krishnaswamy, A. Gupta)

Network Wide Policy Enforcement. Michael K. Reiter (joint work with V. Sekar, R. Krishnaswamy, A. Gupta) Network Wide Policy Enforcement Michael K. Reiter (joint work with V. Sekar, R. Krishnaswamy, A. Gupta) 1 Enforcing Policy in Future Networks MF vision includes enforcement of rich policies in the network

More information

Introduction to Graph Theory

Introduction to Graph Theory Introduction to Graph Theory Tandy Warnow January 20, 2017 Graphs Tandy Warnow Graphs A graph G = (V, E) is an object that contains a vertex set V and an edge set E. We also write V (G) to denote the vertex

More information

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection CSE 255 Lecture 6 Data Mining and Predictive Analytics Community Detection Dimensionality reduction Goal: take high-dimensional data, and describe it compactly using a small number of dimensions Assumption:

More information

SOLVING LARGE CARPOOLING PROBLEMS USING GRAPH THEORETIC TOOLS

SOLVING LARGE CARPOOLING PROBLEMS USING GRAPH THEORETIC TOOLS July, 2014 1 SOLVING LARGE CARPOOLING PROBLEMS USING GRAPH THEORETIC TOOLS Irith Ben-Arroyo Hartman Datasim project - (joint work with Abed Abu dbai, Elad Cohen, Daniel Keren) University of Haifa, Israel

More information

Predictive Indexing for Fast Search

Predictive Indexing for Fast Search Predictive Indexing for Fast Search Sharad Goel, John Langford and Alex Strehl Yahoo! Research, New York Modern Massive Data Sets (MMDS) June 25, 2008 Goel, Langford & Strehl (Yahoo! Research) Predictive

More information

Proteins, Particles, and Pseudo-Max- Marginals: A Submodular Approach Jason Pacheco Erik Sudderth

Proteins, Particles, and Pseudo-Max- Marginals: A Submodular Approach Jason Pacheco Erik Sudderth Proteins, Particles, and Pseudo-Max- Marginals: A Submodular Approach Jason Pacheco Erik Sudderth Department of Computer Science Brown University, Providence RI Protein Side Chain Prediction Estimate side

More information

arxiv: v2 [cs.ds] 14 Sep 2018

arxiv: v2 [cs.ds] 14 Sep 2018 Massively Parallel Dynamic Programming on Trees MohammadHossein Bateni Soheil Behnezhad Mahsa Derakhshan MohammadTaghi Hajiaghayi Vahab Mirrokni arxiv:1809.03685v2 [cs.ds] 14 Sep 2018 Abstract Dynamic

More information

Community Detection. Community

Community Detection. Community Community Detection Community In social sciences: Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group a.k.a. group,

More information

On the Max Coloring Problem

On the Max Coloring Problem On the Max Coloring Problem Leah Epstein Asaf Levin May 22, 2010 Abstract We consider max coloring on hereditary graph classes. The problem is defined as follows. Given a graph G = (V, E) and positive

More information

Submodular Utility Maximization for Deadline Constrained Data Collection in Sensor Networks

Submodular Utility Maximization for Deadline Constrained Data Collection in Sensor Networks Submodular Utility Maximization for Deadline Constrained Data Collection in Sensor Networks Zizhan Zheng, Member, IEEE and Ness B. Shroff, Fellow, IEEE Abstract We study the utility maximization problem

More information

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin September 20, 2012 1 / 1 Lecture 2 We continue where we left off last lecture, namely we are considering a PTAS for the the knapsack

More information

Non-exhaustive, Overlapping k-means

Non-exhaustive, Overlapping k-means Non-exhaustive, Overlapping k-means J. J. Whang, I. S. Dhilon, and D. F. Gleich Teresa Lebair University of Maryland, Baltimore County October 29th, 2015 Teresa Lebair UMBC 1/38 Outline Introduction NEO-K-Means

More information

From Routing to Traffic Engineering

From Routing to Traffic Engineering 1 From Routing to Traffic Engineering Robert Soulé Advanced Networking Fall 2016 2 In the beginning B Goal: pair-wise connectivity (get packets from A to B) Approach: configure static rules in routers

More information

CIS 399: Foundations of Data Science

CIS 399: Foundations of Data Science CIS 399: Foundations of Data Science Massively Parallel Algorithms Grigory Yaroslavtsev Warren Center for Network and Data Sciences http://grigory.us Big Data = buzzword Non-experts, media: a lot of spreadsheets,

More information

Adaptive Caching Algorithms with Optimality Guarantees for NDN Networks. Stratis Ioannidis and Edmund Yeh

Adaptive Caching Algorithms with Optimality Guarantees for NDN Networks. Stratis Ioannidis and Edmund Yeh Adaptive Caching Algorithms with Optimality Guarantees for NDN Networks Stratis Ioannidis and Edmund Yeh A Caching Network Nodes in the network store content items (e.g., files, file chunks) 1 A Caching

More information

Nearest Neighbor with KD Trees

Nearest Neighbor with KD Trees Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest

More information

Algorithms for Nearest Neighbors

Algorithms for Nearest Neighbors Algorithms for Nearest Neighbors Classic Ideas, New Ideas Yury Lifshits Steklov Institute of Mathematics at St.Petersburg http://logic.pdmi.ras.ru/~yura University of Toronto, July 2007 1 / 39 Outline

More information

Comp Online Algorithms

Comp Online Algorithms Comp 7720 - Online Algorithms Notes 4: Bin Packing Shahin Kamalli University of Manitoba - Fall 208 December, 208 Introduction Bin packing is one of the fundamental problems in theory of computer science.

More information