Approximation Algorithms for NP-Hard Clustering Problems
|
|
- Garry Neal Eaton
- 6 years ago
- Views:
Transcription
1 Approximation Algorithms for NP-Hard Clustering Problems Ramgopal R. Mettu Dissertation Advisor: Greg Plaxton Department of Computer Science University of Texas at Austin
2 What Is Clustering? The goal of clustering is to partition n weighted points into a small number of coherent groups. Clustering algorithms can be used to: organize (e.g., document collections) analyze (e.g., data mining) manage (e.g., networks) 2
3 Measuring Cluster Quality The cost of a set of cluster centers is the sum, over all points, of the weighted distance from each point to the closest center or median. 3
4 The Problems We Study The facility location problem asks us to identify a set of cluster centers that minimize associated penalties as well as cost. The k-median problem asks us to identify k cluster centers that minimize cost. The online median problem asks us to identify one cluster center at a time, while ensuring at every step that we have a low cost set of cluster centers. 4
5 Talk Outline Introduction Summary of Results k-median Successive Sampling Algorithm Online Median Hierarchically Greedy Strategy Experimental Work Conclusion 5
6 Assumptions We assume the input points are drawn from a metric space. A metric distance function is symmetric, nonnegative, and satisfies the triangle inequality. Two well-known metrics: Euclidean distance, shortest-paths distance. 6
7 Hardness Results For arbitrary distances, it is NP-hard to approximate to a factor of o(log n). For arbitrary metric spaces, it is NP-hard to compute solutions with cost less than a certain constant factor times optimal. For example, it is NP-hard to obtain a solution for k- median with cost less than times optimal [JMS 02]. 7
8 Standard Approaches The k-means heuristic is widely used due to its simplicity and speed. Given an initial solution, the k-means heuristic utilizes an O(nk)-time iterative improvement step. There are no useful guarantees on solution quality. 8
9 Summary Of Results 1. A randomized constant-factor approximation algorithm for the k-median problem that runs in Θ(nk) time for log n <= k <= n/log 2 n. 2. A Θ(n 2 )-time constant-factor approximation algorithm for the online median problem. 3. A greedy Θ(n 2 )-time constant-factor approximation algorithm for the facility location problem. 4. Analysis of approximate metrics that extends our results to more general objective functions. 9
10 Talk Outline Introduction Summary of Results k-median Successive Sampling Algorithm Online Median Hierarchically Greedy Strategy Experimental Work Conclusion 10
11 Previous Work The k-median problem has been studied widely in Operations Research [FM 90]. The first constant-factor approximation algorithm for the k-median problem is due to Charikar et al. [CGTS 99], based on LP-rounding. The fastest deterministic algorithm is O(n 2 ) [MP 00], the best constant is due to [AGKMMP 01]. 11
12 Previous Work The first randomized algorithm was due to [Indyk 99]. His algorithm runs in O(nk polylog(n)) time but produces O(k) medians. [Thorup 01] gives an algorithm for the graph version of k-median. 12
13 Uniform Weights k-median Algorithm Our algorithm works in two phases: 1. Use successive sampling to rapidly identify O(k log(n/k)) points with cost within a constant factor of optimal. 2. Construct a small problem instance from the sampled points and use an existing k-median algorithm. 13
14 Successive Sampling Let near(x, Y) be the nearest half of the points in Y from X. near(x, Y) 14
15 Successive Sampling U 0 := U, i := 0 While U i > 0 do: S i := 3k/2 random samples from U i U i+1 := U i - near(s i, U i ) i := i+1 return S = union(s i ) near(s 2, U 2 ) Let k=2: near(s 1, U 1 ) near(s 0, U 0 ) 15
16 Successive Sampling Bounds Theorem: With high probability, cost(s) is within a constant factor of the optimal k-median solution cost. Running Time: For the case of uniform weights, our successive sampling algorithm runs in O(n(k+log n)) time. 16
17 Second Phase Collapse the points and apply a k-median algorithm to the resulting weighted problem instance The output of the second phase is within a constant factor of optimal [GMMO 00], and can be computed quickly. 17
18 Upper Bounds Theorem: With high probability, our k-median algorithm produces a solution with cost within a constant factor of optimal. Running Time: O(n(k+log n) if we use our online median algorithm for the second phase. 18
19 Our Arbitrary Weights k-median Algorithm The uniform weights algorithm can be used as a subroutine: Divide the points into power-of-2 weight classes Run the uniform weights algorithm on each weight class Apply the approach of the second phase to obtain k points 19
20 Upper Bounds Theorem: With high probability, our k-median algorithm produces a solution with cost within a constant factor of optimal. Running Time: O(n(k+log n)+k 2 log 2 n) 20
21 k-median Lower Bound For log n k n/log 2 n, our upper bound of O(n(k+log n)+k 2 log 2 n)) is tight: Theorem: Any o(nk)-time randomized constant-factor approximation algorithm for the k-median problem has a negligible success probability. [GMMO 00] gives a deterministic lower bound. 21
22 Talk Outline Introduction Summary of Results k-median Successive Sampling Algorithm Online Median Hierarchically Greedy Strategy Experimental Work Conclusion 22
23 The Online Median Problem What if we wish to compute a set of cluster centers, but we don t know k? 23
24 The Online Median Problem The goal of the online median problem is to identify an ordering of the points such that, over all i, the i-median cost of the prefix of length i is minimized. Is there always an ordering of the points such that, for all i, the cost of the prefix of length i is within a constant factor of optimal? 24
25 A Natural Greedy Approach Idea: Find successive points in the ordering greedily optimal solution for k=3 1 2 For k=3, the optimal solution has cost 0. But greedy has cost 1, so the approximation ratio is 1/0! 25 1
26 A Hierarchically Greedy Approach Balance global and local decisions by considering the metric space at varying levels of granularity. Instead of making a single greedy choice, make a sequence of greedy choices that are increasingly local. 26
27 Definitions Let ball(x, r) denote {y in d(x, y) r}. b a x r 2r r/3 y C=ball(y, r/3) c B=ball(x, r)={x,a,b,c} C is a child of B if radius(b)=3 radius(c) and d(x, y) 2r. 27
28 Definitions Let value(b)= y in B (r-d(y, center(b))) w(y). 2 b a x 2 r s z s/15 isolated(z, {x,a,b,c})=ball(z, s/15) c 1 value(b)=r+5 Let isolated(x, Y)=ball(x, d(x, Y)/15). 28
29 Hierarchically Greedy Algorithm Let Z denote the points in the ordering so far. B := maximum value isolated(x, Z) over all x While B has > 1 child do: B := maximum value child of B return center(b) as the next point in the ordering 29
30 Approximation Ratio and Running Time Theorem: The hierarchically greedy strategy produces an ordering such that every prefix has cost within a constant factor of optimal. Running Time: Our online median algorithm can be implemented in O(n 2 ) time. This is optimal by the k-median lower bound. 30
31 Talk Outline Introduction Summary of Results k-median Successive Sampling Algorithm Online Median Hierarchically Greedy Strategy Experimental Work Conclusion 31
32 Algorithm Implementations We implemented our uniform-weights k-median and online median algorithms in Java (version 1.3.1). We also implemented the k-means heuristic with a centroid-based initialization procedure. Common data structures took 542 loc; k-median took 726 loc, online median took 800 loc, and k-means took 218 loc. 32
33 Goals of Experiments Our algorithms have provably good solutions, are simple, and are asymptotically fast. How do they compare in practice to heuristics in terms of speed and solution quality? 33
34 Experiments with Gaussians We generated synthetic inputs consisting of k d-dimensional Gaussians. We tested: k-means with centroid-based initialization k-means with k-median initialization We varied n to test scalability, and varied d to test the effect of dimensionality. 34
35 Scalability Results Solution Costs Solution Cost 60,000 50,000 40,000 30,000 20,000 10,000 0 k-means k-median+k-means n 35
36 Scalability Results Running Times Running Time (seconds) k-means k-median+k-means n 36
37 A Real-World Application We applied our online median implementation to particle selection in electron microscopy images. Electron microscopy images typically have a low signal-to-noise ratio and are thus hard to interpret by inspection. We compare our results with those of [YB 02]. 37
38 Input to Online Median A weighted 2D point set is obtained by thresholding the electron microscopy image [YB 02]. source image 376 weighted points 38
39 Methodology Our online median algorithm proceeds by choosing one cluster center at a time. We chose the appropriate number of clusters for a given data set interactively. Note that heuristics exist for choosing the number of clusters in a data set. 39
40 Comparison of Results [YB 02] Online Median We obtained comparable results on the other inputs. 40
41 Talk Outline Introduction Summary of Results k-median Successive Sampling Algorithm Online Median Hierarchically Greedy Strategy Experimental Work Conclusion 41
42 Directions for Future Work Can the approximation constants be improved? Can the hierarchically greedy strategy be applied to other location problems (e.g., cooperative caching in a metric space)? Is our successive sampling algorithm useful for other problems? 42
43 Extra Slides
44 With high probability Let A be an algorithm that runs in time T(n). A succeeds with high probability if, given c>0, A can be made to succeed with probability 1-n -c while maintaining a running time of O(T(n)). 44
45 K-Clustering Lower Bound For the same objective function, is the problem of just partitioning the points into k sets considerably simpler? Theorem: Any randomized constant-factor approximation algorithm for the k-clustering problem, with even a negligible success probability, requires Ω(nk) time. 45
46 K-Median Lower Bound Think of k equidistant groups, each containing n/k unit-weight points. We show that no algorithm (even randomized) can distinguish between two groups without looking at at least nk distances. Any algorithm that cannot make this distinction cannot be constant-factor approximate. 46
47 The Approximation Constants For our online median algorithm, we can show that the approximation constant is around 27. For our k-median algorithm, the approximation factor depends on a number of complicated statistical arguments. 47
Optimal Time Bounds for Approximate Clustering
Optimal Time Bounds for Approximate Clustering Ramgopal R. Mettu C. Greg Plaxton Department of Computer Science University of Texas at Austin Austin, TX 78712, U.S.A. ramgopal, plaxton@cs.utexas.edu Abstract
More informationApproximation Algorithms for Clustering Uncertain Data
Approximation Algorithms for Clustering Uncertain Data Graham Cormode AT&T Labs - Research graham@research.att.com Andrew McGregor UCSD / MSR / UMass Amherst andrewm@ucsd.edu Introduction Many applications
More informationThe Online Median Problem
The Online Median Problem Ramgopal R. Mettu C. Greg Plaxton November 1999 Abstract We introduce a natural variant of the (metric uncapacitated) -median problem that we call the online median problem. Whereas
More informationFast Clustering using MapReduce
Fast Clustering using MapReduce Alina Ene Sungjin Im Benjamin Moseley September 6, 2011 Abstract Clustering problems have numerous applications and are becoming more challenging as the size of the data
More informationOnline Facility Location
Online Facility Location Adam Meyerson Abstract We consider the online variant of facility location, in which demand points arrive one at a time and we must maintain a set of facilities to service these
More informationApproximation Algorithms
Chapter 8 Approximation Algorithms Algorithm Theory WS 2016/17 Fabian Kuhn Approximation Algorithms Optimization appears everywhere in computer science We have seen many examples, e.g.: scheduling jobs
More informationOverview. 1 Preliminaries. 2 k-center Clustering. 3 The Greedy Clustering Algorithm. 4 The Greedy permutation. 5 k-median clustering
KCenter K-Center Isuru Gunasekara University of Ottawa aguna100@uottawa.ca November 21,2016 Overview KCenter 1 2 3 The Algorithm 4 The permutation 5 6 for What is? KCenter The process of finding interesting
More informationPolynomial-Time Approximation Algorithms
6.854 Advanced Algorithms Lecture 20: 10/27/2006 Lecturer: David Karger Scribes: Matt Doherty, John Nham, Sergiy Sidenko, David Schultz Polynomial-Time Approximation Algorithms NP-hard problems are a vast
More informationFast Hierarchical Clustering via Dynamic Closest Pairs
Fast Hierarchical Clustering via Dynamic Closest Pairs David Eppstein Dept. Information and Computer Science Univ. of California, Irvine http://www.ics.uci.edu/ eppstein/ 1 My Interest In Clustering What
More information/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18
601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 22.1 Introduction We spent the last two lectures proving that for certain problems, we can
More informationIntroduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14
600.363 Introduction to Algorithms / 600.463 Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 23.1 Introduction We spent last week proving that for certain problems,
More informationClustering: Centroid-Based Partitioning
Clustering: Centroid-Based Partitioning Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong 1 / 29 Y Tao Clustering: Centroid-Based Partitioning In this lecture, we
More informationAn experimental evaluation of incremental and hierarchical k-median algorithms
An experimental evaluation of incremental and hierarchical k-median algorithms Chandrashekhar Nagarajan David P. Williamson January 2, 20 Abstract In this paper, we consider different incremental and hierarchical
More informationClustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity
More informationParallel Algorithms K means Clustering
CSE 633: Parallel Algorithms Spring 2014 Parallel Algorithms K means Clustering Final Results By: Andreina Uzcategui Outline The problem Algorithm Description Parallel Algorithm Implementation(MPI) Test
More informationCompetitive analysis of aggregate max in windowed streaming. July 9, 2009
Competitive analysis of aggregate max in windowed streaming Elias Koutsoupias University of Athens Luca Becchetti University of Rome July 9, 2009 The streaming model Streaming A stream is a sequence of
More informationDocument Clustering: Comparison of Similarity Measures
Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation
More informationToday. Lecture 4: Last time. The EM algorithm. We examine clustering in a little more detail; we went over it a somewhat quickly last time
Today Lecture 4: We examine clustering in a little more detail; we went over it a somewhat quickly last time The CAD data will return and give us an opportunity to work with curves (!) We then examine
More informationTheorem 2.9: nearest addition algorithm
There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used
More informationOutline. CS38 Introduction to Algorithms. Approximation Algorithms. Optimization Problems. Set Cover. Set cover 5/29/2014. coping with intractibility
Outline CS38 Introduction to Algorithms Lecture 18 May 29, 2014 coping with intractibility approximation algorithms set cover TSP center selection randomness in algorithms May 29, 2014 CS38 Lecture 18
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationHard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering
An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other
More informationClustering. (Part 2)
Clustering (Part 2) 1 k-means clustering 2 General Observations on k-means clustering In essence, k-means clustering aims at minimizing cluster variance. It is typically used in Euclidean spaces and works
More informationHigh Dimensional Indexing by Clustering
Yufei Tao ITEE University of Queensland Recall that, our discussion so far has assumed that the dimensionality d is moderately high, such that it can be regarded as a constant. This means that d should
More informationCSE 202: Design and Analysis of Algorithms Lecture 4
CSE 202: Design and Analysis of Algorithms Lecture 4 Instructor: Kamalika Chaudhuri Announcements HW 1 due in class on Tue Jan 24 Email me your homework partner name, or if you need a partner today Greedy
More information6 Randomized rounding of semidefinite programs
6 Randomized rounding of semidefinite programs We now turn to a new tool which gives substantially improved performance guarantees for some problems We now show how nonlinear programming relaxations can
More informationCS264: Homework #4. Due by midnight on Wednesday, October 22, 2014
CS264: Homework #4 Due by midnight on Wednesday, October 22, 2014 Instructions: (1) Form a group of 1-3 students. You should turn in only one write-up for your entire group. (2) Turn in your solutions
More informationClustering Part 4 DBSCAN
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationBalanced Trees Part Two
Balanced Trees Part Two Outline for Today Recap from Last Time Review of B-trees, 2-3-4 trees, and red/black trees. Order Statistic Trees BSTs with indexing. Augmented Binary Search Trees Building new
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationVertex Cover Approximations
CS124 Lecture 20 Heuristics can be useful in practice, but sometimes we would like to have guarantees. Approximation algorithms give guarantees. It is worth keeping in mind that sometimes approximation
More informationL9: Hierarchical Clustering
L9: Hierarchical Clustering This marks the beginning of the clustering section. The basic idea is to take a set X of items and somehow partition X into subsets, so each subset has similar items. Obviously,
More informationNotes for Lecture 24
U.C. Berkeley CS170: Intro to CS Theory Handout N24 Professor Luca Trevisan December 4, 2001 Notes for Lecture 24 1 Some NP-complete Numerical Problems 1.1 Subset Sum The Subset Sum problem is defined
More informationCompact Data Representations and their Applications. Moses Charikar Princeton University
Compact Data Representations and their Applications Moses Charikar Princeton University Lots and lots of data AT&T Information about who calls whom What information can be got from this data? Network router
More informationTask Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval
Case Study 2: Document Retrieval Task Description: Finding Similar Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 11, 2017 Sham Kakade 2017 1 Document
More informationGeometric data structures:
Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other
More informationTree-Weighted Neighbors and Geometric k Smallest Spanning Trees
Tree-Weighted Neighbors and Geometric k Smallest Spanning Trees David Eppstein Department of Information and Computer Science University of California, Irvine, CA 92717 Tech. Report 92-77 July 7, 1992
More informationMotivation. Technical Background
Handling Outliers through Agglomerative Clustering with Full Model Maximum Likelihood Estimation, with Application to Flow Cytometry Mark Gordon, Justin Li, Kevin Matzen, Bryce Wiedenbeck Motivation Clustering
More informationCS 373: Combinatorial Algorithms, Spring 1999
CS 373: Combinatorial Algorithms, Spring 1999 Final Exam (May 7, 1999) Name: Net ID: Alias: This is a closed-book, closed-notes exam! If you brought anything with you besides writing instruments and your
More informationCS270 Combinatorial Algorithms & Data Structures Spring Lecture 19:
CS270 Combinatorial Algorithms & Data Structures Spring 2003 Lecture 19: 4.1.03 Lecturer: Satish Rao Scribes: Kevin Lacker and Bill Kramer Disclaimer: These notes have not been subjected to the usual scrutiny
More informationSocial-Network Graphs
Social-Network Graphs Mining Social Networks Facebook, Google+, Twitter Email Networks, Collaboration Networks Identify communities Similar to clustering Communities usually overlap Identify similarities
More informationCOMP 355 Advanced Algorithms
COMP 355 Advanced Algorithms Algorithms for MSTs Sections 4.5 (KT) 1 Minimum Spanning Tree Minimum spanning tree. Given a connected graph G = (V, E) with realvalued edge weights c e, an MST is a subset
More informationCPSC 340: Machine Learning and Data Mining. Density-Based Clustering Fall 2016
CPSC 340: Machine Learning and Data Mining Density-Based Clustering Fall 2016 Assignment 1 : Admin 2 late days to hand it in before Wednesday s class. 3 late days to hand it in before Friday s class. 0
More informationRandomized Algorithms 2017A - Lecture 10 Metric Embeddings into Random Trees
Randomized Algorithms 2017A - Lecture 10 Metric Embeddings into Random Trees Lior Kamma 1 Introduction Embeddings and Distortion An embedding of a metric space (X, d X ) into a metric space (Y, d Y ) is
More informationClustering. Unsupervised Learning
Clustering. Unsupervised Learning Maria-Florina Balcan 11/05/2018 Clustering, Informal Goals Goal: Automatically partition unlabeled data into groups of similar datapoints. Question: When and why would
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning
More informationLocality- Sensitive Hashing Random Projections for NN Search
Case Study 2: Document Retrieval Locality- Sensitive Hashing Random Projections for NN Search Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 18, 2017 Sham Kakade
More informationPeer-to-Peer Networks Pastry & Tapestry 4th Week
Peer-to-Peer Networks Pastry & Tapestry 4th Week Department of Computer Science 1 Peer-to-Peer Networks Pastry 2 2 Pastry Peter Druschel Rice University, Houston, Texas now head of Max-Planck-Institute
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 4
Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised
More informationNon-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions
Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551,
More informationMining Data Streams. Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction. Summarization Methods. Clustering Data Streams
Mining Data Streams Outline [Garofalakis, Gehrke & Rastogi 2002] Introduction Summarization Methods Clustering Data Streams Data Stream Classification Temporal Models CMPT 843, SFU, Martin Ester, 1-06
More informationAlgorithms and Data Structures
Algorithms and Data Structures Spring 2019 Alexis Maciel Department of Computer Science Clarkson University Copyright c 2019 Alexis Maciel ii Contents 1 Analysis of Algorithms 1 1.1 Introduction.................................
More informationDepartment of Computer Science
Yale University Department of Computer Science Pass-Efficient Algorithms for Facility Location Kevin L. Chang Department of Computer Science Yale University kchang@cs.yale.edu YALEU/DCS/TR-1337 Supported
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationClustering: K-means and Kernel K-means
Clustering: K-means and Kernel K-means Piyush Rai Machine Learning (CS771A) Aug 31, 2016 Machine Learning (CS771A) Clustering: K-means and Kernel K-means 1 Clustering Usually an unsupervised learning problem
More informationCSE 202: Design and Analysis of Algorithms Lecture 4
CSE 202: Design and Analysis of Algorithms Lecture 4 Instructor: Kamalika Chaudhuri Greedy Algorithms Direct argument - MST Exchange argument - Caching Greedy approximation algorithms Greedy Approximation
More informationTopic: Local Search: Max-Cut, Facility Location Date: 2/13/2007
CS880: Approximations Algorithms Scribe: Chi Man Liu Lecturer: Shuchi Chawla Topic: Local Search: Max-Cut, Facility Location Date: 2/3/2007 In previous lectures we saw how dynamic programming could be
More informationLecture 2 The k-means clustering problem
CSE 29: Unsupervised learning Spring 2008 Lecture 2 The -means clustering problem 2. The -means cost function Last time we saw the -center problem, in which the input is a set S of data points and the
More informationModels of distributed computing: port numbering and local algorithms
Models of distributed computing: port numbering and local algorithms Jukka Suomela Adaptive Computing Group Helsinki Institute for Information Technology HIIT University of Helsinki FMT seminar, 26 February
More informationOn-line Steiner Trees in the Euclidean Plane
On-line Steiner Trees in the Euclidean Plane Noga Alon Yossi Azar Abstract Suppose we are given a sequence of n points in the Euclidean plane, and our objective is to construct, on-line, a connected graph
More informationMinimizing the Diameter of a Network using Shortcut Edges
Minimizing the Diameter of a Network using Shortcut Edges Erik D. Demaine and Morteza Zadimoghaddam MIT Computer Science and Artificial Intelligence Laboratory 32 Vassar St., Cambridge, MA 02139, USA {edemaine,morteza}@mit.edu
More informationarxiv: v1 [cs.ma] 8 May 2018
Ordinal Approximation for Social Choice, Matching, and Facility Location Problems given Candidate Positions Elliot Anshelevich and Wennan Zhu arxiv:1805.03103v1 [cs.ma] 8 May 2018 May 9, 2018 Abstract
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationDistributed Balanced Clustering via Mapping Coresets
Distributed Balanced Clustering via Mapping Coresets MohammadHossein Bateni Aditya Bhaskara Silvio Lattanzi Vahab Mirrokni Google Research NYC Abstract Large-scale clustering of data points in metric spaces
More informationApproximation Algorithms
Approximation Algorithms Subhash Suri June 5, 2018 1 Figure of Merit: Performance Ratio Suppose we are working on an optimization problem in which each potential solution has a positive cost, and we want
More informationCS261: A Second Course in Algorithms Lecture #16: The Traveling Salesman Problem
CS61: A Second Course in Algorithms Lecture #16: The Traveling Salesman Problem Tim Roughgarden February 5, 016 1 The Traveling Salesman Problem (TSP) In this lecture we study a famous computational problem,
More informationCS 580: Algorithm Design and Analysis. Jeremiah Blocki Purdue University Spring 2018
CS 580: Algorithm Design and Analysis Jeremiah Blocki Purdue University Spring 2018 Chapter 11 Approximation Algorithms Slides by Kevin Wayne. Copyright @ 2005 Pearson-Addison Wesley. All rights reserved.
More informationShortest Path Problem
Shortest Path Problem CLRS Chapters 24.1 3, 24.5, 25.2 Shortest path problem Shortest path problem (and variants) Properties of shortest paths Algorithmic framework Bellman-Ford algorithm Shortest paths
More informationClustering Algorithms. Margareta Ackerman
Clustering Algorithms Margareta Ackerman A sea of algorithms As we discussed last class, there are MANY clustering algorithms, and new ones are proposed all the time. They are very different from each
More informationApproximation Algorithms
Approximation Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours
More information(Refer Slide Time: 1:27)
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 1 Introduction to Data Structures and Algorithms Welcome to data
More informationApproximability Results for the p-center Problem
Approximability Results for the p-center Problem Stefan Buettcher Course Project Algorithm Design and Analysis Prof. Timothy Chan University of Waterloo, Spring 2004 The p-center
More information35 Approximation Algorithms
35 Approximation Algorithms Many problems of practical significance are NP-complete, yet they are too important to abandon merely because we don t know how to find an optimal solution in polynomial time.
More informationClustering. K-means clustering
Clustering K-means clustering Clustering Motivation: Identify clusters of data points in a multidimensional space, i.e. partition the data set {x 1,...,x N } into K clusters. Intuition: A cluster is a
More informationHighway Dimension and Provably Efficient Shortest Paths Algorithms
Highway Dimension and Provably Efficient Shortest Paths Algorithms Andrew V. Goldberg Microsoft Research Silicon Valley www.research.microsoft.com/ goldberg/ Joint with Ittai Abraham, Amos Fiat, and Renato
More informationBased on Raymond J. Mooney s slides
Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit
More informationGreedy Approximations
CS 787: Advanced Algorithms Instructor: Dieter van Melkebeek Greedy Approximations Approximation algorithms give a solution to a problem in polynomial time, at most a given factor away from the correct
More informationThe Design of Approximation Algorithms
The Design of Approximation Algorithms David P. Williamson Cornell University David B. Shmoys Cornell University m Щ0 CAMBRIDGE UNIVERSITY PRESS Contents Preface page ix I An Introduction to the Techniques
More informationLecture 7: Asymmetric K-Center
Advanced Approximation Algorithms (CMU 18-854B, Spring 008) Lecture 7: Asymmetric K-Center February 5, 007 Lecturer: Anupam Gupta Scribe: Jeremiah Blocki In this lecture, we will consider the K-center
More informationCOMP Analysis of Algorithms & Data Structures
COMP 3170 - Analysis of Algorithms & Data Structures Shahin Kamali Approximation Algorithms CLRS 35.1-35.5 University of Manitoba COMP 3170 - Analysis of Algorithms & Data Structures 1 / 30 Approaching
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationStanford University CS261: Optimization Handout 1 Luca Trevisan January 4, 2011
Stanford University CS261: Optimization Handout 1 Luca Trevisan January 4, 2011 Lecture 1 In which we describe what this course is about and give two simple examples of approximation algorithms 1 Overview
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #14: Clustering Seoul National University 1 In This Lecture Learn the motivation, applications, and goal of clustering Understand the basic methods of clustering (bottom-up
More informationarxiv: v1 [cs.cg] 8 Jan 2018
Voronoi Diagrams for a Moderate-Sized Point-Set in a Simple Polygon Eunjin Oh Hee-Kap Ahn arxiv:1801.02292v1 [cs.cg] 8 Jan 2018 Abstract Given a set of sites in a simple polygon, a geodesic Voronoi diagram
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 15: Microarray clustering http://compbio.pbworks.com/f/wood2.gif Some slides were adapted from Dr. Shaojie Zhang (University of Central Florida) Microarray
More informationLecture 7: Decision Trees
Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...
More informationCoping with NP-Completeness
Coping with NP-Completeness Siddhartha Sen Questions: sssix@cs.princeton.edu Some figures obtained from Introduction to Algorithms, nd ed., by CLRS Coping with intractability Many NPC problems are important
More information1 The range query problem
CS268: Geometric Algorithms Handout #12 Design and Analysis Original Handout #12 Stanford University Thursday, 19 May 1994 Original Lecture #12: Thursday, May 19, 1994 Topics: Range Searching with Partition
More informationLecture 10: Semantic Segmentation and Clustering
Lecture 10: Semantic Segmentation and Clustering Vineet Kosaraju, Davy Ragland, Adrien Truong, Effie Nehoran, Maneekwan Toyungyernsub Department of Computer Science Stanford University Stanford, CA 94305
More informationDesign and Analysis of Algorithms
CSE 101, Winter 2018 Design and Analysis of Algorithms Lecture 9: Minimum Spanning Trees Class URL: http://vlsicad.ucsd.edu/courses/cse101-w18/ Goal: MST cut and cycle properties Prim, Kruskal greedy algorithms
More informationTowards the world s fastest k-means algorithm
Greg Hamerly Associate Professor Computer Science Department Baylor University Joint work with Jonathan Drake May 15, 2014 Objective function and optimization Lloyd s algorithm 1 The k-means clustering
More informationClustering. Unsupervised Learning
Clustering. Unsupervised Learning Maria-Florina Balcan 03/02/2016 Clustering, Informal Goals Goal: Automatically partition unlabeled data into groups of similar datapoints. Question: When and why would
More informationChapter 4: Non-Parametric Techniques
Chapter 4: Non-Parametric Techniques Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Supervised Learning How to fit a density
More information10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2
161 Machine Learning Hierarchical clustering Reading: Bishop: 9-9.2 Second half: Overview Clustering - Hierarchical, semi-supervised learning Graphical models - Bayesian networks, HMMs, Reasoning under
More informationClustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2
So far in the course Clustering Subhransu Maji : Machine Learning 2 April 2015 7 April 2015 Supervised learning: learning with a teacher You had training data which was (feature, label) pairs and the goal
More informationChapter 4. Greedy Algorithms. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.
Chapter 4 Greedy Algorithms Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. 1 4.5 Minimum Spanning Tree Minimum Spanning Tree Minimum spanning tree. Given a connected
More informationIntroduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Shortest Paths Date: 10/13/15
600.363 Introduction to Algorithms / 600.463 Algorithms I Lecturer: Michael Dinitz Topic: Shortest Paths Date: 10/13/15 14.1 Introduction Today we re going to talk about algorithms for computing shortest
More informationThe Touring Polygons Problem (TPP)
The Touring Polygons Problem (TPP) [Dror-Efrat-Lubiw-M]: Given a sequence of k polygons in the plane, a start point s, and a target point, t, we seek a shortest path that starts at s, visits in order each
More information