Similarity Estimation Techniques from Rounding Algorithms. Moses Charikar Princeton University

Size: px
Start display at page:

Download "Similarity Estimation Techniques from Rounding Algorithms. Moses Charikar Princeton University"

Transcription

1 Similarity Estimation Techniques from Rounding Algorithms Moses Charikar Princeton University 1

2 Compact sketches for estimating similarity Collection of objects, e.g. mathematical representation of documents, images. Implicit similarity/distance function. Want to estimate similarity without looking at entire objects. Compute compact sketches of objects so that similarity/distance can be estimated from them. 2

3 Similarity Preserving Hashing Similarity function sim(x,y) Family of hash functions F with probability distribution such that Pr [ hx ( ) = hy ( )] = simxy (, ) h F 3

4 Applications Compact representation scheme for estimating similarity x ( h( x), h ( x),, h ( x)) 1 2 y ( h ( y), h ( y),, h ( y)) 1 2 Approximate nearest neighbor search [Indyk,Motwani] [Kushilevitz,Ostrovsky,Rabani]] k k 4

5 Estimating Set Similarity [Broder,Manasse,Glassman,Zweig] [Broder,C,Frieze,Mitzenmacher] Collection of subsets S 1 S 2 S S 1 2 similarity = S1 S2 5

6 Minwise Independent Permutations S 1 min( σ( S )) σ 1 S 1 S 2 S 2 σ min( σ( S )) 2 prob(min( σ( S ) = min( σ( S )) = 1 2 S1 S2 S S 1 2 6

7 Streaming algorithms Related Work Compute f(data) in one pass using small space. Implicitly construct sketch of data seen so far. Synopsis data structures [Gibbons,Matias Matias] Compact distance oracles, distance labels. Hash functions with similar properties: [Linial,Sassoon] [Indyk,Motwani,Raghavan,Vempala] [Feige, Krauthgamer] 7

8 Results Necessary conditions for existence of similarity preserving hashing (SPH). SPH schemes from rounding algorithms Hash function for vectors based on random hyperplane rounding. Hash function for estimating Earth Mover Distance based on rounding schemes for classification with pairwise relationships. 8

9 Existence of SPH schemes sim(x,y) admits an SPH scheme if family of hash functions F such that Pr [ hx ( ) = hy ( )] = simxy (, ) h F 9

10 Theorem: : If sim(x,y) admits an SPH scheme then 1-sim(x,y) satisfies triangle inequality. Proof: 1 sim( x, y) = Pr ( h( x) h( y)) h h F ( x, y): indicator variable for hx ( ) hy ( ) ( xy, ) + ( yz, ) ( xz, ) h h h 1 sim( x, y) = E [ ( x, y)] h F h 10

11 Stronger Condition Theorem: : If sim(x,y) admits an SPH scheme then (1+sim sim(x,y) )/2 has an SPH scheme with hash functions mapping objects to {0,1}. Theorem: : If sim(x,y) admits an SPH scheme then 1-sim(x,y) is isometrically embeddable in the Hamming cube. 11

12 Random Hyperplane Rounding based SPH Collection of vectors sim( u, v) = 1 ( uv, ) π Pick random hyperplane through origin (normal r ) 1 if r u 0 h r ( u) = 0 if r u< 0 [Goemans,Williamson] 12

13 Earth Mover Distance (EMD) P Q EMD(P,Q) 13

14 Earth Mover Distance Set of points L={l 1,l 2, l n } Distance function d(i,j) (assume metric) Distribution P(L) : non-negative negative weights (p 1,p 2, p n ). Earth Mover Distance (EMD( EMD): distance between distributions P and Q. Proposed as metric in graphics and vision for distance between images. [Rubner,Tomasi,Guibas] 14

15 min f di (, j) i, j j i i, j i f = i, j i, j i, j f 0 i, j p j f = q i j 15

16 Relaxation of SPH Estimate distance measure, not similarity measure in [0,1]. Allow hash functions to map objects to points in metric space and measure E[d(h(P),h(Q) d(h(p),h(q)]. (SPH: d(x,y) = 1 if x y) Estimator will approximate EMD. 16

17 Classification with pairwise relationships [Kleinberg,Tardos] Assignment cost separation cost w e 17

18 Classification with pairwise relationships Collection of objects V Labels L={l 1,l 2, l n } Assignment of labels h : V LV Cost of assigning label to u : c(u,h(u)) Graph of related objects; for edge e=(u,v), cost paid: w e.d(h(u),h(v)) Find assignment of labels to minimize cost. 18

19 LP Relaxation and Rounding [Kleinberg,Tardos] [Chekuri,Khanna,Naor,Zosin] P Q Separation cost measured by EMD(P,Q) Rounding algorithm guarantees Pr[h(P)= h(p)=l ] i = p i E[d(h(P),h(Q) d(h(p),h(q)] O(log n log log n) EMD(P,Q) 19

20 Rounding details Probabilistically approximate metric on L by tree metric (HST) Expected distortion O(log n log log n) EMD on tree metric has nice form: T: subtree P(T): sum of probabilities for leaves in T l T : length of edge leading up from T EMD(P,Q) = l T P(T)-Q(T) 20

21 Theorem: : The rounding scheme gives a hashing scheme such that EMD(P,Q) E[d(h(P),h(Q)] O(log n log log n) EMD(P,Q) Proof:, y : Probability that h( P) = l, h( Q) = l y i j i j i, j give feasible solution to LP for EMD Cost of this solution = E[ dhp ( ( ), hq ( )] Hence EMD( P, Q) E[ d( h( P), h( Q)] 21

22 SPH for weighted sets Weighted Set: (p 1,p 2, p n ), weights in [0,1] Kleinberg-Tardos rounding scheme for uniform metric can be thought of as a hashing scheme for weighted sets with sim( P, Q) = min( p, q ) max( p, q ) i i i i Generalization of minwise independent permutations 22

23 Conclusions and Future Work Interesting connection between rounding procedures for approximation algorithms and hash functions for estimating similarity. Better estimators for Earth Mover Distance Ignored variance of estimators: related to dimensionality reduction in L 1 Study compact representation schemes in general 23

Compact Data Representations and their Applications. Moses Charikar Princeton University

Compact Data Representations and their Applications. Moses Charikar Princeton University Compact Data Representations and their Applications Moses Charikar Princeton University Lots and lots of data AT&T Information about who calls whom What information can be got from this data? Network router

More information

Metric Techniques and Approximation Algorithms. Anupam Gupta Carnegie Mellon University

Metric Techniques and Approximation Algorithms. Anupam Gupta Carnegie Mellon University Metric Techniques and Approximation Algorithms Anupam Gupta Carnegie Mellon University Metric space M = (V, d) set Vof points y z distances d(x,y) triangle inequality d(x,y) d(x,z) + d(z,y) x why metric

More information

CS 340 Lec. 4: K-Nearest Neighbors

CS 340 Lec. 4: K-Nearest Neighbors CS 340 Lec. 4: K-Nearest Neighbors AD January 2011 AD () CS 340 Lec. 4: K-Nearest Neighbors January 2011 1 / 23 K-Nearest Neighbors Introduction Choice of Metric Overfitting and Underfitting Selection

More information

1 The Traveling Salesperson Problem (TSP)

1 The Traveling Salesperson Problem (TSP) CS 598CSC: Approximation Algorithms Lecture date: January 23, 2009 Instructor: Chandra Chekuri Scribe: Sungjin Im In the previous lecture, we had a quick overview of several basic aspects of approximation

More information

6 Randomized rounding of semidefinite programs

6 Randomized rounding of semidefinite programs 6 Randomized rounding of semidefinite programs We now turn to a new tool which gives substantially improved performance guarantees for some problems We now show how nonlinear programming relaxations can

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu /2/8 Jure Leskovec, Stanford CS246: Mining Massive Datasets 2 Task: Given a large number (N in the millions or

More information

Locality- Sensitive Hashing Random Projections for NN Search

Locality- Sensitive Hashing Random Projections for NN Search Case Study 2: Document Retrieval Locality- Sensitive Hashing Random Projections for NN Search Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 18, 2017 Sham Kakade

More information

Course : Data mining

Course : Data mining Course : Data mining Lecture : Mining data streams Aristides Gionis Department of Computer Science Aalto University visiting in Sapienza University of Rome fall 2016 reading assignment LRU book: chapter

More information

Algorithms for Nearest Neighbors

Algorithms for Nearest Neighbors Algorithms for Nearest Neighbors Classic Ideas, New Ideas Yury Lifshits Steklov Institute of Mathematics at St.Petersburg http://logic.pdmi.ras.ru/~yura University of Toronto, July 2007 1 / 39 Outline

More information

CS675: Convex and Combinatorial Optimization Spring 2018 Consequences of the Ellipsoid Algorithm. Instructor: Shaddin Dughmi

CS675: Convex and Combinatorial Optimization Spring 2018 Consequences of the Ellipsoid Algorithm. Instructor: Shaddin Dughmi CS675: Convex and Combinatorial Optimization Spring 2018 Consequences of the Ellipsoid Algorithm Instructor: Shaddin Dughmi Outline 1 Recapping the Ellipsoid Method 2 Complexity of Convex Optimization

More information

CS369G: Algorithmic Techniques for Big Data Spring

CS369G: Algorithmic Techniques for Big Data Spring CS369G: Algorithmic Techniques for Big Data Spring 2015-2016 Lecture 11: l 0 -Sampling and Introduction to Graph Streaming Prof. Moses Charikar Scribe: Austin Benson 1 Overview We present and analyze the

More information

Probabilistic embedding into trees: definitions and applications. Fall 2011 Lecture 4

Probabilistic embedding into trees: definitions and applications. Fall 2011 Lecture 4 Probabilistic embedding into trees: definitions and applications. Fall 2011 Lecture 4 Instructor: Mohammad T. Hajiaghayi Scribe: Anshul Sawant September 21, 2011 1 Overview Some problems which are hard

More information

Randomized Algorithms 2017A - Lecture 10 Metric Embeddings into Random Trees

Randomized Algorithms 2017A - Lecture 10 Metric Embeddings into Random Trees Randomized Algorithms 2017A - Lecture 10 Metric Embeddings into Random Trees Lior Kamma 1 Introduction Embeddings and Distortion An embedding of a metric space (X, d X ) into a metric space (Y, d Y ) is

More information

Lecture 6: Linear Programming for Sparsest Cut

Lecture 6: Linear Programming for Sparsest Cut Lecture 6: Linear Programming for Sparsest Cut Sparsest Cut and SOS The SOS hierarchy captures the algorithms for sparsest cut, but they were discovered directly without thinking about SOS (and this is

More information

Algorithms design under a geometric lens Spring 2014, CSE, OSU Lecture 1: Introduction

Algorithms design under a geometric lens Spring 2014, CSE, OSU Lecture 1: Introduction 5339 - Algorithms design under a geometric lens Spring 2014, CSE, OSU Lecture 1: Introduction Instructor: Anastasios Sidiropoulos January 8, 2014 Geometry & algorithms Geometry in algorithm design Computational

More information

6.842 Randomness and Computation September 25-27, Lecture 6 & 7. Definition 1 Interactive Proof Systems (IPS) [Goldwasser, Micali, Rackoff]

6.842 Randomness and Computation September 25-27, Lecture 6 & 7. Definition 1 Interactive Proof Systems (IPS) [Goldwasser, Micali, Rackoff] 6.84 Randomness and Computation September 5-7, 017 Lecture 6 & 7 Lecturer: Ronitt Rubinfeld Scribe: Leo de Castro & Kritkorn Karntikoon 1 Interactive Proof Systems An interactive proof system is a protocol

More information

11.1 Facility Location

11.1 Facility Location CS787: Advanced Algorithms Scribe: Amanda Burton, Leah Kluegel Lecturer: Shuchi Chawla Topic: Facility Location ctd., Linear Programming Date: October 8, 2007 Today we conclude the discussion of local

More information

Lecture 9. Semidefinite programming is linear programming where variables are entries in a positive semidefinite matrix.

Lecture 9. Semidefinite programming is linear programming where variables are entries in a positive semidefinite matrix. CSE525: Randomized Algorithms and Probabilistic Analysis Lecture 9 Lecturer: Anna Karlin Scribe: Sonya Alexandrova and Keith Jia 1 Introduction to semidefinite programming Semidefinite programming is linear

More information

High Dimensional Clustering

High Dimensional Clustering Distributed Computing High Dimensional Clustering Bachelor Thesis Alain Ryser aryser@ethz.ch Distributed Computing Group Computer Engineering and Networks Laboratory ETH Zürich Supervisors: Zeta Avarikioti,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Clustering Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 / 19 Outline

More information

3 No-Wait Job Shops with Variable Processing Times

3 No-Wait Job Shops with Variable Processing Times 3 No-Wait Job Shops with Variable Processing Times In this chapter we assume that, on top of the classical no-wait job shop setting, we are given a set of processing times for each operation. We may select

More information

Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost

Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R

More information

Parallel Algorithms for Geometric Graph Problems Grigory Yaroslavtsev

Parallel Algorithms for Geometric Graph Problems Grigory Yaroslavtsev Parallel Algorithms for Geometric Graph Problems Grigory Yaroslavtsev http://grigory.us Appeared in STOC 2014, joint work with Alexandr Andoni, Krzysztof Onak and Aleksandar Nikolov. The Big Data Theory

More information

Lecture 5 Finding meaningful clusters in data. 5.1 Kleinberg s axiomatic framework for clustering

Lecture 5 Finding meaningful clusters in data. 5.1 Kleinberg s axiomatic framework for clustering CSE 291: Unsupervised learning Spring 2008 Lecture 5 Finding meaningful clusters in data So far we ve been in the vector quantization mindset, where we want to approximate a data set by a small number

More information

Invariant shape similarity. Invariant shape similarity. Invariant similarity. Equivalence. Equivalence. Equivalence. Equal SIMILARITY TRANSFORMATION

Invariant shape similarity. Invariant shape similarity. Invariant similarity. Equivalence. Equivalence. Equivalence. Equal SIMILARITY TRANSFORMATION 1 Invariant shape similarity Alexer & Michael Bronstein, 2006-2009 Michael Bronstein, 2010 tosca.cs.technion.ac.il/book 2 Invariant shape similarity 048921 Advanced topics in vision Processing Analysis

More information

Hashing with real numbers and their big-data applications

Hashing with real numbers and their big-data applications Chapter 4 Hashing with real numbers and their big-data applications Using only memory equivalent to 5 lines of printed text, you can estimate with a typical accuracy of 5 per cent and in a single pass

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 22.1 Introduction We spent the last two lectures proving that for certain problems, we can

More information

Synthesis of 2-level Logic Heuristic Method. Two Approaches

Synthesis of 2-level Logic Heuristic Method. Two Approaches Synthesis of 2-level Logic Heuristic Method Lecture 8 Exact Two Approaches Find all primes Find a complete sum Find a minimum cover (covering problem) Heuristic Take an initial cover of cubes Repeat Expand

More information

Based on Raymond J. Mooney s slides

Based on Raymond J. Mooney s slides Instance Based Learning Based on Raymond J. Mooney s slides University of Texas at Austin 1 Example 2 Instance-Based Learning Unlike other learning algorithms, does not involve construction of an explicit

More information

Approximation Algorithms for Clustering Uncertain Data

Approximation Algorithms for Clustering Uncertain Data Approximation Algorithms for Clustering Uncertain Data Graham Cormode AT&T Labs - Research graham@research.att.com Andrew McGregor UCSD / MSR / UMass Amherst andrewm@ucsd.edu Introduction Many applications

More information

A random triadic process

A random triadic process A random triadic process Dániel Korándi 1 Department of Mathematics ETH Zurich Zurich, Switzerland Yuval Peled 2 School of Computer Science and Engineering The Hebrew University of Jerusalem Jerusalem,

More information

Extensions of submodularity and their application in computer vision

Extensions of submodularity and their application in computer vision Extensions of submodularity and their application in computer vision Vladimir Kolmogorov IST Austria Oxford, 20 January 2014 Linear Programming relaxation Popular approach: Basic LP relaxation (BLP) -

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms Frédéric Giroire FG Simplex 1/11 Motivation Goal: Find good solutions for difficult problems (NP-hard). Be able to quantify the goodness of the given solution. Presentation of

More information

15-854: Approximations Algorithms Lecturer: Anupam Gupta Topic: Direct Rounding of LP Relaxations Date: 10/31/2005 Scribe: Varun Gupta

15-854: Approximations Algorithms Lecturer: Anupam Gupta Topic: Direct Rounding of LP Relaxations Date: 10/31/2005 Scribe: Varun Gupta 15-854: Approximations Algorithms Lecturer: Anupam Gupta Topic: Direct Rounding of LP Relaxations Date: 10/31/2005 Scribe: Varun Gupta 15.1 Introduction In the last lecture we saw how to formulate optimization

More information

Graph Theory and Optimization Approximation Algorithms

Graph Theory and Optimization Approximation Algorithms Graph Theory and Optimization Approximation Algorithms Nicolas Nisse Université Côte d Azur, Inria, CNRS, I3S, France October 2018 Thank you to F. Giroire for some of the slides N. Nisse Graph Theory and

More information

Lecture 7: Asymmetric K-Center

Lecture 7: Asymmetric K-Center Advanced Approximation Algorithms (CMU 18-854B, Spring 008) Lecture 7: Asymmetric K-Center February 5, 007 Lecturer: Anupam Gupta Scribe: Jeremiah Blocki In this lecture, we will consider the K-center

More information

1 Unweighted Set Cover

1 Unweighted Set Cover Comp 60: Advanced Algorithms Tufts University, Spring 018 Prof. Lenore Cowen Scribe: Yuelin Liu Lecture 7: Approximation Algorithms: Set Cover and Max Cut 1 Unweighted Set Cover 1.1 Formulations There

More information

Lecture 16: Gaps for Max-Cut

Lecture 16: Gaps for Max-Cut Advanced Approximation Algorithms (CMU 8-854B, Spring 008) Lecture 6: Gaps for Max-Cut Mar 6, 008 Lecturer: Ryan O Donnell Scribe: Ravishankar Krishnaswamy Outline In this lecture, we will discuss algorithmic

More information

Coloring 3-Colorable Graphs

Coloring 3-Colorable Graphs Coloring -Colorable Graphs Charles Jin April, 015 1 Introduction Graph coloring in general is an etremely easy-to-understand yet powerful tool. It has wide-ranging applications from register allocation

More information

Geometric data structures:

Geometric data structures: Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other

More information

Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri

Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri Scene Completion Problem The Bare Data Approach High Dimensional Data Many real-world problems Web Search and Text Mining Billions

More information

1 Better Approximation of the Traveling Salesman

1 Better Approximation of the Traveling Salesman Stanford University CS261: Optimization Handout 4 Luca Trevisan January 13, 2011 Lecture 4 In which we describe a 1.5-approximate algorithm for the Metric TSP, we introduce the Set Cover problem, observe

More information

ACO Comprehensive Exam October 12 and 13, Computability, Complexity and Algorithms

ACO Comprehensive Exam October 12 and 13, Computability, Complexity and Algorithms 1. Computability, Complexity and Algorithms Given a simple directed graph G = (V, E), a cycle cover is a set of vertex-disjoint directed cycles that cover all vertices of the graph. 1. Show that there

More information

CSC Linear Programming and Combinatorial Optimization Lecture 12: Semidefinite Programming(SDP) Relaxation

CSC Linear Programming and Combinatorial Optimization Lecture 12: Semidefinite Programming(SDP) Relaxation CSC411 - Linear Programming and Combinatorial Optimization Lecture 1: Semidefinite Programming(SDP) Relaxation Notes taken by Xinwei Gui May 1, 007 Summary: This lecture introduces the semidefinite programming(sdp)

More information

Compact Routing with Slack

Compact Routing with Slack Compact Routing with Slack Michael Dinitz Computer Science Department Carnegie Mellon University ACM Symposium on Principles of Distributed Computing Portland, Oregon August 13, 2007 Routing Routing in

More information

Fast Clustering using MapReduce

Fast Clustering using MapReduce Fast Clustering using MapReduce Alina Ene Sungjin Im Benjamin Moseley September 6, 2011 Abstract Clustering problems have numerous applications and are becoming more challenging as the size of the data

More information

Throughput-Optimal Broadcast in Wireless Networks with Point-to-Multipoint Transmissions

Throughput-Optimal Broadcast in Wireless Networks with Point-to-Multipoint Transmissions Throughput-Optimal Broadcast in Wireless Networks with Point-to-Multipoint Transmissions Abhishek Sinha Laboratory for Information and Decision Systems MIT MobiHoc, 2017 April 18, 2017 1 / 63 Introduction

More information

Sublinear Time and Space Algorithms 2016B Lecture 7 Sublinear-Time Algorithms for Sparse Graphs

Sublinear Time and Space Algorithms 2016B Lecture 7 Sublinear-Time Algorithms for Sparse Graphs Sublinear Time and Space Algorithms 2016B Lecture 7 Sublinear-Time Algorithms for Sparse Graphs Robert Krauthgamer 1 Approximating Average Degree in a Graph Input: A graph represented (say) as the adjacency

More information

Streaming verification of graph problems

Streaming verification of graph problems Streaming verification of graph problems Suresh Venkatasubramanian The University of Utah Joint work with Amirali Abdullah, Samira Daruki and Chitradeep Dutta Roy Outsourcing Computations We no longer

More information

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 14: Combinatorial Problems as Linear Programs I. Instructor: Shaddin Dughmi

CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 14: Combinatorial Problems as Linear Programs I. Instructor: Shaddin Dughmi CS599: Convex and Combinatorial Optimization Fall 2013 Lecture 14: Combinatorial Problems as Linear Programs I Instructor: Shaddin Dughmi Announcements Posted solutions to HW1 Today: Combinatorial problems

More information

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007

Topic: Local Search: Max-Cut, Facility Location Date: 2/13/2007 CS880: Approximations Algorithms Scribe: Chi Man Liu Lecturer: Shuchi Chawla Topic: Local Search: Max-Cut, Facility Location Date: 2/3/2007 In previous lectures we saw how dynamic programming could be

More information

A constant-factor approximation algorithm for the asymmetric travelling salesman problem

A constant-factor approximation algorithm for the asymmetric travelling salesman problem A constant-factor approximation algorithm for the asymmetric travelling salesman problem London School of Economics Joint work with Ola Svensson and Jakub Tarnawski cole Polytechnique F d rale de Lausanne

More information

Approximation slides 1. An optimal polynomial algorithm for the Vertex Cover and matching in Bipartite graphs

Approximation slides 1. An optimal polynomial algorithm for the Vertex Cover and matching in Bipartite graphs Approximation slides 1 An optimal polynomial algorithm for the Vertex Cover and matching in Bipartite graphs Approximation slides 2 Linear independence A collection of row vectors {v T i } are independent

More information

Stable and Multiscale Topological Signatures

Stable and Multiscale Topological Signatures Stable and Multiscale Topological Signatures Mathieu Carrière, Steve Oudot, Maks Ovsjanikov Inria Saclay Geometrica April 21, 2015 1 / 31 Shape = point cloud in R d (d = 3) 2 / 31 Signature = mathematical

More information

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 36

CS 473: Algorithms. Ruta Mehta. Spring University of Illinois, Urbana-Champaign. Ruta (UIUC) CS473 1 Spring / 36 CS 473: Algorithms Ruta Mehta University of Illinois, Urbana-Champaign Spring 2018 Ruta (UIUC) CS473 1 Spring 2018 1 / 36 CS 473: Algorithms, Spring 2018 LP Duality Lecture 20 April 3, 2018 Some of the

More information

Cuts of the Hypercube

Cuts of the Hypercube Cuts of the Hypercube Neil Calkin, Kevin James, J. Bowman Light, Rebecca Myers, Eric Riedl, Veronica July 3, 8 Abstract This paper approaches the hypercube slicing problem from a probabilistic perspective.

More information

Linear Programming in Small Dimensions

Linear Programming in Small Dimensions Linear Programming in Small Dimensions Lekcija 7 sergio.cabello@fmf.uni-lj.si FMF Univerza v Ljubljani Edited from slides by Antoine Vigneron Outline linear programming, motivation and definition one dimensional

More information

1 Minimum Cut Problem

1 Minimum Cut Problem CS 6 Lecture 6 Min Cut and Karger s Algorithm Scribes: Peng Hui How, Virginia Williams (05) Date: November 7, 07 Anthony Kim (06), Mary Wootters (07) Adapted from Virginia Williams lecture notes Minimum

More information

Introduction to Machine Learning Lecture 4. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 4. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 4 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Nearest-Neighbor Algorithms Nearest Neighbor Algorithms Definition: fix k 1, given a labeled

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information

Theorem 2.9: nearest addition algorithm

Theorem 2.9: nearest addition algorithm There are severe limits on our ability to compute near-optimal tours It is NP-complete to decide whether a given undirected =(,)has a Hamiltonian cycle An approximation algorithm for the TSP can be used

More information

Graph Algorithms Matching

Graph Algorithms Matching Chapter 5 Graph Algorithms Matching Algorithm Theory WS 2012/13 Fabian Kuhn Circulation: Demands and Lower Bounds Given: Directed network, with Edge capacities 0and lower bounds l for Node demands for

More information

Local Search Approximation Algorithms for the Complement of the Min-k-Cut Problems

Local Search Approximation Algorithms for the Complement of the Min-k-Cut Problems Local Search Approximation Algorithms for the Complement of the Min-k-Cut Problems Wenxing Zhu, Chuanyin Guo Center for Discrete Mathematics and Theoretical Computer Science, Fuzhou University, Fuzhou

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule. CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit

More information

Data Caching under Number Constraint

Data Caching under Number Constraint 1 Data Caching under Number Constraint Himanshu Gupta and Bin Tang Abstract Caching can significantly improve the efficiency of information access in networks by reducing the access latency and bandwidth

More information

Nearest Neighbor Search by Branch and Bound

Nearest Neighbor Search by Branch and Bound Nearest Neighbor Search by Branch and Bound Algorithmic Problems Around the Web #2 Yury Lifshits http://yury.name CalTech, Fall 07, CS101.2, http://yury.name/algoweb.html 1 / 30 Outline 1 Short Intro to

More information

Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions

Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551,

More information

Lecture 3: Convex sets

Lecture 3: Convex sets Lecture 3: Convex sets Rajat Mittal IIT Kanpur We denote the set of real numbers as R. Most of the time we will be working with space R n and its elements will be called vectors. Remember that a subspace

More information

Lecture 5: Properties of convex sets

Lecture 5: Properties of convex sets Lecture 5: Properties of convex sets Rajat Mittal IIT Kanpur This week we will see properties of convex sets. These properties make convex sets special and are the reason why convex optimization problems

More information

Machine Learning: k-nearest Neighbors. Lecture 08. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Machine Learning: k-nearest Neighbors. Lecture 08. Razvan C. Bunescu School of Electrical Engineering and Computer Science Machine Learning: k-nearest Neighbors Lecture 08 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Nonparametric Methods: k-nearest Neighbors Input: A training dataset

More information

15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015

15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015 15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015 While we have good algorithms for many optimization problems, the previous lecture showed that many

More information

arxiv: v4 [cs.ds] 25 May 2017

arxiv: v4 [cs.ds] 25 May 2017 Efficient Construction of Probabilistic Tree Embeddings Guy Blelloch Carnegie Mellon University guyb@cs.cmu.edu Yan Gu Carnegie Mellon University yan.gu@cs.cmu.edu Yihan Sun Carnegie Mellon University

More information

The Simplex Algorithm for LP, and an Open Problem

The Simplex Algorithm for LP, and an Open Problem The Simplex Algorithm for LP, and an Open Problem Linear Programming: General Formulation Inputs: real-valued m x n matrix A, and vectors c in R n and b in R m Output: n-dimensional vector x There is one

More information

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19 CSE34T/CSE549T /05/04 Lecture 9 Treaps Binary Search Trees (BSTs) Search trees are tree-based data structures that can be used to store and search for items that satisfy a total order. There are many types

More information

Locality-Sensitive Hashing

Locality-Sensitive Hashing Locality-Sensitive Hashing & Image Similarity Search Andrew Wylie Overview; LSH given a query q (or not), how do we find similar items from a large search set quickly? Can t do all pairwise comparisons;

More information

A Soft Clustering Algorithm Based on k-median

A Soft Clustering Algorithm Based on k-median 1 A Soft Clustering Algorithm Based on k-median Ertu grul Kartal Tabak Computer Engineering Dept. Bilkent University Ankara, Turkey 06550 Email: tabak@cs.bilkent.edu.tr Abstract The k-median problem is

More information

Solutions to Assignment# 4

Solutions to Assignment# 4 Solutions to Assignment# 4 Liana Yepremyan 1 Nov.12: Text p. 651 problem 1 Solution: (a) One example is the following. Consider the instance K = 2 and W = {1, 2, 1, 2}. The greedy algorithm would load

More information

An O(log n/ log log n)-approximation Algorithm for the Asymmetric Traveling Salesman Problem

An O(log n/ log log n)-approximation Algorithm for the Asymmetric Traveling Salesman Problem An O(log n/ log log n)-approximation Algorithm for the Asymmetric Traveling Salesman Problem and more recent developments CATS @ UMD April 22, 2016 The Asymmetric Traveling Salesman Problem (ATSP) Problem

More information

Nearest Neighbor with KD Trees

Nearest Neighbor with KD Trees Case Study 2: Document Retrieval Finding Similar Documents Using Nearest Neighbors Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Emily Fox January 22 nd, 2013 1 Nearest

More information

Branch and Bound. Algorithms for Nearest Neighbor Search: Lecture 1. Yury Lifshits

Branch and Bound. Algorithms for Nearest Neighbor Search: Lecture 1. Yury Lifshits Branch and Bound Algorithms for Nearest Neighbor Search: Lecture 1 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology 1 / 36 Outline 1 Welcome

More information

1-Nearest Neighbor Boundary

1-Nearest Neighbor Boundary Linear Models Bankruptcy example R is the ratio of earnings to expenses L is the number of late payments on credit cards over the past year. We would like here to draw a linear separator, and get so a

More information

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013 Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork

More information

B561 Advanced Database Concepts. 6. Streaming Algorithms. Qin Zhang 1-1

B561 Advanced Database Concepts. 6. Streaming Algorithms. Qin Zhang 1-1 B561 Advanced Database Concepts 6. Streaming Algorithms Qin Zhang 1-1 The model and challenge The data stream model (Alon, Matias and Szegedy 1996) a n a 2 a 1 RAM CPU Why hard? Cannot store everything.

More information

Problem Set 3. MATH 778C, Spring 2009, Austin Mohr (with John Boozer) April 15, 2009

Problem Set 3. MATH 778C, Spring 2009, Austin Mohr (with John Boozer) April 15, 2009 Problem Set 3 MATH 778C, Spring 2009, Austin Mohr (with John Boozer) April 15, 2009 1. Show directly that P 1 (s) P 1 (t) for all t s. Proof. Given G, let H s be a subgraph of G on s vertices such that

More information

Fall CS598CC: Approximation Algorithms. Chandra Chekuri

Fall CS598CC: Approximation Algorithms. Chandra Chekuri Fall 2006 CS598CC: Approximation Algorithms Chandra Chekuri Administrivia http://www.cs.uiuc.edu/homes/chekuri/teaching/fall2006/approx.htm Grading: 4 home works (60-70%), 1 take home final (30-40%) Mailing

More information

Introduction to Randomized Algorithms

Introduction to Randomized Algorithms Introduction to Randomized Algorithms Gopinath Mishra Advanced Computing and Microelectronics Unit Indian Statistical Institute Kolkata 700108, India. Organization 1 Introduction 2 Some basic ideas from

More information

Approximation Algorithms: The Primal-Dual Method. My T. Thai

Approximation Algorithms: The Primal-Dual Method. My T. Thai Approximation Algorithms: The Primal-Dual Method My T. Thai 1 Overview of the Primal-Dual Method Consider the following primal program, called P: min st n c j x j j=1 n a ij x j b i j=1 x j 0 Then the

More information

Approximation Algorithms

Approximation Algorithms Approximation Algorithms Group Members: 1. Geng Xue (A0095628R) 2. Cai Jingli (A0095623B) 3. Xing Zhe (A0095644W) 4. Zhu Xiaolu (A0109657W) 5. Wang Zixiao (A0095670X) 6. Jiao Qing (A0095637R) 7. Zhang

More information

Hw 4 Due Feb 22. D(fg) x y z (

Hw 4 Due Feb 22. D(fg) x y z ( Hw 4 Due Feb 22 2.2 Exercise 7,8,10,12,15,18,28,35,36,46 2.3 Exercise 3,11,39,40,47(b) 2.4 Exercise 6,7 Use both the direct method and product rule to calculate where f(x, y, z) = 3x, g(x, y, z) = ( 1

More information

Geometric Computation: Introduction. Piotr Indyk

Geometric Computation: Introduction. Piotr Indyk Geometric Computation: Introduction Piotr Indyk Welcome to 6.850! Overview and goals Course Information Closest pair Signup sheet Geometric Computation Geometric computation occurs everywhere: Robotics:

More information

CSE200: Computability and complexity Interactive proofs

CSE200: Computability and complexity Interactive proofs CSE200: Computability and complexity Interactive proofs Shachar Lovett January 29, 2018 1 What are interactive proofs Think of a prover trying to convince a verifer that a statement is correct. For example,

More information

Approximation Algorithms

Approximation Algorithms Chapter 8 Approximation Algorithms Algorithm Theory WS 2016/17 Fabian Kuhn Approximation Algorithms Optimization appears everywhere in computer science We have seen many examples, e.g.: scheduling jobs

More information

Introduction to optimization

Introduction to optimization Introduction to optimization G. Ferrari Trecate Dipartimento di Ingegneria Industriale e dell Informazione Università degli Studi di Pavia Industrial Automation Ferrari Trecate (DIS) Optimization Industrial

More information

Object Classification Problem

Object Classification Problem HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

Convex Optimization - Chapter 1-2. Xiangru Lian August 28, 2015

Convex Optimization - Chapter 1-2. Xiangru Lian August 28, 2015 Convex Optimization - Chapter 1-2 Xiangru Lian August 28, 2015 1 Mathematical optimization minimize f 0 (x) s.t. f j (x) 0, j=1,,m, (1) x S x. (x 1,,x n ). optimization variable. f 0. R n R. objective

More information

Phylogenetic Trees Lecture 12. Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau

Phylogenetic Trees Lecture 12. Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau Phylogenetic Trees Lecture 12 Section 7.4, in Durbin et al., 6.5 in Setubal et al. Shlomo Moran, Ilan Gronau. Maximum Parsimony. Last week we presented Fitch algorithm for (unweighted) Maximum Parsimony:

More information

Figure 1: An example of a hypercube 1: Given that the source and destination addresses are n-bit vectors, consider the following simple choice of rout

Figure 1: An example of a hypercube 1: Given that the source and destination addresses are n-bit vectors, consider the following simple choice of rout Tail Inequalities Wafi AlBalawi and Ashraf Osman Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV fwafi,osman@csee.wvu.edug 1 Routing in a Parallel Computer

More information

Bounded, Closed, and Compact Sets

Bounded, Closed, and Compact Sets Bounded, Closed, and Compact Sets Definition Let D be a subset of R n. Then D is said to be bounded if there is a number M > 0 such that x < M for all x D. D is closed if it contains all the boundary points.

More information

Measures of Clustering Quality: A Working Set of Axioms for Clustering

Measures of Clustering Quality: A Working Set of Axioms for Clustering Measures of Clustering Quality: A Working Set of Axioms for Clustering Margareta Ackerman and Shai Ben-David School of Computer Science University of Waterloo, Canada Abstract Aiming towards the development

More information