arxiv: v2 [cs.ds] 24 Mar 2018

Similar documents
INTERSECTION CORDIAL LABELING OF GRAPHS

Improved Random Graph Isomorphism

An Efficient Algorithm for Graph Bisection of Triangularizations

An Efficient Algorithm for Graph Bisection of Triangularizations

Counting the Number of Minimum Roman Dominating Functions of a Graph

A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

condition w i B i S maximum u i

Civil Engineering Computation

Relationship between augmented eccentric connectivity index and some other graph invariants

Lecture 1: Introduction and Strassen s Algorithm

1 Graph Sparsfication

Exact Minimum Lower Bound Algorithm for Traveling Salesman Problem

Lecture 6. Lecturer: Ronitt Rubinfeld Scribes: Chen Ziv, Eliav Buchnik, Ophir Arie, Jonathan Gradstein

Analysis Metrics. Intro to Algorithm Analysis. Slides. 12. Alg Analysis. 12. Alg Analysis

CIS 121 Data Structures and Algorithms with Java Fall Big-Oh Notation Tuesday, September 5 (Make-up Friday, September 8)

The isoperimetric problem on the hypercube

2. ALGORITHM ANALYSIS

Criterion in selecting the clustering algorithm in Radial Basis Functional Link Nets

New Results on Energy of Graphs of Small Order

Lecturers: Sanjam Garg and Prasad Raghavendra Feb 21, Midterm 1 Solutions

ANN WHICH COVERS MLP AND RBF

Chapter 3 Classification of FFT Processor Algorithms

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today

Algorithms for Disk Covering Problems with the Most Points

CHAPTER IV: GRAPH THEORY. Section 1: Introduction to Graphs

CIS 121 Data Structures and Algorithms with Java Spring Stacks and Queues Monday, February 12 / Tuesday, February 13

Lecture 2: Spectra of Graphs

Cluster Analysis. Andrew Kusiak Intelligent Systems Laboratory

A Comparative Study of Positive and Negative Factorials

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence

Lecture Notes 6 Introduction to algorithm analysis CSS 501 Data Structures and Object-Oriented Programming

A RELATIONSHIP BETWEEN BOUNDS ON THE SUM OF SQUARES OF DEGREES OF A GRAPH

1.2 Binomial Coefficients and Subsets

FREQUENCY ESTIMATION OF INTERNET PACKET STREAMS WITH LIMITED SPACE: UPPER AND LOWER BOUNDS

Evaluation of Support Vector Machine Kernels for Detecting Network Anomalies

How do we evaluate algorithms?

Recursion. Computer Science S-111 Harvard University David G. Sullivan, Ph.D. Review: Method Frames

COSC 1P03. Ch 7 Recursion. Introduction to Data Structures 8.1

Improving Information Retrieval System Security via an Optimal Maximal Coding Scheme

THE COMPETITION NUMBERS OF JOHNSON GRAPHS

Pseudocode ( 1.1) Analysis of Algorithms. Primitive Operations. Pseudocode Details. Running Time ( 1.1) Estimating performance

Designing a learning system

Combination Labelings Of Graphs

Major CSL Write your name and entry no on every sheet of the answer script. Time 2 Hrs Max Marks 70

Ones Assignment Method for Solving Traveling Salesman Problem

Homework 1 Solutions MA 522 Fall 2017

Sorting in Linear Time. Data Structures and Algorithms Andrei Bulatov

A Generalized Set Theoretic Approach for Time and Space Complexity Analysis of Algorithms and Functions

CSCI 5090/7090- Machine Learning. Spring Mehdi Allahyari Georgia Southern University

A study on Interior Domination in Graphs

Normal Distributions

Thompson s Group F (p + 1) is not Minimally Almost Convex

Classes and Objects. Again: Distance between points within the first quadrant. José Valente de Oliveira 4-1

Redundancy Allocation for Series Parallel Systems with Multiple Constraints and Sensitivity Analysis

CIS 121 Data Structures and Algorithms with Java Spring Stacks, Queues, and Heaps Monday, February 18 / Tuesday, February 19

Pruning and Summarizing the Discovered Time Series Association Rules from Mechanical Sensor Data Qing YANG1,a,*, Shao-Yu WANG1,b, Ting-Ting ZHANG2,c

SECURITY PROOF FOR SHENGBAO WANG S IDENTITY-BASED ENCRYPTION SCHEME

ON THE DEFINITION OF A CLOSE-TO-CONVEX FUNCTION

What are we going to learn? CSC Data Structures Analysis of Algorithms. Overview. Algorithm, and Inputs

Analysis of Algorithms

Our Learning Problem, Again

Journal of Chemical and Pharmaceutical Research, 2013, 5(12): Research Article

Some non-existence results on Leech trees

Protected points in ordered trees

Some cycle and path related strongly -graphs

Σ P(i) ( depth T (K i ) + 1),

CSE 417: Algorithms and Computational Complexity

Big-O Analysis. Asymptotics

Pattern Recognition Systems Lab 1 Least Mean Squares

Alpha Individual Solutions MAΘ National Convention 2013

Algorithms Chapter 3 Growth of Functions

Intro to Scientific Computing: Solutions

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve

Designing a learning system

Sum-connectivity indices of trees and unicyclic graphs of fixed maximum degree

A Parallel DFA Minimization Algorithm

Fuzzy Membership Function Optimization for System Identification Using an Extended Kalman Filter

Image Segmentation EEE 508

Mean cordiality of some snake graphs

Compactness of Fuzzy Sets

. Written in factored form it is easy to see that the roots are 2, 2, i,

Bank-interleaved cache or memory indexing does not require euclidean division

A Kernel Density Based Approach for Large Scale Image Retrieval

Big-O Analysis. Asymptotics

Some New Results on Prime Graphs

Cone Depth and the Center Vertex Theorem

arxiv: v2 [cs.lg] 12 Jan 2018

Recurrent Formulas of the Generalized Fibonacci Sequences of Third & Fourth Order

The Adjacency Matrix and The nth Eigenvalue

BOOLEAN DIFFERENTIATION EQUATIONS APPLICABLE IN RECONFIGURABLE COMPUTATIONAL MEDIUM

DATA STRUCTURES. amortized analysis binomial heaps Fibonacci heaps union-find. Data structures. Appetizer. Appetizer

Data Structures and Algorithms. Analysis of Algorithms

Outline and Reading. Analysis of Algorithms. Running Time. Experimental Studies. Limitations of Experiments. Theoretical Analysis

15-859E: Advanced Algorithms CMU, Spring 2015 Lecture #2: Randomized MST and MST Verification January 14, 2015

3D Model Retrieval Method Based on Sample Prediction

Accuracy Improvement in Camera Calibration

Convergence results for conditional expectations

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

Dynamic Programming and Curve Fitting Based Road Boundary Detection

On Characteristic Polynomial of Directed Divisor Graphs

Transcription:

Similar Elemets ad Metric Labelig o Complete Graphs arxiv:1803.08037v [cs.ds] 4 Mar 018 Pedro F. Felzeszwalb Brow Uiversity Providece, RI, USA pff@brow.edu March 8, 018 We cosider a problem that ivolves fidig similar elemets i a collectio of sets. The problem is motivated by applicatios i machie learig ad patter recogitio (see, e.g. [3]). Ituitively we would like to discover somethig i commo amog a collectio of sets, eve whe the sets have empty itersectio. A solutio ivolves selectig a elemet from each set such that the selected elemets are close to each other uder a appropriate metric. We formulate a optimizatio problem that captures this otio ad give a efficiet approximatio algorithm that fids a solutio withi a factor of of the optimal solutio. The similar elemets problem is a special case of the metric labelig problem defied i [] ad we also give a efficiet -approximatio algorithm for the metric labelig problem o complete graphs. Metric labelig o complete graphs geeralizes the similar elemets problem to iclude costs for selectig elemets i each set. The algorithms described here are similar to the ceter star method for multiple sequece aligmet described i [1]. Beyod producig solutios with good theoretical guaratees, the algorithms described here are also practical. A versio of the algorithm for the similar elemets problem has bee implemeted ad used to fid objects i a collectio of photographs [4]. 1 Similar Elemets Let X be a (possibly ifiite) set ad d be a metric o X. Let S 1,...,S be fiite subsets of X. The goal of the similar elemets problem is to select a elemet from each set S i such that the selected elemets are close to each other uder the metric d. Oe motivatio is for discoverig somethig i commo amog the sets S 1,...,S eve whe they have empty itersectio. We formalize the problem as the imizatio of the sum of pairwise distaces amog selected elemets. Let x = (x 1,...,x ) with x i S i. Defie the similar elemets objective as, c(x) = d(x i,x j ). (1) Let x = arg x c(x) be a optimal solutio for the similar elemets problem. Optimizig c(x) appears to be difficult, but we ca defie easier problems if we igore some of the pairwise distaces i the objective. I particular we defie differet star-graph objective 1

fuctios as follows. For each 1 r defie the objective c r (x) to accout oly for the terms i c(x) ivolvig x r, c r (x) = j rd(x r,x j ). () Let x r = arg x c r (x) be a optimal solutio for the optimizatio problem defied by c r (x). We ca compute x r efficietly usig a simple form of dyamic programg, by first computig x r r ad the computig x r j for j r. x r r = arg x r S r j r d(x r,x j ), (3) x j S j x r j = arg x j S j d(x r r,x j ). (4) Each of the star-graph objective fuctios leads to a possible solutio. We the select from amog the solutios x 1,...,x as follows, ˆr = argc r (x r ), (5) 1 r ˆx = x r. (6) Theorem 1. The algorithm described above fids a -approximate solutio for the similar elemets problem. That is, c(ˆx) c(x ). Proof. First ote that, c(x) = c r (x). Sice the imum of a set of values is at most the average, ad x r imizes c r (x), 1 r cr (x r ) 1 By the triagle iequality we have c(x) = d(x i,x j ) c r (x r ) 1 c r (x ) = 1 c(x ). (d(x i,x r )+d(x r,x j )) = d(x r,x l ) = c r (x). l=1 Therefore c(ˆx) cˆr (ˆx) = 1 r cr (x r ) c(x ). To aalyze the ruig time of the algorithm we assume the distaces d(p,q) betwee pairs of elemets i S = S 1 S are either pre-computed ad give as part of the iput, or they ca each be computed i O(1) time. Let k = max 1 i S i. The first stage of the algorithm ivolves optimizatio problems that ca be solved i O(k ) time each. The secod stage of the algorithm ivolves selectig oe of the solutios, ad takes O( ) time.

Remark. If each of the sets S 1,...,S has size at most k the ruig time of the approximatio algorithm for the similar elemets problem is O( k ). The bottleeck of the algorithm is the evaluatio of the imizatios over x j S j i (3) ad (4). This computatio is equivalet to a earest-eighbor computatio, where we wat to fid a poit from a set S X that is closest to a query poit q X. Whe the earest-eighbor computatio ca be doe efficietly (with a appropriate data structure) the ruig time of the similar elemets approximatio algorithm ca be reduced. Metric Labelig o Complete Graphs Let G = (V,E) be a udirected simple graph o odes V = {1,...,}. Let L be a fiite set of labels with L = k ad d be a metric o L. For i V let m i be a o-egative fuctio mappig labels to real values. The uweighted metric labelig problem o G is to fid a labelig x = (x 1,...,x ) L imizig c(x) = m i (x i )+ d(x i,x j ). (7) i V {i,j} E Let x = arg x c(x). This optimizatio problem ca be solved i polyomial time usig dyamic programg if G is a tree. Here we cosider the case whe G is the complete graph ad give a efficiet -approximatio algorithm based o the solutio of several metric labelig problems o star graphs. For each r V defie a differet objective fuctio, c r (x), correspodig to a metric labelig problem o a star graph with vertex set V rooted at r, c r (x) = i V m i (x i ) + j V\{r} d(x r,x j ). (8) Let x r = arg x c r (x). We ca solve this optimizatio problem i O(k ) time usig a simple form of dyamic programg. First compute a optimal label for the root vertex usig oe step of dyamic programg, x r r = arg x r L m r(x r ) + x j L j V\{r} ( mj (x j ) + d(x ) r,x j ). (9) The compute x r j for j V \{r}, ( ) x r mj (x j ) j = arg + d(xr r,x j ). (10) x j L Optimizig each c r (x) separately leads to possible solutios x 1,...,x, ad we select oe of them as follows, ˆr = argc r (x r ), (11) r V ˆx = x r. (1) 3

Theorem 3. The algorithm described above fids a -approximate solutio for the metric labelig problem o a complete graph. That is, Proof. First ote that, c(ˆx) c(x ). c(x) = c r (x). Sice the imum of a set of values is at most the average, ad x r imizes c r (x), 1 r cr (x r ) 1 c r (x r ) 1 Sice d is a metric ad m i is o-egative, c(x) = i V m i (x i )+ {i,j} E c r (x ) = 1 c(x ). d(x i,x j ) = m i (x i )+ d(x i,x j ) i V (i,j) V m i (x i )+ ( d(xi,x r ) + d(x ) r,x j ) i V (i,j) V = m i (x i )+ d(x r,x l ) i V l V\{r} m i (x i ) + d(x r,x l ) i V = c r (x). l V\{r} Therefore c(ˆx) cˆr (ˆx) = 1 r cr (x r ) c(x ). The first stage of the algorithm ivolves optimizatio problems that ca be solved i O(k ) time each. The secod stage ivolves selectig oe of the solutios, ad takes O( ) time. Remark 4. The ruig time of the approximatio algorithm for the metric labelig problem o complete graphs is O( k ). Ackowledgmets We thak Carolie Klivas, Sarah Sachs, Aa Grim, Robert Kleiberg ad Yag Yua for helpful discussios about the cotets of this report. This material is based upo work supported by the Natioal Sciece Foudatio uder Grat No. 1447413. 4

Refereces [1] Da Gusfield. Efficiet methods for multiple sequece aligmet with guarateed error bouds. Bulleti of Mathematical Biology, 55(1):141 154, 1993. [] Jo Kleiberg ad Eva Tardos. Approximatio algorithms for classificatio problems with pairwise relatioships: Metric labelig ad markov radom fields. Joural of the ACM, 49(5):616 639, 00. [3] Oded Maro ad Apara Lakshmi Rata. Multiple-istace learig for atural scee classificatio. I Iteratioal Coferece o Machie Learig, volume 98, pages 341 349, 1998. [4] Sarah Sachs. Similar-part approximatio usig ivariat feature descriptors. Udergraduate Hoors Thesis, Brow Uiversity, 016. 5