Fast b-matching via Sufficient Selection Belief Propagation

Fast b-matching via Sufficient Seection Beief Propagation Bert Huang Computer Science Department Coumbia University New York, NY 127 bert@cs.coumbia.edu Tony Jebara Computer Science Department Coumbia University New York, NY 127 jebara@cs.coumbia.edu Abstract This artice describes scaabiity enhancements to a previousy estabished beief propagation agorithm that soves bipartite maximum weight b-matching. The previous agorithm required O( V + E ) space and O( V E ) time, whereas we appy improvements to reduce the space to O( V ) and the time to O( V 2.5 ) in the expected case (though worst case time is sti O( V E )). The space improvement is most significant in cases where edge weights are determined by a function of node descriptors, such as a distance or kerne function. In practice, we demonstrate maximum weight b-matchings to be sovabe on graphs with hundreds of miions of edges in ony a few hours of compute time on a modern persona computer without paraeization, whereas neither the memory nor the time requirement of previousy known agorithms woud have aowed graphs of this scae. 1 INTRODUCTION The maximum weight perfect b-matching probem is a generaization of maximum weight matching in which the sover is given a weighted graph and a set of target degrees, and must output the maximum weight induced subgraph such that each node has its target number of neighbors. The probem is sovabe in O( V E ) time with min-cost fow methods (Fremuth- Paeger and Jungnicke, 1999). In probems with dense graphs, the running time for b-matching sovers is Appearing in Proceedings of the 14 th Internationa Conference on Artificia Inteigence and Statistics (AISTATS) 211, Fort Lauderdae, FL, USA. Voume 15 of JMLR: W&CP 15. Copyright 211 by the authors. O(N 3 ), where N = V. Huang and Jebara (27) introduced a beief propagation agorithm which has the same asymptotic running time guarantee O(N 3 ) but is ightweight and has much smaer constant factors on running time than other avaiabe sovers. In modern appications, however, the more obstructive botteneck is the O(N 2 ) space requirement to store messages from each node to each of its candidate neighbors. Whie it is possibe to wait for time-intensive jobs to run, a task that requires too much storage is further burdened by the need for compicated memory swapping strategies. This artice presents an improved agorithm for weighted b-matching that significanty reduces the memory cost and the running time for soving b- matching. Specificay, in probems where the edge weights are determined by a function of node descriptors, the space requirement is reduced to O(N) and the running time can be reduced to O(N 2.5 ) in some cases (but no worse than previous agorithms in adversaria cases). Both improvements are on each iteration of beief propagation, and the resuting agorithm computes the origina beief updates exacty, so any previous anaysis of the number of iterations necessary for convergence remains intact. The memory botteneck is reduced by unroing one eve of recursion in the beief updates such that the expicit beief need never be stored, and the running time improvement is achieved by a variant of the agorithm by McAuey and Caetano (21), in which speedups are avaiabe by decomposing a maximization procedure into the maximization of two components. Reated Work. This artice extends the beief propagation b-matching agorithm first introduced by Huang and Jebara (27), which is proven to converge in O( V ) iterations with a constant depending on the difference between the maximum weight edge and the minimum weight edge as we as the difference between the maximum weight b-matching and the second best b-matching. This agorithm was further anayzed by Sanghavi et a. (27) and Bayati et a. (27), who

Fast b-matching via Sufficient Seection Beief Propagation showed independenty that the agorithm is guaranteed to converge if and ony if the inear programming reaxation of the integer program formuation of b-matching is tight. This resut confirms the previous theorem that the agorithm converges on bipartite probems and further extends guaranteed convergence to some non-bipartite cases. The 1-matching with iid, random weights was further anayzed by Saez and Shah (29), where the surprising resut was proven that the agorithm converges with high probabiity in O(1) iterations and, thus, costs O( V 2 ) time overa, which is optima, as it is equivaent to the time needed to read the input edge weights. In addition to cassica optimization tasks, such as discrete resource aocation, weighted b-matching has been shown to be a usefu too for various machine earning tasks, incuding semi-supervised earning, spectra custering, graph embedding, and manifod earning (Jebara et a., 29; Jebara and Shchogoev, 26; Shaw and Jebara, 27, 29). Weighted b-matching sovers can aso be used as drivers for a maximum a posteriori estimation procedure for graph structure given edge ikeihoods and soft degree priors (Huang and Jebara, 29). The genera formuation aows for concave penaty functions on the degrees of nodes by constructing an augmented graph with auxiiary edges encoding the degree penaties. The augmented graph has at most doube the nodes of the origina graph, so the asymptotic running time of the agorithm is equivaent to the running time of the b-matching sover. For graphs restricted to nonnegative integer weights, the bipartite maximum weight 1-matching probem was shown to be sovabe in O( V E og( V )) time by Gabow and Tarjan (1989). An Õ( V 2.376 ) randomized agorithm which succeeds with high probabiity was reveaed by Sankowski (29). A (1-ǫ) approximation agorithm for nonbipartite maximum weight matching with rea weights was given by Duan and Pettie (21), which runs in O( E ǫ 2 og 3 V ) time. Outine. The remainder of this paper is organized as foows. Section 2 describes the proposed agorithm in detai and provides anaysis. Section 3 describes empirica evauation of the proposed agorithm on synthetic and rea data, incuding comparisons with a state-of-the-art maximum weight matching sover. Finay, Section 4 concudes with a brief discussion. 2 ALGORITHM DESCRIPTION This section describes the proposed agorithm, which is derived from the previous beief propagation approaches for b-matching and incorporates some further improvements to improve scaabiity. First, we provide a forma definition of the probem; then we describe the agorithm. Finay, we provide some anaysis showing the correctness of the enhanced agorithm as we as the speed and space improvements. 2.1 Dense Maximum Weight b-matching The bipartite dense maximum weight perfect b- matching probem (abbreviated as b-matching) is, given a dense, bipartite graph, in which a pairs of points that cross bipartitions have candidate edges and a target degree for each node, to find the maximum weight induced subgraph such that the nodes in the subgraph have their target degrees. Formay, the sover is given node descriptors {x 1,..., x m+n } drawn from space Ω, a weight function W : (Ω, Ω) R, and a set of target degrees {b 1,..., b m+n }, where each b i N. The goa is to output a symmetric, binary adjacency matrix A B (m+n) (m+n) whose entries A ij = 1 for a matched edges (x i, x j ) and are otherwise zero. The optimization can aso be written as argmax A s.t. m m+n i=1 j=m+1 m+n j=1 A ij W(x i, x j ) A ij = b i, i, A ij = A ji, (i, j). In particuar, we consider the bipartite scenario, where edges may ony be matched between nodes {x 1,..., x m } and nodes {x m+1,...,x m+n } but not within each set. This can be impemented with abuse of notation by defining the weight function W to output for any edges within bipartitions. This same probem can be expressed in many other forms, incuding graph notations using node and edge sets, but when considering the dense bipartite form of the probem, it is convenient to use matrix notation. 2.2 Linear Memory b-matching Beief Propagation In this section, we describe the method to reduce memory usage of b-matching via beief propagation to O(N), where the tota number of nodes N = m + n. First, we review the resuts from previous work (Bayati et a., 25; Huang and Jebara, 27; Sanghavi et a., 27) defining a simpified update rue for message updates, which aows for the standard O(N 2 ) space and O(N 2 ) per-iteration running time. A key component of the simpified beief propagation agorithm is the seection operation. This is the operation that finds the k th argest eement of a set for some index k. For notationa convenience, denote the seection operation

Bert Huang, Tony Jebara over any set S as σ k (S) = s S where {t S t s} = k. Beief propagation maintains a beief vaue for each edge, which, in the dense case, is convenienty represented as a matrix B, where entry Bij t is the beief vaue for the edge between x i and x j at iteration t. The simpified update rue for each beief is Bij t = W(x i, x j ) σ bj ({B t 1 k i}). (1) In the above equation and for the remainder of this text, indices range from 1 to (m+n), uness otherwise noted, and are omitted for ceaniness. The key insight for reducing memory usage is that the fu beiefs never need to be stored (not even the compressed messages). Instead, by unroing one eve of recursion, a that need to be stored are the seected beiefs, because the seection operation in Equation (1) ony weaky depends on index i. That is, the seection operation is over a indices excuding i, which means the seected vaue wi be either the b j th or the b j + 1 th greatest eement, σ bj ({B t 1 k i}) {σ bj ({B t 1 k}), σ b j+1({b t 1 k})}. Thus, once each row of the beief matrix B is updated, these two seected vaues can be computed and stored, and the rest of the row can be deeted from memory. Any further reference to B is therefore abstract, as it wi never be fuy stored. Any entry of the beief matrix can be computed in an onine manner from the stored seected vaue. Let α j be the negation of the b j th seection and β j be that of the b j +1 th seection. Then the update rues for these parameters are α t j = σ bj ({B t 1 k}), βt j = σ bj+1({b t 1 k}), (2) and the resuting beief ookup rue is { Bij t α t j if A t ji = W(x i, x j ) + 1 βj t otherwise. After each iteration, the current estimate of A is { A t ij = 1 if B t 1 ij α t i otherwise, (3) which is computed when the α and β vaues are updated in Equation (2). When this estimate is a vaid b-matching, i.e., when the coumns of A ij sum to their target degrees, the agorithm has converged to the soution. The agorithm can be viewed as simpy computing each row of the beief matrix and performing the seections on that row and is summarized in Agorithm 1. Agorithm 1 Beief Propagation for b-matching. Computes the adjacency matrix of the maximum weight b-matching., j 1: α j, β j 2: A [] 3: t 1 4: whie not converged do 5: for a j {1,..., m + n} do 6: A t, k k}) {Agorithm 2} k}) 9: for a {k B t 1 α t j } do 1: A t 1 11: end for 12: end for 13: deete A t 1, α t 1 and β t 1 from memory 14: t t + 1 15: end whie 7: α t j σ b j ({B t 1 8: βj t σ b j+1({b t 1 2.3 Sufficient Seection This section describes the running time enhancement in the proposed agorithm, which is a variation of the faster beief propagation agorithm proposed by McAuey and Caetano (21). The enhancements aim to reduce the running time of each iteration by expoiting the nature of the quantities being seected. In particuar, the key observation is that each beief is a sum of two quantities: a weight and an α or β vaue. These quantities can be sorted in advance, outside of the inner (row-wise) oop of the agorithm, and the seection operation can be performed without searching over the entire row, significanty reducing the amount of work necessary. This is done by testing a stopping criterion that guarantees no further beief ookups are necessary. Some minor difficuties arise, however, when sorting each component, so the agorithm by McAuey and Caetano (21) does not directy appy as-is. First, the weights cannot aways be fuy sorted. In genera, storing fu order information for each weight between a pairs of nodes requires quadratic space, which is impossibe with arger data sets. Thus, the proposed agorithm instead stores a cache of the heaviest weights for each node. In some specia cases, such as when the weights are a function of Eucidean distance, data structures such as kd-trees can be used to impicity store the sorted weights. This construction can provide one possibe variant to our main agorithm. Second, the α-β vaues require carefu sorting, because the true beief updates mosty incude α t terms but a few β t terms. Specificay, the indices that index the greatest b j eements of the row shoud use β t. One way

Fast b-matching via Sufficient Seection Beief Propagation to hande this technicaity is to first compute the sortorder of the α t terms and, on each row, correct the ordering using a binary search-ike strategy for each index in the seected indices. This method is technicay a ogarithmic time procedure, but requires some extra indexing ogic that creates undesirabe constant time penaties. Another approach, which is much simper to impement and does not require extra indexing ogic, is to use the sort-order of the β t s and adjust the stopping criterion to account for the possibiity of unseen α t vaues. Since the weights do not change during beief propagation, at initiaization, the agorithm computes index cache I N (m+n) c of cache size c, which is a parameter set by the user, where entry I ik is the index of the k th argest weight connected to node x i and, for u = I ik, W(x i, x u ) = σ k ({W(x i, x j ) j}). At the end of each iteration, the β t vaues are simiary sorted and stored in index vector e N m+n, where, for v = e k, entry β t v = σ k (β t j j}). The seection operation from (2) is then computed by checking the beiefs corresponding to the sorted weight and β indices. At each step, maintain a set S of the greatest b j + 1 beiefs seen so far. These provide tight ower bounds on the true α β vaues. At each stage of this procedure, the current estimates for α t j and βt j are α t j σ bj (S), and β t j min(s). Incrementay scan the beiefs for both index ists (I) j and e, computing for incrementing index k, B iiik and B iek. Each of these computed beiefs is compared to the beiefs in set S and if any member of S is ess than the new beief, the new beief repaces the minimum vaue in S. 1 ). This maintains S as the set of the greatest b j + 1 eements seen so far. At each stage, we bound the greatest possibe unseen beief as the sum of the east weight seen so far from the sorted weight cache and the east β vaue so far from the β cache. Once the estimate β j t is ess than or equa to this sum, the agorithm can exit because further comparisons are unnecessary. Agorithm 2 summarizes the sufficient seection procedure. 1 A sma hash tabe for the indices wi indicate whether an index has been previousy visited in O(1) time per ookup. For sma vaues of b j where (b j << n + m), a inear scan through S to find the minimum is sufficienty fast, but a priority queue can be used to achieve sub-inear time insertion and repacement when b j is arge. Agorithm 2 Sufficient Seection. Given sort-order of β t vaues and partia sort-order of weights, seects the b j th and b j + 1 th greatest beiefs of row j. 1: k 1 2: bound 3: S 4: α j t 5: βj t 6: whie β t j < bound do 7: if k c then 8: u I 9: if (u is unvisited and (Bju t 1 1: S (S \ min(s)) B t 1 ju 11: end if 12: end if > min(s)) then 13: v e k 14: if (v is unvisited and (Bjv t 1 > min(s)) then 15: S (S \ min(s)) B t 1 jv 16: end if 17: bound W(x j, x u ) + βv t 1 18: α t j σ b j (S) 19: βt j σ bj+1(s) 2: k k + 1 21: end whie 22: α t j αt j 23: β t j β t j 2.4 Impementation Detais The impementation of Agorithms 1 and 2 used in the experiments of Section 3 is in C. To perform the initia iteration, during which the weight cache is constructed, our program uses the Quick Seect agorithm, which features the same pivot-based partitioning strategy as Quick Sort to perform seection in (average case) O(N) time per node (Cormen et a., 21). For owdimensiona data and distance-based weights, we can run the same seection using a kd-tree and provide the index cache as an input to the program. 2 2.5 Anaysis In this section, we anayze the correctness, space and running time requirements of the proposed agorithm. First, we verify that the bound from the sufficient seection procedure hods even though it is computed using ony the βj t vaues, when many of the beiefs are actuay computed using α t j vaues. Caim 1. At each stage of the scan, where set S contains the b j + 1 greatest beiefs corresponding to the first through k th indices of (I) j and e, the foowing 2 A newer C++ version of the sover is avaiabe at http://www.cs.coumbia.edu/~bert/code/bmatching/.

Bert Huang, Tony Jebara properties are invariant: the current estimates bound the true vaues from beow, α t j αt j, β j t βt j, and the greatest unexpored beief is no greater than the sum of the east cached weight and the east β t 1 j vaue, W(x j, x u ) + βv t 1 ({ }) max B t 1 j {e k+1,..., e m+n, (4) where u = I and v = e k. Proof. The first two inequaities foow from the fact that the agorithm is seecting from but has not necessariy seen the fu row yet. The third inequaity (4) is the resut of two bounds. First, the beiefs in the right-hand side can be expanded and bounded by ignoring the conditiona in the beief update rue and aways using β t 1 : W(x j, x ) + β t 1 B t 1 j. By definition α t 1 β t 1, since the former is the negation of a arger vaue than the atter. A sufficient condition to guarantee Inequaity (4) is then W(x j, x u ) + β t 1 v max({w(x j, x ) + β t 1 }), where is in the remaining unseen indices as in (4). Since each component on the eft-hand side has been expored in decreasing order, the maximization on the right can be reaxed into independent maximizations over each component, and neither can exceed the corresponding vaue on the eft. Thus, the agorithm wi never stop too eary. However, the running time of the seection operation depends on how eary the stopping criterion is detected. In the worst case, the process examines every entry of the row, with some overhead checking for repeat comparisons. McAuey and Caetano (29, 21) showed that for random orderings of each dimension (and no truncated cache size), the expected number of beief comparisons necessary is O( N) to find the maximum, where, in our case N = m + n = V. We show that seection is computabe with O( bn) expected comparisons. However, for probems where the orderings of each dimension are negativey correated, the running time can be worse. In the case of b-matching, the orderings of the beiefs and potentias are in fact negativey correated, but in a weak manner. We first estabish the expected performance of the sufficient seection agorithm under the assumption of randomy ordered β vaues. Theorem 1. Considering the eement-wise sum of two rea-vaued vectors w and β of ength N with independenty random sort orders, the expected number of eements that must be compared to compute the seection of the b th greatest entry σ b ({w i + β i i}) is bn. Proof. The sufficient seection agorithm can be equivaenty viewed as checking eement-wise sums in the sort orders of the w and β vectors, and growing a set of k indices that have been examined. The agorithm can stop once it has seen b entries that are in the first k of both sort orders. We first consider the agorithm once it has examined k indices of each vector, and derive the expected number of entries that wi be in both sets of k greatest entries. Since the sort orders of each set are random, the probem can be posed as a simpe samping scenario. Without oss of generaity, consider the set of indices that correspond to the greatest k entries in w. Examining the greatest k eements of β is then equivaent to randomy samping k indices from 1 to N without repacement. Thus, the probabiity of any of the k greatest entries of β being samped is k/n, and, since there are k of these, the expected number of samped entries that are in the greatest k entries of both vectors is k 2 /N. Finay, to determine the number of entries the agorithm must examine to have, in expectation, b entries in the top k, we simpy sove the equation b = k 2 /N for k, which yieds that when k = bn, the agorithm wi in the expected case observe b entries in the top k of both ists and therefore competes computation. Appying the estimated running time to anaysis of the fu agorithm provides the foowing coroary. Coroary 1. Assuming the β messages and the weight potentias are aways randomy, independenty ordered, and for constant b, the tota running time for each iteration of beief propagation for b-matching with sufficient seection is O(N 1.5 ), and the tota running time to sove b-matching is O(N 2.5 ). It is important to point out the differences between the assumptions in Theorem 1 and why they do not aways hod in rea data scenarios. When nodes represent actua objects or entities and the weights are determined by a function between nodes, the weight vaues have dependencies and are therefore not competey randomy ordered. Furthermore, the β vaues change during beief propagation according to rues that depend on the weights, and in some cases can cause the seection time to grow to O(N). Nevertheess, in many samping settings and rea data generating processes, the weights are random enough and the messages behave we enough that the agorithm yieds significant speed improvements. Section 3 contains synthetic and rea data experiments that demonstrate the significant speed improvement as we as a contrived, synthetic experiment where the speedup is ess significant due to a specia samping process.

Fast b-matching via Sufficient Seection Beief Propagation Finay, the space requirement for this agorithm has been reduced from the O(N 2 ) beiefs (or messages) of the previous beief propagation agorithm to O(N) storage for the α and β vaues of each row. Naturay, this improvement is most significant in settings where the weights are computabe from an efficient function, whereas if the weights are arbitrary, the input itsef requires O(N 2 ) memory, so the memory reduction ony aows the additiona storage to be inear. In most machine earning appications, however, the weights are computed from functions of node descriptor pairs, such as Eucidean distance between vectors or kerne vaues. In these appications, the agorithm needs ony to store the node descriptors, the α and β vaues and, during the computation of Agorithm 2, O(N) beiefs (which can be immediatey deeted before computing the next row). The weight cache adds O(cN) space, where we consider c a user-seected constant. The space reduction is aso significant for the purposes of paraeization. The computation of beief propagation is easy to paraeize, but the communication costs between processors can be prohibitive. With the proposed agorithm, each computer in a custer stores ony a copy of the node descriptors and the current α and β vaues. At each iteration, the custer must share the 2N updated α and β vaues. This is in contrast to previous formuations where O(N 2 ) messages or beiefs needed to be transmitted between computers at each iteration for fu paraeization. Thus, when it is possibe to provide each computer with a copy of the node descriptor data, an easy paraeization scheme is to spit the row updates between custer computers at each iteration. 3 EXPERIMENTS This section describes empirica resuts from synthetic tests, which provide usefu insight into the behavior of the agorithm, and a simpe test on the MNIST handwritten digits data set, which demonstrates that the performance improvements appy to rea data. 3.1 Synthetic Gaussian Data In these experiments, the running time of the proposed agorithm is measured and compared against two baseine methods: the standard beief propagation agorithm, which is equivaent to setting the proposed agorithm s cache size to zero, and the Bossom V code by Komogorov (29), which is considered to be a state-of-the-art maximum weight non-bipartite matching sover. For both experiments, node descriptors are samped from zero-mean, spherica Gaussian distributions with (beief ookups per iteration) (1/2) (seconds per iteration) (1/2) 1.8.6.4.2 Averaged over 15 runs per size 5 1 15 2 25 3 Averaged over 15 runs per size 3 2 1 c =. c =.5 (m+n) c =.15 (m+n) c = 1. (m+n) c =. c =.5 (m+n) c =.15 (m+n) c = 1. (m+n) 5 1 15 2 25 3 Figure 1: Running Time Measurements on Synthetic Gaussian Data. Top: Square root CPU time per iteration used to sove b-matching of varying sizes. The defaut beief propagation agorithm is equivaent to cache size c =, where the running time appears to grow quadraticay. Nonzero cache sizes are ceary sub-quadratic (sub-inear in the square root pot). Bottom: Count of beief ookups per iteration. The number of beief ookups serves as a surrogate measure of running time which is not affected by other processes running on the computer. variance 1., the weight function returns negative Eucidean distance, and we sampe bipartitions of equa size (m = n = N/2). In the first experiment, points are samped from R 2. Using different cache sizes, the running time of the agorithm is measured for varying point set sizes from 1 to 5. We set b i = 1, i. We measure the running time using actua CPU time as we as a count of beief ookups. The square roots of per-iteration running times are drawn in Figure 1. It is cear that for a cache size of zero, where the agorithm is defaut beief propagation, the running time per iteration scaes quadraticay and that for non-zero cache sizes, the running time scaes sub-quadraticay. This impies that, at east for random, iid, Gaussian data and Eucidean weights, the weights and β vaues are uncorreated enough to achieve the random permutation case speedup. For the second experiment, node descriptors are drawn from R 5, and we compare 1-matching performance between sufficient seection beief propagation, fu beief propagation and Komogorov s Bossom V code. For sufficient seection, we set the cache size to c = 2 m + n. In this case, there is no equivaent notion of per-iteration time for Bossom V, so we compare the fu soution time. Fu beief propagation and Bossom V seem to scae simiary, but sufficient se-

Bert Huang, Tony Jebara time (seconds) 3 2 1 Averaged over 13 runs per size Sufficient BP Fu BP BossomV 2 4 6 8 1 Figure 2: Comparison against Bossom V. Running times for soving varying sized bipartite 1-matching probems using Komogorov s Bossom V code, fu beief propagation and sufficient seection beief propagation. Node descriptors are samped from a spherica Gaussian in R 5 and weights are negative Eucidean distances. Fu beief propagation tends to run faster than Bossom V, but not aways. Beief propagation with sufficient seection is significanty faster for these random probems. ection improves the running time significanty. For this comparison, it is important to note some differences between the probem casses that the compared code sove: the agorithm behind Bossom V soves non-bipartite 1-matchings, whereas the proposed agorithm is speciaized for bipartite b-matchings. Nevertheess, in this comparison, a agorithms are given bipartite 1-matchings. These tests were run on a persona computer with an 8-core 3 GHz Inte Xeon processor (though each run was singe-threaded). 3.2 Synthetic Adversaria Exampe In this section, we present an experiment that is an adversaria exampe for the sufficient seection agorithm. We construct an iid samping scheme that generates data where the cached nearest neighbors of certain points wi not be the b-matched neighbors unti we cache Ω(N) neighbors. The data is generated by randomy samping points uniformy from the surfaces of two hyperspheres in high dimensiona space R 5, one with radius 1. and the other with radius.1. The resut is that, due to concentration, the points on the outer hypersphere are coser to a points on the inner sphere than any other points on the outer sphere, with high probabiity. Yet, the minimum distance b-matching wi connect points according to which sphere they were samped from. The distance between outer points to inner points wi be in the range [.9, 1.1], and the distance between outer points to other outer points wi concentrate around 2 when dimensionaity is much arger than N (because each vector is orthogona with high probabiity). A outer points wi rank the inner points as their nearest neighbors before any outer points, but due to b-matching constraints, not enough edges are avaiabe from the inner points. This is an exampe where, for beief propagation to find the best b-matching, the α and β vaues must be negativey correated with the weights. Using cache sizes from to m + n, where c = m + n aows the fu sufficient seection, running times are compared for different sized input. From the arguments above, the sufficient seection shoud fai to improve upon the asymptotic time of fu seection for a nodes on the outer hypersphere. Nevertheess, a constant time speedup is sti achieved by expoiting order information. This may simpy be because, sufficient seection speeds up performance for the points on the inner hypersphere but not for the adversariay arranged points on the outer hypersphere. (beief ookups per iteration) (1/2) (seconds per iteration) (1/2).8.6.4.2 Averaged over 15 runs per size 2 4 6 8 1 Averaged over 15 runs per size 1 8 6 4 2 c =. c =.1 (m+n) c =.25 (m+n) c = 1. (m+n) c =. c =.1 (m+n) c =.25 (m+n) c = 1. (m+n) 2 4 6 8 1 Figure 3: High Dimensiona Two Hypersphere Running Times. Even for a fu cache size, the running time seems to sti scae quadraticay, abeit with a smaer constant factor. 3.3 Handwritten Digits We perform timing tests on the MNIST digits data set (LeCun et a., 21), which contains 6k training and 1k testing handwritten digit images. The images are centered, and represented as 28 28 pixe grayscae images. We use principe components anaysis (PCA) to reduce the 784 pixe dimensions of each image to the top 1 principe eigenvector projections. We use negative Eucidean distance between PCA-projected digits as edge weights, and time sufficient seection beief propagation on a subsamped data set with varying cache sizes. In particuar, for this test, we sampe 1% of both the training and testing sets, resuting in 6 training and 1 testing digits. We generate feasibe b-matching constraints by setting the target degree {1,..., 5} for the training points and the

Fast b-matching via Sufficient Seection Beief Propagation target degree b te for testing points to b te = 6 (since there are six times as many training points). Since there are 6 miion candidate edges between training and testing exampes, any agorithm that stores and updates beiefs or states for each edge, such as the origina beief propagation agorithm described by Huang and Jebara (27) or the Bossom V agorithm by Komogorov (29) cannot be run on most computers without the use of expensive virtua memory swapping. Thus, we ony compare the running times of inear memory b-matching beief propagation as described in Section 2.2 using different cache sizes. These timing tests were run on a Mac Pro with an 8- core 3 GHz Inte Xeon processor, each b-matching job running on ony a singe core. The resuts show that for a cache size of 2, the soution time is reduced from around an hour to fewer than ten minutes. Interestingy, the running time for arger b vaues is ess, which is because beief propagation seems to converge in fewer iterations. For arger cache sizes, we achieve minima further improvement in running time; it seems that once the cache size is arge enough, the agorithm finishes seection before running out of cached weights. Finay, using a cache size of 35, finding the minimum distance matching for the fu MNIST data set, which contains six hundred miion candidate edges between training and testing exampes, took approximatey five hours for = 1 and = 4. The statistics from each run are summarized in Tabe 1. As in the synthetic exampes, we count the number of beief ookups during the entire run and can compare against the tota number that woud have been necessary had a standard seection agorithm been used (which is (m + n) 2 per iteration). The running time is approximatey 1 times faster than the estimated time for beief propagation with naive seection. Time (minutes) 1 8 6 4 2 = 1, b te = 6 = 2, b te = 12 = 3, b te = 18 = 4, b te = 24 = 5, b te = 3 2 4 6 8 1 12 Cache Size Figure 4: Minimum Eucidean Distance b-matching Subsamped MNIST Digit Running Times. Weighted b-matching is soved on a subset of the MNIST data set. Running times are measured for various target degrees and b te, as we as weight cache sizes. See Tabe 1 for running time measurements on the fu MNIST data set. Tabe 1: Running Time Statistics on Fu MNIST Data Set. Matching the fu MNIST training set to the testing set considers 7 nodes and 6 miion edges. The tabe coumns are, from eft to right, the target degrees and b te for training and testing nodes, raw running time for b-matching in minutes, the tota number of beief ookups during the entire run, and the percentage of the beief ookups that woud have been necessary using naive beief propagation (% Fu). b te Time (min.) Beief Lookups % Fu 1 6 285.77 4.5992 1 1.94% 4 24 36.76 5.228 1 1 1.11% 4 DISCUSSION This artice presented an enhanced beief propagation agorithm that soves maximum weight b-matching. The enhancements yied significant improvements in space requirement and running time. The space requirement is reduced from quadratic to inear, and the running time is reduced from O(N 3 ) to O(N 2.5 ) under certain assumptions. Empirica performance is consistent with the theoretica anaysis, yet the theoretica anaysis needs restrictive assumptions, so reaxing these to more reaistic scenarios remains future work. Further speed and space improvements may be possibe by conceding exactness in favor of an approximation scheme. For exampe, node descriptors can be stored using hashing schemes that preserve the reconstruction of node distances (Karatzogou et a., 21). Additionay, the initia iteration requires essentiay a k-nearest neighbor computation, for which there are various approximate methods with speed tradeoffs. Extra anaysis is necessary, however, to provide the error bound for the resuting b-matching, as we as to ensure that beief propagation converges. Parae versions of the proposed agorithm are yet to be impemented, and, whie they seem theoreticay straightforward, exacty impementing the paraeization as efficienty as possibe remains future work. Finay, because of this agorithm, the cass of b-matching probems efficienty sovabe is now much arger, so appication of b-matching (and the agorithms that buid on b-matching) to arger scae data is a significant direction of future research. Acknowedgements The authors acknowedge support from DHS Contract N661-9-C-8 Privacy Preserving Sharing of Network Trace Data (PPSNTD) Program and thank Bake Shaw and Tiberio Caetano for hepfu discussions.

Bert Huang, Tony Jebara References M. Bayati, D. Shah, and M. Sharma. Maximum weight matching via max-product beief propagation. In Proc. of the IEEE Internationa Symposium on Information Theory, 25. M. Bayati, C. Borgs, J. T. Chayes, and R. Zecchina. Beief-propagation for weighted b-matchings on arbitrary graphs and its reation to inear programs with integer soutions. CoRR, abs/79.119, 27. T. Cormen, C. Leiserson, R. Rivest, and C. Stein. Introduction to agorithms. McGraw-Hi Book Company, Cambridge, London, 2 edition, 21. A. Danyuk, L. Bottou, and M. Littman, editors. Proceedings of the 26th Annua Internationa Conference on Machine Learning, ICML 29, Montrea, Quebec, Canada, June 14-18, 29, voume 382 of ACM Internationa Conference Proceeding Series, 29. ACM. ISBN 978-1-6558-516-1. R. Duan and S. Pettie. Approximating maximum weight matching in near-inear time. In Proceedings 51st IEEE Symposium on Foundations of Computer Science (FOCS), 21. C. Fremuth-Paeger and D. Jungnicke. Baanced network fows. i. a unifying framework for design and anaysis of matching agorithms. Networks, 33(1), 1999. H. N. Gabow and R. E. Tarjan. Faster scaing agorithms for network probems. SIAM J. Comput., 18(5):113 136, 1989. B. Huang and T. Jebara. Loopy beief propagation for bipartite maximum weight b-matching. In M. Meia and X. Shen, editors, Proceedings of the 11th Internationa Conference on Artificia Inteigence and Statistics, voume 2 of JMLR: W&CP, March 27. B. Huang and T. Jebara. Exact graph structure estimation with degree priors. In M. Wani, M. Kantardzic, V. Paade, L. Kurgan, and Y. Qi, editors, ICMLA, pages 111 118. IEEE Computer Society, 29. ISBN 978-- 7695-3926-3. T. Jebara and V. Shchogoev. B-matching for spectra custering. In J. Fürnkranz, T. Scheffer, and M. Spiiopouou, editors, ECML, voume 4212 of Lecture Notes in Computer Science, pages 679 686. Springer, 26. ISBN 3-54-45375-X. T. Jebara, J. Wang, and S.-F. Chang. Graph construction and b-matching for semi-supervised earning. In Danyuk et a. (29), page 56. ISBN 978-1-6558-516-1. A. Karatzogou, A. Smoa, and M. Weimer. Coaborative fitering on a budget. In Y. Teh and M. Titterington, editors, Proceedings of the Thirteenth Internationa Conference on Artificia Inteigence and Statistics (AISTATS), voume 9, pages 389 396, 21. V. Komogorov. Bossom v: a new impementation of a minimum cost perfect matching agorithm. Mathematica Programming Computation, 1:43 67, 29. ISSN 1867-2949. URL http://dx.doi.org/1.17/s12532-9-2-8. 1.17/s12532-9-2-8. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased earning appied to document recognition. In Inteigent Signa Processing, pages 36 351. IEEE Press, 21. J. McAuey and T. Caetano. Faster agorithms for maxproduct message-passing. CoRR, abs/91.331, 29. J. McAuey and T. Caetano. Expoiting data-independence for fast beief-propagation. In J. Fürnkranz and T. Joachims, editors, ICML, pages 767 774. Omnipress, 21. J. Saez and D. Shah. Optimaity of beief propagation for random assignment probem. In C. Mathieu, editor, SODA, pages 187 196. SIAM, 29. S. Sanghavi, D. Maioutov, and A. Wisky. Linear programming anaysis of oopy beief propagation for weighted matching. In J. Patt, D. Koer, Y. Singer, and S. Roweis, editors, Advances in Neura Information Processing Systems 2, pages 1273 128, Cambridge, MA, 27. MIT Press. P. Sankowski. Maximum weight bipartite matching in matrix mutipication time. Theor. Comput. Sci., 41(44): 448 4488, 29. B. Shaw and T. Jebara. Minimum voume embedding. In M. Meia and X. Shen, editors, Proceedings of the 11th Internationa Conference on Artificia Inteigence and Statistics, voume 2 of JMLR: W&CP, March 27. B. Shaw and T. Jebara. Structure preserving embedding. In Danyuk et a. (29), page 118. ISBN 978-1-6558-516-1.