Parallelizing A Convergent Approximate Inference Method
|
|
- Silas Bishop
- 5 years ago
- Views:
Transcription
1 Parallelizing A Convergent Approximate Inference Method Ming Su (1) and Elizabeth Thompson (2) Departments of (1) Electrical Engineering and (2) Statistics University of Washington {mingsu, eathomp}@u.washington.edu Abstract. The ability to efficiently perform probabilistic inference task is critical to large scale applications in statistics and artificial intelligence. Dramatic speedup might be achieved by appropriately mapping the current inference algorithms to the parallel framework. Parallel exact inference methods still suffer from exponential complexity in the worst case. Approximate inference methods have been parallelized and good speedup is achieved. In this paper, we focus on a variant of Belief Propagation algorithm. This variant has better convergent property and is provably convergent under certain conditions. We show that this method is amenable to coarse-grained parallelization and propose techniques to optimally parallelize it without sacrificing convergence. Experiments on a shared memory systems demonstrate that near-ideal speedup is achieved with reasonable scalability. Keywords: Graphical Model, Approximate Inference, Parallel Algorithm 1 Introduction The ability to efficiently perform probabilistic inference task is critical to large scale applications in statistics and artificial intelligence. In particular, such problems arise in the analysis of genetic data on large and complex pedigrees [1] or data at large numbers of markers across the genome [2]. The ever-evolving parallel computing technology suggests that dramatic speed-up might be achieved by appropriately mapping the existing sequential inference algorithms to the parallel framework. Exact inference methods, such as variable elimination (VE) and the junction tree algorithm, have been parallelized and reasonable speedup achieved [3 7]. However, the complexity of exact inference methods for a graphical model is exponential in the tree-width of the graph. For graphs with large tree-width, approximate methods are necessary. While it has been demonstrated empirically that loopy and generalized BP work extremely well in many applications [8], Yedidia et al. [9] have shown that these methods are not guaranteed to converge for loopy graphs. Recently a promising parallel approximate inference method was presented by Gonzalez et al., [10], where loopy Belief Propagation (BP)
2 2 Ming Su and Elizabeth Thompson was optimally parallelized, but without guarantee of convergence. The UPS algorithm [11] has gained popularity due to its reasonably good performance and ease of implemention [12, 13]. More important, the convex relaxation method which incorporates UPS as a special case, is guaranteed to converge under mild conditions [14]. In this paper, we develop an effective parallel generalized inference method with special attention to the UPS algorithm. Even though the generalized inference method possesses a structural parallelism that is straightforward to extract, problems of imbalanced load and excessive communication overhead can result from ineffective task partitioning and sequencing. We focus on solving these two problems and demonstrating the performance of efficiently paralleled algorithms on large scale problems using a shared memory system. 2 Convex Relaxation Method and Subproblem Construction The convex relaxation method relies on the notion of region graphs to faciliate the Bethe Approximation. In the Bethe approximation, one minimizes the Bethe free energy function and uses its solution to obtain an estimate of the partition function and true marginal distributions [14]. The Bethe free energy is a function of terms known as the pseudo-marginals. Definitions and examples of the Bethe approximation, Bethe region graphs and pseudo-marginals can be found in [9,15]. The UPS algorithm and the convex relaxation method were based on the fact that if the graphical model admits a tree-structured Bethe region graph, the associated Bethe approximation is exact [9, 15]. That is, minimization of the Bethe free energy is a convex optimization problem. We obtain a convex subproblem by fixing the pseudo-marginals associated with a selected subset of inner regions to a constant vector. The convex relaxation method works by first finding a sequence of such convex subproblems then repeatedly solving them until convergence. Graphically, the subproblems are defined over a sequence of tree-structured subgraphs. Simple schemes of finding these subgraphs in grid graphs are proposed in [11]. However, these schemes are not optimal and cannot be extended to general graphs. We present a hypergraph spanning tree algorithm that is more effective and is applicable to general graphs. With the hypergraph representation, the problem of finding these subgraphs, which otherwise requires ad hoc treatment in bipartite region graphs, becomes well-defined. The definition of hypergraphs, hyperedges, hypergraph spanning trees and hyperforests can be found in [16]. In the hypergraph representation, nodes and hyperedges correspond to outer regions and inner regions, respectively. Specifically, an inner region can be regarded as a set, whose elements are adjacent outer regions. In the Greedy Sequencing procedure developed by [14], all outer regions are included in each subproblem. The sequence of tree-structured subgraphs corresponds to a sequence of spanning hypertrees. In general, a spanning tree in a hypergraph may not exist and even determination of its existence is strongly NP-complete [16]. We
3 Parallelizing A Convergent Approximate Inference Method 3 T1 T2 Tm T1 T2 Barrier 1 Map Reduce Tm Barrier 2 4 3' (a) (b) Fig. 1. (a) MapReduce flowchart for a sequence of size 2; (b) Coarsening by contracting edge 3, 4 and 5. develop a heuristic, hyperspan, by extending Kruskal s minimum spanning tree algorithm for ordinary graphs. We apply hyperspan repeatedly to obtain a sequence of spanning hyperforests. In this context, the convergence crierion of [14] translates to a condition that every hyperedge has to appear in at least one spanning forest. The Greedy Sequencing procedure guarantees that, in the worst case, the convergence criterion is still satisfied. Interestingly, for a grid graph model with arbitrary size, the greedy sequencing procedure returns a sequence of size two, which is optimal. 3 Parallel and Distributed Inference In the greedy sequencing procedure, if a subproblem is defined on a forest rather than on a tree, we can run Iterative Scaling (IS) on disconnected components, independently and consequently in parallel. This suggests a natural way of extracting coarse-grained parallelism uniformly across the sequence of subproblems. The basic idea is to partition the hypertree or even the hyperforest into a prescribed number, t, of components and assign the computation associated with each component to a separate processing unit. There is no communication cost incurred among the independent computation tasks. This maps to a coarse-grained MapReduce framework [17] as shown in Figure 1(a). Note that synchronization, accomplished by software barriers, is still required at the end of each inner iteration. In this paper, we only focus on mapping the algorithm to a shared memory system. Task partitioning is performed using a multilevel hypergraph partitioning program hmetis [18]. Compared to alternative programs, it has much shorter solution time and more importantly, it produces balanced partitions with a significantly fewer cut edges. The convergence crierion states that every hyperedge has to appear in at least one spanning forest [14]. This means no hyperedge is allowed to be always a cut edge. A simple technique, edge contraction, prevents a hyperedge from being a cut edge. When a hyperedge is contracted, it is replaced by a super node, containing this edge and all nodes that are adjacent to this
4 4 Ming Su and Elizabeth Thompson edge. All other edges that are previously adjacent to any of these nodes become adjacent to the super node (Figure 1(b)). After we partition once, we can contract a subset of cut edges, resulting in a coarsened hypergraph, repartitioning on which will not have any cut placed on the contracted edges. Near optimal speedup is only achieved when we have perfect load balancing. Knowing that IS solution time is proportional to the number of nodes, we perform weighted partitionings. The weight of a node is 1 for a regular node. For a super node, the weight is the number of contained regular nodes. Reasonable load balance is achieved through weighted partitioning when the average interaction between adjacent random variables is not too high. For high interaction, partitioning-based static load balancing (SLB) performs poorly. In Section 4, we show this effect and propose some techniques to accommodate it. We adopted the common multithreading scheme, where in general, n threads are created in a n-core system and each thread is distributed to a separate core. Thread synchronization ensures that all subproblems converge. We use nonblocking send and blocking receive because they are more efficient for the implementation. For efficiency purpose, pseudo-marginals are sent and received in one package rather than individually. Sender and receiver, respectively, use the predefined protocol to packing and unpacking the aggregate into individual pseudo-marginal messages. Our experimenting environment is a shared memory 8-core system with 2 Intel Xeon Quad Core E GHz processors with Debian Linux. We implemented the algorithms in the Java programming language using MPJ Express, an open source Java message passing interface (MPI) library that allows application developers to write and execute parallel applications for multicore processors and computer clusters/clouds. 4 Experiments and Results The selected class of test problems are Ising models, with joint distribution P (x) e i V αixi+ (i,j) E βijxixj,wherev and E are nodes and edges of graph. α i s are uniformly drawn from [ 1, 1] and β ij s are uniformly drawn from [-β, β]. When β>1, loopy BP fails to converge even for small graphs. Due to synchronization, the slowest task will determine the overall performance. The SLB introduced in Section 3 performs worse as β increases. In practice, we apply two runtime techniques to mitigate the problem. First, a dynamic load balancing (DLB) scheme is developed. Instead of partitioning the graph into n components and distributing them to n threads, we partition the graphs into more components and put them into a task pool. At runtime, each thread fetches a task from the pool onces it finishes with its current task. The use of each core is maximized and the length of bottleneck task is shortened. The second technique is the bottleneck task early termination (ET). A thread is terminated when all other threads become idle and no task is left in the pool. However terminating a task prematurely has two undesirable effects. First, it breaks the convergence requirement. Second, it may change the convergence rate. In order to ensure
5 Parallelizing A Convergent Approximate Inference Method , $ #$%&'!" #$%&'!" $ () Fig. 2. (a) Load balance: DLB & ET vs. SLB. Normalized load (w.r.t. the largest) shown for each core. 3 cases listed: 2 cores (upper left), 4 cores (upper right) and 8 cores (bottom). (b) Speedup: DLB & ET vs. SLB. convergence, we can occasionally switch back to non-et mode, especially when oscillation of messages is detected. With β = 1.1, we randomly generated 100 problems. The number of cores ranges from 2 up to 8 to demonstrate both raw speedup and scalability. Speedup is defined as the ratio between sequential and parallel elapsed time. At this interaction level, the sequential run time exceeds 1 minute giving rise to parallelization, and SLB starts performing poorly. Figure 2(a) shows that with SLB, poor balance results irrespective of the number of cores used. This is dramatically mitigated by DLB and ET. Notice that almost perfect balance is achieved for a small number of cores (2,4), but with 8 cores the load is less balanced. The average speedup over 100 problems is shown in Figure 2(b), both for using SLB and for using DLB and ET. DLB and ET universally improved the speedup and the improvement became more prominent as the number of cores increased. With DLB and ET, the speedup approaches the ideal case until the number of cores reaches 6. We attribute this drop-in-speedup trend to two factors. First, as shown in Figure 2(a), even with DLB and ET, load becomes less balanced as the number of cores increases. Second, there is an increased level of resource contention in terms of memory bandwidth. The BP algorithm frequently accesses memory. As more tasks are running in parallel, the number of concurrent memory accesses also increases. 5 Discussion In the paper, we proposed a heuristic for subproblem construction. This heuristic has been shown to be effective and is provably optimal for grid graphs. Thorough testing on a complete set of benchmarking networks will be important in evaluating the performance of the heuristic. Our parallel implementation is at the algorithmic level, which indicates that it can be combined with other lower level parallelization techniques proposed by other researchers. Experiments on a shared memory system exhibit near-ideal speedup with reasonable scalability.
6 6 Ming Su and Elizabeth Thompson Further exploration is necessary to demonstrate that the speedup scales up in practice on large distributed memory systems, such as clusters. Acknowledgments. This work is supported by NIH grant HG References 1. Cannings C, Thompson EA, and Skolnick MH. (1978) Probability functions on complex pedigrees. Advances in Applied Probability 10: Abecasis GR, Cherny SS, Cookson WO, and Cardon LR. (2002) Merlin rapid analysis of dense genetic maps using sparse gene flow trees. Nature Genetics 30: Shachter RD, and Andersen SK. (1994) Global Conditioning for Probabilistic Inference in Belief Networks. UAI. 4. Pennock D.(1998) Logarithmic Time Parallel Bayesian Inference. UAI, Kozlov A, and Singh J. (1994) A Parallel Lauritzen-Spiegelhalter Algorithm for Probabilistic Inference. In Proceedings of the 1994 Conference on Supercomputing, Namasivayam VK, Pathak A, and Prasanna VK. (2006) Scalable Parallel Implementation of Bayesian Network to Junction Tree Conversion for Exact Inference. In 18th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 06), Xia Y, and Prasanna VK. (2008) Parallel exact inference on the cell broadband engine processor. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pp Botetz B. (2007) Efficient belief propagation for vision using linear constraint nodes. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 9. Yedidia JS, Freeman WT, and Weiss Y. (2000) Generalized belief propagation. In NIPS, pages MIT Press. 10. Gonzalez J, Low Y, Guestrin C, and O Hallaron D. (2009b) Distributed Parallel Inference on Large Factor Graphs. UAI. 11. Teh YW, and Welling M. (2001) The unified propagation and scaling algorithm. In NIPS, pages Carbonetto P, de Freitas N, and Barnard K. (2004) A statistical model for general contextual object recognition. In ECCV, pages Xie Z, Gao J, and Wu X. (2009) Regional category parsing in undirected graphical models. Pattern Recognition Letters, 30(14): Su M. (2010) On the Convergence of Convex Relaxation Method and Distributed Optimization of Bethe Free Energy. In Proceedings of the 11th International Symposium on Artificial Intelligence and Mathematics (ISAIM), Fort Lauderdale, Florida. 15. Heskes T. (2002) Stable fixed points of loopy belief propagation are local minima of the Bethe free energy. In NIPS, pages Tomescu I, and Zimand M. (1994) Minimum spanning hypertrees. Discrete Applied Mathematics, 54: Dean J, and Ghemawat S, (2004) MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the Sixth Symposium on Operating System Design and Implementation, San Francisco, CA. 18. Karypis G, and Kumar V. (1998) hmetis: A Hypergraph Partitioning Package.
Parallelizing A Convergent Approximate Inference Method
Parallelizing A Convergent Approximate Inference Method Ming Su () and Elizabeth Thompson (2) Departments of () Electrical Engineering and (2) Statistics University of Washington Department of Statistics
More informationExpectation Propagation
Expectation Propagation Erik Sudderth 6.975 Week 11 Presentation November 20, 2002 Introduction Goal: Efficiently approximate intractable distributions Features of Expectation Propagation (EP): Deterministic,
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Eric Xing Lecture 14, February 29, 2016 Reading: W & J Book Chapters Eric Xing @
More informationSeminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm
Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of
More informationScalable Parallel Implementation of Bayesian Network to Junction Tree Conversion for Exact Inference 1
Scalable Parallel Implementation of Bayesian Network to Junction Tree Conversion for Exact Inference Vasanth Krishna Namasivayam, Animesh Pathak and Viktor K. Prasanna Department of Electrical Engineering
More informationLoopy Belief Propagation
Loopy Belief Propagation Research Exam Kristin Branson September 29, 2003 Loopy Belief Propagation p.1/73 Problem Formalization Reasoning about any real-world problem requires assumptions about the structure
More informationParallel Evidence Propagation on Multicore Processors
Parallel Evidence Propagation on Multicore Processors Yinglong Xia 1, Xiaojun Feng 3 and Viktor K. Prasanna 2,1 Computer Science Department 1 and Department of Electrical Engineering 2 University of Southern
More informationParallel Gibbs Sampling From Colored Fields to Thin Junction Trees
Parallel Gibbs Sampling From Colored Fields to Thin Junction Trees Joseph Gonzalez Yucheng Low Arthur Gretton Carlos Guestrin Draw Samples Sampling as an Inference Procedure Suppose we wanted to know the
More informationTree-structured approximations by expectation propagation
Tree-structured approximations by expectation propagation Thomas Minka Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213 USA minka@stat.cmu.edu Yuan Qi Media Laboratory Massachusetts
More informationInformation Processing Letters
Information Processing Letters 112 (2012) 449 456 Contents lists available at SciVerse ScienceDirect Information Processing Letters www.elsevier.com/locate/ipl Recursive sum product algorithm for generalized
More informationParallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)
Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication
More informationImage-Space-Parallel Direct Volume Rendering on a Cluster of PCs
Image-Space-Parallel Direct Volume Rendering on a Cluster of PCs B. Barla Cambazoglu and Cevdet Aykanat Bilkent University, Department of Computer Engineering, 06800, Ankara, Turkey {berkant,aykanat}@cs.bilkent.edu.tr
More informationParallel Exact Inference on the Cell Broadband Engine Processor
Parallel Exact Inference on the Cell Broadband Engine Processor Yinglong Xia and Viktor K. Prasanna {yinglonx, prasanna}@usc.edu University of Southern California http://ceng.usc.edu/~prasanna/ SC 08 Overview
More informationTowards Asynchronous Distributed MCMC Inference for Large Graphical Models
Towards Asynchronous Distributed MCMC for Large Graphical Models Sameer Singh Department of Computer Science University of Massachusetts Amherst, MA 13 sameer@cs.umass.edu Andrew McCallum Department of
More informationComputer Vision Group Prof. Daniel Cremers. 4a. Inference in Graphical Models
Group Prof. Daniel Cremers 4a. Inference in Graphical Models Inference on a Chain (Rep.) The first values of µ α and µ β are: The partition function can be computed at any node: Overall, we have O(NK 2
More informationAutomatic Scaling Iterative Computations. Aug. 7 th, 2012
Automatic Scaling Iterative Computations Guozhang Wang Cornell University Aug. 7 th, 2012 1 What are Non-Iterative Computations? Non-iterative computation flow Directed Acyclic Examples Batch style analytics
More informationChallenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery
Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark by Nkemdirim Dockery High Performance Computing Workloads Core-memory sized Floating point intensive Well-structured
More informationInference in the Promedas medical expert system
Inference in the Promedas medical expert system Bastian Wemmenhove 1, Joris M. Mooij 1, Wim Wiegerinck 1, Martijn Leisink 1, Hilbert J. Kappen 1, and Jan P. Neijt 2 1 Department of Biophysics, Radboud
More informationLayer-Based Scheduling Algorithms for Multiprocessor-Tasks with Precedence Constraints
Layer-Based Scheduling Algorithms for Multiprocessor-Tasks with Precedence Constraints Jörg Dümmler, Raphael Kunis, and Gudula Rünger Chemnitz University of Technology, Department of Computer Science,
More informationAnalysis of Parallelization Techniques and Tools
International Journal of Information and Computation Technology. ISSN 97-2239 Volume 3, Number 5 (213), pp. 71-7 International Research Publications House http://www. irphouse.com /ijict.htm Analysis of
More informationMinimum Spanning Trees My T. UF
Introduction to Algorithms Minimum Spanning Trees @ UF Problem Find a low cost network connecting a set of locations Any pair of locations are connected There is no cycle Some applications: Communication
More informationParallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering
Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering George Karypis and Vipin Kumar Brian Shi CSci 8314 03/09/2017 Outline Introduction Graph Partitioning Problem Multilevel
More informationParallel Computing in Combinatorial Optimization
Parallel Computing in Combinatorial Optimization Bernard Gendron Université de Montréal gendron@iro.umontreal.ca Course Outline Objective: provide an overview of the current research on the design of parallel
More informationImplementation of Parallel Path Finding in a Shared Memory Architecture
Implementation of Parallel Path Finding in a Shared Memory Architecture David Cohen and Matthew Dallas Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180 Email: {cohend4, dallam}
More informationCS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees
CS242: Probabilistic Graphical Models Lecture 2B: Loopy Belief Propagation & Junction Trees Professor Erik Sudderth Brown University Computer Science September 22, 2016 Some figures and materials courtesy
More informationLarge-Scale Network Simulation Scalability and an FPGA-based Network Simulator
Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Stanley Bak Abstract Network algorithms are deployed on large networks, and proper algorithm evaluation is necessary to avoid
More informationParallel Multilevel Algorithms for Multi-constraint Graph Partitioning
Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning Kirk Schloegel, George Karypis, and Vipin Kumar Army HPC Research Center Department of Computer Science and Engineering University
More informationStatistical Physics of Community Detection
Statistical Physics of Community Detection Keegan Go (keegango), Kenji Hata (khata) December 8, 2015 1 Introduction Community detection is a key problem in network science. Identifying communities, defined
More informationFast Stochastic Block Partition for Streaming Graphs
Fast Stochastic Block Partition for Streaming Graphs Ahsen J. Uppal, and H. Howie Huang The George Washington University Abstract The graph partition problem continues to be challenging, particularly for
More informationProbabilistic Graphical Models
Overview of Part Two Probabilistic Graphical Models Part Two: Inference and Learning Christopher M. Bishop Exact inference and the junction tree MCMC Variational methods and EM Example General variational
More informationComplementary Graph Coloring
International Journal of Computer (IJC) ISSN 2307-4523 (Print & Online) Global Society of Scientific Research and Researchers http://ijcjournal.org/ Complementary Graph Coloring Mohamed Al-Ibrahim a*,
More informationConvergent message passing algorithms - a unifying view
Convergent message passing algorithms - a unifying view Talya Meltzer, Amir Globerson and Yair Weiss School of Computer Science and Engineering The Hebrew University of Jerusalem, Jerusalem, Israel {talyam,gamir,yweiss}@cs.huji.ac.il
More informationAcyclic Network. Tree Based Clustering. Tree Decomposition Methods
Summary s Join Tree Importance of s Solving Topological structure defines key features for a wide class of problems CSP: Inference in acyclic network is extremely efficient (polynomial) Idea: remove cycles
More informationStructured Region Graphs: Morphing EP into GBP
Structured Region Graphs: Morphing EP into GBP Max Welling Dept. of Computer Science UC Irvine Irvine CA 92697-3425 welling@ics.uci.edu Thomas P. Minka Microsoft Research Cambridge, CB3 0FB, UK minka@microsoft.com
More informationFinding Non-overlapping Clusters for Generalized Inference Over Graphical Models
1 Finding Non-overlapping Clusters for Generalized Inference Over Graphical Models Divyanshu Vats and José M. F. Moura arxiv:1107.4067v2 [stat.ml] 18 Mar 2012 Abstract Graphical models use graphs to compactly
More informationSpeeding Up Computation in Probabilistic Graphical Models using GPGPUs
Speeding Up Computation in Probabilistic Graphical Models using GPGPUs Lu Zheng & Ole J. Mengshoel {lu.zheng ole.mengshoel}@sv.cmu.edu GPU Technology Conference San Jose, CA, March 20, 2013 CMU in Silicon
More informationPerformance Evaluations for Parallel Image Filter on Multi - Core Computer using Java Threads
Performance Evaluations for Parallel Image Filter on Multi - Core Computer using Java s Devrim Akgün Computer Engineering of Technology Faculty, Duzce University, Duzce,Turkey ABSTRACT Developing multi
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference
More informationCHAPTER 6 DEVELOPMENT OF PARTICLE SWARM OPTIMIZATION BASED ALGORITHM FOR GRAPH PARTITIONING
CHAPTER 6 DEVELOPMENT OF PARTICLE SWARM OPTIMIZATION BASED ALGORITHM FOR GRAPH PARTITIONING 6.1 Introduction From the review, it is studied that the min cut k partitioning problem is a fundamental partitioning
More informationParallel Splash Belief Propagation
Journal of Machine Learning Research (9) -48 Submitted 4/; Published / Parallel Belief Propagation Joseph Gonzalez Machine Learning Department Carnegie Mellon University 5 Forbes Avenue Pittsburgh, PA
More informationIntroduction to Parallel & Distributed Computing Parallel Graph Algorithms
Introduction to Parallel & Distributed Computing Parallel Graph Algorithms Lecture 16, Spring 2014 Instructor: 罗国杰 gluo@pku.edu.cn In This Lecture Parallel formulations of some important and fundamental
More informationIntroduction to Parallel Computing. CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014
Introduction to Parallel Computing CPS 5401 Fall 2014 Shirley Moore, Instructor October 13, 2014 1 Definition of Parallel Computing Simultaneous use of multiple compute resources to solve a computational
More informationIntroduction to parallel Computing
Introduction to parallel Computing VI-SEEM Training Paschalis Paschalis Korosoglou Korosoglou (pkoro@.gr) (pkoro@.gr) Outline Serial vs Parallel programming Hardware trends Why HPC matters HPC Concepts
More informationMultiple Constraint Satisfaction by Belief Propagation: An Example Using Sudoku
Multiple Constraint Satisfaction by Belief Propagation: An Example Using Sudoku Todd K. Moon and Jacob H. Gunther Utah State University Abstract The popular Sudoku puzzle bears structural resemblance to
More informationMultilevel k-way Hypergraph Partitioning
_ Multilevel k-way Hypergraph Partitioning George Karypis and Vipin Kumar fkarypis, kumarg@cs.umn.edu Department of Computer Science & Engineering, University of Minnesota, Minneapolis, MN 55455 Abstract
More informationFoundation of Parallel Computing- Term project report
Foundation of Parallel Computing- Term project report Shobhit Dutia Shreyas Jayanna Anirudh S N (snd7555@rit.edu) (sj7316@rit.edu) (asn5467@rit.edu) 1. Overview: Graphs are a set of connections between
More informationParallel Implementation of Deep Learning Using MPI
Parallel Implementation of Deep Learning Using MPI CSE633 Parallel Algorithms (Spring 2014) Instructor: Prof. Russ Miller Team #13: Tianle Ma Email: tianlema@buffalo.edu May 7, 2014 Content Introduction
More informationA Study of the Effect of Partitioning on Parallel Simulation of Multicore Systems
A Study of the Effect of Partitioning on Parallel Simulation of Multicore Systems Zhenjiang Dong, Jun Wang, George Riley, Sudhakar Yalamanchili School of Electrical and Computer Engineering Georgia Institute
More informationParallel Graph Partitioning and Sparse Matrix Ordering Library Version 4.0
PARMETIS Parallel Graph Partitioning and Sparse Matrix Ordering Library Version 4.0 George Karypis and Kirk Schloegel University of Minnesota, Department of Computer Science and Engineering Minneapolis,
More informationAn advanced greedy square jigsaw puzzle solver
An advanced greedy square jigsaw puzzle solver Dolev Pomeranz Ben-Gurion University, Israel dolevp@gmail.com July 29, 2010 Abstract The jigsaw puzzle is a well known problem, which many encounter during
More informationSummary. Acyclic Networks Join Tree Clustering. Tree Decomposition Methods. Acyclic Network. Tree Based Clustering. Tree Decomposition.
Summary s Join Tree Importance of s Solving Topological structure denes key features for a wide class of problems CSP: Inference in acyclic network is extremely ecient (polynomial) Idea: remove cycles
More informationAv. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil
" Generalizing Variable Elimination in Bayesian Networks FABIO GAGLIARDI COZMAN Escola Politécnica, University of São Paulo Av Prof Mello Moraes, 31, 05508-900, São Paulo, SP - Brazil fgcozman@uspbr Abstract
More informationSome aspects of parallel program design. R. Bader (LRZ) G. Hager (RRZE)
Some aspects of parallel program design R. Bader (LRZ) G. Hager (RRZE) Finding exploitable concurrency Problem analysis 1. Decompose into subproblems perhaps even hierarchy of subproblems that can simultaneously
More informationDiscrete Optimization with Decision Diagrams
Discrete Optimization with Decision Diagrams J. N. Hooker Joint work with David Bergman, André Ciré, Willem van Hoeve Carnegie Mellon University Australian OR Society, May 2014 Goal Find an alternative
More informationDistributed Parallel Inference on Large Factor Graphs
Distributed Parallel Inference on Large Factor Graphs Joseph E. Gonzalez Carnegie Mellon University jegonzal@cs.cmu.edu Yucheng Low Carnegie Mellon University ylow@cs.cmu.edu Carlos Guestrin Carnegie Mellon
More informationMultilevel Algorithms for Multi-Constraint Hypergraph Partitioning
Multilevel Algorithms for Multi-Constraint Hypergraph Partitioning George Karypis University of Minnesota, Department of Computer Science / Army HPC Research Center Minneapolis, MN 55455 Technical Report
More informationComparison of Graph Cuts with Belief Propagation for Stereo, using Identical MRF Parameters
Comparison of Graph Cuts with Belief Propagation for Stereo, using Identical MRF Parameters Marshall F. Tappen William T. Freeman Computer Science and Artificial Intelligence Laboratory Massachusetts Institute
More informationHomework # 2 Due: October 6. Programming Multiprocessors: Parallelism, Communication, and Synchronization
ECE669: Parallel Computer Architecture Fall 2 Handout #2 Homework # 2 Due: October 6 Programming Multiprocessors: Parallelism, Communication, and Synchronization 1 Introduction When developing multiprocessor
More informationDr Tay Seng Chuan Tel: Office: S16-02, Dean s s Office at Level 2 URL:
Self Introduction Dr Tay Seng Chuan Tel: Email: scitaysc@nus.edu.sg Office: S-0, Dean s s Office at Level URL: http://www.physics.nus.edu.sg/~phytaysc I have been working in NUS since 0, and I teach mainly
More informationShape Optimizing Load Balancing for Parallel Adaptive Numerical Simulations Using MPI
Parallel Adaptive Institute of Theoretical Informatics Karlsruhe Institute of Technology (KIT) 10th DIMACS Challenge Workshop, Feb 13-14, 2012, Atlanta 1 Load Balancing by Repartitioning Application: Large
More informationThe Future of High Performance Computing
The Future of High Performance Computing Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Comparing Two Large-Scale Systems Oakridge Titan Google Data Center 2 Monolithic supercomputer
More informationSubset Sum Problem Parallel Solution
Subset Sum Problem Parallel Solution Project Report Harshit Shah hrs8207@rit.edu Rochester Institute of Technology, NY, USA 1. Overview Subset sum problem is NP-complete problem which can be solved in
More informationAcyclic Network. Tree Based Clustering. Tree Decomposition Methods
Summary s Cluster Tree Elimination Importance of s Solving Topological structure dene key features for a wide class of problems CSP: Inference in acyclic network is extremely ecient (polynomial) Idea:
More information6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS
Chapter 6 Indexing Results 6. INTRODUCTION The generation of inverted indexes for text databases is a computationally intensive process that requires the exclusive use of processing resources for long
More informationParallel Exact Inference on Multicore Using MapReduce
Parallel Exact Inference on Multicore Using MapReduce Nam Ma Computer Science Department University of Southern California Los Angeles, CA 9009 Email: namma@usc.edu Yinglong Xia IBM T.J. Watson Research
More informationImproving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets
Improving the Random Forest Algorithm by Randomly Varying the Size of the Bootstrap Samples for Low Dimensional Data Sets Md Nasim Adnan and Md Zahidul Islam Centre for Research in Complex Systems (CRiCS)
More informationMultithreaded Algorithms Part 1. Dept. of Computer Science & Eng University of Moratuwa
CS4460 Advanced d Algorithms Batch 08, L4S2 Lecture 11 Multithreaded Algorithms Part 1 N. H. N. D. de Silva Dept. of Computer Science & Eng University of Moratuwa Announcements Last topic discussed is
More informationDynamic Fine Grain Scheduling of Pipeline Parallelism. Presented by: Ram Manohar Oruganti and Michael TeWinkle
Dynamic Fine Grain Scheduling of Pipeline Parallelism Presented by: Ram Manohar Oruganti and Michael TeWinkle Overview Introduction Motivation Scheduling Approaches GRAMPS scheduling method Evaluation
More informationGraph Partitioning for Scalable Distributed Graph Computations
Graph Partitioning for Scalable Distributed Graph Computations Aydın Buluç ABuluc@lbl.gov Kamesh Madduri madduri@cse.psu.edu 10 th DIMACS Implementation Challenge, Graph Partitioning and Graph Clustering
More informationMultigrid Pattern. I. Problem. II. Driving Forces. III. Solution
Multigrid Pattern I. Problem Problem domain is decomposed into a set of geometric grids, where each element participates in a local computation followed by data exchanges with adjacent neighbors. The grids
More informationEscola Politécnica, University of São Paulo Av. Prof. Mello Moraes, 2231, , São Paulo, SP - Brazil
Generalizing Variable Elimination in Bayesian Networks FABIO GAGLIARDI COZMAN Escola Politécnica, University of São Paulo Av. Prof. Mello Moraes, 2231, 05508-900, São Paulo, SP - Brazil fgcozman@usp.br
More informationSupplementary Material: The Emergence of. Organizing Structure in Conceptual Representation
Supplementary Material: The Emergence of Organizing Structure in Conceptual Representation Brenden M. Lake, 1,2 Neil D. Lawrence, 3 Joshua B. Tenenbaum, 4,5 1 Center for Data Science, New York University
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Markov Random Fields: Inference Exact: VE Exact+Approximate: BP Readings: Barber 5 Dhruv Batra
More informationUsing Combinatorial Optimization within Max-Product Belief Propagation
Using Combinatorial Optimization within Max-Product Belief Propagation John Duchi Daniel Tarlow Gal Elidan Daphne Koller Department of Computer Science Stanford University Stanford, CA 94305-9010 {jduchi,dtarlow,galel,koller}@cs.stanford.edu
More informationPairwise Clustering and Graphical Models
Pairwise Clustering and Graphical Models Noam Shental Computer Science & Eng. Center for Neural Computation Hebrew University of Jerusalem Jerusalem, Israel 994 fenoam@cs.huji.ac.il Tomer Hertz Computer
More informationDecision Problems. Observation: Many polynomial algorithms. Questions: Can we solve all problems in polynomial time? Answer: No, absolutely not.
Decision Problems Observation: Many polynomial algorithms. Questions: Can we solve all problems in polynomial time? Answer: No, absolutely not. Definition: The class of problems that can be solved by polynomial-time
More informationPart II. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS
Part II C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Converting Directed to Undirected Graphs (1) Converting Directed to Undirected Graphs (2) Add extra links between
More informationA Comparative Study for Efficient Synchronization of Parallel ACO on Multi-core Processors in Solving QAPs
2 IEEE Symposium Series on Computational Intelligence A Comparative Study for Efficient Synchronization of Parallel ACO on Multi-core Processors in Solving Qs Shigeyoshi Tsutsui Management Information
More informationD-Separation. b) the arrows meet head-to-head at the node, and neither the node, nor any of its descendants, are in the set C.
D-Separation Say: A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked by C if it contains a node such that either a) the arrows on the path meet either
More informationA Parallel Genetic Algorithm for Maximum Flow Problem
A Parallel Genetic Algorithm for Maximum Flow Problem Ola M. Surakhi Computer Science Department University of Jordan Amman-Jordan Mohammad Qatawneh Computer Science Department University of Jordan Amman-Jordan
More informationAdaptive-Mesh-Refinement Pattern
Adaptive-Mesh-Refinement Pattern I. Problem Data-parallelism is exposed on a geometric mesh structure (either irregular or regular), where each point iteratively communicates with nearby neighboring points
More informationDynamic Load Balancing of Unstructured Computations in Decision Tree Classifiers
Dynamic Load Balancing of Unstructured Computations in Decision Tree Classifiers A. Srivastava E. Han V. Kumar V. Singh Information Technology Lab Dept. of Computer Science Information Technology Lab Hitachi
More informationParallel Methods for Convex Optimization. A. Devarakonda, J. Demmel, K. Fountoulakis, M. Mahoney
Parallel Methods for Convex Optimization A. Devarakonda, J. Demmel, K. Fountoulakis, M. Mahoney Problems minimize g(x)+f(x; A, b) Sparse regression g(x) =kxk 1 f(x) =kax bk 2 2 mx Sparse SVM g(x) =kxk
More informationMaximum Clique Conformance Measure for Graph Coloring Algorithms
Maximum Clique Conformance Measure for Graph Algorithms Abdulmutaleb Alzubi Jadarah University Dept. of Computer Science Irbid, Jordan alzoubi3@yahoo.com Mohammad Al-Haj Hassan Zarqa University Dept. of
More informationPerformance impact of dynamic parallelism on different clustering algorithms
Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu
More informationMore details on Loopy BP
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website Chapter 9 - Jordan Loopy Belief Propagation Generalized Belief Propagation Unifying Variational and GBP Learning Parameters of MNs
More informationParallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs
Parallel Direct Simulation Monte Carlo Computation Using CUDA on GPUs C.-C. Su a, C.-W. Hsieh b, M. R. Smith b, M. C. Jermy c and J.-S. Wu a a Department of Mechanical Engineering, National Chiao Tung
More informationDESIGN AND ANALYSIS OF ALGORITHMS GREEDY METHOD
1 DESIGN AND ANALYSIS OF ALGORITHMS UNIT II Objectives GREEDY METHOD Explain and detail about greedy method Explain the concept of knapsack problem and solve the problems in knapsack Discuss the applications
More informationSLS Methods: An Overview
HEURSTC OPTMZATON SLS Methods: An Overview adapted from slides for SLS:FA, Chapter 2 Outline 1. Constructive Heuristics (Revisited) 2. terative mprovement (Revisited) 3. Simple SLS Methods 4. Hybrid SLS
More informationResidual Splash for Optimally Parallelizing Belief Propagation
Joseph E. Gonzalez Carnegie Mellon University Yucheng Low Carnegie Mellon University Carlos Guestrin Carnegie Mellon University Abstract As computer architectures move towards multicore we must build a
More informationReview of the Robust K-means Algorithm and Comparison with Other Clustering Methods
Review of the Robust K-means Algorithm and Comparison with Other Clustering Methods Ben Karsin University of Hawaii at Manoa Information and Computer Science ICS 63 Machine Learning Fall 8 Introduction
More informationQuasi-Dynamic Network Model Partition Method for Accelerating Parallel Network Simulation
THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. 565-871 1-5 E-mail: {o-gomez,oosaki,imase}@ist.osaka-u.ac.jp QD-PART (Quasi-Dynamic network model PARTition
More informationPractical Near-Data Processing for In-Memory Analytics Frameworks
Practical Near-Data Processing for In-Memory Analytics Frameworks Mingyu Gao, Grant Ayers, Christos Kozyrakis Stanford University http://mast.stanford.edu PACT Oct 19, 2015 Motivating Trends End of Dennard
More information6 : Factor Graphs, Message Passing and Junction Trees
10-708: Probabilistic Graphical Models 10-708, Spring 2018 6 : Factor Graphs, Message Passing and Junction Trees Lecturer: Kayhan Batmanghelich Scribes: Sarthak Garg 1 Factor Graphs Factor Graphs are graphical
More informationA Parallel Algorithm for Exact Structure Learning of Bayesian Networks
A Parallel Algorithm for Exact Structure Learning of Bayesian Networks Olga Nikolova, Jaroslaw Zola, and Srinivas Aluru Department of Computer Engineering Iowa State University Ames, IA 0010 {olia,zola,aluru}@iastate.edu
More informationGeometric Registration for Deformable Shapes 3.3 Advanced Global Matching
Geometric Registration for Deformable Shapes 3.3 Advanced Global Matching Correlated Correspondences [ASP*04] A Complete Registration System [HAW*08] In this session Advanced Global Matching Some practical
More informationHEURISTIC ALGORITHMS FOR THE GENERALIZED MINIMUM SPANNING TREE PROBLEM
Proceedings of the International Conference on Theory and Applications of Mathematics and Informatics - ICTAMI 24, Thessaloniki, Greece HEURISTIC ALGORITHMS FOR THE GENERALIZED MINIMUM SPANNING TREE PROBLEM
More informationScalable Inference in Hierarchical Generative Models
Scalable Inference in Hierarchical Generative Models Thomas Dean Department of Computer Science Brown University, Providence, RI 02912 Abstract Borrowing insights from computational neuroscience, we present
More informationA Parallel Evolutionary Algorithm for Discovery of Decision Rules
A Parallel Evolutionary Algorithm for Discovery of Decision Rules Wojciech Kwedlo Faculty of Computer Science Technical University of Bia lystok Wiejska 45a, 15-351 Bia lystok, Poland wkwedlo@ii.pb.bialystok.pl
More information