Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering

Similar documents
Multilevel Graph Partitioning

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Graph and Hypergraph Partitioning for Parallel Computing

Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Lesson 2 7 Graph Partitioning

Penalized Graph Partitioning for Static and Dynamic Load Balancing

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen

A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering

Sorting Algorithms. Slides used during lecture of 8/11/2013 (D. Roose) Adapted from slides by

Problem Definition. Clustering nonlinearly separable data:

Lecture 19: Graph Partitioning

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning

Lecture 4: Graph Algorithms

Dense Matrix Algorithms

MULTI-LEVEL GRAPH PARTITIONING

Multi-Objective Hypergraph Partitioning Algorithms for Cut and Maximum Subdomain Degree Minimization

Parallel Multilevel Algorithms for Multi-constraint Graph Partitioning

Analysis of Multilevel Graph Partitioning

Introduction to Parallel & Distributed Computing Parallel Graph Algorithms

Analysis of Multilevel Graph Partitioning

Efficient Programming of Nanowire-based Sublithographic PLAs: A Multilevel Algorithm for Partitioning Graphs

k-way Hypergraph Partitioning via n-level Recursive Bisection

Graph Partitioning Algorithms

CHAPTER 6 DEVELOPMENT OF PARTICLE SWARM OPTIMIZATION BASED ALGORITHM FOR GRAPH PARTITIONING

PARALLEL DECOMPOSITION OF 100-MILLION DOF MESHES INTO HIERARCHICAL SUBDOMAINS

Requirements of Load Balancing Algorithm

Multilevel Algorithms for Multi-Constraint Hypergraph Partitioning

Parallel static and dynamic multi-constraint graph partitioning

Sorting Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Parallel FEM Computation and Multilevel Graph Partitioning Xing Cai

Multi-Threaded Graph Partitioning

Multilevel k-way Hypergraph Partitioning

Principles of Parallel Algorithm Design: Concurrency and Decomposition

Native mesh ordering with Scotch 4.0

Parallel Multilevel Graph Partitioning

Shared memory parallel algorithms in Scotch 6

PuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks

Chapter 8 Dense Matrix Algorithms

Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication

A Parallel Hill-Climbing Refinement Algorithm for Graph Partitioning

Decreasing a key FIB-HEAP-DECREASE-KEY(,, ) 3.. NIL. 2. error new key is greater than current key 6. CASCADING-CUT(, )

Unit 5A: Circuit Partitioning

Place and Route for FPGAs

Search Algorithms for Discrete Optimization Problems

Search Algorithms for Discrete Optimization Problems

Advanced Topics in Numerical Analysis: High Performance Computing

Introduction to Parallel Computing

Partitioning. Course contents: Readings. Kernighang-Lin partitioning heuristic Fiduccia-Mattheyses heuristic. Chapter 7.5.

Co-optimizing Application Partitioning and Network Topology for a Reconfigurable Interconnect

Kernighan/Lin - Preliminary Definitions. Comments on Kernighan/Lin Algorithm. Partitioning Without Nodal Coordinates Kernighan/Lin

Principles of Parallel Algorithm Design: Concurrency and Mapping

F k G A S S1 3 S 2 S S V 2 V 3 V 1 P 01 P 11 P 10 P 00

CME 305: Discrete Mathematics and Algorithms Instructor: Reza Zadeh HW#3 Due at the beginning of class Thursday 02/26/15

Randomized Graph Algorithms

Hypergraph-Partitioning Based Decomposition for Parallel Sparse-Matrix Vector Multiplication

Efficient Nested Dissection for Multicore Architectures

22 Elementary Graph Algorithms. There are two standard ways to represent a

John Mellor-Crummey Department of Computer Science Rice University

Parallel Multilevel Algorithms for Hypergraph Partitioning

Parallel Algorithm Design. Parallel Algorithm Design p. 1

COMMUNICATION IN HYPERCUBES

Distributed-memory Algorithms for Dense Matrices, Vectors, and Arrays

Parallel Graph Partitioning on a CPU-GPU Architecture

Multigrid Pattern. I. Problem. II. Driving Forces. III. Solution

Shape Optimizing Load Balancing for Parallel Adaptive Numerical Simulations Using MPI

Optimizing Parallel Sparse Matrix-Vector Multiplication by Corner Partitioning

Parallelizing LU Factorization

Level 3: Level 2: Level 1: Level 0:

Randomized Algorithms

Lecture 9: Group Communication Operations. Shantanu Dutt ECE Dept. UIC

CSE 431/531: Analysis of Algorithms. Greedy Algorithms. Lecturer: Shi Li. Department of Computer Science and Engineering University at Buffalo

PACKAGE SPECIFICATION HSL 2013

22 Elementary Graph Algorithms. There are two standard ways to represent a

CS 534: Computer Vision Segmentation II Graph Cuts and Image Segmentation

Preclass Warmup. ESE535: Electronic Design Automation. Motivation (1) Today. Bisection Width. Motivation (2)

Computer Science Technical Report. Approximating Weighted Matchings Using the Partitioned Global Address Space Model

Partitioning and Partitioning Tools. Tim Barth NASA Ames Research Center Moffett Field, California USA

Visual Representations for Machine Learning

Parallel Logic Synthesis Optimization for Digital Sequential Circuit

Parallel Graph Partitioning for Complex Networks

11/22/2016. Chapter 9 Graph Algorithms. Introduction. Definitions. Definitions. Definitions. Definitions

Mathematics and Computer Science

Chapter 9 Graph Algorithms

Graph Partitioning for Scalable Distributed Graph Computations

Paths. Path is a sequence of edges that begins at a vertex of a graph and travels from vertex to vertex along edges of the graph.

Principles of Parallel Algorithm Design: Concurrency and Mapping

Graph partitioning. Prof. Richard Vuduc Georgia Institute of Technology CSE/CS 8803 PNA: Parallel Numerical Algorithms [L.27] Tuesday, April 22, 2008

Multi-level sequential circuit partitioning for test vector generation for low power test in VLSI

K-Ways Partitioning of Polyhedral Process Networks: a Multi-Level Approach

On Partitioning FEM Graphs using Diffusion

Graph Partitioning Algorithms for Distributing Workloads of Parallel Computations

Chapter 9 Graph Algorithms

Application of Fusion-Fission to the multi-way graph partitioning problem

CMPSCI 311: Introduction to Algorithms Practice Final Exam

Evolving Multi-level Graph Partitioning Algorithms

Technical Report. OSUBMI-TR-2009-n02/ BU-CE Hypergraph Partitioning-Based Fill-Reducing Ordering

CSE 431/531: Algorithm Analysis and Design (Spring 2018) Greedy Algorithms. Lecturer: Shi Li

Parallel Graph Partitioning on Multicore Architectures

ENCAPSULATING MULTIPLE COMMUNICATION-COST METRICS IN PARTITIONING SPARSE RECTANGULAR MATRICES FOR PARALLEL MATRIX-VECTOR MULTIPLIES

The PRAM model. A. V. Gerbessiotis CIS 485/Spring 1999 Handout 2 Week 2

Transcription:

Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering George Karypis and Vipin Kumar Brian Shi CSci 8314 03/09/2017

Outline Introduction Graph Partitioning Problem Multilevel Graph Partitioning Algorithm Parallelizing The Multilevel Graph Partitioning Algorithm Initial Partitioning Phase Uncoarsening Phase Parallel Refinement Algorithm Parallel Multilevel Sparse Matrix Ordering Algorithm Performance,Scaling Analysis, and Results

Graph Partitioning Problem The graph partitioning problem is NP-complete. Definition: Let G = (V, E) be a graph with V = n. A p-way partition of G consists of V 1,..., V p V such that V is a disjoint union of V 1,..., V p with V i n/p and the edge cuts are minimized.

Multilevel Graph Partitioning Algorithm Consists of three main phases. 1. Coarsening Phase: A matching of G is constructed and edges are contracted. 2. Initial Partitioning Phase: A partition of a smaller coarsened graph is performed. 3. Uncoarsening Phase: The partitioned graph is uncontracted and map back to the original graphs.

Coarsening Phase Sequence of smaller graphs {G l } are constructed by finding maximal matchings starting from G 0 = G. Graph Matching Let G = (V, E) be an undirected graph, then a matching of G is a subset of edges, M E, such that no vertex in V is incident to more than one edge in M. Vertices in matched edges contracted Random Matching Heavy Edge Matching

Partitioning and Uncoarsening 1. Initial Partitioning Phase: A partition of a smaller coarsened graph is performed. 2. Serial algorithms Greedy Graph Growing Partition (GGGP). 3. Partitioned graph is projected back to the original graph. 4. Refinement algorithms used to minimize edge-cuts and load balancing.

Parallelizing The Multilevel Graph Partitioning Algorithm Assumption: p-way partitioning with p processors Parallelization exploited in the recursize nature of the bisection algorithm p processors perform one bisection. Then p/2 processors perform bisections of each half. Goal: Parallel algorithm for graph bisection.

Coarsening Phase Main Idea Assume p = 2 2r processors arranged in a 2-D array, and (V 0, E 0 ) = (V, E) Distribute the verticesinto p subsets: V0 0,, V0 1 p 1,..., V0 Processor P ij will contain subset of edges in E 0 with incident vertices in V i 0 and V j 0

Coarsening Phase The edge matchings, M0 i, will be done along the diagonal processors. After completion each P ii does two broadcasts along its row and columns to distribute M i 0. E 1 = M 0 = i Mi 0 Each P ij contains edges with incident vertices in V i 1 and V j 1 Once G k is coarse enough we reduce number of working processors by folding.

Initial Partitioning Phase Coarsest graph will be contained in a single processor. Can be done sequentially using GGGP. Several runs of GGGP are run using different random starting vertices. We can copy the coarsest graph to multiple processors and run these trials concurrently. We keep the partition giving the smallest edge-cut.

Uncoarsening Phase Project coarse graphs back to original graphs. Refinements are made during each step of the projection Serial algorithms (Kernighan-Lin variants) used when coarse graph resides in one processor. The processor folding is reversed.

Parallel Refinement Algorithm Each P ij computes local gain lg v for each v V j 0. Total gain of v, g v, is computed via a sum reduction along the columns. Each P ii selects vertices, U i V0 i, with positive gain. Broadcast U i row and column-wise. Recompute the local gains and repeat. Balancing partitions Start vertex swaps from heavier parts of partition Use a load balancing iteration if there is more than 2% imbalance.

Parallel Multilevel Sparse Matrix Ordering Algorithm Assume a bisection is already constructed. A, B be the boundary vertices. Need to construct vertex separator.

Parallel Vertex Cover Algorithm Processors are arranged in a 2-D array. A i = A V i 0 and B i = B V i 0 Each P ij stores A i and B j and computes minimal cover of edges between A i, B j locally. Denote A ij c A and Bc ij B and A ij c Bc ij Union of A ij c, Bc ij the minimal cover across all processors forms a cover.

Parallel Vertex Cover Algorithm Constructing Minimal Cover Broadcast Bc i = j Bc ij to processors in same column Each P ij removes vertices from A ij c whose edges are covered in Bc j A ij c A ij c Broadcast A i c = j A ij c to processors in same row Each P ij removes vertices from B j c whose union is B j c Then a minimal cover is got via S = ( p 1 i=0 ) ( p 1 A i c j=0 B j c )

Scaling Analysis Time to broadcast is n = V 0 and m = E 0 Time for bisection is T broadcast = O T bisection = O ( n p ) ( ) ( ) m n + O p p Overall runtime of the p-way partitioning algorithm is ( ( ) ( )) m n T partition = O + O p log(p) p

Test Cases

Results

References G. Karypis and V. Kumar, A parallel algorithm for multilevel graph partitioning and sparse matrix ordering, Journal of Parallel and Distributed Computing, vol. 48, no. 1, p. 71 95, 1998. G. Karypis and V. Kumar, Multilevel graph partitioning schemes, ICPP (3), pp. 113 122, 1995. A. Grama, G. Karypis, V. Kumar, and A. Gupta, Introduction to Parallel Computing. Addison-Wesley, second ed., 2003.