CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning

Similar documents
Kernighan/Lin - Preliminary Definitions. Comments on Kernighan/Lin Algorithm. Partitioning Without Nodal Coordinates Kernighan/Lin

Multilevel Graph Partitioning

Lecture 19: Graph Partitioning

Graph partitioning. Prof. Richard Vuduc Georgia Institute of Technology CSE/CS 8803 PNA: Parallel Numerical Algorithms [L.27] Tuesday, April 22, 2008

Requirements of Load Balancing Algorithm

Lesson 2 7 Graph Partitioning

Graph and Hypergraph Partitioning for Parallel Computing

CS 267: Applications of Parallel Computers. Graph Partitioning" Outline of Graph Partitioning Lecture. Definition of Graph Partitioning

Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering

Partitioning and Partitioning Tools. Tim Barth NASA Ames Research Center Moffett Field, California USA

CS 267: Applications of Parallel Computers. Graph Partitioning

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen

Social-Network Graphs

CS 267: Applications of Parallel Computers. Graph Partitioning"

Partitioning Problem and Usage

Advanced Topics in Numerical Analysis: High Performance Computing

TELCOM2125: Network Science and Analysis

Sparse Matrices and Graphs: There and Back Again

Visual Representations for Machine Learning

Lecture 15: More Iterative Ideas

CS 534: Computer Vision Segmentation II Graph Cuts and Image Segmentation

Unit 5A: Circuit Partitioning

Spectral Clustering. Presented by Eldad Rubinstein Based on a Tutorial by Ulrike von Luxburg TAU Big Data Processing Seminar December 14, 2014

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 11

Partitioning. Course contents: Readings. Kernighang-Lin partitioning heuristic Fiduccia-Mattheyses heuristic. Chapter 7.5.

Preclass Warmup. ESE535: Electronic Design Automation. Motivation (1) Today. Bisection Width. Motivation (2)

Improvements in Dynamic Partitioning. Aman Arora Snehal Chitnavis

Level 3: Level 2: Level 1: Level 0:

My favorite application using eigenvalues: partitioning and community detection in social networks

VLSI Physical Design: From Graph Partitioning to Timing Closure

Spectral Compression of Mesh Geometry

Spectral Graph Sparsification: overview of theory and practical methods. Yiannis Koutis. University of Puerto Rico - Rio Piedras

Mining Social Network Graphs

Penalized Graph Partitioning for Static and Dynamic Load Balancing

Machine Learning for Data Science (CS4786) Lecture 11

Numerical Algorithms

Native mesh ordering with Scotch 4.0

Introduction to spectral clustering

Aarti Singh. Machine Learning / Slides Courtesy: Eric Xing, M. Hein & U.V. Luxburg

Spectral Clustering on Handwritten Digits Database

REGULAR GRAPHS OF GIVEN GIRTH. Contents

Generalized trace ratio optimization and applications

Spectral Clustering X I AO ZE N G + E L HA M TA BA S SI CS E CL A S S P R ESENTATION MA RCH 1 6,

Lecture 11: Clustering and the Spectral Partitioning Algorithm A note on randomized algorithm, Unbiased estimates

CS 534: Computer Vision Segmentation and Perceptual Grouping

V4 Matrix algorithms and graph partitioning

Contents. F10: Parallel Sparse Matrix Computations. Parallel algorithms for sparse systems Ax = b. Discretized domain a metal sheet

New Challenges In Dynamic Load Balancing

Solving Sparse Linear Systems. Forward and backward substitution for solving lower or upper triangular systems

Optimizing Parallel Sparse Matrix-Vector Multiplication by Corner Partitioning

Mathematics and Computer Science

Big Data Management and NoSQL Databases

Subdivision Curves and Surfaces

Overview of Trilinos and PT-Scotch

Non Overlapping Communities

Hypergraph-Partitioning Based Decomposition for Parallel Sparse-Matrix Vector Multiplication

CSCI-B609: A Theorist s Toolkit, Fall 2016 Sept. 6, Firstly let s consider a real world problem: community detection.

Parallel Rendering. Jongwook Jin Kwangyun Wohn VR Lab. CS Dept. KAIST MAR

Hypergraph Partitioning for Computing Matrix Powers

Seminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm

Second Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #21: Graph Mining 2

Genetic Algorithm for Circuit Partitioning

Problem Definition. Clustering nonlinearly separable data:

Graph drawing in spectral layout

Sparse matrices, graphs, and tree elimination

CHAPTER 6 DEVELOPMENT OF PARTICLE SWARM OPTIMIZATION BASED ALGORITHM FOR GRAPH PARTITIONING

Shape Optimizing Load Balancing for Parallel Adaptive Numerical Simulations Using MPI

Week 3: MPI. Day 04 :: Domain decomposition, load balancing, hybrid particlemesh

CAD Algorithms. Circuit Partitioning

Introduction to Multigrid and its Parallelization

Clustering Object-Oriented Software Systems using Spectral Graph Partitioning

Multi-Objective Hypergraph Partitioning Algorithms for Cut and Maximum Subdomain Degree Minimization

Lecture 5: Graphs. Rajat Mittal. IIT Kanpur

Chapter 2 Basic Graph Algorithms

Parallel Computing: Parallel Algorithm Design Examples Jin, Hai

The Partitioning Problem

BACKGROUND: A BRIEF INTRODUCTION TO GRAPH THEORY

Lecture 27: Fast Laplacian Solvers

Targil 12 : Image Segmentation. Image segmentation. Why do we need it? Image segmentation

Lecture 9. Semidefinite programming is linear programming where variables are entries in a positive semidefinite matrix.

Parallel Implementations of Gaussian Elimination

04 - Normal Estimation, Curves

CS 664 Slides #11 Image Segmentation. Prof. Dan Huttenlocher Fall 2003

Homework 4: Clustering, Recommenders, Dim. Reduction, ML and Graph Mining (due November 19 th, 2014, 2:30pm, in class hard-copy please)

Rectangular Partitioning

Engineering Multilevel Graph Partitioning Algorithms

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206

Geometric Modeling Assignment 3: Discrete Differential Quantities

Lecture 11: E-M and MeanShift. CAP 5415 Fall 2007

Modularity CMSC 858L

10-701/15-781, Fall 2006, Final

MULTI-LEVEL GRAPH PARTITIONING

Lecture 13 Segmentation and Scene Understanding Chris Choy, Ph.D. candidate Stanford Vision and Learning Lab (SVL)

Integer Programming Theory

Lecture 9 - Matrix Multiplication Equivalences and Spectral Graph Theory 1

Treewidth and graph minors

Parallel FEM Computation and Multilevel Graph Partitioning Xing Cai

Place and Route for FPGAs

Spectral Graph Multisection Through Orthogonality. Huanyang Zheng and Jie Wu CIS Department, Temple University

Transcription:

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning

Parallel sparse matrix-vector product Lay out matrix and vectors by rows y(i) = sum(a(i,j)*x(j)) Only compute terms with A(i,j) 0 P0 P1 P2 P3 x Algorithm Each processor i: Broadcast x(i) Compute y(i) = A(i,:)*x Comm volume v = p*n (way too much!) y P0 P1 P2 P3 Reducing communication volume 1. Only send each proc the parts of x it needs 2. Reorder matrix for better locality by graph partitioning

2D block decomposition for 5-point stencil n stencil cells, p processors Each processor has a patch of n/p cells Block row (or block col) layout: 2-dimensional block layout: v = 2 * p * sqrt(n) v = 4 * sqrt(p) * sqrt(n)

Graphs and Sparse Matrices Sparse matrix is a representation of a (sparse) graph 1 2 3 4 5 6 1 1 1 2 1 1 1 3 1 1 1 4 1 1 5 1 1 6 1 1 Matrix entries are edge weights Number of nonzeros per row is the vertex degree Edges represent data dependencies in matrix-vector multiplication 1 2 6 3 4 5

Sparse Matrix Vector Multiplication

Graph partitioning Assigns subgraphs to processors Determines parallelism and locality. Tries to make subgraphs all same size (load balance) Tries to minimize edge crossings (communication). Exact minimization is NP-complete. edge crossings = 6 edge crossings = 10

Applications of graph partitioning Telephone network design The original application! 1970 algorithm due to Kernighan & Lin Sparse Matrix times Vector Multiplication To solve Ax = b, partial differential equations, eigenvalue problems, N = {1,,n}, (j,k) in E if A(j,k) nonzero, W N (j) = #nonzeros in row j, W E (j,k) = 1 Data mining and clustering Physical Mapping of DNA VLSI Layout Sparse Gaussian Elimination Reorder matrix rows and columns to decrease fill in factors Load Balancing while Minimizing Communication...

Graph partitioning demo > load meshes > gplotg(airfoil,axy) > specdice(airfoil,axy,5) > meshdemo spectral bisection 32-way partition

Partitioning by Repeated Bisection To partition into 2 k parts, bisect graph recursively k times

Recursive Bisection Recursive bisection approach: Partition data into two sets. Recursively subdivide each set into two sets. Only minor modifications needed to allow P 2 n. Umit V. Catalyurek

CS 240A: Graph and hypergraph partitioning (excerpts) Thanks to Aydin Buluc, Umit Catalyurek, Alan Edelman, and Kathy Yelick for some of these slides. The whole CS240A partitioning lecture is at http://cs.ucsb.edu/~gilbert/cs240a/slides/cs240a-partitioning.pdf

Graph partitioning in practice Graph partitioning heuristics have been an active research area for many years, often motivated by partitioning for parallel computation. Some techniques: Iterative-swapping (Kernighan-Lin, Fiduccia-Matheysses) Spectral partitioning (uses eigenvectors of Laplacian matrix of graph) Geometric partitioning (for meshes with specified vertex coordinates) Breadth-first search (fast but dated) Many popular modern codes (e.g. Metis, Chaco, Zoltan) use multilevel iterative swapping

Iterative swapping: Kernighan/Lin, Fiduccia/Mattheyses Take a initial partition and iteratively improve it Kernighan/Lin (1970), cost = O( N 3 ) but simple Fiduccia/Mattheyses (1982), cost = O( E ) but more complicated Start with a weighted graph and a partition A U B, where A = B T = cost(a,b) = Σ {weight(e): e connects nodes in A and B} Find subsets X of A and Y of B with X = Y Swapping X and Y should decrease cost: newa = A - X U Y and newb = B - Y U X newt = cost(newa, newb) < cost(a,b) Compute newt efficiently for many possible X and Y, (not time to do all possible), then choose smallest

Simplified Fiduccia-Mattheyses: Example (1) 1 0 a b Red nodes are in Part1; black nodes are in Part2. The initial partition into two parts is arbitrary. In this case it cuts 8 edges. The initial node gains are shown in red. e 0-1 1 c d g h 3 0 2 f Nodes tentatively moved (and cut size after each pair): none (8);

Simplified Fiduccia-Mattheyses: Example (2) 1 0 The node in Part1 with largest gain is g. We tentatively move it to Part2 and recompute the gains of its neighbors. Tentatively moved nodes are hollow circles. After a node is tentatively moved its gain doesn t matter any more. e -2 a -3 c g b 1 d h -2 2 f Nodes tentatively moved (and cut size after each pair): none (8); g,

Simplified Fiduccia-Mattheyses: Example (3) -1-2 The node in Part2 with largest gain is d. We tentatively move it to Part1 and recompute the gains of its neighbors. After this first tentative swap, the cut size is 4. e -2 a -1 c g b d h 0 f 0 Nodes tentatively moved (and cut size after each pair): none (8); g, d (4);

Simplified Fiduccia-Mattheyses: Example (4) The unmoved node in Part1 with largest gain is f. We tentatively move it to Part2 and recompute the gains of its neighbors. -2 e -1-2 a b -1 c d g h f -2 Nodes tentatively moved (and cut size after each pair): none (8); g, d (4); f

Simplified Fiduccia-Mattheyses: Example (5) -3-2 The unmoved node in Part2 with largest gain is c. We tentatively move it to Part1 and recompute the gains of its neighbors. After this tentative swap, the cut size is 5. e 0 a c g b d h f 0 Nodes tentatively moved (and cut size after each pair): none (8); g, d (4); f, c (5);

Simplified Fiduccia-Mattheyses: Example (6) -1 The unmoved node in Part1 with largest gain is b. We tentatively move it to Part2 and recompute the gains of its neighbors. 0 e a c g d h b f 0 Nodes tentatively moved (and cut size after each pair): none (8); g, d (4); f, c (5); b

Simplified Fiduccia-Mattheyses: Example (7) -1 There is a tie for largest gain between the two unmoved nodes in Part2. We choose one (say e) and tentatively move it to Part1. It has no unmoved neighbors so no gains are recomputed. e a c g b d h f After this tentative swap the cut size is 7. 0 Nodes tentatively moved (and cut size after each pair): none (8); g, d (4); f, c (5); b, e (7);

Simplified Fiduccia-Mattheyses: Example (8) The unmoved node in Part1 with the largest gain (the only one) is a. We tentatively move it to Part2. It has no unmoved neighbors so no gains are recomputed. e a c g d h b f 0 Nodes tentatively moved (and cut size after each pair): none (8); g, d (4); f, c (5); b, e (7); a

Simplified Fiduccia-Mattheyses: Example (9) The unmoved node in Part2 with the largest gain (the only one) is h. We tentatively move it to Part1. The cut size after the final tentative swap is 8, the same as it was before any tentative moves. e a c g d h b f Nodes tentatively moved (and cut size after each pair): none (8); g, d (4); f, c (5); b, e (7); a, h (8)

Simplified Fiduccia-Mattheyses: Example (10) After every node has been tentatively moved, we look back at the sequence and see that the smallest cut was 4, after swapping g and d. We make that swap permanent and undo all the later tentative swaps. e a c g d h b f This is the end of the first improvement step. Nodes tentatively moved (and cut size after each pair): none (8); g, d (4); f, c (5); b, e (7); a, h (8)

Simplified Fiduccia-Mattheyses: Example (11) Now we recompute the gains and do another improvement step starting from the new size-4 cut. The details are not shown. The second improvement step doesn t change the cut size, so the algorithm ends with a cut of size 4. e a c g d h b f In general, we keep doing improvement steps as long as the cut size keeps getting smaller.

Spectral Bisection Based on theory of Fiedler (1970s), rediscovered several times in different communities Motivation I: analogy to a vibrating string Motivation II: continuous relaxation of discrete optimization problem Implementation: eigenvectors via Lanczos algorithm To optimize sparse-matrix-vector multiply, we graph partition To graph partition, we find an eigenvector of a matrix To find an eigenvector, we do sparse-matrix-vector multiply No free lunch...

Laplacian Matrix Definition: The Laplacian matrix L(G) of a graph G is a symmetric matrix, with one row and column for each node. It is defined by L(G) (i,i) = degree of node i (number of incident edges) L(G) (i,j) = -1 if i!= j and there is an edge (i,j) L(G) (i,j) = 0 otherwise 1 4 G = L(G) = 2 3 5 2-1 -1 0 0-1 2-1 0 0-1 -1 4-1 -1 0 0-1 2-1 0 0-1 -1 2

Properties of Laplacian Matrix Theorem: L(G) has the following properties L(G) is symmetric. This implies the eigenvalues of L(G) are real, and its eigenvectors are real and orthogonal. Rows of L sum to zero: Let e = [1,,1] T, i.e. the column vector of all ones. Then L(G)*e=0. The eigenvalues of L(G) are nonnegative: 0 = λ1 <= λ2 <= <= λn The number of connected components of G is equal to the number of λi that are 0.

Spectral Bisection Algorithm Spectral Bisection Algorithm: Compute eigenvector v 2 corresponding to λ 2 (L(G)) Partition nodes around the median of v 2 (n) Why in the world should this work? Intuition: vibrating string or membrane Heuristic: continuous relaxation of discrete optimization

Motivation for Spectral Bisection Vibrating string Think of G = 1D mesh as masses (nodes) connected by springs (edges), i.e. a string that can vibrate Vibrating string has modes of vibration, or harmonics Label nodes by whether mode - or + to partition into N- and N+ Same idea for other graphs (eg planar graph ~ trampoline)

2nd eigenvector of L(planar mesh)

Multilevel Partitioning If G is too big for our algorithms, what can we do? (1) Replace G by a coarse approximation Gc, and partition Gc instead (2) Use partition of Gc to get a rough partitioning of G, and then iteratively improve it What if Gc is still too big? Apply same idea recursively

Multilevel Partitioning - High Level Algorithm (N+,N- ) = Multilevel_Partition( N, E ) recursive partitioning routine returns N+ and N- where N = N+ U N- if N is small (1) Partition G = (N,E) directly to get N = N+ U N- Return (N+, N- ) else (2) Coarsen G to get an approximation Gc = (Nc, Ec) (3) (Nc+, Nc- ) = Multilevel_Partition( Nc, Ec ) (4) Expand (Nc+, Nc- ) to a partition (N+, N- ) of N (5) Improve the partition ( N+, N- ) Return ( N+, N- ) endif V - cycle: (5) How do we Coarsen? Expand? Improve? (2,3) (2,3) (4) (5) (4) (5) (2,3) (4) (1)

Coarsening by Maximal Matching

Example of Coarsening

At bottom of recursion, Partition the coarsest graph Using spectral bisection, say. 1 4 G = L(G) = 2 3 5 2-1 -1 0 0-1 2-1 0 0-1 -1 4-1 -1 0 0-1 2-1 0 0-1 -1 2

Expand a partition of G c to a partition of G

After each expansion, Improve the partition by iterative swapping. a 1 0 b a b e 0-1 c g d h 1 e 2 f e c g d h f 3 0