Graph Coarsening and Clustering on the GPU
|
|
- Suzanna Elliott
- 5 years ago
- Views:
Transcription
1 Graph Coarsening and Clustering on the GPU B. O. Fagginger Auer R. H. Bisseling Utrecht University February 14, 2012
2 Introduction We will discuss coarsening and greedy clustering of graphs.
3 Introduction We will discuss coarsening and greedy clustering of graphs. Clustering isolating related groups of vertices in a graph.
4 Introduction We will discuss coarsening and greedy clustering of graphs. Clustering isolating related groups of vertices in a graph. Relevant in: social networks, epidemiology, papers, metabolism, and ecosystems (Newman & Girvan, 2004).
5 Introduction We will discuss coarsening and greedy clustering of graphs. Clustering isolating related groups of vertices in a graph. Relevant in: social networks, epidemiology, papers, metabolism, and ecosystems (Newman & Girvan, 2004). Our primary interests are speed and parallelisation.
6 Modularity clustering Let G = (V, E) be a graph with edge weights ω : E R >0.
7 Modularity clustering Let G = (V, E) be a graph with edge weights ω : E R >0. A clustering of G is a partitioning C of V : V = C C C as a disjoint union.
8 Modularity clustering Let G = (V, E) be a graph with edge weights ω : E R >0. A clustering of G is a partitioning C of V : V = C C C as a disjoint union. Quality of a clustering is measured by its modularity mod(c), introduced in 2004 by Newman and Girvan.
9 Modularity clustering Clustering modularity is defined as ω({u, v}) mod(c) := C C {u,v} E u,v C ω(e) e E C C ( v C ) 2 ζ(v) ( ) 2. 4 ω(e) e E
10 Modularity clustering Clustering modularity is defined as ω({u, v}) mod(c) := C C {u,v} E u,v C ω(e) e E C C ( v C ) 2 ζ(v) ( ) 2. 4 ω(e) e E Here, vertex weights ζ : V R 0 are defined as ζ(v) := ω({u, v}). {u,v} E
11 Modularity clustering mod(c) can be rewritten to 1 4 Ω 2 ζ(c) (2 Ω ζ(c)) 2 Ω ω(cut(c, C )). C C C C C C
12 Modularity clustering mod(c) can be rewritten to 1 4 Ω 2 ζ(c) (2 Ω ζ(c)) 2 Ω ω(cut(c, C )). C C C C C C Here, Ω := e E ω(e) and cut(c, C ) := {{u, v} E u C and v C }.
13 Modularity clustering mod(c) can be rewritten to 1 4 Ω 2 ζ(c) (2 Ω ζ(c)) 2 Ω ω(cut(c, C )). C C C C C C Here, Ω := e E ω(e) and cut(c, C ) := {{u, v} E u C and v C }. To calculate modularity, we only need to keep track of summed vertex weights of clusters and summed edge weights between clusters.
14 Merging clusters Merge two clusters C, C C to a single larger cluster C C.
15 Merging clusters Merge two clusters C, C C to a single larger cluster C C. Then, ζ(c C ) = ζ(c) + ζ(c )
16 Merging clusters Merge two clusters C, C C to a single larger cluster C C. Then, ζ(c C ) = ζ(c) + ζ(c ) and the modularity is increased by (eq. (4) of Ovelgönne et al., 2010) 1 ( ) 2 Ω 2 2 Ω ω(cut(c, C )) ζ(c) ζ(c ).
17 Merging clusters Merge two clusters C, C C to a single larger cluster C C. Then, ζ(c C ) = ζ(c) + ζ(c ) and the modularity is increased by (eq. (4) of Ovelgönne et al., 2010) 1 ( ) 2 Ω 2 2 Ω ω(cut(c, C )) ζ(c) ζ(c ). This suggests a greedy agglomerative strategy (e.g. Zhu et al., 2008).
18 Agglomerative clustering 1 Start with all vertices being a separate cluster: C = {{v} v V }.
19 Agglomerative clustering 1 Start with all vertices being a separate cluster: C = {{v} v V }. 2 Find a heavy matching of clusters with edge weights 1 ( ) 2 Ω 2 2 Ω ω(cut(c, C )) ζ(c) ζ(c ).
20 Agglomerative clustering 1 Start with all vertices being a separate cluster: C = {{v} v V }. 2 Find a heavy matching of clusters with edge weights 1 ( ) 2 Ω 2 2 Ω ω(cut(c, C )) ζ(c) ζ(c ). 3 Merge all matched clusters, summing ζ and ω (graph coarsening).
21 Agglomerative clustering 1 Start with all vertices being a separate cluster: C = {{v} v V }. 2 Find a heavy matching of clusters with edge weights 1 ( ) 2 Ω 2 2 Ω ω(cut(c, C )) ζ(c) ζ(c ). 3 Merge all matched clusters, summing ζ and ω (graph coarsening). 4 Go to step 2 until only a single cluster remains.
22 Agglomerative clustering 1 Start with all vertices being a separate cluster: C = {{v} v V }. 2 Find a heavy matching of clusters with edge weights 1 ( ) 2 Ω 2 2 Ω ω(cut(c, C )) ζ(c) ζ(c ). 3 Merge all matched clusters, summing ζ and ω (graph coarsening). 4 Go to step 2 until only a single cluster remains. 5 Return encountered clustering with highest modularity.
23 Agglomerative clustering 1 Start with all vertices being a separate cluster: C = {{v} v V }. 2 Find a heavy matching of clusters with edge weights 1 ( ) 2 Ω 2 2 Ω ω(cut(c, C )) ζ(c) ζ(c ). 3 Merge all matched clusters, summing ζ and ω (graph coarsening). 4 Go to step 2 until only a single cluster remains. 5 Return encountered clustering with highest modularity. We make use of parallelism in steps 2 and 3.
24 Agglomerative clustering (netherlands) 0 iterations.
25 Agglomerative clustering (netherlands) 11 iterations.
26 Agglomerative clustering (netherlands) 21 iterations.
27 Agglomerative clustering (netherlands) 26 iterations.
28 Agglomerative clustering (netherlands) 33 iterations.
29 Agglomerative clustering (netherlands) Final clustering.
30 Star graphs Agglomerative clustering slows down on star graphs.
31 Star graphs Merging vertices with the same neighbours is bad for clustering.
32 Star graphs So we merge multiple satellites to the same centre.
33 Centre potential To identify star centres and satellites, we propose a centre potential.
34 Centre potential To identify star centres and satellites, we propose a centre potential. This potential is defined for vertices v as cp(v) := deg(v) 2 deg(u). {u,v} E
35 Centre potential To identify star centres and satellites, we propose a centre potential. This potential is defined for vertices v as cp(v) := deg(v) 2 deg(u). {u,v} E We use cp( ) to identify satellites and match these to centres.
36 Centre potential
37 Centre potential For a star graph where k satellites are connected to a clique of l vertices with 0 < l < k, we have that
38 Centre potential For a star graph where k satellites are connected to a clique of l vertices with 0 < l < k, we have that cp(satellite) 1 2 and cp(satellite) 0 as k,
39 Centre potential For a star graph where k satellites are connected to a clique of l vertices with 0 < l < k, we have that cp(satellite) 1 2 cp(centre) 4 3 and cp(satellite) 0 as k, and cp(centre) as k.
40 GPU coarsening Input: a graph G = (V, E, ω, ζ) and a map µ : V N.
41 GPU coarsening Input: a graph G = (V, E, ω, ζ) and a map µ : V N. Output: a graph G = (V, E, ω, ζ ) and surjective map π : V V such that:
42 GPU coarsening Input: a graph G = (V, E, ω, ζ) and a map µ : V N. Output: a graph G = (V, E, ω, ζ ) and surjective map π : V V such that: E = {{π(u), π(v)} {u, v} E} (collapse edges),
43 GPU coarsening Input: a graph G = (V, E, ω, ζ) and a map µ : V N. Output: a graph G = (V, E, ω, ζ ) and surjective map π : V V such that: E = {{π(u), π(v)} {u, v} E} (collapse edges), ω (e ) = π(e)=e ω(e) (sum edge weights),
44 GPU coarsening Input: a graph G = (V, E, ω, ζ) and a map µ : V N. Output: a graph G = (V, E, ω, ζ ) and surjective map π : V V such that: E = {{π(u), π(v)} {u, v} E} (collapse edges), ω (e ) = ζ (v ) = π(e)=e ω(e) π(v)=v ζ(v) (sum edge weights), (sum vertex weights),
45 GPU coarsening Input: a graph G = (V, E, ω, ζ) and a map µ : V N. Output: a graph G = (V, E, ω, ζ ) and surjective map π : V V such that: E = {{π(u), π(v)} {u, v} E} (collapse edges), ω (e ) = ζ (v ) = π(e)=e ω(e) π(v)=v ζ(v) (sum edge weights), (sum vertex weights), π(u) = π(v) if and only if µ(u) = µ(v) (compress µ to π).
46 GPU coarsening (implementation) Implemented using the CUDA Thrust library.
47 GPU coarsening (implementation) Implemented using the CUDA Thrust library. View G as a collection of adjacency lists for each vertex.
48 GPU coarsening (implementation) Implemented using the CUDA Thrust library. View G as a collection of adjacency lists for each vertex. First, we create π and π 1 from µ.
49 GPU coarsening (implementation) Implemented using the CUDA Thrust library. View G as a collection of adjacency lists for each vertex. First, we create π and π 1 from µ. Then, we create the new adjacency lists and weights for G.
50 GPU coarsening (implementation) Implemented using the CUDA Thrust library. View G as a collection of adjacency lists for each vertex. First, we create π and π 1 from µ. Then, we create the new adjacency lists and weights for G. Use µ, π, π 1, and a bookkeeping array ρ in global GPU memory.
51 GPU coarsening (algorithm) ρ µ π 1 π Initialise ρ sequentially and store µ.
52 GPU coarsening (algorithm) ρ µ π 1 π Sort by increasing µ-value (sort by key).
53 GPU coarsening (algorithm) ρ µ π 1 π Sort by increasing µ-value (sort by key).
54 GPU coarsening (algorithm) ρ µ π 1 π Determine different matched groups (adjacent not equal).
55 GPU coarsening (algorithm) ρ µ π 1 π Determine different matched groups (adjacent not equal).
56 GPU coarsening (algorithm) ρ µ π 1 π Extract boundaries for π 1 (copy index if nonzero).
57 GPU coarsening (algorithm) ρ µ π π Extract boundaries for π 1 (copy index if nonzero).
58 GPU coarsening (algorithm) ρ µ π π Perform scan to find π indices (inclusive scan).
59 GPU coarsening (algorithm) ρ µ π π Perform scan to find π indices (inclusive scan).
60 GPU coarsening (algorithm) ρ µ π π Extract π as π(ρ(i)) = µ(i) (scatter).
61 GPU coarsening (algorithm) ρ µ π π Extract π as π(ρ(i)) = µ(i) (scatter).
62 GPU coarsening (algorithm) Construct the adjacency lists of G for i V in parallel.
63 GPU coarsening (algorithm) Construct the adjacency lists of G for i V in parallel. Sum vertex weights: ζ (i ) = ζ(ρ(π 1 (i ))) ζ(ρ(π 1 (i + 1) 1)).
64 GPU coarsening (algorithm) Construct the adjacency lists of G for i V in parallel. Sum vertex weights: ζ (i ) = ζ(ρ(π 1 (i ))) ζ(ρ(π 1 (i + 1) 1)). Gather weighted neighbours of ρ(π 1 (i )) to ρ(π 1 (i + 1) 1): (i 1, ω 1, i 2, ω 2,..., i k, ω k ).
65 GPU coarsening (algorithm) Construct the adjacency lists of G for i V in parallel. Sum vertex weights: ζ (i ) = ζ(ρ(π 1 (i ))) ζ(ρ(π 1 (i + 1) 1)). Gather weighted neighbours of ρ(π 1 (i )) to ρ(π 1 (i + 1) 1): (π(i 1 ), ω 1, π(i 2 ), ω 2,..., π(i k ), ω k ).
66 GPU coarsening (algorithm) Construct the adjacency lists of G for i V in parallel. Sum vertex weights: ζ (i ) = ζ(ρ(π 1 (i ))) ζ(ρ(π 1 (i + 1) 1)). Gather weighted neighbours of ρ(π 1 (i )) to ρ(π 1 (i + 1) 1): (π(i 1 ), ω 1, π(i 2 ), ω 2,..., π(i k ), ω k ). Sort the neighbour list by index.
67 GPU coarsening (algorithm) Construct the adjacency lists of G for i V in parallel. Sum vertex weights: ζ (i ) = ζ(ρ(π 1 (i ))) ζ(ρ(π 1 (i + 1) 1)). Gather weighted neighbours of ρ(π 1 (i )) to ρ(π 1 (i + 1) 1): Sort the neighbour list by index. (π(i 1 ), ω 1, π(i 2 ), ω 2,..., π(i k ), ω k ). Compress the neighbour list by replacing subsequences (j, ω 1, j, ω 2,..., j, ω l ) with (j, ω 1 + ω ω l ).
68 GPU clustering (parallelism) We perform matching in parallel on the GPU to obtain µ.
69 GPU clustering (parallelism) We perform matching in parallel on the GPU to obtain µ. Calculate centre potentials and match satellites in parallel.
70 GPU clustering (parallelism) We perform matching in parallel on the GPU to obtain µ. Calculate centre potentials and match satellites in parallel. Coarsen in parallel as described previously.
71 GPU clustering (parallelism) We perform matching in parallel on the GPU to obtain µ. Calculate centre potentials and match satellites in parallel. Coarsen in parallel as described previously. This gives us a fine-grained shared-memory parallel clustering algorithm.
72 GPU clustering (parallelism) We perform matching in parallel on the GPU to obtain µ. Calculate centre potentials and match satellites in parallel. Coarsen in parallel as described previously. This gives us a fine-grained shared-memory parallel clustering algorithm. We do not perform local improvement (Kernighan Lin): changing cluster weights makes parallelising this very hard.
73 Results Created an implementation on the GPU using CUDA and on the CPU using TBB.
74 Results Created an implementation on the GPU using CUDA and on the CPU using TBB. Results are averaged over 16 runs.
75 Results Created an implementation on the GPU using CUDA and on the CPU using TBB. Results are averaged over 16 runs. Time: matching and CPU GPU data transfer, not disk I/O.
76 Results Created an implementation on the GPU using CUDA and on the CPU using TBB. Results are averaged over 16 runs. Time: matching and CPU GPU data transfer, not disk I/O. Test set: 10th DIMACS challenge.
77 Results Created an implementation on the GPU using CUDA and on the CPU using TBB. Results are averaged over 16 runs. Time: matching and CPU GPU data transfer, not disk I/O. Test set: 10th DIMACS challenge. Test hardware: dual quad-core Xeon E5620 and an NVIDIA Tesla C2050 (thanks: the Little Green Machine project).
78 Results (scaling) Relative clustering time (%) Clustering time scaling 40 linear Number of CPU threads
79 Results (time) Clustering time (s) *10-7 E CUDA TBB Clustering time Number of graph edges E
80 Results (quality) V E CUDA TBB Ovelgönne et al. (2010) karate jazz 198 2, ,133 5, PGP 10,680 24,
81 Results (quality) V E CUDA TBB Ovelgönne et al. (2010) karate jazz 198 2, ,133 5, PGP 10,680 24, Lower quality, because we do not use local refinement.
82 Results (time) Our algorithm is very fast for large graphs.
83 Results (time) Our algorithm is very fast for large graphs. CUDA: road central, V = 14,081,816, E = 16,933,413, modularity clustering in 4.6 seconds.
84 Results (time) Our algorithm is very fast for large graphs. CUDA: road central, V = 14,081,816, E = 16,933,413, modularity clustering in 4.6 seconds. TBB: uk-2002, V = 18,520,486, E = 261,787,258, modularity clustering in 31 seconds.
85 Results (time) Our algorithm is very fast for large graphs. CUDA: road central, V = 14,081,816, E = 16,933,413, modularity clustering in 4.6 seconds. TBB: uk-2002, V = 18,520,486, E = 261,787,258, modularity clustering in 31 seconds. Comparison to state of the art: DIMACS challenge.
86 Conclusion We presented a fine-grained shared-memory parallel clustering algorithm.
87 Conclusion We presented a fine-grained shared-memory parallel clustering algorithm. This algorithm is suitable for both multi-core CPUs and GPUs.
88 Conclusion We presented a fine-grained shared-memory parallel clustering algorithm. This algorithm is suitable for both multi-core CPUs and GPUs. We propose the centre potential to deal with star-like graphs.
89 Conclusion We presented a fine-grained shared-memory parallel clustering algorithm. This algorithm is suitable for both multi-core CPUs and GPUs. We propose the centre potential to deal with star-like graphs. The algorithm is very fast, but quality could be improved by parallel local refinement.
90 Questions any questions?
91 GPU matching problems Performing matching in parallel is problematic.
92 GPU matching problems Performing matching in parallel is problematic. Disjoint edges requirement leads to serialisation.
93 GPU matching problems Suppose we match vertices simultaneously.
94 GPU matching problems Vertices find an unmatched neighbour...
95 GPU matching problems but generate an invalid matching.
96 GPU matching To solve this we create two groups of vertices: blue and red.
97 GPU matching To solve this we create two groups of vertices: blue and red. Blue vertices propose.
98 GPU matching To solve this we create two groups of vertices: blue and red. Blue vertices propose. Red vertices respond.
99 GPU matching To solve this we create two groups of vertices: blue and red. Blue vertices propose. Red vertices respond. Proposals that were responded to are matched.
100 GPU matching (implementation) The graph (neighbour ranges, indices, and weights) is stored as a triplet of 1D textures on the GPU.
101 GPU matching (implementation) The graph (neighbour ranges, indices, and weights) is stored as a triplet of 1D textures on the GPU. We create one thread for each vertex in V.
102 GPU matching (implementation) The graph (neighbour ranges, indices, and weights) is stored as a triplet of 1D textures on the GPU. We create one thread for each vertex in V. Each vertex v V only updates its colour/matching value π(v); and its proposal/response value σ(v).
103 GPU matching (implementation) The graph (neighbour ranges, indices, and weights) is stored as a triplet of 1D textures on the GPU. We create one thread for each vertex in V. Each vertex v V only updates its colour/matching value π(v); and its proposal/response value σ(v). Both π and σ are stored in 1D arrays in global memory.
104 GPU matching (algorithm) Colour Propose Respond Match π σ
105 GPU matching (algorithm) Colour Propose Respond Match π b r r b b r b b r σ
106 GPU matching (algorithm) Colour Propose Respond Match π b r r b b r b b r σ
107 GPU matching (algorithm) Colour Propose Respond Match π b r r b b r b b r σ
108 GPU matching (algorithm) Colour Propose Respond Match π b 2 3 b r σ
109 GPU matching (algorithm) Colour Propose Respond Match π r 2 3 r b σ
110 GPU matching (algorithm) Colour Propose Respond Match π r 2 3 r b σ d
111 GPU matching (algorithm) Colour Propose Respond Match π r 2 3 r b σ d
112 GPU matching (algorithm) Colour Propose Respond Match π r 2 3 r d σ d
113 GPU matching (algorithm) Colour Propose Respond Match π b 2 3 r d σ d
114 GPU matching (algorithm) Colour Propose Respond Match π b 2 3 r d σ
115 GPU matching (algorithm) Colour Propose Respond Match π b 2 3 r d σ
116 GPU matching (algorithm) Colour Propose Respond Match π d σ
A GPU Algorithm for Greedy Graph Matching
A GPU Algorithm for Greedy Graph Matching B. O. Fagginger Auer R. H. Bisseling Utrecht University September 29, 2011 Fagginger Auer, Bisseling (UU) GPU Greedy Graph Matching September 29, 2011 1 / 27 Outline
More informationGraph Coarsening and Clustering on the GPU
Graph Coarsening and Clustering on the GPU B. O. Fagginger Auer and R. H. Bisseling Mathematics Institute, Utrecht University, Budapestlaan 6, 3584 CD, Utrecht, the Netherlands B.O.FaggingerAuer@uu.nl
More informationAbusing a hypergraph partitioner for unweighted graph partitioning
Abusing a hypergraph partitioner for unweighted graph partitioning B. O. Fagginger Auer R. H. Bisseling Utrecht University February 13, 2012 Fagginger Auer, Bisseling (UU) Mondriaan Graph Partitioning
More informationA GPU Algorithm for Greedy Graph Matching
A GPU Algorithm for Greedy Graph Matching B. O. Fagginger Auer and R. H. Bisseling Mathematics Institute, Utrecht University, Budapestlaan 6, 3584 CD, Utrecht, the Netherlands B.O.FaggingerAuer@uu.nl R.H.Bisseling@uu.nl
More informationSocial Data Management Communities
Social Data Management Communities Antoine Amarilli 1, Silviu Maniu 2 January 9th, 2018 1 Télécom ParisTech 2 Université Paris-Sud 1/20 Table of contents Communities in Graphs 2/20 Graph Communities Communities
More informationProblem Definition. Clustering nonlinearly separable data:
Outlines Weighted Graph Cuts without Eigenvectors: A Multilevel Approach (PAMI 2007) User-Guided Large Attributed Graph Clustering with Multiple Sparse Annotations (PAKDD 2016) Problem Definition Clustering
More informationShape Optimizing Load Balancing for Parallel Adaptive Numerical Simulations Using MPI
Parallel Adaptive Institute of Theoretical Informatics Karlsruhe Institute of Technology (KIT) 10th DIMACS Challenge Workshop, Feb 13-14, 2012, Atlanta 1 Load Balancing by Repartitioning Application: Large
More informationModularity CMSC 858L
Modularity CMSC 858L Module-detection for Function Prediction Biological networks generally modular (Hartwell+, 1999) We can try to find the modules within a network. Once we find modules, we can look
More informationLesson 2 7 Graph Partitioning
Lesson 2 7 Graph Partitioning The Graph Partitioning Problem Look at the problem from a different angle: Let s multiply a sparse matrix A by a vector X. Recall the duality between matrices and graphs:
More informationA Divisive clustering technique for maximizing the modularity
A Divisive clustering technique for maximizing the modularity Ümit V. Çatalyürek, Kamer Kaya, Johannes Langguth, and Bora Uçar February 13, 2012 1 Introduction 2 Clustering Paradigms 3 Algorithm 4 Results
More informationMassive Data Algorithmics. Lecture 10: Connected Components and MST
Connected Components Connected Components 1 1 2 2 4 1 1 4 1 4 4 3 4 Internal Memory Algorithms BFS, DFS: O( V + E ) time 1: for every edge e E do 2: if two endpoints v and w of e are in different CCs then
More informationSeminar on. A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm
Seminar on A Coarse-Grain Parallel Formulation of Multilevel k-way Graph Partitioning Algorithm Mohammad Iftakher Uddin & Mohammad Mahfuzur Rahman Matrikel Nr: 9003357 Matrikel Nr : 9003358 Masters of
More informationParallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering
Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering George Karypis and Vipin Kumar Brian Shi CSci 8314 03/09/2017 Outline Introduction Graph Partitioning Problem Multilevel
More informationEngineering Parallel Algorithms for Community Detection in Massive Networks
1 Engineering Parallel Algorithms for Community Detection in Massive Networks Christian L. Staudt and Henning Meyerhenke Faculty of Informatics, Karlsruhe Institute of Technology (KIT), Germany {christian.staudt,
More informationk-way Hypergraph Partitioning via n-level Recursive Bisection
k-way Hypergraph Partitioning via n-level Recursive Bisection Sebastian Schlag, Vitali Henne, Tobias Heuer, Henning Meyerhenke Peter Sanders, Christian Schulz January 10th, 2016 @ ALENEX 16 INSTITUTE OF
More informationChapter 23. Minimum Spanning Trees
Chapter 23. Minimum Spanning Trees We are given a connected, weighted, undirected graph G = (V,E;w), where each edge (u,v) E has a non-negative weight (often called length) w(u,v). The Minimum Spanning
More informationGraph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen
Graph Partitioning for High-Performance Scientific Simulations Advanced Topics Spring 2008 Prof. Robert van Engelen Overview Challenges for irregular meshes Modeling mesh-based computations as graphs Static
More informationGPU-Accelerated Algebraic Multigrid for Commercial Applications. Joe Eaton, Ph.D. Manager, NVAMG CUDA Library NVIDIA
GPU-Accelerated Algebraic Multigrid for Commercial Applications Joe Eaton, Ph.D. Manager, NVAMG CUDA Library NVIDIA ANSYS Fluent 2 Fluent control flow Accelerate this first Non-linear iterations Assemble
More informationMinimum Spanning Trees
Minimum Spanning Trees Overview Problem A town has a set of houses and a set of roads. A road connects and only houses. A road connecting houses u and v has a repair cost w(u, v). Goal: Repair enough (and
More informationAn Efficient Algorithm for Community Detection in Complex Networks
An Efficient Algorithm for Community Detection in Complex Networks Qiong Chen School of Computer Science & Engineering South China University of Technology Guangzhou Higher Education Mega Centre Panyu
More informationOh Pott, Oh Pott! or how to detect community structure in complex networks
Oh Pott, Oh Pott! or how to detect community structure in complex networks Jörg Reichardt Interdisciplinary Centre for Bioinformatics, Leipzig, Germany (Host of the 2012 Olympics) Questions to start from
More informationNP-Hard (A) (B) (C) (D) 3 n 2 n TSP-Min any Instance V, E Question: Hamiltonian Cycle TSP V, n 22 n E u, v V H
Hard Problems What do you do when your problem is NP-Hard? Give up? (A) Solve a special case! (B) Find the hidden parameter! (Fixed parameter tractable problems) (C) Find an approximate solution. (D) Find
More informationSelf-Improving Sparse Matrix Partitioning and Bulk-Synchronous Pseudo-Streaming
Self-Improving Sparse Matrix Partitioning and Bulk-Synchronous Pseudo-Streaming MSc Thesis Jan-Willem Buurlage Scientific Computing Group Mathematical Institute Utrecht University supervised by Prof. Rob
More informationLecture Notes for Chapter 23: Minimum Spanning Trees
Lecture Notes for Chapter 23: Minimum Spanning Trees Chapter 23 overview Problem A town has a set of houses and a set of roads. A road connects 2 and only 2 houses. A road connecting houses u and v has
More informationKartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18
Accelerating PageRank using Partition-Centric Processing Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Outline Introduction Partition-centric Processing Methodology Analytical Evaluation
More informationFrom Biological Cells to Populations of Individuals: Complex Systems Simulations with CUDA (S5133)
From Biological Cells to Populations of Individuals: Complex Systems Simulations with CUDA (S5133) Dr Paul Richmond Research Fellow University of Sheffield (NVIDIA CUDA Research Centre) Overview Complex
More informationPersistent Homology in Complex Network Analysis
Persistent Homology Summer School - Rabat Persistent Homology in Complex Network Analysis Ulderico Fugacci Kaiserslautern University of Technology Department of Computer Science July 7, 2017 Anything has
More informationMCL. (and other clustering algorithms) 858L
MCL (and other clustering algorithms) 858L Comparing Clustering Algorithms Brohee and van Helden (2006) compared 4 graph clustering algorithms for the task of finding protein complexes: MCODE RNSC Restricted
More informationMATH 363 Final Wednesday, April 28. Final exam. You may use lemmas and theorems that were proven in class and on assignments unless stated otherwise.
Final exam This is a closed book exam. No calculators are allowed. Unless stated otherwise, justify all your steps. You may use lemmas and theorems that were proven in class and on assignments unless stated
More informationClustering Algorithms on Graphs Community Detection 6CCS3WSN-7CCSMWAL
Clustering Algorithms on Graphs Community Detection 6CCS3WSN-7CCSMWAL Contents Zachary s famous example Community structure Modularity The Girvan-Newman edge betweenness algorithm In the beginning: Zachary
More informationCommunity Detection. Community
Community Detection Community In social sciences: Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group a.k.a. group,
More informationCommunity Detection on the GPU
Community Detection on the GPU Md. Naim, Fredrik Manne, Mahantesh Halappanavar, and Antonino Tumeo Abstract We present and evaluate a new GPU algorithm based on the Louvain method for community detection.
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:
More informationA Push- Relabel- Based Maximum Cardinality Bipar9te Matching Algorithm on GPUs
A Push- Relabel- Based Maximum Cardinality Biparte Matching Algorithm on GPUs Mehmet Deveci,, Kamer Kaya, Bora Uçar, and Ümit V. Çatalyürek, Dept. of Biomedical InformaDcs, The Ohio State University Dept.
More informationSocial-Network Graphs
Social-Network Graphs Mining Social Networks Facebook, Google+, Twitter Email Networks, Collaboration Networks Identify communities Similar to clustering Communities usually overlap Identify similarities
More informationEvaluating Queries. Query Processing: Overview. Query Processing: Example 6/2/2009. Query Processing
Evaluating Queries Query Processing Query Processing: Overview Query Processing: Example select lname from employee where ssn = 123456789 ; query expression tree? π lname σ ssn= 123456789 employee physical
More informationEngineering Multilevel Graph Partitioning Algorithms
Engineering Multilevel Graph Partitioning Algorithms Manuel Holtgrewe, Vitaly Osipov, Peter Sanders, Christian Schulz Institute for Theoretical Computer Science, Algorithmics II 1 Mar. 3, 2011 Manuel Holtgrewe,
More informationDistributed NVAMG. Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs
Distributed NVAMG Design and Implementation of a Scalable Algebraic Multigrid Framework for a Cluster of GPUs Istvan Reguly (istvan.reguly at oerc.ox.ac.uk) Oxford e-research Centre NVIDIA Summer Internship
More informationProcessing and Others. Xiaojun Qi -- REU Site Program in CVMA
Advanced Digital Image Processing and Others Xiaojun Qi -- REU Site Program in CVMA (0 Summer) Segmentation Outline Strategies and Data Structures Overview of Algorithms Region Splitting Region Merging
More informationOn community detection in very large networks
On community detection in very large networks Alexandre P. Francisco and Arlindo L. Oliveira INESC-ID / CSE Dept, IST, Tech Univ of Lisbon Rua Alves Redol 9, 1000-029 Lisboa, PT {aplf,aml}@inesc-id.pt
More informationSoftware and Performance Engineering for numerical codes on GPU clusters
Software and Performance Engineering for numerical codes on GPU clusters H. Köstler International Workshop of GPU Solutions to Multiscale Problems in Science and Engineering Harbin, China 28.7.2010 2 3
More informationMinimum Spanning Trees
Chapter 23 Minimum Spanning Trees Let G(V, E, ω) be a weighted connected graph. Find out another weighted connected graph T(V, E, ω), E E, such that T has the minimum weight among all such T s. An important
More informationMachine learning - HT Clustering
Machine learning - HT 2016 10. Clustering Varun Kanade University of Oxford March 4, 2016 Announcements Practical Next Week - No submission Final Exam: Pick up on Monday Material covered next week is not
More informationClustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity
More informationParallel FEM Computation and Multilevel Graph Partitioning Xing Cai
Parallel FEM Computation and Multilevel Graph Partitioning Xing Cai Simula Research Laboratory Overview Parallel FEM computation how? Graph partitioning why? The multilevel approach to GP A numerical example
More informationClusters and Communities
Clusters and Communities Lecture 7 CSCI 4974/6971 22 Sep 2016 1 / 14 Today s Biz 1. Reminders 2. Review 3. Communities 4. Betweenness and Graph Partitioning 5. Label Propagation 2 / 14 Today s Biz 1. Reminders
More informationMining Social Network Graphs
Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be
More informationParCube. W. Randolph Franklin and Salles V. G. de Magalhães, Rensselaer Polytechnic Institute
ParCube W. Randolph Franklin and Salles V. G. de Magalhães, Rensselaer Polytechnic Institute 2017-11-07 Which pairs intersect? Abstract Parallelization of a 3d application (intersection detection). Good
More informationCS 341: Algorithms. Douglas R. Stinson. David R. Cheriton School of Computer Science University of Waterloo. February 26, 2019
CS 341: Algorithms Douglas R. Stinson David R. Cheriton School of Computer Science University of Waterloo February 26, 2019 D.R. Stinson (SCS) CS 341 February 26, 2019 1 / 296 1 Course Information 2 Introduction
More informationDi Zhao Ohio State University MVAPICH User Group (MUG) Meeting, August , Columbus Ohio
Di Zhao zhao.1029@osu.edu Ohio State University MVAPICH User Group (MUG) Meeting, August 26-27 2013, Columbus Ohio Nvidia Kepler K20X Intel Xeon Phi 7120 Launch Date November 2012 Q2 2013 Processor Per-processor
More informationMultiresolution Computation of Conformal Structures of Surfaces
Multiresolution Computation of Conformal Structures of Surfaces Xianfeng Gu Yalin Wang Shing-Tung Yau Division of Engineering and Applied Science, Harvard University, Cambridge, MA 0138 Mathematics Department,
More informationBetweenness Measures and Graph Partitioning
Betweenness Measures and Graph Partitioning Objectives Define densely connected regions of a network Graph partitioning Algorithm to identify densely connected regions breaking a network into a set of
More informationScalable GPU Graph Traversal!
Scalable GPU Graph Traversal Duane Merrill, Michael Garland, and Andrew Grimshaw PPoPP '12 Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming Benwen Zhang
More informationCUB. collective software primitives. Duane Merrill. NVIDIA Research
CUB collective software primitives Duane Merrill NVIDIA Research What is CUB?. A design model for collective primitives How to make reusable SIMT software constructs. A library of collective primitives
More informationData parallel algorithms, algorithmic building blocks, precision vs. accuracy
Data parallel algorithms, algorithmic building blocks, precision vs. accuracy Robert Strzodka Architecture of Computing Systems GPGPU and CUDA Tutorials Dresden, Germany, February 25 2008 2 Overview Parallel
More informationAdaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics
Adaptive-Mesh-Refinement Hydrodynamic GPU Computation in Astrophysics H. Y. Schive ( 薛熙于 ) Graduate Institute of Physics, National Taiwan University Leung Center for Cosmology and Particle Astrophysics
More informationClustering Color/Intensity. Group together pixels of similar color/intensity.
Clustering Color/Intensity Group together pixels of similar color/intensity. Agglomerative Clustering Cluster = connected pixels with similar color. Optimal decomposition may be hard. For example, find
More informationSGN (4 cr) Chapter 11
SGN-41006 (4 cr) Chapter 11 Clustering Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology February 25, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter
More informationPuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks
PuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks George M. Slota 1,2 Kamesh Madduri 2 Sivasankaran Rajamanickam 1 1 Sandia National Laboratories, 2 The Pennsylvania
More information3/1/2010. Acceleration Techniques V1.2. Goals. Overview. Based on slides from Celine Loscos (v1.0)
Acceleration Techniques V1.2 Anthony Steed Based on slides from Celine Loscos (v1.0) Goals Although processor can now deal with many polygons (millions), the size of the models for application keeps on
More informationParallel Graph Component Labelling with GPUs and CUDA
Parallel Graph Component Labelling with GPUs and CUDA K.A. Hawick, A. Leist and D.P. Playne Institute of Information and Mathematical Sciences Massey University Albany, North Shore 102-904, Auckland, New
More informationLecture 6: Spectral Graph Theory I
A Theorist s Toolkit (CMU 18-859T, Fall 013) Lecture 6: Spectral Graph Theory I September 5, 013 Lecturer: Ryan O Donnell Scribe: Jennifer Iglesias 1 Graph Theory For this course we will be working on
More informationAdjacent: Two distinct vertices u, v are adjacent if there is an edge with ends u, v. In this case we let uv denote such an edge.
1 Graph Basics What is a graph? Graph: a graph G consists of a set of vertices, denoted V (G), a set of edges, denoted E(G), and a relation called incidence so that each edge is incident with either one
More informationFast BVH Construction on GPUs
Fast BVH Construction on GPUs Published in EUROGRAGHICS, (2009) C. Lauterbach, M. Garland, S. Sengupta, D. Luebke, D. Manocha University of North Carolina at Chapel Hill NVIDIA University of California
More informationGPU Sparse Graph Traversal
GPU Sparse Graph Traversal Duane Merrill (NVIDIA) Michael Garland (NVIDIA) Andrew Grimshaw (Univ. of Virginia) UNIVERSITY of VIRGINIA Breadth-first search (BFS) 1. Pick a source node 2. Rank every vertex
More informationCS 561, Lecture 9. Jared Saia University of New Mexico
CS 561, Lecture 9 Jared Saia University of New Mexico Today s Outline Minimum Spanning Trees Safe Edge Theorem Kruskal and Prim s algorithms Graph Representation 1 Graph Definition A graph is a pair of
More informationTELCOM2125: Network Science and Analysis
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning
More informationCommunity Structure and Beyond
Community Structure and Beyond Elizabeth A. Leicht MAE: 298 April 9, 2009 Why do we care about community structure? Large Networks Discussion Outline Overview of past work on community structure. How to
More informationMultilevel Graph Partitioning
Multilevel Graph Partitioning George Karypis and Vipin Kumar Adapted from Jmes Demmel s slide (UC-Berkely 2009) and Wasim Mohiuddin (2011) Cover image from: Wang, Wanyi, et al. "Polygonal Clustering Analysis
More informationData-Parallel Algorithms on GPUs. Mark Harris NVIDIA Developer Technology
Data-Parallel Algorithms on GPUs Mark Harris NVIDIA Developer Technology Outline Introduction Algorithmic complexity on GPUs Algorithmic Building Blocks Gather & Scatter Reductions Scan (parallel prefix)
More informationCoordinating More Than 3 Million CUDA Threads for Social Network Analysis. Adam McLaughlin
Coordinating More Than 3 Million CUDA Threads for Social Network Analysis Adam McLaughlin Applications of interest Computational biology Social network analysis Urban planning Epidemiology Hardware verification
More informationClustering on networks by modularity maximization
Clustering on networks by modularity maximization Sonia Cafieri ENAC Ecole Nationale de l Aviation Civile Toulouse, France thanks to: Pierre Hansen, Sylvain Perron, Gilles Caporossi (GERAD, HEC Montréal,
More informationNon Overlapping Communities
Non Overlapping Communities Davide Mottin, Konstantina Lazaridou HassoPlattner Institute Graph Mining course Winter Semester 2016 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides
More informationHash-Based Indexing 165
Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19
More informationMinimum Spanning Trees My T. UF
Introduction to Algorithms Minimum Spanning Trees @ UF Problem Find a low cost network connecting a set of locations Any pair of locations are connected There is no cycle Some applications: Communication
More informationSection 8.2 Graph Terminology. Undirected Graphs. Definition: Two vertices u, v in V are adjacent or neighbors if there is an edge e between u and v.
Section 8.2 Graph Terminology Undirected Graphs Definition: Two vertices u, v in V are adjacent or neighbors if there is an edge e between u and v. The edge e connects u and v. The vertices u and v are
More informationMosaic: Processing a Trillion-Edge Graph on a Single Machine
Mosaic: Processing a Trillion-Edge Graph on a Single Machine Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, Taesoo Kim Georgia Institute of Technology Best Student Paper @ EuroSys
More informationDiscrete Structures CISC 2315 FALL Graphs & Trees
Discrete Structures CISC 2315 FALL 2010 Graphs & Trees Graphs A graph is a discrete structure, with discrete components Components of a Graph edge vertex (node) Vertices A graph G = (V, E), where V is
More informationQuery Processing. Solutions to Practice Exercises Query:
C H A P T E R 1 3 Query Processing Solutions to Practice Exercises 13.1 Query: Π T.branch name ((Π branch name, assets (ρ T (branch))) T.assets>S.assets (Π assets (σ (branch city = Brooklyn )(ρ S (branch)))))
More informationParsing in Parallel on Multiple Cores and GPUs
1/28 Parsing in Parallel on Multiple Cores and GPUs Mark Johnson Centre for Language Sciences and Department of Computing Macquarie University ALTA workshop December 2011 Why parse in parallel? 2/28 The
More informationThe University of Sydney MATH 2009
The University of Sydney MTH 009 GRPH THEORY Tutorial solutions 00. Show that the graph on the left is Hamiltonian, but that the other two are not. To show that the graph is Hamiltonian, simply find a
More informationEngineering Parallel Algorithms for Community Detection in Massive Networks
1 Engineering Parallel Algorithms for Community Detection in Massive Networks Christian L. Staudt and Henning Meyerhenke Faculty of Informatics, Karlsruhe Institute of Technology (KIT), Germany {christian.staudt,
More informationFlux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters
Flux Vector Splitting Methods for the Euler Equations on 3D Unstructured Meshes for CPU/GPU Clusters Manfred Liebmann Technische Universität München Chair of Optimal Control Center for Mathematical Sciences,
More informationComplexity of Disjoint Π-Vertex Deletion for Disconnected Forbidden Subgraphs
Journal of Graph Algorithms and Applications http://jgaa.info/ vol. 18, no. 4, pp. 603 631 (2014) DOI: 10.7155/jgaa.00339 Complexity of Disjoint Π-Vertex Deletion for Disconnected Forbidden Subgraphs Jiong
More informationModern GPU Programming With CUDA and Thrust. Gilles Civario (ICHEC)
Modern GPU Programming With CUDA and Thrust Gilles Civario (ICHEC) gilles.civario@ichec.ie Let me introduce myself ID: Gilles Civario Gender: Male Age: 40 Affiliation: ICHEC Specialty: HPC Mission: CUDA
More informationNorth Bank. West Island. East Island. South Bank
Lecture 11 Eulerian Multigraphs This section of the notes revisits the Königsberg Bridge Problem and generalises it to explore Eulerian multigraphs: those that contain a closed walk that traverses every
More informationMy favorite application using eigenvalues: partitioning and community detection in social networks
My favorite application using eigenvalues: partitioning and community detection in social networks Will Hobbs February 17, 2013 Abstract Social networks are often organized into families, friendship groups,
More informationData Warehouse Tuning. Without SQL Modification
Data Warehouse Tuning Without SQL Modification Agenda About Me Tuning Objectives Data Access Profile Data Access Analysis Performance Baseline Potential Model Changes Model Change Testing Testing Results
More informationParallel Graph Partitioning on a CPU-GPU Architecture
Parallel Graph Partitioning on a CPU-GPU Architecture Bahareh Goodarzi Martin Burtscher Dhrubajyoti Goswami Department of Computer Science Department of Computer Science Department of Computer Science
More informationHIGH RESOLUTION REMOTE SENSING IMAGE SEGMENTATION BASED ON GRAPH THEORY AND FRACTAL NET EVOLUTION APPROACH
HIGH RESOLUTION REMOTE SENSING IMAGE SEGMENTATION BASED ON GRAPH THEORY AND FRACTAL NET EVOLUTION APPROACH Yi Yang, Haitao Li, Yanshun Han, Haiyan Gu Key Laboratory of Geo-informatics of State Bureau of
More informationACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS
ACCELERATING THE PRODUCTION OF SYNTHETIC SEISMOGRAMS BY A MULTICORE PROCESSOR CLUSTER WITH MULTIPLE GPUS Ferdinando Alessi Annalisa Massini Roberto Basili INGV Introduction The simulation of wave propagation
More informationA Novel Parallel Hierarchical Community Detection Method for Large Networks
A Novel Parallel Hierarchical Community Detection Method for Large Networks Ping Lu Shengmei Luo Lei Hu Yunlong Lin Junyang Zou Qiwei Zhong Kuangyan Zhu Jian Lu Qiao Wang Southeast University, School of
More informationWarps and Reduction Algorithms
Warps and Reduction Algorithms 1 more on Thread Execution block partitioning into warps single-instruction, multiple-thread, and divergence 2 Parallel Reduction Algorithms computing the sum or the maximum
More informationGPU Acceleration of Graph Matching, Clustering, and Partitioning
GPU Acceleration of Graph Matching, Clustering, and Partitioning Thesis committee: Dr. B. Uçar, CNRS, École Normale Supérieure de Lyon Prof. dr. D. Roose, KU Leuven Prof. dr. F. Pellegrini, Université
More informationVoronoi Diagrams and Delaunay Triangulations. O Rourke, Chapter 5
Voronoi Diagrams and Delaunay Triangulations O Rourke, Chapter 5 Outline Preliminaries Properties and Applications Computing the Delaunay Triangulation Preliminaries Given a function f: R 2 R, the tangent
More informationCommunity Structure in Graphs
Community Structure in Graphs arxiv:0712.2716v1 [physics.soc-ph] 17 Dec 2007 Santo Fortunato a, Claudio Castellano b a Complex Networks LagrangeLaboratory(CNLL), ISI Foundation, Torino, Italy b SMC, INFM-CNR
More informationUsing CUDA to Accelerate Radar Image Processing
Using CUDA to Accelerate Radar Image Processing Aaron Rogan Richard Carande 9/23/2010 Approved for Public Release by the Air Force on 14 Sep 2010, Document Number 88 ABW-10-5006 Company Overview Neva Ridge
More informationGridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. Xiaowei ZHU Tsinghua University
GridGraph: Large-Scale Graph Processing on a Single Machine Using -Level Hierarchical Partitioning Xiaowei ZHU Tsinghua University Widely-Used Graph Processing Shared memory Single-node & in-memory Ligra,
More informationOutline. 1 Introduction. 2 The problem of assigning PCUs in GERAN. 3 Other hierarchical structuring problems in cellular networks.
Optimisation of cellular network structure by graph partitioning techniques Matías Toril, Volker Wille () (2) () Communications Engineering Dept., University of Málaga, Spain (mtoril@ic.uma.es) (2) Performance
More informationPerformance potential for simulating spin models on GPU
Performance potential for simulating spin models on GPU Martin Weigel Institut für Physik, Johannes-Gutenberg-Universität Mainz, Germany 11th International NTZ-Workshop on New Developments in Computational
More information