NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV Joe Eaton Ph.D.

Size: px

Start display at page:

Download "NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV Joe Eaton Ph.D."

Johnathan Davidson
5 years ago
Views:

1 NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D.

2 Agenda Accelerated Computing nvgraph New Features Coming Soon Dynamic Graphs GraphBLAS 2

3 ACCELERATED COMPUTING 10x Performance & 5x Energy Efficiency GPU Accelerator # GPU Developers CPU 3

4 PERFORMANCE GAP CONTINUES TO GROW GFLOPS Peak Double Precision FLOPS GB/s Peak Memory Bandwidth Pascal Pascal K K K M2090 K M2090 M M NVIDIA GPU x86 CPU NVIDIA GPU x86 CPU 4

5 GRAPHS ARE FUNDAMENTAL Tight connection between data and graphs Data View Data Element/ Entity Entity Attributes Binary Relation (1 to 1) N-ary Relation (many to 1) Relation Attributes Group of relations over entities Graph View Graph Vertex Vertex labels Graph Edge Hypergraph edge Edge labels Sets of Vertices and Edges 5

6 NVGRAPH Easy Onramp to GPU Accelerated Graph Analytics GPU Optimized Algorithms Reduced cost & Increased performance Standard formats and primitives Semi-rings, load-balancing Performance Constantly Improving 6

7 Iterations/s nvgraph Accelerated Graph Analytics nvgraph: 3.4x Speedup 3 nvgraph for high performance graph analytics Deliver results up to 3x faster than CPU-only Solve graphs with up to 2 Billion edges on a single GPU (M40) Accelerates a wide range of graph analytics applications: 2 1 2x12 Core Xeon E5 v2 nvgraph on P100 PageRank Single Source Shortest Path Search Robotic Path Planning IP Routing Single Source Widest Path Recommendation Engines Power Network Planning Chip Design / EDA Social Ad Placement Logistics & Supply Chain Planning Traffic sensitive routing developer.nvidia.com/nvgraph 0 PageRank on Twitter 1.5B edge dataset nvgraph on P100 GraphMat on 2 socket 12-core Xeon E v2 CPU,@ 2.70 GHz 7

8 Motivating example Power law graph: wiki2003.bin 455,436 vertices (n) 2,033,173 edges (nnz) sparsity = Cusparse csrmv time: 8.05 ms Merge Path csrmv time: 1.08 ms ~7.45x faster! PSG Cluster, K40 8

9 SEMI-RINGS Definition / Axioms Set R with two binary operators: + and * that satisfy: 1. (R, +) is associative, commutative with additive identity 0 (0 + a = a) 2. (R, *) is associative with multiplicative identity 1 (1 * a = a) 3. Left and Right multiplication is distributive over addition 4. Additive identity 0 = multiplicative null operator (0 * a = a * 0 = 0) 9

10 SEMI-RINGS Examples SEMIRING SET PLUS TIMES 0 1 Real R + * 0 1 MinPlus R {, } min + 0 MaxMin R {, } max min - Boolean {0, 1}

11 APPLICATIONS Pagerank (+, *) Ideal application: runs on web and social graphs Each iteration involves computing: y = A x Standard csrmv PlusTimes Semiring α = 1.0 (multiplicative identity) β = 0.0 (multiplicative nullity) //sw/gpgpu/naga/src/pagerank.cpp 11

12 APPLICATIONS Single Source Shortest Path (min, +) Common Usage Examples: Path-finding algorithms: Navigation Modeling Communications Network Breadth first search building block Graph 500 Benchmark

13 Common Usage Examples: Maximum bipartite graph matching Graph partitioning Minimum cut Common application areas: power grids chip circuits APPLICATIONS Widest Path (max,min) 13

14 PROPERTY GRAPHS Many simple graphs overlaid 14

15 SUBGRAPH EXTRACTION Focus on a specific area 15

16 Partitioning Clustering BFS Graph Contraction COMING SOON Features in next release 16

17 PARTITIONING AND CLUSTERING Spectral Min Edge Cut Partition 17

18 GTEPS GTEPS BREADTH FIRST SEARCH Key subroutine in several graph algorithms, naturally leads to random access MPI Version implementations: pack or use a bitmap to exchange frontier at end of each step NVSHMEM version: directly updates the frontier map at target using atomics Benefits with smaller graphs (likely behavior with strong scaling) 4 P100 GPUs alltoall connected with NVLink cumpi NVSHMEM cumpi NVSHMEM % 33% 20% 13% % 18% 8% Scale 1x4 Process Grid Scale 2x2 Process Grid 18

19 Graph Contraction 19

20 Graph Contraction 20

21 DYNAMIC GRAPHS custinger brings STINGER to GPUs Oded Green presented at HPEC 2016 custinger: Supporting Dynamic Graph Algorithms for GPUs 21

22 GRAPHBLAS nvgraph is part of a GraphBLAS implementation SPMV and SPMM Semi-rings are the first parts Draft 1.0 of GraphBLAS spec 22

WHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016

WHAT S NEW IN CUDA 8. Siddharth Sharma, Oct 2016 WHAT S NEW IN CUDA 8 Siddharth Sharma, Oct 2016 WHAT S NEW IN CUDA 8 Why Should You Care >2X Run Computations Faster* Solve Larger Problems** Critical Path Analysis * HOOMD Blue v1.3.3 Lennard-Jones liquid