Mosaic: Processing a Trillion-Edge Graph on a Single Machine

Size: px
Start display at page:

Download "Mosaic: Processing a Trillion-Edge Graph on a Single Machine"

Transcription

1 Mosaic: Processing a Trillion-Edge Graph on a Single Machine Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, Taesoo Kim Georgia Institute of Technology Best Student EuroSys 7 June 8, 07 Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

2 Large-scale graph processing is ubiquitous Social networks Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

3 Large-scale graph processing is ubiquitous Social networks Genome analysis Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

4 Large-scale graph processing is ubiquitous Social networks Genome analysis Graphs enable Machine Learning Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

5 Powerful, heterogeneous machines Terabytes of RAM on multiple sockets Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

6 Powerful, heterogeneous machines Terabytes of RAM on multiple sockets Powerful many-core coprocessors Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

7 Powerful, heterogeneous machines Terabytes of RAM on multiple sockets Powerful many-core coprocessors Fast, large-capacity Non-volatile Memory Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

8 Powerful, heterogeneous machines Take advantage of heterogeneous machine to process tera-scale graphs Terabytes of RAM on multiple sockets Powerful many-core coprocessors Fast, large-capacity Non-volatile Memory Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

9 Table of contents Graph Processing: Sample Application Design Mosaic Architecture Graph Encoding API Evaluation Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

10 Graph Processing: Applications Community Detection Find Common Friends Find Shortest Paths Estimate Impact of Vertices (webpages, users,... )... Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

11 Example: Pagerank Calculate impact of each vertex Pagerank v = α u Neighborhood(v) Pagerank u degree u + ( α) Simple Algorithm: In each iteration, distribute current impact along out-edges, weighted by degree Sum up all incoming impacts new impact for next iteration Weight new impact with regularization factor α = 0.8 Repeat until no changes Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 6 / 8

12 Pagerank: Graphical example ) Initialize Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 7 / 8

13 Pagerank: Graphical example ) Propagate along outgoing edges Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 7 / 8

14 Pagerank: Graphical example ) Sum up incoming contributions Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 7 / 8

15 Pagerank: Graphical example ) Apply regularization: x Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 7 / 8

16 Pagerank: Graphical example ) Update outgoing edges for second iteration Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 7 / 8

17 Pagerank: Graphical example 6) Repeat until stabilized Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 7 / 8

18 Mosaic: Design space Graph Processing has many faces: Single Machine Out-of-core In memory Cluster Out-of-core In memory Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 8 / 8

19 Mosaic: Design space Graph Processing has many faces: Single Machine Out-of-core Cheap, but potentially slow In memory Fast, but limited graph size Cluster Out-of-core Large graphs, but expensive & slow In memory Large graphs & fast, but very expensive Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 8 / 8

20 Mosaic: Design space Graph Processing has many faces: Single Machine Out-of-core Cheap, but potentially slow In memory Fast, but limited graph size Cluster Out-of-core Large graphs, but expensive & slow In memory Large graphs & fast, but very expensive Single machine, out-of-core is most cost-effective Goal: Good performance and large graphs! Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 8 / 8

21 Mosaic: Design goals Goal Run algorithms on very large graphs on a single machine using coprocessors Enabled by: Common, familiar API (vertex/edge-centric) Encoding: Lossless compression Cache locality Processing on isolated subgraphs Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 9 / 8

22 Architecture of Mosaic Usage of Xeon Phi & NVMe Involvement of Host Global vertex state <current state>... <next state>... stripped Host Processors (Xeon) ( 6) I I... T T edge processing NVMe... Meta transfer Tile transfer fetch Xeon Phi receive ( 6 cores)... per Xeon Phi ( ) PCIe Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 0 / 8

23 Graph encoding: Idea Compression Split graph into subgraphs, use local (short) identifiers Cache locality Inside subgraphs: Sort by access order Between subgraphs: Overlap vertex sets Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

24 Background: Column first Locality for write Multiple sequential reads Target vertex Source vertex P P P P P P P P P P P P P P P P Global adjacency matrix Partition (S = ) Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

25 Background: Row first Locality for read Multiple sequential writes Target vertex Source vertex P P P P P P P P P P P P P P P P Global adjacency matrix Partition (S = ) Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

26 Background: Hilbert order Space-filling curve Provides locality between adjacent data points Target vertex Source vertex P P P P P P P P P P P P P P P P Global adjacency matrix Partition (S = ) Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

27 From global to local: Tiles Convert graph to set of tiles ) Start with adjacency Matrix: 6 ➏ ➑ ➒ ➋ ➊ ➎ ➌ ➐ ➍ Target vertex Source vertex ➊ ➐ ➒ ➑ ➋ ➍ P P P P ➌ ➎ ➏ P P P P P P P P P P P P Global adjacency matrix Partition (S = ) Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

28 From global to local: Tiles Convert graph to set of tiles ) Use first edge in tile T : 6 ➏ ➑ ➒ ➋ ➊ ➎ ➌ ➐ ➍ Target vertex (global) Source vertex (global) ➊ ➐ ➒ ➑ ➋ ➍ P P P P ➌ ➎ ➏ P P P P P P P P P P P P Global adjacency matrix Partition (S = ) Tile- (T ) ➊ meta (I ) (,) (local) (,) : local vertex id (,) : local global id ➊ : local edge store order Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

29 From global to local: Tiles Convert graph to set of tiles ) Use more edges in tile T : 6 ➏ ➑ ➒ ➋ ➊ ➎ ➌ ➐ ➍ Target vertex (global) Source vertex (global) ➊ ➐ ➒ ➑ ➋ ➍ P P P P ➌ ➎ ➏ P P P P P P P P P P P P Global adjacency matrix Partition (S = ) Tile- (T ) ➊ ➋ meta (I ) (,) (,) (,) (local) : local vertex id (,) : local global id ➊ : local edge store order Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

30 From global to local: Tiles Convert graph to set of tiles ) Use more edges in tile T : 6 ➏ ➑ ➒ ➋ ➊ ➎ ➌ ➐ ➍ Target vertex (global) Source vertex (global) ➊ ➐ ➒ ➑ ➋ ➍ P P P P ➌ ➎ ➏ P P P P P P P P P P P P Global adjacency matrix Partition (S = ) Tile- (T ) ➊ ➋ ➍ meta (I (,) (,) ( (,) ) (local),) : local vertex id (,) : local global id ➊ : local edge store order Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

31 From global to local: Tiles Convert graph to set of tiles ) Use more edges in tile T : 6 ➏ ➑ ➒ ➋ ➊ ➎ ➌ ➐ ➍ Target vertex (global) Source vertex (global) ➊ ➐ ➒ ➑ ➋ ➍ P P P P ➌ ➎ ➏ P P P P P P P P P P P P Global adjacency matrix Partition (S = ) Tile- (T ) ➊ ➋ ➍ ➌ meta (I (,) (,) ( (,) ) (local),) : local vertex id (,) : local global id ➊ : local edge store order Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

32 From global to local: Tiles Convert graph to set of tiles 6) Next edges do not fit in T anymore, construct T : 6 ➏ ➑ ➒ ➋ ➊ ➎ ➌ ➐ ➍ Target vertex (global) Source vertex (global) ➊ ➐ ➒ ➑ ➋ ➍ P P P P ➌ ➎ ➏ P P P P P P P P P P P P Global adjacency matrix Partition (S = ) Tile- (T ) ➊ ➋ ➍ ➌ meta (I (,) (,) ( (,) ) (local),) Tile- (T ) ➎ ➐ ➑ ➒ ➏ meta (I (,) (local) (,6) ( (,) ),) : local vertex id (,) : local global id ➊ : local edge store order Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

33 Locality with Hilbert-ordered tiles Overlapping sets of sources and targets Target vertex Source vertex ➊ ➐ ➒ ➑ ➋ ➍ P P P P ➌ ➎ ➏ P P P P P P P P P P P P Global adjacency matrix Partition (S = ) Better locality than row-first or column-first Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 6 / 8

34 From global to local: Data structure ) Split original graphs into two subgraphs: ➊ ➋ ➌ ➍ ➎ ➏ ➐ ➑ ➒ 6 ➊ ➋ ➌ ➍ ➎ ➏ ➐ ➑ ➒ Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 7 / 8

35 From global to local: Data structure ) Internal data structure of T : ➊ ➋ 6 ➑ ➏ ➒ ➎ ➌➐ ➍... ➎ ➐ ➑ ➒ ➏ Index local global 6 ➎ ➏ ➐ ➑ ➒ src tgt Edges B B sorted by Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 7 / 8

36 From global to local: Data structure ) Compress edges: Compressed sparse rows ➊ ➋ 6 ➑ ➏ ➒ ➎ ➌➐ ➍... ➎ ➐ ➑ ➒ ➏ Index local global 6 ➎ ➏ ➐ ➑ ➒ src tgt src (tgt,#tgt) Edges B B sorted by ➎ ➏ ➐ ➑ ➒ (, ) (, ) B B #tgt B compressed Efficient, local encoding, sequentially accessed Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 7 / 8

37 From global to local: Summary Better locality Efficient encoding of local graphs Effect: up to 68% reduction in data size: Graph #vertices #edges Raw data Mosaic size (red.) rmat 6.8 M 0. B.0 GB. GB (.0%) twitter.6 M. B 0.9 GB 7.7 GB ( 9.%) rmat7. M. B 6.0 GB. GB ( 0.6%) uk M.7 B 7.9 GB 8.7 GB ( 68.8%) hyperlink,7.6 M 6. B 80.0 GB. GB ( 68.%) rmat-trillion,9.9 M,000.0 B 8,000.0 GB,86.7 GB ( 9.8%) Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 8 / 8

38 API: Pagerank example Pull: Gather per edge information Reduce: Combine results from multiple subgraphs Apply: Calculate non-associative regularization Edge-centric operation Local graph processing on Tile // On edge processor (co-processor) // Edge e = (Vertex src, Vertex tgt) def Pull(Vertex src, Vertex tgt): return src.val / src.out_degree // On edge processor/global reducers (both) def Reduce(Vertex v, Vertex v ): return v.val + v.val // On global reducers (host) def Apply(Vertex v): v.val = ( - α) + α v.val Formula: Pagerank v = α ( u Neighborhood(v) Global graph processing Vertex-centric operation Pagerank u degree u ) + ( α) Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 9 / 8

39 API: Pagerank example ) Start with T Host current state next state T meta (I (,) (,) ( (,) ),) Xeon Phi Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 0 / 8

40 API: Pagerank example ) Execute Pull along all edges in T Host current state next state T Pull(e) meta (I (,) (,) ( (,) ),) Xeon Phi Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 0 / 8

41 API: Pagerank example ) Reduce all updates from T onto next state Host current state next state T meta (I (,) (,) ( (,) ),) Reduce(v) Xeon Phi Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 0 / 8

42 API: Pagerank example ) Switch to T Host current state next state T meta (I (,) (,6) ( (,) ),) Xeon Phi Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 0 / 8

43 API: Pagerank example ) Pull all updates Host current state next state T Pull(e) meta (I (,) (,6) ( (,) ),) Xeon Phi Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 0 / 8

44 API: Pagerank example 6) Reduce updates from T Host current state next state T meta (I (,) (,6) ( (,) ),) Reduce(v) Xeon Phi Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 0 / 8

45 API: Pagerank example 7) All tiles processed, Apply processed updates Host current state Apply(v) next state T meta (I (,) (,6) ( (,) ),) Xeon Phi Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 0 / 8

46 API: Pagerank example 8) Switch current and next state, clear next state for second iteration Host current state next state T meta (I (,) (,) ( (,) ),) Xeon Phi Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 0 / 8

47 Evaluation Questions: Preprocessing Cost Performance (in comparison) Impact of Design Decisions Scalability Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

48 Evaluation - Setup Single Server: sockets, cores each 768GB RAM Xeon Phi (KNC, 6 cores) 6 NVMe (.TB each) 7 Algorithms 6 Datasets ( real world, synthetic) Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

49 Preprocessing Mosaic needs explicit preprocessing step - min for small datasets, minutes for webgraph, hours for trillion edges But: Can be amortized during execution: GridGraph: Mosaic faster after twitter: 0 iterations uk007: 8 iterations X-Stream: Mosaic faster after twitter: 8 iterations uk007: iterations Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

50 Performance comparison Comparison to other single machine engines with Pagerank: Runtime (seconds) Mosaic GridGraph X-Stream GraphChi 0 rmat twitter rmat7 uk007-0 Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

51 Performance comparison Comparison to other single machine engines with Pagerank: log Runtime (seconds) 00 0 Mosaic GridGraph X-Stream GraphChi 0. rmat twitter rmat7 uk007-0 Mosaic outperforms other system by.7 to 8.6 Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

52 Hilbert-ordered tiles: Cache locality Cache misses and execution times for three different strategies Cache Misses (%) Hilbert Row-First Column-First Runtime (s) Pagerank BFS WCC 0 Pagerank BFS WCC Hilbert-ordered tiles have up to % better cache locality, up to % reduction in runtime Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

53 Evaluation - Scalability Dimensions Add Xeon Phis/NVMes Add threads on each Xeon Phi # iterations / minute # pair of Phi+NVMe Xeon Phi threads Mosaic scales well when adding threads/xeon Phis Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 6 / 8

54 Conclusion Mosaic, a graph processing engine for trillion edge graphs on a single machine Hilbert-ordered tiles allow: Enable localized processing on coprocessors Optimizes cache locality Enables compression Code is open-source: Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 7 / 8

55 Thank you! Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 8 / 8

56 Evaluation - Selective scheduling Skip subgraphs without active vertices Especially useful for traversal algorithms: BFS, Connected Components,... 0k 8k # active tiles 6k k k Tiles Time w/o Selective Scheduling Time Selective Scheduling. 0. Time (sec) 0k # iteration. speedup for BFS on twitter graph Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 9 / 8

57 Evaluation - Dynamic load balancing Runtime (seconds) Effect of choosing the wrong load balancing scheme Mosaic has light-weight balancing scheme optimal one optimal*0 0 Pagerank BFS WCC SpMV TC Up to.8 improvement in running time by choosing correct load balancing scheme Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 0 / 8

58 Mosaic architecture Host and Xeon Phi component Streaming-based design Global vertex state <current state>... <next state>... stripped Host Processors (Xeon) ( ) Meta transfer... (T, ) LF LR (T, ) (T, ) I... local... T EP ( 6 cores) I I... Tile core transfer (cache locality) T T... T t+... T T... Tile Prefetching Active tiles NVMe processing (concurrent I/O) Xeon Phi I (local, own memory) per socket (T, ) GR GR... per Xeon Phi ( ) T I / Global reducers (memory locality) PCIe messaging via ring buffer data transfer tile meta local data per tile process Steffen Maass Mosaic: Trillion Edges on a Single Machine June 8, 07 / 8

Solros: A Data-Centric Operating System Architecture for Heterogeneous Computing

Solros: A Data-Centric Operating System Architecture for Heterogeneous Computing Solros: A Data-Centric Operating System Architecture for Heterogeneous Computing Changwoo Min, Woonhak Kang, Mohan Kumar, Sanidhya Kashyap, Steffen Maass, Heeseung Jo, Taesoo Kim Virginia Tech, ebay, Georgia

More information

GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. Xiaowei ZHU Tsinghua University

GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. Xiaowei ZHU Tsinghua University GridGraph: Large-Scale Graph Processing on a Single Machine Using -Level Hierarchical Partitioning Xiaowei ZHU Tsinghua University Widely-Used Graph Processing Shared memory Single-node & in-memory Ligra,

More information

NUMA-aware Graph-structured Analytics

NUMA-aware Graph-structured Analytics NUMA-aware Graph-structured Analytics Kaiyuan Zhang, Rong Chen, Haibo Chen Institute of Parallel and Distributed Systems Shanghai Jiao Tong University, China Big Data Everywhere 00 Million Tweets/day 1.11

More information

Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18

Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Accelerating PageRank using Partition-Centric Processing Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Outline Introduction Partition-centric Processing Methodology Analytical Evaluation

More information

GraphChi: Large-Scale Graph Computation on Just a PC

GraphChi: Large-Scale Graph Computation on Just a PC OSDI 12 GraphChi: Large-Scale Graph Computation on Just a PC Aapo Kyrölä (CMU) Guy Blelloch (CMU) Carlos Guestrin (UW) In co- opera+on with the GraphLab team. BigData with Structure: BigGraph social graph

More information

GraFBoost: Using accelerated flash storage for external graph analytics

GraFBoost: Using accelerated flash storage for external graph analytics GraFBoost: Using accelerated flash storage for external graph analytics Sang-Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu and Arvind MIT CSAIL Funded by: 1 Large Graphs are Found Everywhere in Nature

More information

Scalable GPU Graph Traversal!

Scalable GPU Graph Traversal! Scalable GPU Graph Traversal Duane Merrill, Michael Garland, and Andrew Grimshaw PPoPP '12 Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming Benwen Zhang

More information

X-Stream: Edge-centric Graph Processing using Streaming Partitions

X-Stream: Edge-centric Graph Processing using Streaming Partitions X-Stream: Edge-centric Graph Processing using Streaming Partitions Amitabha Roy, Ivo Mihailovic, Willy Zwaenepoel {amitabha.roy, ivo.mihailovic, willy.zwaenepoel}@epfl.ch EPFL Abstract X-Stream is a system

More information

Efficiently Multiplying Sparse Matrix - Sparse Vector for Social Network Analysis

Efficiently Multiplying Sparse Matrix - Sparse Vector for Social Network Analysis Efficiently Multiplying Sparse Matrix - Sparse Vector for Social Network Analysis Ashish Kumar Mentors - David Bader, Jason Riedy, Jiajia Li Indian Institute of Technology, Jodhpur 6 June 2014 Ashish Kumar

More information

X-Stream: A Case Study in Building a Graph Processing System. Amitabha Roy (LABOS)

X-Stream: A Case Study in Building a Graph Processing System. Amitabha Roy (LABOS) X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1 X-Stream Graph processing system Single Machine Works on graphs stored Entirely in RAM Entirely in SSD Entirely on Magnetic

More information

Practical Near-Data Processing for In-Memory Analytics Frameworks

Practical Near-Data Processing for In-Memory Analytics Frameworks Practical Near-Data Processing for In-Memory Analytics Frameworks Mingyu Gao, Grant Ayers, Christos Kozyrakis Stanford University http://mast.stanford.edu PACT Oct 19, 2015 Motivating Trends End of Dennard

More information

A Framework for Processing Large Graphs in Shared Memory

A Framework for Processing Large Graphs in Shared Memory A Framework for Processing Large Graphs in Shared Memory Julian Shun Based on joint work with Guy Blelloch and Laxman Dhulipala (Work done at Carnegie Mellon University) 2 What are graphs? Vertex Edge

More information

Jure Leskovec Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah

Jure Leskovec Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah Jure Leskovec (@jure) Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah 2 My research group at Stanford: Mining and modeling large social and information networks

More information

Report. X-Stream Edge-centric Graph processing

Report. X-Stream Edge-centric Graph processing Report X-Stream Edge-centric Graph processing Yassin Hassan hassany@student.ethz.ch Abstract. X-Stream is an edge-centric graph processing system, which provides an API for scatter gather algorithms. The

More information

A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader

A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader A New Parallel Algorithm for Connected Components in Dynamic Graphs Robert McColl Oded Green David Bader Overview The Problem Target Datasets Prior Work Parent-Neighbor Subgraph Results Conclusions Problem

More information

Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU

Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU Lifan Xu Wei Wang Marco A. Alvarez John Cavazos Dongping Zhang Department of Computer and Information Science University of Delaware

More information

FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs

FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs Da Zheng, Disa Mhembere, Randal Burns Department of Computer Science Johns Hopkins University Carey E. Priebe Department of Applied

More information

Extreme-scale Graph Analysis on Blue Waters

Extreme-scale Graph Analysis on Blue Waters Extreme-scale Graph Analysis on Blue Waters 2016 Blue Waters Symposium George M. Slota 1,2, Siva Rajamanickam 1, Kamesh Madduri 2, Karen Devine 1 1 Sandia National Laboratories a 2 The Pennsylvania State

More information

PEGASUS: A peta-scale graph mining system Implementation. and observations. U. Kang, C. E. Tsourakakis, C. Faloutsos

PEGASUS: A peta-scale graph mining system Implementation. and observations. U. Kang, C. E. Tsourakakis, C. Faloutsos PEGASUS: A peta-scale graph mining system Implementation and observations U. Kang, C. E. Tsourakakis, C. Faloutsos What is Pegasus? Open source Peta Graph Mining Library Can deal with very large Giga-,

More information

GPU Sparse Graph Traversal. Duane Merrill

GPU Sparse Graph Traversal. Duane Merrill GPU Sparse Graph Traversal Duane Merrill Breadth-first search of graphs (BFS) 1. Pick a source node 2. Rank every vertex by the length of shortest path from source Or label every vertex by its predecessor

More information

Efficient and Simplified Parallel Graph Processing over CPU and MIC

Efficient and Simplified Parallel Graph Processing over CPU and MIC Efficient and Simplified Parallel Graph Processing over CPU and MIC Linchuan Chen Xin Huo Bin Ren Surabhi Jain Gagan Agrawal Department of Computer Science and Engineering The Ohio State University Columbus,

More information

Wedge A New Frontier for Pull-based Graph Processing. Samuel Grossman and Christos Kozyrakis Platform Lab Retreat June 8, 2018

Wedge A New Frontier for Pull-based Graph Processing. Samuel Grossman and Christos Kozyrakis Platform Lab Retreat June 8, 2018 Wedge A New Frontier for Pull-based Graph Processing Samuel Grossman and Christos Kozyrakis Platform Lab Retreat June 8, 2018 Graph Processing Problems modelled as objects (vertices) and connections between

More information

Automatic Scaling Iterative Computations. Aug. 7 th, 2012

Automatic Scaling Iterative Computations. Aug. 7 th, 2012 Automatic Scaling Iterative Computations Guozhang Wang Cornell University Aug. 7 th, 2012 1 What are Non-Iterative Computations? Non-iterative computation flow Directed Acyclic Examples Batch style analytics

More information

RStream:Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine

RStream:Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine RStream:Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine Guoqing Harry Xu Kai Wang, Zhiqiang Zuo, John Thorpe, Tien Quang Nguyen, UCLA Nanjing University Facebook

More information

Fast Iterative Graph Computation: A Path Centric Approach

Fast Iterative Graph Computation: A Path Centric Approach Fast Iterative Graph Computation: A Path Centric Approach Pingpeng Yuan*, Wenya Zhang, Changfeng Xie, Hai Jin* Services Computing Technology and System Lab. Cluster and Grid Computing Lab. School of Computer

More information

Data-Intensive Computing with MapReduce

Data-Intensive Computing with MapReduce Data-Intensive Computing with MapReduce Session 5: Graph Processing Jimmy Lin University of Maryland Thursday, February 21, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Graph Partitioning for Scalable Distributed Graph Computations

Graph Partitioning for Scalable Distributed Graph Computations Graph Partitioning for Scalable Distributed Graph Computations Aydın Buluç ABuluc@lbl.gov Kamesh Madduri madduri@cse.psu.edu 10 th DIMACS Implementation Challenge, Graph Partitioning and Graph Clustering

More information

Pregel. Ali Shah

Pregel. Ali Shah Pregel Ali Shah s9alshah@stud.uni-saarland.de 2 Outline Introduction Model of Computation Fundamentals of Pregel Program Implementation Applications Experiments Issues with Pregel 3 Outline Costs of Computation

More information

Understanding Manycore Scalability of File Systems. Changwoo Min, Sanidhya Kashyap, Steffen Maass Woonhak Kang, and Taesoo Kim

Understanding Manycore Scalability of File Systems. Changwoo Min, Sanidhya Kashyap, Steffen Maass Woonhak Kang, and Taesoo Kim Understanding Manycore Scalability of File Systems Changwoo Min, Sanidhya Kashyap, Steffen Maass Woonhak Kang, and Taesoo Kim Application must parallelize I/O operations Death of single core CPU scaling

More information

Tanuj Kr Aasawat, Tahsin Reza, Matei Ripeanu Networked Systems Laboratory (NetSysLab) University of British Columbia

Tanuj Kr Aasawat, Tahsin Reza, Matei Ripeanu Networked Systems Laboratory (NetSysLab) University of British Columbia How well do CPU, GPU and Hybrid Graph Processing Frameworks Perform? Tanuj Kr Aasawat, Tahsin Reza, Matei Ripeanu Networked Systems Laboratory (NetSysLab) University of British Columbia Networked Systems

More information

Graph Data Management

Graph Data Management Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn Leitow Overview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of

More information

Efficient graph computation on hybrid CPU and GPU systems

Efficient graph computation on hybrid CPU and GPU systems J Supercomput (2015) 71:1563 1586 DOI 10.1007/s11227-015-1378-z Efficient graph computation on hybrid CPU and GPU systems Tao Zhang Jingjie Zhang Wei Shu Min-You Wu Xiaoyao Liang Published online: 21 January

More information

Exploring the Hidden Dimension in Graph Processing

Exploring the Hidden Dimension in Graph Processing Exploring the Hidden Dimension in Graph Processing Mingxing Zhang, Yongwei Wu, Kang Chen, *Xuehai Qian, Xue Li, and Weimin Zheng Tsinghua University *University of Shouthern California Graph is Ubiquitous

More information

NXgraph: An Efficient Graph Processing System on a Single Machine

NXgraph: An Efficient Graph Processing System on a Single Machine NXgraph: An Efficient Graph Processing System on a Single Machine Yuze Chi, Guohao Dai, Yu Wang, Guangyu Sun, Guoliang Li and Huazhong Yang Tsinghua National Laboratory for Information Science and Technology,

More information

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV Joe Eaton Ph.D.

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV Joe Eaton Ph.D. NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Agenda Accelerated Computing nvgraph New Features Coming Soon Dynamic Graphs GraphBLAS 2 ACCELERATED COMPUTING 10x Performance

More information

Optimizing Cache Performance for Graph Analytics. Yunming Zhang Presentation

Optimizing Cache Performance for Graph Analytics. Yunming Zhang Presentation Optimizing Cache Performance for Graph Analytics Yunming Zhang 6.886 Presentation Goals How to optimize in-memory graph applications How to go about performance engineering just about anything In-memory

More information

Graph-Processing Systems. (focusing on GraphChi)

Graph-Processing Systems. (focusing on GraphChi) Graph-Processing Systems (focusing on GraphChi) Recall: PageRank in MapReduce (Hadoop) Input: adjacency matrix H D F S (a,[c]) (b,[a]) (c,[a,b]) (c,pr(a) / out (a)), (a,[c]) (a,pr(b) / out (b)), (b,[a])

More information

Extreme-scale Graph Analysis on Blue Waters

Extreme-scale Graph Analysis on Blue Waters Extreme-scale Graph Analysis on Blue Waters 2016 Blue Waters Symposium George M. Slota 1,2, Siva Rajamanickam 1, Kamesh Madduri 2, Karen Devine 1 1 Sandia National Laboratories a 2 The Pennsylvania State

More information

NXgraph: An Efficient Graph Processing System on a Single Machine

NXgraph: An Efficient Graph Processing System on a Single Machine NXgraph: An Efficient Graph Processing System on a Single Machine arxiv:151.91v1 [cs.db] 23 Oct 215 Yuze Chi, Guohao Dai, Yu Wang, Guangyu Sun, Guoliang Li and Huazhong Yang Tsinghua National Laboratory

More information

Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets

Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets Nadathur Satish, Narayanan Sundaram, Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, M. Amber Hassaan, Shubho Sengupta, Zhaoming

More information

The GraphGrind Framework: Fast Graph Analytics on Large. Shared-Memory Systems

The GraphGrind Framework: Fast Graph Analytics on Large. Shared-Memory Systems Memory-Centric System Level Design of Heterogeneous Embedded DSP Systems The GraphGrind Framework: Scott Johan Fischaber, BSc (Hon) Fast Graph Analytics on Large Shared-Memory Systems Jiawen Sun, PhD candidate

More information

Squeezing out All the Value of Loaded Data: An Out-of-core Graph Processing System with Reduced Disk I/O

Squeezing out All the Value of Loaded Data: An Out-of-core Graph Processing System with Reduced Disk I/O Squeezing out All the Value of Loaded Data: An Out-of-core Graph Processing System with Reduced Disk I/O Zhiyuan Ai, Mingxing Zhang, and Yongwei Wu, Department of Computer Science and Technology, Tsinghua

More information

AS an alternative to distributed graph processing, singlemachine

AS an alternative to distributed graph processing, singlemachine JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST 018 1 CLIP: A Disk I/O Focused Parallel Out-of-core Graph Processing System Zhiyuan Ai, Mingxing Zhang, Yongwei Wu, Senior Member, IEEE, Xuehai

More information

Graph Processing. Connor Gramazio Spiros Boosalis

Graph Processing. Connor Gramazio Spiros Boosalis Graph Processing Connor Gramazio Spiros Boosalis Pregel why not MapReduce? semantics: awkward to write graph algorithms efficiency: mapreduces serializes state (e.g. all nodes and edges) while pregel keeps

More information

From Think Like a Vertex to Think Like a Graph. Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, John McPherson

From Think Like a Vertex to Think Like a Graph. Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, John McPherson From Think Like a Vertex to Think Like a Graph Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, John McPherson Large Scale Graph Processing Graph data is everywhere and growing

More information

Harp-DAAL for High Performance Big Data Computing

Harp-DAAL for High Performance Big Data Computing Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big

More information

Distributed computing: index building and use

Distributed computing: index building and use Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 60 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Pregel: A System for Large-Scale Graph Processing

More information

A Study of Partitioning Policies for Graph Analytics on Large-scale Distributed Platforms

A Study of Partitioning Policies for Graph Analytics on Large-scale Distributed Platforms A Study of Partitioning Policies for Graph Analytics on Large-scale Distributed Platforms Gurbinder Gill, Roshan Dathathri, Loc Hoang, Keshav Pingali Department of Computer Science, University of Texas

More information

SociaLite: A Datalog-based Language for

SociaLite: A Datalog-based Language for SociaLite: A Datalog-based Language for Large-Scale Graph Analysis Jiwon Seo M OBIS OCIAL RESEARCH GROUP Overview Overview! SociaLite: language for large-scale graph analysis! Extensions to Datalog! Compiler

More information

CS 5220: Parallel Graph Algorithms. David Bindel

CS 5220: Parallel Graph Algorithms. David Bindel CS 5220: Parallel Graph Algorithms David Bindel 2017-11-14 1 Graphs Mathematically: G = (V, E) where E V V Convention: V = n and E = m May be directed or undirected May have weights w V : V R or w E :

More information

Callisto-RTS: Fine-Grain Parallel Loops

Callisto-RTS: Fine-Grain Parallel Loops Callisto-RTS: Fine-Grain Parallel Loops Tim Harris, Oracle Labs Stefan Kaestle, ETH Zurich 7 July 2015 The following is intended to provide some insight into a line of research in Oracle Labs. It is intended

More information

GPU Sparse Graph Traversal

GPU Sparse Graph Traversal GPU Sparse Graph Traversal Duane Merrill (NVIDIA) Michael Garland (NVIDIA) Andrew Grimshaw (Univ. of Virginia) UNIVERSITY of VIRGINIA Breadth-first search (BFS) 1. Pick a source node 2. Rank every vertex

More information

GraphZ: Improving the Performance of Large-Scale Graph Analytics on Small-Scale Machines

GraphZ: Improving the Performance of Large-Scale Graph Analytics on Small-Scale Machines GraphZ: Improving the Performance of Large-Scale Graph Analytics on Small-Scale Machines Zhixuan Zhou Henry Hoffmann University of Chicago, Dept. of Computer Science Abstract Recent programming frameworks

More information

Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery

Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark by Nkemdirim Dockery High Performance Computing Workloads Core-memory sized Floating point intensive Well-structured

More information

Graphs (Part II) Shannon Quinn

Graphs (Part II) Shannon Quinn Graphs (Part II) Shannon Quinn (with thanks to William Cohen and Aapo Kyrola of CMU, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford University) Parallel Graph Computation Distributed computation

More information

GraphMP: An Efficient Semi-External-Memory Big Graph Processing System on a Single Machine

GraphMP: An Efficient Semi-External-Memory Big Graph Processing System on a Single Machine GraphMP: An Efficient Semi-External-Memory Big Graph Processing System on a Single Machine Peng Sun, Yonggang Wen, Ta Nguyen Binh Duong and Xiaokui Xiao School of Computer Science and Engineering, Nanyang

More information

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen

Graph Partitioning for High-Performance Scientific Simulations. Advanced Topics Spring 2008 Prof. Robert van Engelen Graph Partitioning for High-Performance Scientific Simulations Advanced Topics Spring 2008 Prof. Robert van Engelen Overview Challenges for irregular meshes Modeling mesh-based computations as graphs Static

More information

Everything you always wanted to know about multicore graph processing but were afraid to ask

Everything you always wanted to know about multicore graph processing but were afraid to ask Everything you always wanted to know about multicore graph processing but were afraid to ask Jasmina Malicevic, Baptiste Lepers, and Willy Zwaenepoel, EPFL https://www.usenix.org/conference/atc17/technical-sessions/presentation/malicevic

More information

Order or Shuffle: Empirically Evaluating Vertex Order Impact on Parallel Graph Computations

Order or Shuffle: Empirically Evaluating Vertex Order Impact on Parallel Graph Computations Order or Shuffle: Empirically Evaluating Vertex Order Impact on Parallel Graph Computations George M. Slota 1 Sivasankaran Rajamanickam 2 Kamesh Madduri 3 1 Rensselaer Polytechnic Institute, 2 Sandia National

More information

Domain-specific Programming on Graphs

Domain-specific Programming on Graphs Lecture 16: Domain-specific Programming on Graphs Parallel Computer Architecture and Programming Last time: Increasing acceptance of domain-specific programming systems Challenge to programmers: modern

More information

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

MATE-EC2: A Middleware for Processing Data with Amazon Web Services MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering

More information

PathGraph: A Path Centric Graph Processing System

PathGraph: A Path Centric Graph Processing System 1 PathGraph: A Path Centric Graph Processing System Pingpeng Yuan, Changfeng Xie, Ling Liu, Fellow, IEEE, and Hai Jin Senior Member, IEEE Abstract Large scale iterative graph computation presents an interesting

More information

VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC

VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC Jiefeng Cheng 1, Qin Liu 2, Zhenguo Li 1, Wei Fan 1, John C.S. Lui 2, Cheng He 1 1 Huawei Noah s Ark Lab, Hong Kong {cheng.jiefeng,li.zhenguo,david.fanwei,hecheng}@huawei.com

More information

Big Data Analytics Using Hardware-Accelerated Flash Storage

Big Data Analytics Using Hardware-Accelerated Flash Storage Big Data Analytics Using Hardware-Accelerated Flash Storage Sang-Woo Jun University of California, Irvine (Work done while at MIT) Flash Memory Summit, 2018 A Big Data Application: Personalized Genome

More information

Graphs. Edges may be directed (from u to v) or undirected. Undirected edge eqvt to pair of directed edges

Graphs. Edges may be directed (from u to v) or undirected. Undirected edge eqvt to pair of directed edges (p 186) Graphs G = (V,E) Graphs set V of vertices, each with a unique name Note: book calls vertices as nodes set E of edges between vertices, each encoded as tuple of 2 vertices as in (u,v) Edges may

More information

Parallel Combinatorial BLAS and Applications in Graph Computations

Parallel Combinatorial BLAS and Applications in Graph Computations Parallel Combinatorial BLAS and Applications in Graph Computations Aydın Buluç John R. Gilbert University of California, Santa Barbara SIAM ANNUAL MEETING 2009 July 8, 2009 1 Primitives for Graph Computations

More information

Graphs / Networks CSE 6242/ CX Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

Graphs / Networks CSE 6242/ CX Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech CSE 6242/ CX 4242 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic

More information

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation Voldemort Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/29 Outline 1 2 3 Smruti R. Sarangi Leader Election 2/29 Data

More information

Locality-Aware Software Throttling for Sparse Matrix Operation on GPUs

Locality-Aware Software Throttling for Sparse Matrix Operation on GPUs Locality-Aware Software Throttling for Sparse Matrix Operation on GPUs Yanhao Chen 1, Ari B. Hayes 1, Chi Zhang 2, Timothy Salmon 1, Eddy Z. Zhang 1 1. Rutgers University 2. University of Pittsburgh Sparse

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 (Fall 2018) Part 4: Analyzing Graphs (1/2) October 4, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These slides are

More information

Basic Communication Operations (Chapter 4)

Basic Communication Operations (Chapter 4) Basic Communication Operations (Chapter 4) Vivek Sarkar Department of Computer Science Rice University vsarkar@cs.rice.edu COMP 422 Lecture 17 13 March 2008 Review of Midterm Exam Outline MPI Example Program:

More information

On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs

On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs Sungpack Hong 2, Nicole C. Rodia 1, and Kunle Olukotun 1 1 Pervasive Parallelism Laboratory, Stanford University

More information

Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink

Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Rajesh Bordawekar IBM T. J. Watson Research Center bordaw@us.ibm.com Pidad D Souza IBM Systems pidsouza@in.ibm.com 1 Outline

More information

Lecture 13: March 25

Lecture 13: March 25 CISC 879 Software Support for Multicore Architectures Spring 2007 Lecture 13: March 25 Lecturer: John Cavazos Scribe: Ying Yu 13.1. Bryan Youse-Optimization of Sparse Matrix-Vector Multiplication on Emerging

More information

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25 Indexing Jan Chomicki University at Buffalo Jan Chomicki () Indexing 1 / 25 Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow (nanosec) (10 nanosec) (millisec) (sec) Very small Small

More information

Datenbanksysteme II: Modern Hardware. Stefan Sprenger November 23, 2016

Datenbanksysteme II: Modern Hardware. Stefan Sprenger November 23, 2016 Datenbanksysteme II: Modern Hardware Stefan Sprenger November 23, 2016 Content of this Lecture Introduction to Modern Hardware CPUs, Cache Hierarchy Branch Prediction SIMD NUMA Cache-Sensitive Skip List

More information

Heterogeneous Memory Subsystem for Natural Graph Analytics

Heterogeneous Memory Subsystem for Natural Graph Analytics Heterogeneous Memory Subsystem for Natural Graph Analytics Abraham Addisie, Hiwot Kassa, Opeoluwa Matthews, and Valeria Bertacco Computer Science and Engineering, University of Michigan Email: {abrahad,

More information

Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices

Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices Evaluation of Asynchronous Offloading Capabilities of Accelerator Programming Models for Multiple Devices Jonas Hahnfeld 1, Christian Terboven 1, James Price 2, Hans Joachim Pflug 1, Matthias S. Müller

More information

LFGraph: Simple and Fast Distributed Graph Analytics

LFGraph: Simple and Fast Distributed Graph Analytics LFGraph: Simple and Fast Distributed Graph Analytics Imranul Hoque VMware, Inc. ihoque@vmware.com Indranil Gupta University of Illinois, Urbana-Champaign indy@illinois.edu Abstract Distributed graph analytics

More information

Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System

Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System Seunghwa Kang David A. Bader 1 A Challenge Problem Extracting a subgraph from

More information

GTS: A Fast and Scalable Graph Processing Method based on Streaming Topology to GPUs

GTS: A Fast and Scalable Graph Processing Method based on Streaming Topology to GPUs GTS: A Fast and Scalable Graph Processing Method based on Streaming Topology to GPUs Min-Soo Kim, Kyuhyeon An, Himchan Park, Hyunseok Seo, and Jinwook Kim Department of Information and Communication Engineering

More information

Domain-specific programming on graphs

Domain-specific programming on graphs Lecture 25: Domain-specific programming on graphs Parallel Computer Architecture and Programming CMU 15-418/15-618, Fall 2016 1 Last time: Increasing acceptance of domain-specific programming systems Challenge

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

CuSha: Vertex-Centric Graph Processing on GPUs

CuSha: Vertex-Centric Graph Processing on GPUs CuSha: Vertex-Centric Graph Processing on GPUs Farzad Khorasani, Keval Vora, Rajiv Gupta, Laxmi N. Bhuyan HPDC Vancouver, Canada June, Motivation Graph processing Real world graphs are large & sparse Power

More information

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context 1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes

More information

swsptrsv: a Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architecture Xinliang Wang, Weifeng Liu, Wei Xue, Li Wu

swsptrsv: a Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architecture Xinliang Wang, Weifeng Liu, Wei Xue, Li Wu swsptrsv: a Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architecture 1,3 2 1,3 1,3 Xinliang Wang, Weifeng Liu, Wei Xue, Li Wu 1 2 3 Outline 1. Background 2. Sunway architecture

More information

arxiv: v1 [cs.db] 21 Oct 2017

arxiv: v1 [cs.db] 21 Oct 2017 : High-performance external graph analytics Sang-Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu and Arvind Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology

More information

When Graph Meets Big Data: Opportunities and Challenges

When Graph Meets Big Data: Opportunities and Challenges High Performance Graph Data Management and Processing (HPGDM 2016) When Graph Meets Big Data: Opportunities and Challenges Yinglong Xia Huawei Research America 11/13/2016 The International Conference for

More information

Optimization of Lattice QCD with CG and multi-shift CG on Intel Xeon Phi Coprocessor

Optimization of Lattice QCD with CG and multi-shift CG on Intel Xeon Phi Coprocessor Optimization of Lattice QCD with CG and multi-shift CG on Intel Xeon Phi Coprocessor Intel K. K. E-mail: hirokazu.kobayashi@intel.com Yoshifumi Nakamura RIKEN AICS E-mail: nakamura@riken.jp Shinji Takeda

More information

Conflict-Free Vectorization of Associative Irregular Applications with Recent SIMD Architectural Advances

Conflict-Free Vectorization of Associative Irregular Applications with Recent SIMD Architectural Advances Conflict-Free Vectorization of Associative Irregular Applications with Recent SIMD Architectural Advances Peng Jiang Gagan Agrawal Department of Computer Science and Engineering The Ohio State University,

More information

LA3: A Scalable Link- and Locality-Aware Linear Algebra-Based Graph Analytics System

LA3: A Scalable Link- and Locality-Aware Linear Algebra-Based Graph Analytics System LA3: A Scalable Link- and Locality-Aware Linear Algebra-Based Graph Analytics System Yousuf Ahmad, Omar Khattab, Arsal Malik, Ahmad Musleh, Mohammad Hammoud Carnegie Mellon University in Qatar {myahmad,

More information

Leveraging Flash in HPC Systems

Leveraging Flash in HPC Systems Leveraging Flash in HPC Systems IEEE MSST June 3, 2015 This work was performed under the auspices of the U.S. Department of Energy by under Contract DE-AC52-07NA27344. Lawrence Livermore National Security,

More information

Tree-Based Density Clustering using Graphics Processors

Tree-Based Density Clustering using Graphics Processors Tree-Based Density Clustering using Graphics Processors A First Marriage of MRNet and GPUs Evan Samanas and Ben Welton Paradyn Project Paradyn / Dyninst Week College Park, Maryland March 26-28, 2012 The

More information

CGraph: A Correlations-aware Approach for Efficient Concurrent Iterative Graph Processing

CGraph: A Correlations-aware Approach for Efficient Concurrent Iterative Graph Processing : A Correlations-aware Approach for Efficient Concurrent Iterative Graph Processing Yu Zhang, Xiaofei Liao, Hai Jin, and Lin Gu, Huazhong University of Science and Technology; Ligang He, University of

More information

Chaos: Scale-out Graph Processing from Secondary Storage

Chaos: Scale-out Graph Processing from Secondary Storage Chaos: Scale-out Graph Processing from Secondary Storage Amitabha Roy Laurent Bindschaedler Jasmina Malicevic Willy Zwaenepoel Intel EPFL, Switzerland firstname.lastname@{intel.com, epfl.ch } Abstract

More information

GraphQ: Graph Query Processing with Abstraction Refinement -- Scalable and Programmable Analytics over Very Large Graphs on a Single PC

GraphQ: Graph Query Processing with Abstraction Refinement -- Scalable and Programmable Analytics over Very Large Graphs on a Single PC GraphQ: Graph Query Processing with Abstraction Refinement -- Scalable and Programmable Analytics over Very Large Graphs on a Single PC Kai Wang, Guoqing Xu, University of California, Irvine Zhendong Su,

More information

One Trillion Edges. Graph processing at Facebook scale

One Trillion Edges. Graph processing at Facebook scale One Trillion Edges Graph processing at Facebook scale Introduction Platform improvements Compute model extensions Experimental results Operational experience How Facebook improved Apache Giraph Facebook's

More information

Out-Of-Core Sort-First Parallel Rendering for Cluster-Based Tiled Displays

Out-Of-Core Sort-First Parallel Rendering for Cluster-Based Tiled Displays Out-Of-Core Sort-First Parallel Rendering for Cluster-Based Tiled Displays Wagner T. Corrêa James T. Klosowski Cláudio T. Silva Princeton/AT&T IBM OHSU/AT&T EG PGV, Germany September 10, 2002 Goals Render

More information