Communication for PIM-based Graph Processing with Efficient Data Partition. Mingxing Zhang, Youwei Zhuo (equal contribution),

Size: px
Start display at page:

Download "Communication for PIM-based Graph Processing with Efficient Data Partition. Mingxing Zhang, Youwei Zhuo (equal contribution),"

Transcription

1 GraphP: Reducing Communication for PIM-based Graph Processing with Efficient Data Partition Mingxing Zhang, Youwei Zhuo (equal contribution), Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, Xuehai Qian Tsinghua University University of Southern California Stanford University

2 Motivation Outline Graph applications Processing-In-Memory The drawbacks of the current solution GraphP Evaluation

3 Graph Applications Social network analytics Recommendation system Bioinformatics

4 Challenges High bandwidth requirement Small amount of computation per vertex Data movement overhead

5 Challenges High bandwidth requirement Small amount of computation per vertex Data movement overhead comp L1 L2 L3 mem

6 PIM: Processing-In-Memory

7 PIM: Processing-In-Memory Idea: Computation logic inside memory Advantage: High memory bandwidth

8 PIM: Processing-In-Memory Idea: Computation logic inside memory Advantage: High memory bandwidth Example: Hybrid Memory Cubes (HMC)

9 PIM: Processing-In-Memory Idea: Computation logic inside memory Advantage: High memory bandwidth Example: Hybrid Memory Cubes (HMC) 320GB/s intra-cube mem.. mem mem mem comp 4x120GB/s inter-cube

10 HMC: Hybrid Memory Cubes 320 Intra-cube bandwidth (GB/s)

11 HMC: Hybrid Memory Cubes Intra-cube Inter-cube bandwidth (GB/s)

12 HMC: Hybrid Memory Cubes Intra-cube Inter-cube bandwidth (GB/s)

13 HMC: Hybrid Memory Cubes Intra-cube Inter-cube bandwidth (GB/s)

14 HMC: Hybrid Memory Cubes Intra-cube Inter-cube Inter-group bandwidth (GB/s)

15 HMC: Hybrid Memory Cubes Intra-cube Bottleneck: Inter-cube communication Inter-cube Inter-group bandwidth (GB/s)

16 Motivation Outline Graph applications Processing-In-Memory The drawbacks of the current solution GraphP Evaluation

17 Current Solution: Tesseract First PIM-based graph processing architecture Ahn, J., Hong, S., Yoo, S., Mutlu, O., & Choi, K. A scalable processing-inmemory accelerator for parallel graph processing. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

18 Current Solution: Tesseract First PIM-based graph processing architecture Programming model Vertex program Ahn, J., Hong, S., Yoo, S., Mutlu, O., & Choi, K. A scalable processing-inmemory accelerator for parallel graph processing. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

19 Current Solution: Tesseract First PIM-based graph processing architecture Programming model Vertex program Partition Based on vertex program Ahn, J., Hong, S., Yoo, S., Mutlu, O., & Choi, K. A scalable processing-inmemory accelerator for parallel graph processing. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

20 PageRank in Vertex Program for (v: vertices) { update = 0.85 * v.rank / v.out_degree; }

21 PageRank in Vertex Program for (v: vertices) { update = 0.85 * v.rank / v.out_degree; for (w: edges.destination) { put(w.id, function{ w.next_rank += update; }); } }

22 PageRank in Vertex Program for (v: vertices) { update = 0.85 * v.rank / v.out_degree; for (w: edges.destination) { put(w.id, function{ w.next_rank += update; }); } } barrier();

23 Graph Partition hmc vertex hmc1

24 Graph Partition hmc vertex inter edge intra edge hmc1

25 Graph Partition hmc vertex inter edge intra edge comm put(w.id, function{ w.next_rank += update; }); hmc1

26 Graph Partition hmc vertex inter edge intra edge comm communication = # of cross-cube edges hmc1

27 Drawback of Tesseract Excessive data communication Why? Programming Model Graph Partition Data Communication Tesseract

28 Drawback of Tesseract Excessive data communication Why? Programming Model Graph Partition Data Communication Tesseract?

29 Drawback of Tesseract Excessive data communication Why? Programming Model Graph Partition Data Communication Tesseract?

30 Drawback of Tesseract Excessive data communication Why? Programming Model Graph Partition Data Communication Tesseract?

31 Outline Motivation GraphP Evaluation

32 GraphP Consider graph partition first. Graph Partition Source-Cut Programming model Two-phase vertex program Reduces inter-cube communication

33 Source-Cut Partition hmc vertex hmc

34 Source-Cut Partition hmc vertex inter edge intra edge hmc

35 Source-Cut Partition hmc vertex inter edge intra edge 2 replica 2 hmc

36 Source-Cut Partition hmc vertex inter edge intra edge 2 replica 2 hmc

37 Source-Cut Partition hmc vertex inter edge intra edge 2 replica 2 hmc

38 Two-Phase Vertex Program for (r: replicas) { r.next_rank = 0.85 * r.next_rank / r.out_degree; } //apply updates from previous iterations

39 Two-Phase Vertex Program for (r: replicas) { r.next_rank = 0.85 * r.next_rank / r.out_degree; } //apply updates from previous iterations

40 Two-Phase Vertex Program

41 Two-Phase Vertex Program for (v: vertices) { for (u: edges.sources) { }

42 Two-Phase Vertex Program for (v: vertices) { for (u: edges.sources) { } update += u.rank;

43 Two-Phase Vertex Program for (v: vertices) { for (u: edges.sources) { } update += u.rank;

44 Two-Phase Vertex Program } for (r: replicas) { put(r.id, function { r.next_rank = update}); } 0 2 barrier();

45 Benefits Strictly less data communication Enables architecture optimizations

46 Less Communication Tesseract GraphP

47 Less Communication Tesseract GraphP

48 Broadcast Optimization for (r: replicas) { } put(r.id, function { r.next_rank = update}); } barrier(); 4 broadcast 4 4 4

49 Naïve Broadcast 15 point to point messages src dst dst dst dst

50 Hierarchical communication 3 intergroup messages src dst dst dst dst

51 Other Optimizations Computation/communication overlap Leveraging low-power state of SerDes Please see the paper for more details

52 Outline Motivation GraphP Evaluation

53 Evaluation Methodology Simulation Infrastructure zsim with HMC support ORION for NOC Energy modeling Configurations Same as Tesseract 16 HMCs Interconnection: Dragonfly and Mesh2D 512 CPUs Single-issue in-order cores Frequency: 1GHz

54 4 graph algorithms Workloads 5 real-world graphs

55 4 graph algorithms Workloads Breadth First Search Single Source Shortest Path Weakly Connected Component PageRank 5 real-world graphs Wiki-Vote (WV) ego-twitter (TT) Soc-Slashdot0902 (SD) Amazon0302 (AZ) ljournal-2008 (LJ)

56 Performance memory bandwidth Tesseract

57 Speedup Performance data partition 1.7x 5 memory bandwidth 0 DDR3 Tesseract SOTA GraphP-SC GraphP-SC -BRD

58 Speedup Performance 20 <1.1x data partition 1.7x 5 memory bandwidth 0 DDR3 Tesseract SOTA GraphP-SC GraphP-SC -BRD

59 Normalized to Tesseract Communication Amount 100% 75% 51.8% Intra-group Inter-group 50% 25% 48.2% 0% 7.1% 0.4% 7.0% 1.7% Tesseract GraphP-SC GraphP-SC -BRD

60 Normalized to Tesseract Energy consumption 100% 100.0% 75% 50% 25% 24.9% 15.9% 0% Tesseract GraphP-SC GraphP-SC -BRD

61 Other results Bandwidth utilization Scalability Replication overhead Please see the paper for more details

62 Conclusions We propose GraphP A new PIM-based graph processing framework Key contributions

63 Conclusions We propose GraphP A new PIM-based graph processing framework Key contributions Data partition as first-order design consideration

64 Conclusions We propose GraphP A new PIM-based graph processing framework Key contributions Data partition as first-order design consideration Source-cut partition

65 Conclusions We propose GraphP A new PIM-based graph processing framework Key contributions Data partition as first-order design consideration Source-cut partition Two-phase vertex program

66 Conclusions We propose GraphP A new PIM-based graph processing framework Key contributions Data partition as first-order design consideration Source-cut partition Two-phase vertex program Enable additional architecture optimizations

67 Conclusions We propose GraphP A new PIM-based graph processing framework Key contributions Data partition as first-order design consideration Source-cut partition Two-phase vertex program Enable additional architecture optimizations GraphP drastically reduces inter-cube communication and improves energy efficiency.

68 GraphP: Reducing Communication for PIM-based Graph Processing with Efficient Data Partition Mingxing Zhang, Youwei Zhuo (equal contribution), Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, Xuehai Qian Tsinghua University University of Southern California Stanford University

69 Workload Size & Capacity 128 GB (16 * 8GB) ~16 billion edges ~400 million edges (SNAP) ~7 billion edges (WebGraph)

70 Two-phase vertex program Equivalent Expressiveness as vertex programs

GraphP: Reducing Communication for PIM-based Graph Processing with Efficient Data Partition

GraphP: Reducing Communication for PIM-based Graph Processing with Efficient Data Partition GraphP: Reducing Communication for PIM-based Graph Processing with Efficient Data Partition Mingxing Zhang Youwei Zhuo Chao Wang Mingyu Gao Yongwei Wu Kang Chen Christos Kozyrakis Xuehai Qian Tsinghua

More information

Practical Near-Data Processing for In-Memory Analytics Frameworks

Practical Near-Data Processing for In-Memory Analytics Frameworks Practical Near-Data Processing for In-Memory Analytics Frameworks Mingyu Gao, Grant Ayers, Christos Kozyrakis Stanford University http://mast.stanford.edu PACT Oct 19, 2015 Motivating Trends End of Dennard

More information

Exploring the Hidden Dimension in Graph Processing

Exploring the Hidden Dimension in Graph Processing Exploring the Hidden Dimension in Graph Processing Mingxing Zhang, Yongwei Wu, Kang Chen, *Xuehai Qian, Xue Li, and Weimin Zheng Tsinghua University *University of Shouthern California Graph is Ubiquitous

More information

HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing

HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing Mingyu Gao and Christos Kozyrakis Stanford University http://mast.stanford.edu HPCA March 14, 2016 PIM is Coming Back End of Dennard

More information

AS we are now in the big data era, the data volume

AS we are now in the big data era, the data volume IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. XX, NO. X, MONTH YEAR 1 GraphH: A Processing-in-Memory Architecture for Large-scale Graph Processing Guohao Dai, Student

More information

Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search

Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search Jialiang Zhang, Soroosh Khoram and Jing Li 1 Outline Background Big graph analytics Hybrid

More information

Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18

Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Accelerating PageRank using Partition-Centric Processing Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Outline Introduction Partition-centric Processing Methodology Analytical Evaluation

More information

Sort vs. Hash Join Revisited for Near-Memory Execution. Nooshin Mirzadeh, Onur Kocberber, Babak Falsafi, Boris Grot

Sort vs. Hash Join Revisited for Near-Memory Execution. Nooshin Mirzadeh, Onur Kocberber, Babak Falsafi, Boris Grot Sort vs. Hash Join Revisited for Near-Memory Execution Nooshin Mirzadeh, Onur Kocberber, Babak Falsafi, Boris Grot 1 Near-Memory Processing (NMP) Emerging technology Stacked memory: A logic die w/ a stack

More information

PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory

PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in ReRAM-based Main Memory Scalable and Energy-Efficient Architecture Lab (SEAL) PRIME: A Novel Processing-in-memory Architecture for Neural Network Computation in -based Main Memory Ping Chi *, Shuangchen Li *, Tao Zhang, Cong

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #12 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Last class Outline

More information

Wedge A New Frontier for Pull-based Graph Processing. Samuel Grossman and Christos Kozyrakis Platform Lab Retreat June 8, 2018

Wedge A New Frontier for Pull-based Graph Processing. Samuel Grossman and Christos Kozyrakis Platform Lab Retreat June 8, 2018 Wedge A New Frontier for Pull-based Graph Processing Samuel Grossman and Christos Kozyrakis Platform Lab Retreat June 8, 2018 Graph Processing Problems modelled as objects (vertices) and connections between

More information

FPGP: Graph Processing Framework on FPGA

FPGP: Graph Processing Framework on FPGA FPGP: Graph Processing Framework on FPGA Guohao DAI, Yuze CHI, Yu WANG, Huazhong YANG E.E. Dept., TNLIST, Tsinghua University dgh14@mails.tsinghua.edu.cn 1 Big graph is widely used Big graph is widely

More information

GraFBoost: Using accelerated flash storage for external graph analytics

GraFBoost: Using accelerated flash storage for external graph analytics GraFBoost: Using accelerated flash storage for external graph analytics Sang-Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu and Arvind MIT CSAIL Funded by: 1 Large Graphs are Found Everywhere in Nature

More information

On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs

On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs Sungpack Hong 2, Nicole C. Rodia 1, and Kunle Olukotun 1 1 Pervasive Parallelism Laboratory, Stanford University

More information

Squeezing out All the Value of Loaded Data: An Out-of-core Graph Processing System with Reduced Disk I/O

Squeezing out All the Value of Loaded Data: An Out-of-core Graph Processing System with Reduced Disk I/O Squeezing out All the Value of Loaded Data: An Out-of-core Graph Processing System with Reduced Disk I/O Zhiyuan Ai, Mingxing Zhang, and Yongwei Wu, Department of Computer Science and Technology, Tsinghua

More information

A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing

A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing Junwhan Ahn Sungpack Hong Sungjoo Yoo Onur Mutlu Kiyoung Choi junwhan@snu.ac.kr, sungpack.hong@oracle.com, sungjoo.yoo@gmail.com,

More information

Deploying Multiple Service Chain (SC) Instances per Service Chain BY ABHISHEK GUPTA FRIDAY GROUP MEETING APRIL 21, 2017

Deploying Multiple Service Chain (SC) Instances per Service Chain BY ABHISHEK GUPTA FRIDAY GROUP MEETING APRIL 21, 2017 Deploying Multiple Service Chain (SC) Instances per Service Chain BY ABHISHEK GUPTA FRIDAY GROUP MEETING APRIL 21, 2017 Virtual Network Function (VNF) Service Chain (SC) 2 Multiple VNF SC Placement and

More information

TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory

TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, Christos Kozyrakis Stanford University Platform Lab Review Feb 2017 Deep Neural

More information

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV Joe Eaton Ph.D.

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV Joe Eaton Ph.D. NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Agenda Accelerated Computing nvgraph New Features Coming Soon Dynamic Graphs GraphBLAS 2 ACCELERATED COMPUTING 10x Performance

More information

Prometheus: Processing-in-memory Heterogeneous Architecture Design From a Multi-layer Network Theoretic Strategy

Prometheus: Processing-in-memory Heterogeneous Architecture Design From a Multi-layer Network Theoretic Strategy Prometheus: Processing-in-memory Heterogeneous Architecture Design From a Multi-layer Network Theoretic Strategy Yao Xiao, Shahin Nazarian, Paul Bogdan Department of Electrical Engineering University of

More information

Scaling Neural Network Acceleration using Coarse-Grained Parallelism

Scaling Neural Network Acceleration using Coarse-Grained Parallelism Scaling Neural Network Acceleration using Coarse-Grained Parallelism Mingyu Gao, Xuan Yang, Jing Pu, Mark Horowitz, Christos Kozyrakis Stanford University Platform Lab Review Feb 2018 Neural Networks (NNs)

More information

AS an alternative to distributed graph processing, singlemachine

AS an alternative to distributed graph processing, singlemachine JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST 018 1 CLIP: A Disk I/O Focused Parallel Out-of-core Graph Processing System Zhiyuan Ai, Mingxing Zhang, Yongwei Wu, Senior Member, IEEE, Xuehai

More information

arxiv: v1 [cs.dc] 21 Aug 2017

arxiv: v1 [cs.dc] 21 Aug 2017 GraphR: Accelerating Graph Processing Using ReRAM arxiv:78.648v [cs.dc] Aug 7 Linghao Song Duke University linghao.song@duke.edu ABSTRACT Youwei Zhuo University of Southern California youweizh@usc.edu

More information

ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture

ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture Guohao Dai 1, Tianhao Huang 1, Yuze Chi 2, Ningyi Xu 3, Yu Wang 1, Huazhong Yang 1 1 Tsinghua University, 2 UCLA, 3 MSRA dgh14@mails.tsinghua.edu.cn

More information

GraphR: Accelerating Graph Processing Using ReRAM

GraphR: Accelerating Graph Processing Using ReRAM : Accelerating Graph Processing Using ReRAM Linghao Song *, Youwei Zhuo #, Xuehai Qian #, Hai Li * and Yiran Chen * * Duke University, # University of Southern California * {linghao.song, hai.li, yiran.chen}@duke.edu,

More information

Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System

Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System Seunghwa Kang David A. Bader 1 A Challenge Problem Extracting a subgraph from

More information

Graph Data Management

Graph Data Management Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn Leitow Overview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:

More information

Tanuj Kr Aasawat, Tahsin Reza, Matei Ripeanu Networked Systems Laboratory (NetSysLab) University of British Columbia

Tanuj Kr Aasawat, Tahsin Reza, Matei Ripeanu Networked Systems Laboratory (NetSysLab) University of British Columbia How well do CPU, GPU and Hybrid Graph Processing Frameworks Perform? Tanuj Kr Aasawat, Tahsin Reza, Matei Ripeanu Networked Systems Laboratory (NetSysLab) University of British Columbia Networked Systems

More information

A Server-based Approach for Predictable GPU Access Control

A Server-based Approach for Predictable GPU Access Control A Server-based Approach for Predictable GPU Access Control Hyoseung Kim * Pratyush Patel Shige Wang Raj Rajkumar * University of California, Riverside Carnegie Mellon University General Motors R&D Benefits

More information

NUMA-aware Graph-structured Analytics

NUMA-aware Graph-structured Analytics NUMA-aware Graph-structured Analytics Kaiyuan Zhang, Rong Chen, Haibo Chen Institute of Parallel and Distributed Systems Shanghai Jiao Tong University, China Big Data Everywhere 00 Million Tweets/day 1.11

More information

High-Performance Graph Primitives on the GPU: Design and Implementation of Gunrock

High-Performance Graph Primitives on the GPU: Design and Implementation of Gunrock High-Performance Graph Primitives on the GPU: Design and Implementation of Gunrock Yangzihao Wang University of California, Davis yzhwang@ucdavis.edu March 24, 2014 Yangzihao Wang (yzhwang@ucdavis.edu)

More information

DNA Interaction Network

DNA Interaction Network Social Network Web Network Social Network DNA Interaction Network Follow Network User-Product Network Nonuniform network comm costs Contentiousness of the memory subsystems Nonuniform comp requirement

More information

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200

More information

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS

ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS ZSIM: FAST AND ACCURATE MICROARCHITECTURAL SIMULATION OF THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CHRISTOS KOZYRAKIS STANFORD ISCA-40 JUNE 27, 2013 Introduction 2 Current detailed simulators are slow (~200

More information

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based

More information

Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs

Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs Sungpack Hong 2, Nicole C. Rodia 1, and Kunle Olukotun 1 1 Pervasive Parallelism Laboratory, Stanford University 2 Oracle

More information

Network-on-Chip Architecture

Network-on-Chip Architecture Multiple Processor Systems(CMPE-655) Network-on-Chip Architecture Performance aspect and Firefly network architecture By Siva Shankar Chandrasekaran and SreeGowri Shankar Agenda (Enhancing performance)

More information

Buffered Co-scheduling: A New Methodology for Multitasking Parallel Jobs on Distributed Systems

Buffered Co-scheduling: A New Methodology for Multitasking Parallel Jobs on Distributed Systems National Alamos Los Laboratory Buffered Co-scheduling: A New Methodology for Multitasking Parallel Jobs on Distributed Systems Fabrizio Petrini and Wu-chun Feng {fabrizio,feng}@lanl.gov Los Alamos National

More information

Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks Amirali Boroumand

Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks Amirali Boroumand Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks Amirali Boroumand Saugata Ghose, Youngsok Kim, Rachata Ausavarungnirun, Eric Shiu, Rahul Thakur, Daehyun Kim, Aki Kuusela, Allan

More information

Nonblocking Memory Refresh. Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian

Nonblocking Memory Refresh. Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian Nonblocking Memory Refresh Kate Nguyen, Kehan Lyu, Xianze Meng, Vilas Sridharan, Xun Jian Latency (ns) History of DRAM 2 Refresh Latency Bus Cycle Time Min. Read Latency 512 550 16 13.5 0.5 0.75 1968 DRAM

More information

PREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING

PREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING PREGEL: A SYSTEM FOR LARGE- SCALE GRAPH PROCESSING G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, G. Czajkowski Google, Inc. SIGMOD 2010 Presented by Ke Hong (some figures borrowed from

More information

Bolt: I Know What You Did Last Summer In the Cloud

Bolt: I Know What You Did Last Summer In the Cloud Bolt: I Know What You Did Last Summer In the Cloud Christina Delimitrou1 and Christos Kozyrakis2 1Cornell University, 2Stanford University Platform Lab Review February 2018 Executive Summary Problem: cloud

More information

Jure Leskovec Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah

Jure Leskovec Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah Jure Leskovec (@jure) Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah 2 My research group at Stanford: Mining and modeling large social and information networks

More information

G(B)enchmark GraphBench: Towards a Universal Graph Benchmark. Khaled Ammar M. Tamer Özsu

G(B)enchmark GraphBench: Towards a Universal Graph Benchmark. Khaled Ammar M. Tamer Özsu G(B)enchmark GraphBench: Towards a Universal Graph Benchmark Khaled Ammar M. Tamer Özsu Bioinformatics Software Engineering Social Network Gene Co-expression Protein Structure Program Flow Big Graphs o

More information

Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters

Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters Presentation at GTC 2014 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

Evaluating On-Node GPU Interconnects for Deep Learning Workloads

Evaluating On-Node GPU Interconnects for Deep Learning Workloads Evaluating On-Node GPU Interconnects for Deep Learning Workloads NATHAN TALLENT, NITIN GAWANDE, CHARLES SIEGEL ABHINAV VISHNU, ADOLFY HOISIE Pacific Northwest National Lab PMBS 217 (@ SC) November 13,

More information

Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU

Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU Parallelization of Shortest Path Graph Kernels on Multi-Core CPUs and GPU Lifan Xu Wei Wang Marco A. Alvarez John Cavazos Dongping Zhang Department of Computer and Information Science University of Delaware

More information

Bolt: I Know What You Did Last Summer In the Cloud

Bolt: I Know What You Did Last Summer In the Cloud Bolt: I Know What You Did Last Summer In the Cloud Christina Delimitrou 1 and Christos Kozyrakis 2 1 Cornell University, 2 Stanford University ASPLOS April 12 th 2017 Executive Summary Problem: cloud resource

More information

Harp-DAAL for High Performance Big Data Computing

Harp-DAAL for High Performance Big Data Computing Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big

More information

SWAP: EFFECTIVE FINE-GRAIN MANAGEMENT

SWAP: EFFECTIVE FINE-GRAIN MANAGEMENT : EFFECTIVE FINE-GRAIN MANAGEMENT OF SHARED LAST-LEVEL CACHES WITH MINIMUM HARDWARE SUPPORT Xiaodong Wang, Shuang Chen, Jeff Setter, and José F. Martínez Computer Systems Lab Cornell University Page 1

More information

Oasis: An Active Storage Framework for Object Storage Platform

Oasis: An Active Storage Framework for Object Storage Platform Oasis: An Active Storage Framework for Object Storage Platform Yulai Xie 1, Dan Feng 1, Darrell D. E. Long 2, Yan Li 2 1 School of Computer, Huazhong University of Science and Technology Wuhan National

More information

In-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017

In-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017 In-Network Computing Paving the Road to Exascale 5th Annual MVAPICH User Group (MUG) Meeting, August 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric

More information

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics Presented by: Dishant Mittal Authors: Juwei Shi, Yunjie Qiu, Umar Firooq Minhas, Lemei Jiao, Chen Wang, Berthold Reinwald and Fatma

More information

Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink

Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Optimizing Out-of-Core Nearest Neighbor Problems on Multi-GPU Systems Using NVLink Rajesh Bordawekar IBM T. J. Watson Research Center bordaw@us.ibm.com Pidad D Souza IBM Systems pidsouza@in.ibm.com 1 Outline

More information

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric

DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric DRAF: A Low-Power DRAM-based Reconfigurable Acceleration Fabric Mingyu Gao, Christina Delimitrou, Dimin Niu, Krishna Malladi, Hongzhong Zheng, Bob Brennan, Christos Kozyrakis ISCA June 22, 2016 FPGA-Based

More information

Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh

Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh Accelerating Pointer Chasing in 3D-Stacked : Challenges, Mechanisms, Evaluation Kevin Hsieh Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, Onur Mutlu Executive Summary

More information

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection

An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection An Evaluation of an Energy Efficient Many-Core SoC with Parallelized Face Detection Hiroyuki Usui, Jun Tanabe, Toru Sano, Hui Xu, and Takashi Miyamori Toshiba Corporation, Kawasaki, Japan Copyright 2013,

More information

Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions

Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing

More information

A Row Buffer Locality-Aware Caching Policy for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu

A Row Buffer Locality-Aware Caching Policy for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu A Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu Overview Emerging memories such as PCM offer higher density than

More information

Low-Cost Inter-Linked Subarrays (LISA) Enabling Fast Inter-Subarray Data Movement in DRAM

Low-Cost Inter-Linked Subarrays (LISA) Enabling Fast Inter-Subarray Data Movement in DRAM Low-Cost Inter-Linked ubarrays (LIA) Enabling Fast Inter-ubarray Data Movement in DRAM Kevin Chang rashant Nair, Donghyuk Lee, augata Ghose, Moinuddin Qureshi, and Onur Mutlu roblem: Inefficient Bulk Data

More information

Upgrade to Microsoft SQL Server 2016 with Dell EMC Infrastructure

Upgrade to Microsoft SQL Server 2016 with Dell EMC Infrastructure Upgrade to Microsoft SQL Server 2016 with Dell EMC Infrastructure Generational Comparison Study of Microsoft SQL Server Dell Engineering February 2017 Revisions Date Description February 2017 Version 1.0

More information

Towards Performance and Scalability Analysis of Distributed Memory Programs on Large-Scale Clusters

Towards Performance and Scalability Analysis of Distributed Memory Programs on Large-Scale Clusters Towards Performance and Scalability Analysis of Distributed Memory Programs on Large-Scale Clusters 1 University of California, Santa Barbara, 2 Hewlett Packard Labs, and 3 Hewlett Packard Enterprise 1

More information

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

Ambry: LinkedIn s Scalable Geo- Distributed Object Store Ambry: LinkedIn s Scalable Geo- Distributed Object Store Shadi A. Noghabi *, Sriram Subramanian +, Priyesh Narayanan +, Sivabalan Narayanan +, Gopalakrishna Holla +, Mammad Zadeh +, Tianwei Li +, Indranil

More information

A Review on Backup-up Practices using Deduplication

A Review on Backup-up Practices using Deduplication Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 9, September 2015,

More information

Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights

Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights Efficient Document Analytics on Compressed Data: Method, Challenges, Algorithms, Insights Feng Zhang, Jidong Zhai, Xipeng Shen #, Onur Mutlu, Wenguang Chen Renmin University of China Tsinghua University

More information

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference

Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee

More information

Rack-Scale Optical Network for High Performance Computing Systems

Rack-Scale Optical Network for High Performance Computing Systems Rack-Scale Optical Network for High Performance Computing Systems Peng Yang, Zhengbin Pang, Zhifei Wang Zhehui Wang, Min Xie, Xuanqi Chen, Luan H. K. Duong, Jiang Xu Outline Introduction Rack-scale inter/intra-chip

More information

Near Memory Key/Value Lookup Acceleration MemSys 2017

Near Memory Key/Value Lookup Acceleration MemSys 2017 Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy

More information

Fine-Grained Task Migration for Graph Algorithms using Processing in Memory

Fine-Grained Task Migration for Graph Algorithms using Processing in Memory Fine-Grained Task Migration for Graph Algorithms using Processing in Memory Paula Aguilera 1, Dong Ping Zhang 2, Nam Sung Kim 3 and Nuwan Jayasena 2 1 Dept. of Electrical and Computer Engineering University

More information

STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip

STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip Codesign for Tiled Manycore Systems Mingyu Wang and Zhaolin Li Institute of Microelectronics, Tsinghua University, Beijing 100084,

More information

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced

More information

A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications

A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications Li Tan 1, Zizhong Chen 1, Ziliang Zong 2, Rong Ge 3, and Dong Li 4 1 University of California, Riverside 2 Texas

More information

When Graph Meets Big Data: Opportunities and Challenges

When Graph Meets Big Data: Opportunities and Challenges High Performance Graph Data Management and Processing (HPGDM 2016) When Graph Meets Big Data: Opportunities and Challenges Yinglong Xia Huawei Research America 11/13/2016 The International Conference for

More information

Flexible Architecture Research Machine (FARM)

Flexible Architecture Research Machine (FARM) Flexible Architecture Research Machine (FARM) RAMP Retreat June 25, 2009 Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan Bronson Christos Kozyrakis, Kunle Olukotun Motivation Why CPUs + FPGAs make sense

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #8 2/7/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class

More information

Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture

Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture Tiered-Latency DRAM: A Low Latency and A Low Cost DRAM Architecture Donghyuk Lee, Yoongu Kim, Vivek Seshadri, Jamie Liu, Lavanya Subramanian, Onur Mutlu Carnegie Mellon University HPCA - 2013 Executive

More information

Staged Memory Scheduling

Staged Memory Scheduling Staged Memory Scheduling Rachata Ausavarungnirun, Kevin Chang, Lavanya Subramanian, Gabriel H. Loh*, Onur Mutlu Carnegie Mellon University, *AMD Research June 12 th 2012 Executive Summary Observation:

More information

Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation!

Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation! Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation! Xiangyao Yu 1, Christopher Hughes 2, Nadathur Satish 2, Onur Mutlu 3, Srinivas Devadas 1 1 MIT 2 Intel Labs 3 ETH Zürich 1 High-Bandwidth

More information

Irregular Graph Algorithms on Parallel Processing Systems

Irregular Graph Algorithms on Parallel Processing Systems Irregular Graph Algorithms on Parallel Processing Systems George M. Slota 1,2 Kamesh Madduri 1 (advisor) Sivasankaran Rajamanickam 2 (Sandia mentor) 1 Penn State University, 2 Sandia National Laboratories

More information

Recent Advances in Heterogeneous Computing using Charm++

Recent Advances in Heterogeneous Computing using Charm++ Recent Advances in Heterogeneous Computing using Charm++ Jaemin Choi, Michael Robson Parallel Programming Laboratory University of Illinois Urbana-Champaign April 12, 2018 1 / 24 Heterogeneous Computing

More information

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC Three Consortia Formed in Oct 2016 Gen-Z Open CAPI CCIX complex to rack scale memory fabric Cache coherent accelerator

More information

Design and Test Solutions for Networks-on-Chip. Jin-Ho Ahn Hoseo University

Design and Test Solutions for Networks-on-Chip. Jin-Ho Ahn Hoseo University Design and Test Solutions for Networks-on-Chip Jin-Ho Ahn Hoseo University Topics Introduction NoC Basics NoC-elated esearch Topics NoC Design Procedure Case Studies of eal Applications NoC-Based SoC Testing

More information

A Survey on Realizing Memory-Optimized Distributed Graph Processing

A Survey on Realizing Memory-Optimized Distributed Graph Processing IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 08, Issue 8 (August. 2018), V (V) PP 55-61 www.iosrjen.org A Survey on Realizing Memory-Optimized Distributed Graph Processing

More information

Planar: Parallel Lightweight Architecture-Aware Adaptive Graph Repartitioning

Planar: Parallel Lightweight Architecture-Aware Adaptive Graph Repartitioning Planar: Parallel Lightweight Architecture-Aware Adaptive Graph Repartitioning Angen Zheng, Alexandros Labrinidis, and Panos K. Chrysanthis University of Pittsburgh 1 Graph Partitioning Applications of

More information

High-performance Graph Analytics

High-performance Graph Analytics High-performance Graph Analytics Kamesh Madduri Computer Science and Engineering The Pennsylvania State University madduri@cse.psu.edu Papers, code, slides at graphanalysis.info Acknowledgments NSF grants

More information

PULP: Fast and Simple Complex Network Partitioning

PULP: Fast and Simple Complex Network Partitioning PULP: Fast and Simple Complex Network Partitioning George Slota #,* Kamesh Madduri # Siva Rajamanickam * # The Pennsylvania State University *Sandia National Laboratories Dagstuhl Seminar 14461 November

More information

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

MATE-EC2: A Middleware for Processing Data with Amazon Web Services MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering

More information

HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array

HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array Linghao Song, Jiachen Mao, Youwei Zhuo, Xuehai Qian, Hai Li, Yiran Chen Duke University, University of Southern California {linghao.song,

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 60 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Pregel: A System for Large-Scale Graph Processing

More information

PuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks

PuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks PuLP: Scalable Multi-Objective Multi-Constraint Partitioning for Small-World Networks George M. Slota 1,2 Kamesh Madduri 2 Sivasankaran Rajamanickam 1 1 Sandia National Laboratories, 2 The Pennsylvania

More information

Reduce Costs & Increase Oracle Database OLTP Workload Service Levels:

Reduce Costs & Increase Oracle Database OLTP Workload Service Levels: Reduce Costs & Increase Oracle Database OLTP Workload Service Levels: PowerEdge 2950 Consolidation to PowerEdge 11th Generation A Dell Technical White Paper Dell Database Solutions Engineering Balamurugan

More information

Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10. Onur Kahraman

Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10. Onur Kahraman Sub-Second Response Times with New In-Memory Analytics in MicroStrategy 10 Onur Kahraman High Performance Is No Longer A Nice To Have In Analytical Applications Users expect Google Like performance from

More information

Bioinformatics Framework

Bioinformatics Framework Persona: A High-Performance Bioinformatics Framework Stuart Byma 1, Sam Whitlock 1, Laura Flueratoru 2, Ethan Tseng 3, Christos Kozyrakis 4, Edouard Bugnion 1, James Larus 1 EPFL 1, U. Polytehnica of Bucharest

More information

Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search

Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search oosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for readth First Search Jialiang Zhang, Soroosh Khoram and Jing Li Department of Electrical and Computer Engineering

More information

Architectural Implications on the Performance and Cost of Graph Analytics Systems

Architectural Implications on the Performance and Cost of Graph Analytics Systems Architectural Implications on the Performance and Cost of Graph Analytics Systems Qizhen Zhang,, Hongzhi Chen, Da Yan,, James Cheng, Boon Thau Loo, and Purushotham Bangalore University of Pennsylvania

More information

Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G.

Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G. Authors: Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, L., Leiser, N., Czjkowski, G. Speaker: Chong Li Department: Applied Health Science Program: Master of Health Informatics 1 Term

More information

Argo: Architecture- Aware Graph Par33oning

Argo: Architecture- Aware Graph Par33oning Argo: Architecture- Aware Graph Par33oning Angen Zheng Alexandros Labrinidis, Panos K. Chrysanthis, and Jack Lange Department of Computer Science, University of PiCsburgh hcp://db.cs.pic.edu/group/ hcp://www.prognosgclab.org/

More information

A Framework for Processing Large Graphs in Shared Memory

A Framework for Processing Large Graphs in Shared Memory A Framework for Processing Large Graphs in Shared Memory Julian Shun Based on joint work with Guy Blelloch and Laxman Dhulipala (Work done at Carnegie Mellon University) 2 What are graphs? Vertex Edge

More information

Techniques to improve the scalability of Checkpoint-Restart

Techniques to improve the scalability of Checkpoint-Restart Techniques to improve the scalability of Checkpoint-Restart Bogdan Nicolae Exascale Systems Group IBM Research Ireland 1 Outline A few words about the lab and team Challenges of Exascale A case for Checkpoint-Restart

More information