GraFBoost: Using accelerated flash storage for external graph analytics

Size: px
Start display at page:

Download "GraFBoost: Using accelerated flash storage for external graph analytics"

Transcription

1 GraFBoost: Using accelerated flash storage for external graph analytics Sang-Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu and Arvind MIT CSAIL Funded by: 1

2 Large Graphs are Found Everywhere in Nature Human neural network Structure of the internet TB to 100s of TB in size! Social networks 1) Connectomics graph of the brain - Source: Connectomics and Graph Theory Jon Clayden, UCL 2) (Part of) the internet - Source: Criteo Engineering Blog 3) The Graph of a Social Network Source: Griff s Graphs 2

3 Storage for Graph Analytics Extremely sparse Irregular structure TB DRAM of DRAM Terabytes in size $$$ $8000/TB, 200W Our goal: $ $500/TB, 10W 3

4 Random Access Challenge in Flash Storage Flash DRAM Bandwidth: 3.6 GB/s ~100 GB/s Access Granularity: 8192 Bytes 128 Bytes Wastes performance by not using most of fetched page Using 8 bytes in a 8192 Byte page uses 1/1024 of bandwidth! 4

5 Two Pillars for Flash in GraFBoost Flash Storage Sort-Reduce Algorithm Hardware Acceleration Sequentialize Random Access Reduce Overhead of Sort-Reduce System resources Power consumption Performance 5

6 Vertex-Centric Programming Model Popular model for efficient parallism and distribution Vertex program only sees neighbors Algorithm executed in terms of disjoint iterations Vertex program is executed on one or more Active Vertices Active Vertices

7 Algorithmic Representation of a Vertex Program Iteration for each v src in ActiveList do for each e(v src, v dst ) in G do ev = edge_program (v src,. val, e. weight) v dst. next_val = vertex_update(v dst. next_val, ev) end for end for Random read-modify update into vertex data Sort-reduce algorithm solves this issue 7

8 General Problem of Irregular Array Updates Updating an array x with a stream of update requests xs and update function f For each < idx, arg > in xs: x idx = f(x idx, arg) X xs f Random Updates 8

9 Solution Part One - Sort Sort xs according to index X xs Sort Sorted xs Sequential Updates Much better than naïve random updates Terabyte graphs can generate terabyte logs Significant sorting overhead 9

10 Solution Part Two - Reduce Associative update function f can be interleaved with sort merge merge 1 7 xs e.g., (A + B) + C = A + (B + C) Reduced Overhead 10

11 Normalized size of update stream Big Benefits from Interleaving Reduction Ratio of data copied at each sort phase Kron32 Kron32 WDC Kron30 WDC Twitter 90% Merge Iteration 11

12 Accelerated GraFBoost Architecture In-storage accelerator reduces data movement and cost Software Host (Server/PC/Laptop) FPGA Accelerator-Aware Flash Management Multirate Aggregator Multirate 16-to-1 Merge-Sorter Wire-Speed On-chip Sorter Flash Edge Data Vertex Data Partially Sort- Reduced Files Active Vertices 12

13 Accelerated GraFBoost Architecture In-storage accelerator reduces data movement and cost Software Host (Server/PC/Laptop) FPGA Edge Program Vertex Value Update Log (xs) 1GB DRAM Sort-Reduce Accelerator Edge Property Flash Edge Data Vertex Data Partially Sort- Reduced Files Active Vertices 13

14 Evaluated Graph Analytic Systems In-memory Semi-External External GraphLab (IN) FlashGraph (SE1) X-Stream (SE2) Projected system with 2x memory bandwidth GraphChi (EX) GraFSoft GraFBoost GraFBoost2 Distributed GraphLab: a framework for machine learning and data mining in the cloud, VLDB 2012 FlashGraph: Processing billion-node graphs on an array of commodity SSDs, FAST 2015 X-Stream: edge-centric graph processing using streaming partitions, SOSP 2013 GraphChi: Large-scale graph computation on just a PC, USENIX

15 Evaluation Environment + 32-core Xeon 128 GB RAM 5x 0.5TB PCIe Flash $8000 All software experiments 4-core i5 4 GB RAM $400 + Virtex 7 FPGA 1TB custom flash 1GB on-board RAM $1000 $??? 15

16 Evaluation Result Overview In-memory Semi-External External GraFBoost Large graphs: Medium graphs: Fail Fail Slow Fast Fail Fast Slow Fast Small graphs: Fast Fast Slow Fast GraFBoost has very low resource requirement Memory, CPU, Power 16

17 Normalized Performance Results with a Large Graph: Synthetic Scale 32 Kronecker Graph 0.5 TB in text, 4 Billion vertices GraphLab (IN) out of memory GraphChi (EX) did not finish x 2.8x 10x GraFSoft 0 PR BFS BC SE1 SE2 GraFBoost GraFBoost2 17

18 Normalized Performance Results with a Large Graph: Web Data Commons Web Crawl 4 2 TB in text, 3 Billion vertices GraphLab (IN) out of memory GraphChi (EX) did not finish GraFSoft 0 PR BFS BC SE1 SE2 GraFBoost GraFBoost2 18

19 Normalized Performance Results with a Large Graph: Web Data Commons Web Crawl 4 2 TB in text, 3 Billion vertices GraphLab (IN) out of memory GraphChi (EX) did not finish 3 Only GraFBoost succeeds in both graphs 2 1 GraFBoost can run still larger graphs! GraFSoft 0 PR BFS BC SE1 SE2 GraFBoost GraFBoost2 19

20 Normalized Performance Results with Smaller Graphs: Breadth-First Search TB 0.04 Billion Fastest! Slowest 0.09 TB 0.3 Billion Slowest 0.3 TB 1 Billion Slowest twitter kron28 kron30 IN SE1 SE2 EX GraFBoost GraFBoost2 GraFSoft 20

21 Normalized Performance Results with a Medium Graph: Against an In-Memory Cluster Synthesized Kronecker Scale TB in text, 0.3 Billion vertices PR BFS 5xIN SE1 SE2 EX GraFBoost GraFSoft 21

22 GB Threads Watts GraFBoost Reduces Resource Requirements Conventional GraFBoost External analytics Conventional GraFBoost Hardware Acceleration Conventional GraFBoost External Analytics + Hardware Acceleration 22

23 Future Work Open-source GraFBoost Cleaning up code for users! Acceleration using Amazon F1 Commercial accelerated storage Collaborating with Samsung More applications using sort-reduce Bioinformatics collaboration with Barcelona Supercomputering Center 23

24 Thank You Flash Storage Sort-Reduce Algorithm Hardware Acceleration 24

25 25

26 BlueDBM Cluster Architecture Host Server (24-Core) FPGA (VC707) minflash minflash 1GB RAM Ethernet Host Server (24-Core) FPGA (VC707) minflash minflash 1 TB PCIe 4GB/s FMC 10Gbps network 8 Host Server (24-Core) FPGA (VC707) minflash minflash Uniform latency of 100 µs! 26

27 The BlueDBM Cluster BlueDBM Storage Device 27

Big Data Analytics Using Hardware-Accelerated Flash Storage

Big Data Analytics Using Hardware-Accelerated Flash Storage Big Data Analytics Using Hardware-Accelerated Flash Storage Sang-Woo Jun University of California, Irvine (Work done while at MIT) Flash Memory Summit, 2018 A Big Data Application: Personalized Genome

More information

BlueDBM: An Appliance for Big Data Analytics*

BlueDBM: An Appliance for Big Data Analytics* BlueDBM: An Appliance for Big Data Analytics* Arvind *[ISCA, 2015] Sang-Woo Jun, Ming Liu, Sungjin Lee, Shuotao Xu, Arvind (MIT) and Jamey Hicks, John Ankcorn, Myron King(Quanta) BigData@CSAIL Annual Meeting

More information

GraFBoost: Using accelerated flash storage for external graph analytics

GraFBoost: Using accelerated flash storage for external graph analytics : Using accelerated flash storage for external graph analytics Sang-Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu and Arvind Department of Electrical Engineering and Computer Science Massachusetts Institute

More information

HADP Talk BlueDBM: An appliance for Big Data Analytics

HADP Talk BlueDBM: An appliance for Big Data Analytics HADP Talk BlueDBM: An appliance for Big Data Analytics Sang-Woo Jun* Ming Liu* Sungjin Lee* Jamey Hicks+ John Ankcorn+ Myron King+ Shuotao Xu* Arvind* *MIT Computer Science and Artificial Intelligence

More information

arxiv: v1 [cs.db] 21 Oct 2017

arxiv: v1 [cs.db] 21 Oct 2017 : High-performance external graph analytics Sang-Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu and Arvind Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology

More information

Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18

Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Accelerating PageRank using Partition-Centric Processing Kartik Lakhotia, Rajgopal Kannan, Viktor Prasanna USENIX ATC 18 Outline Introduction Partition-centric Processing Methodology Analytical Evaluation

More information

NUMA-aware Graph-structured Analytics

NUMA-aware Graph-structured Analytics NUMA-aware Graph-structured Analytics Kaiyuan Zhang, Rong Chen, Haibo Chen Institute of Parallel and Distributed Systems Shanghai Jiao Tong University, China Big Data Everywhere 00 Million Tweets/day 1.11

More information

FPGP: Graph Processing Framework on FPGA

FPGP: Graph Processing Framework on FPGA FPGP: Graph Processing Framework on FPGA Guohao DAI, Yuze CHI, Yu WANG, Huazhong YANG E.E. Dept., TNLIST, Tsinghua University dgh14@mails.tsinghua.edu.cn 1 Big graph is widely used Big graph is widely

More information

GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. Xiaowei ZHU Tsinghua University

GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. Xiaowei ZHU Tsinghua University GridGraph: Large-Scale Graph Processing on a Single Machine Using -Level Hierarchical Partitioning Xiaowei ZHU Tsinghua University Widely-Used Graph Processing Shared memory Single-node & in-memory Ligra,

More information

FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs

FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs Da Zheng, Disa Mhembere, Randal Burns Department of Computer Science Johns Hopkins University Carey E. Priebe Department of Applied

More information

Mosaic: Processing a Trillion-Edge Graph on a Single Machine

Mosaic: Processing a Trillion-Edge Graph on a Single Machine Mosaic: Processing a Trillion-Edge Graph on a Single Machine Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, Taesoo Kim Georgia Institute of Technology Best Student Paper @ EuroSys

More information

BlueDBM: An Appliance for Big Data Analytics

BlueDBM: An Appliance for Big Data Analytics BlueDBM: An Appliance for Big Data Analytics Sang-Woo Jun Ming Liu Sungjin Lee Jamey Hicks John Ankcorn Myron King Shuotao Xu Arvind Department of Electrical Engineering and Computer Science Massachusetts

More information

GraphQ: Graph Query Processing with Abstraction Refinement -- Scalable and Programmable Analytics over Very Large Graphs on a Single PC

GraphQ: Graph Query Processing with Abstraction Refinement -- Scalable and Programmable Analytics over Very Large Graphs on a Single PC GraphQ: Graph Query Processing with Abstraction Refinement -- Scalable and Programmable Analytics over Very Large Graphs on a Single PC Kai Wang, Guoqing Xu, University of California, Irvine Zhendong Su,

More information

Optimizing Cache Performance for Graph Analytics. Yunming Zhang Presentation

Optimizing Cache Performance for Graph Analytics. Yunming Zhang Presentation Optimizing Cache Performance for Graph Analytics Yunming Zhang 6.886 Presentation Goals How to optimize in-memory graph applications How to go about performance engineering just about anything In-memory

More information

Jure Leskovec Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah

Jure Leskovec Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah Jure Leskovec (@jure) Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah 2 My research group at Stanford: Mining and modeling large social and information networks

More information

Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search

Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search Jialiang Zhang, Soroosh Khoram and Jing Li 1 Outline Background Big graph analytics Hybrid

More information

NOHOST: A New Storage Architecture for Distributed Storage Systems. Chanwoo Chung

NOHOST: A New Storage Architecture for Distributed Storage Systems. Chanwoo Chung NOHOST: A New Storage Architecture for Distributed Storage Systems by Chanwoo Chung B.S., Seoul National University (2014) Submitted to the Department of Electrical Engineering and Computer Science in

More information

SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research

SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research 1 The world s most valuable resource Data is everywhere! May. 2017 Values from Data! Need infrastructures for

More information

Graph Data Management

Graph Data Management Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn Leitow Overview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of

More information

A Framework for Processing Large Graphs in Shared Memory

A Framework for Processing Large Graphs in Shared Memory A Framework for Processing Large Graphs in Shared Memory Julian Shun Based on joint work with Guy Blelloch and Laxman Dhulipala (Work done at Carnegie Mellon University) 2 What are graphs? Vertex Edge

More information

Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing

Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing Laxman Dhulipala Joint work with Guy Blelloch and Julian Shun SPAA 07 Giant graph datasets Graph V E (symmetrized) com-orkut

More information

Database Acceleration Solution Using FPGAs and Integrated Flash Storage

Database Acceleration Solution Using FPGAs and Integrated Flash Storage Database Acceleration Solution Using FPGAs and Integrated Flash Storage HK Verma, Xilinx Inc. August 2017 1 FPGA Analytics in Flash Storage System In-memory or Flash storage based DB reduce disk access

More information

Memory-Based Cloud Architectures

Memory-Based Cloud Architectures Memory-Based Cloud Architectures ( Or: Technical Challenges for OnDemand Business Software) Jan Schaffner Enterprise Platform and Integration Concepts Group Example: Enterprise Benchmarking -) *%'+,#$)

More information

Report. X-Stream Edge-centric Graph processing

Report. X-Stream Edge-centric Graph processing Report X-Stream Edge-centric Graph processing Yassin Hassan hassany@student.ethz.ch Abstract. X-Stream is an edge-centric graph processing system, which provides an API for scatter gather algorithms. The

More information

RStream:Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine

RStream:Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine RStream:Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine Guoqing Harry Xu Kai Wang, Zhiqiang Zuo, John Thorpe, Tien Quang Nguyen, UCLA Nanjing University Facebook

More information

Tanuj Kr Aasawat, Tahsin Reza, Matei Ripeanu Networked Systems Laboratory (NetSysLab) University of British Columbia

Tanuj Kr Aasawat, Tahsin Reza, Matei Ripeanu Networked Systems Laboratory (NetSysLab) University of British Columbia How well do CPU, GPU and Hybrid Graph Processing Frameworks Perform? Tanuj Kr Aasawat, Tahsin Reza, Matei Ripeanu Networked Systems Laboratory (NetSysLab) University of British Columbia Networked Systems

More information

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010

Moneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010 Moneta: A High-performance Storage Array Architecture for Nextgeneration, Non-volatile Memories Micro 2010 NVM-based SSD NVMs are replacing spinning-disks Performance of disks has lagged NAND flash showed

More information

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic WHITE PAPER Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive

More information

High-Performance Graph Primitives on the GPU: Design and Implementation of Gunrock

High-Performance Graph Primitives on the GPU: Design and Implementation of Gunrock High-Performance Graph Primitives on the GPU: Design and Implementation of Gunrock Yangzihao Wang University of California, Davis yzhwang@ucdavis.edu March 24, 2014 Yangzihao Wang (yzhwang@ucdavis.edu)

More information

Persistent Memory. High Speed and Low Latency. White Paper M-WP006

Persistent Memory. High Speed and Low Latency. White Paper M-WP006 Persistent Memory High Speed and Low Latency White Paper M-WP6 Corporate Headquarters: 3987 Eureka Dr., Newark, CA 9456, USA Tel: (51) 623-1231 Fax: (51) 623-1434 E-mail: info@smartm.com Customer Service:

More information

Communication for PIM-based Graph Processing with Efficient Data Partition. Mingxing Zhang, Youwei Zhuo (equal contribution),

Communication for PIM-based Graph Processing with Efficient Data Partition. Mingxing Zhang, Youwei Zhuo (equal contribution), GraphP: Reducing Communication for PIM-based Graph Processing with Efficient Data Partition Mingxing Zhang, Youwei Zhuo (equal contribution), Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis,

More information

Maximizing Fraud Prevention Through Disruptive Architectures Delivering speed at scale.

Maximizing Fraud Prevention Through Disruptive Architectures Delivering speed at scale. Maximizing Fraud Prevention Through Disruptive Architectures Delivering speed at scale. January 2016 Credit Card Fraud prevention is among the most time-sensitive and high-value of IT tasks. The databases

More information

SociaLite: A Datalog-based Language for

SociaLite: A Datalog-based Language for SociaLite: A Datalog-based Language for Large-Scale Graph Analysis Jiwon Seo M OBIS OCIAL RESEARCH GROUP Overview Overview! SociaLite: language for large-scale graph analysis! Extensions to Datalog! Compiler

More information

ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture

ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture ForeGraph: Exploring Large-scale Graph Processing on Multi-FPGA Architecture Guohao Dai 1, Tianhao Huang 1, Yuze Chi 2, Ningyi Xu 3, Yu Wang 1, Huazhong Yang 1 1 Tsinghua University, 2 UCLA, 3 MSRA dgh14@mails.tsinghua.edu.cn

More information

Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets

Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets Nadathur Satish, Narayanan Sundaram, Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, M. Amber Hassaan, Shubho Sengupta, Zhaoming

More information

X-Stream: Edge-centric Graph Processing using Streaming Partitions

X-Stream: Edge-centric Graph Processing using Streaming Partitions X-Stream: Edge-centric Graph Processing using Streaming Partitions Amitabha Roy, Ivo Mihailovic, Willy Zwaenepoel {amitabha.roy, ivo.mihailovic, willy.zwaenepoel}@epfl.ch EPFL Abstract X-Stream is a system

More information

Efficient graph computation on hybrid CPU and GPU systems

Efficient graph computation on hybrid CPU and GPU systems J Supercomput (2015) 71:1563 1586 DOI 10.1007/s11227-015-1378-z Efficient graph computation on hybrid CPU and GPU systems Tao Zhang Jingjie Zhang Wei Shu Min-You Wu Xiaoyao Liang Published online: 21 January

More information

The Future of High Performance Computing

The Future of High Performance Computing The Future of High Performance Computing Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Comparing Two Large-Scale Systems Oakridge Titan Google Data Center 2 Monolithic supercomputer

More information

X-Stream: A Case Study in Building a Graph Processing System. Amitabha Roy (LABOS)

X-Stream: A Case Study in Building a Graph Processing System. Amitabha Roy (LABOS) X-Stream: A Case Study in Building a Graph Processing System Amitabha Roy (LABOS) 1 X-Stream Graph processing system Single Machine Works on graphs stored Entirely in RAM Entirely in SSD Entirely on Magnetic

More information

New Approach to Unstructured Data

New Approach to Unstructured Data Innovations in All-Flash Storage Deliver a New Approach to Unstructured Data Table of Contents Developing a new approach to unstructured data...2 Designing a new storage architecture...2 Understanding

More information

Evaluation of the Chelsio T580-CR iscsi Offload adapter

Evaluation of the Chelsio T580-CR iscsi Offload adapter October 2016 Evaluation of the Chelsio T580-CR iscsi iscsi Offload makes a difference Executive Summary As application processing demands increase and the amount of data continues to grow, getting this

More information

When Graph Meets Big Data: Opportunities and Challenges

When Graph Meets Big Data: Opportunities and Challenges High Performance Graph Data Management and Processing (HPGDM 2016) When Graph Meets Big Data: Opportunities and Challenges Yinglong Xia Huawei Research America 11/13/2016 The International Conference for

More information

Jignesh M. Patel. Blog:

Jignesh M. Patel. Blog: Jignesh M. Patel Blog: http://bigfastdata.blogspot.com Go back to the design Query Cache from Processing for Conscious 98s Modern (at Algorithms Hardware least for Hash Joins) 995 24 2 Processor Processor

More information

A 101 Guide to Heterogeneous, Accelerated, Data Centric Computing Architectures

A 101 Guide to Heterogeneous, Accelerated, Data Centric Computing Architectures A 101 Guide to Heterogeneous, Accelerated, Centric Computing Architectures Allan Cantle President & Founder, Nallatech Join the Conversation #OpenPOWERSummit 2016 OpenPOWER Foundation Buzzword & Acronym

More information

Data Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research

Data Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research The CRAIL Project: Overview Data Processing Framework (e.g., Spark, TensorFlow, λ Compute) Spark-IO FS Albis Streaming

More information

FAWN. A Fast Array of Wimpy Nodes. David Andersen, Jason Franklin, Michael Kaminsky*, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan

FAWN. A Fast Array of Wimpy Nodes. David Andersen, Jason Franklin, Michael Kaminsky*, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan FAWN A Fast Array of Wimpy Nodes David Andersen, Jason Franklin, Michael Kaminsky*, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan Carnegie Mellon University *Intel Labs Pittsburgh Energy in computing

More information

A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader

A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader A New Parallel Algorithm for Connected Components in Dynamic Graphs Robert McColl Oded Green David Bader Overview The Problem Target Datasets Prior Work Parent-Neighbor Subgraph Results Conclusions Problem

More information

Practical Near-Data Processing for In-Memory Analytics Frameworks

Practical Near-Data Processing for In-Memory Analytics Frameworks Practical Near-Data Processing for In-Memory Analytics Frameworks Mingyu Gao, Grant Ayers, Christos Kozyrakis Stanford University http://mast.stanford.edu PACT Oct 19, 2015 Motivating Trends End of Dennard

More information

Data Intensive Computing SUBTITLE WITH TWO LINES OF TEXT IF NECESSARY PASIG June, 2009

Data Intensive Computing SUBTITLE WITH TWO LINES OF TEXT IF NECESSARY PASIG June, 2009 Data Intensive Computing SUBTITLE WITH TWO LINES OF TEXT IF NECESSARY PASIG June, 2009 Presenter s Name Simon CW See Title & and Division HPC Cloud Computing Sun Microsystems Technology Center Sun Microsystems,

More information

Searching for Meaning in the Era of Big Data and IoT

Searching for Meaning in the Era of Big Data and IoT Searching for Meaning in the Era of Big Data and IoT Trung Tran MIT Lincoln Labs GraphEx Conference 11 May 2016 Distribution Statement A MTO Strategy EM Spectrum Tactical Information Extraction Globalization

More information

Application-Managed Flash

Application-Managed Flash Application-Managed Flash Sungjin Lee*, Ming Liu, Sangwoo Jun, Shuotao Xu, Jihong Kim and Arvind *Inha University Massachusetts Institute of Technology Seoul National University Operating System Support

More information

3D NAND Technology Scaling helps accelerate AI growth

3D NAND Technology Scaling helps accelerate AI growth 3D NAND Technology Scaling helps accelerate AI growth Jung Yoon, Ranjana Godse IBM Supply Chain Engineering Andrew Walls IBM Flash Systems August 2018 1 Agenda 3D-NAND Scaling & AI Flash density trend

More information

GAIL The Graph Algorithm Iron Law

GAIL The Graph Algorithm Iron Law GAIL The Graph Algorithm Iron Law Scott Beamer, Krste Asanović, David Patterson GAP Berkeley Electrical Engineering & Computer Sciences gap.cs.berkeley.edu Graph Applications Social Network Analysis Recommendations

More information

Toward a Memory-centric Architecture

Toward a Memory-centric Architecture Toward a Memory-centric Architecture Martin Fink EVP & Chief Technology Officer Western Digital Corporation August 8, 2017 1 SAFE HARBOR DISCLAIMERS Forward-Looking Statements This presentation contains

More information

Big Data Systems on Future Hardware. Bingsheng He NUS Computing

Big Data Systems on Future Hardware. Bingsheng He NUS Computing Big Data Systems on Future Hardware Bingsheng He NUS Computing http://www.comp.nus.edu.sg/~hebs/ 1 Outline Challenges for Big Data Systems Why Hardware Matters? Open Challenges Summary 2 3 ANYs in Big

More information

The 3D-Memory Evolution

The 3D-Memory Evolution The 3D-Memory Evolution ISC 2015 /, Director Marcom + SBD EMEA Legal Disclaimer This presentation is intended to provide information concerning computer and memory industries. We do our best to make sure

More information

Breadth First Search on Cost efficient Multi GPU Systems

Breadth First Search on Cost efficient Multi GPU Systems Breadth First Search on Cost efficient Multi Systems Takuji Mitsuishi Keio University 3 14 1 Hiyoshi, Yokohama, 223 8522, Japan mits@am.ics.keio.ac.jp Masaki Kan NEC Corporation 1753, Shimonumabe, Nakahara

More information

Parallel graph traversal for FPGA

Parallel graph traversal for FPGA LETTER IEICE Electronics Express, Vol.11, No.7, 1 6 Parallel graph traversal for FPGA Shice Ni a), Yong Dou, Dan Zou, Rongchun Li, and Qiang Wang National Laboratory for Parallel and Distributed Processing,

More information

Hardware NVMe implementation on cache and storage systems

Hardware NVMe implementation on cache and storage systems Hardware NVMe implementation on cache and storage systems Jerome Gaysse, IP-Maker Santa Clara, CA 1 Agenda Hardware architecture NVMe for storage NVMe for cache/application accelerator NVMe for new NVM

More information

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018 Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster

More information

DDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1

DDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1 1 DDN DDN Updates DataDirect Neworks Japan, Inc Nobu Hashizume DDN Storage 2018 DDN Storage 1 2 DDN A Broad Range of Technologies to Best Address Your Needs Your Use Cases Research Big Data Enterprise

More information

Apache Spark Graph Performance with Memory1. February Page 1 of 13

Apache Spark Graph Performance with Memory1. February Page 1 of 13 Apache Spark Graph Performance with Memory1 February 2017 Page 1 of 13 Abstract Apache Spark is a powerful open source distributed computing platform focused on high speed, large scale data processing

More information

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

PHX: Memory Speed HPC I/O with NVM. Pradeep Fernando Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan

PHX: Memory Speed HPC I/O with NVM. Pradeep Fernando Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan PHX: Memory Speed HPC I/O with NVM Pradeep Fernando Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan Node Local Persistent I/O? Node local checkpoint/ restart - Recover from transient failures ( node restart)

More information

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Parallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple

More information

15-388/688 - Practical Data Science: Big data and MapReduce. J. Zico Kolter Carnegie Mellon University Spring 2018

15-388/688 - Practical Data Science: Big data and MapReduce. J. Zico Kolter Carnegie Mellon University Spring 2018 15-388/688 - Practical Data Science: Big data and MapReduce J. Zico Kolter Carnegie Mellon University Spring 2018 1 Outline Big data Some context in distributed computing map + reduce MapReduce MapReduce

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Part 6 Input/Output Israel Koren ECE568/Koren Part.6. CPU performance keeps increasing 26 72-core Xeon

More information

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage DDN DDN Updates Data DirectNeworks Japan, Inc Shuichi Ihara DDN A Broad Range of Technologies to Best Address Your Needs Protection Security Data Distribution and Lifecycle Management Open Monitoring Your

More information

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications

More information

Efficient and Simplified Parallel Graph Processing over CPU and MIC

Efficient and Simplified Parallel Graph Processing over CPU and MIC Efficient and Simplified Parallel Graph Processing over CPU and MIC Linchuan Chen Xin Huo Bin Ren Surabhi Jain Gagan Agrawal Department of Computer Science and Engineering The Ohio State University Columbus,

More information

BDCC: Exploiting Fine-Grained Persistent Memories for OLAP. Peter Boncz

BDCC: Exploiting Fine-Grained Persistent Memories for OLAP. Peter Boncz BDCC: Exploiting Fine-Grained Persistent Memories for OLAP Peter Boncz NVRAM System integration: NVMe: block devices on the PCIe bus NVDIMM: persistent RAM, byte-level access Low latency Lower than Flash,

More information

Parallelizing FPGA Technology Mapping using GPUs. Doris Chen Deshanand Singh Aug 31 st, 2010

Parallelizing FPGA Technology Mapping using GPUs. Doris Chen Deshanand Singh Aug 31 st, 2010 Parallelizing FPGA Technology Mapping using GPUs Doris Chen Deshanand Singh Aug 31 st, 2010 Motivation: Compile Time In last 12 years: 110x increase in FPGA Logic, 23x increase in CPU speed, 4.8x gap Question:

More information

Enabling Cost-effective Data Processing with Smart SSD

Enabling Cost-effective Data Processing with Smart SSD Enabling Cost-effective Data Processing with Smart SSD Yangwook Kang, UC Santa Cruz Yang-suk Kee, Samsung Semiconductor Ethan L. Miller, UC Santa Cruz Chanik Park, Samsung Electronics Efficient Use of

More information

ExtraV: Boosting Graph Processing Near Storage with a Coherent Accelerator

ExtraV: Boosting Graph Processing Near Storage with a Coherent Accelerator ExtraV: Boosting Graph Processing Near Storage with a Coherent Accelerator Jinho Lee, Heesu Kim *, Sungjoo Yoo *, Kiyoung Choi *, H. Peter Hofstee,, Gi-Joon Nam, Mark R. Nutter, and Damir Jamsek IBM Research

More information

Camdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa

Camdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa Camdoop Exploiting In-network Aggregation for Big Data Applications costa@imperial.ac.uk joint work with Austin Donnelly, Antony Rowstron, and Greg O Shea (MSR Cambridge) MapReduce Overview Input file

More information

LegUp: Accelerating Memcached on Cloud FPGAs

LegUp: Accelerating Memcached on Cloud FPGAs 0 LegUp: Accelerating Memcached on Cloud FPGAs Xilinx Developer Forum December 10, 2018 Andrew Canis & Ruolong Lian LegUp Computing Inc. 1 COMPUTE IS BECOMING SPECIALIZED 1 GPU Nvidia graphics cards are

More information

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval New Oracle NoSQL Database APIs that Speed Insertion and Retrieval O R A C L E W H I T E P A P E R F E B R U A R Y 2 0 1 6 1 NEW ORACLE NoSQL DATABASE APIs that SPEED INSERTION AND RETRIEVAL Introduction

More information

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Programming Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Computing Only required amount of CPU and storage can be used anytime from anywhere via network Availability, throughput, reliability

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)

More information

Domain-specific Programming on Graphs

Domain-specific Programming on Graphs Lecture 16: Domain-specific Programming on Graphs Parallel Computer Architecture and Programming Last time: Increasing acceptance of domain-specific programming systems Challenge to programmers: modern

More information

An Algorithmic Approach to Communication Reduction in Parallel Graph Algorithms

An Algorithmic Approach to Communication Reduction in Parallel Graph Algorithms An Algorithmic Approach to Communication Reduction in Parallel Graph Algorithms Harshvardhan, Adam Fidel, Nancy M. Amato, Lawrence Rauchwerger Parasol Laboratory Dept. of Computer Science and Engineering

More information

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

SDA: Software-Defined Accelerator for Large- Scale DNN Systems SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, Yong Wang, Bo Yu, Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A dominant

More information

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Goal: fast and cost-efficient key-value store Store, retrieve, manage key-value objects Get(key)/Put(key,value)/Delete(key) Target: cluster-level

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

In-Memory Data Management for Enterprise Applications. BigSys 2014, Stuttgart, September 2014 Johannes Wust Hasso Plattner Institute (now with SAP)

In-Memory Data Management for Enterprise Applications. BigSys 2014, Stuttgart, September 2014 Johannes Wust Hasso Plattner Institute (now with SAP) In-Memory Data Management for Enterprise Applications BigSys 2014, Stuttgart, September 2014 Johannes Wust Hasso Plattner Institute (now with SAP) What is an In-Memory Database? 2 Source: Hector Garcia-Molina

More information

DCS-ctrl: A Fast and Flexible Device-Control Mechanism for Device-Centric Server Architecture

DCS-ctrl: A Fast and Flexible Device-Control Mechanism for Device-Centric Server Architecture DCS-ctrl: A Fast and Flexible ice-control Mechanism for ice-centric Server Architecture Dongup Kwon 1, Jaehyung Ahn 2, Dongju Chae 2, Mohammadamin Ajdari 2, Jaewon Lee 1, Suheon Bae 1, Youngsok Kim 1,

More information

arxiv: v1 [cs.dc] 21 Jan 2016

arxiv: v1 [cs.dc] 21 Jan 2016 Efficient Processing of Very Large Graphs in a Small Cluster Da Yan 1, Yuzhen Huang 2, James Cheng 3, Huanhuan Wu 4 Department of Computer Science and Engineering, The Chinese University of Hong Kong {

More information

Let Software Decide: Matching Application Diversity with One- Size-Fits-All Memory

Let Software Decide: Matching Application Diversity with One- Size-Fits-All Memory Let Software Decide: Matching Application Diversity with One- Size-Fits-All Memory Mattan Erez The University of Teas at Austin 2010 Workshop on Architecting Memory Systems March 1, 2010 iggest Problems

More information

Next Generation Architecture for NVM Express SSD

Next Generation Architecture for NVM Express SSD Next Generation Architecture for NVM Express SSD Dan Mahoney CEO Fastor Systems Copyright 2014, PCI-SIG, All Rights Reserved 1 NVMExpress Key Characteristics Highest performance, lowest latency SSD interface

More information

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management

X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management X10 specific Optimization of CPU GPU Data transfer with Pinned Memory Management Hideyuki Shamoto, Tatsuhiro Chiba, Mikio Takeuchi Tokyo Institute of Technology IBM Research Tokyo Programming for large

More information

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University RAMCloud and the Low- Latency Datacenter John Ousterhout Stanford University Most important driver for innovation in computer systems: Rise of the datacenter Phase 1: large scale Phase 2: low latency Introduction

More information

CuSha: Vertex-Centric Graph Processing on GPUs

CuSha: Vertex-Centric Graph Processing on GPUs CuSha: Vertex-Centric Graph Processing on GPUs Farzad Khorasani, Keval Vora, Rajiv Gupta, Laxmi N. Bhuyan HPDC Vancouver, Canada June, Motivation Graph processing Real world graphs are large & sparse Power

More information

A Study of Partitioning Policies for Graph Analytics on Large-scale Distributed Platforms

A Study of Partitioning Policies for Graph Analytics on Large-scale Distributed Platforms A Study of Partitioning Policies for Graph Analytics on Large-scale Distributed Platforms Gurbinder Gill, Roshan Dathathri, Loc Hoang, Keshav Pingali Department of Computer Science, University of Texas

More information

All-Flash Storage System

All-Flash Storage System All-Flash Storage System June 2016 Jungsoo Kim Manager, SK Telecom Agenda SKT Storage Solution R&D Introduction Our approaches in developing storage system AF-Media details Computing Board Storage Module

More information

Terabyte Sort on FPGA-Accelerated Flash Storage

Terabyte Sort on FPGA-Accelerated Flash Storage 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines Terabyte on FPGA-Accelerated Flash Storage Sang-Woo Jun, Shuotao Xu, Arvind Department of Electrical Engineering

More information

An Approach for Hybrid-Memory Scaling Columnar In-Memory Databases

An Approach for Hybrid-Memory Scaling Columnar In-Memory Databases An Approach for Hybrid-Memory Scaling Columnar In-Memory Databases *Bernhard Höppner, Ahmadshah Waizy, *Hannes Rauhe * SAP SE Fujitsu Technology Solutions GmbH ADMS 4 in conjunction with 4 th VLDB Hangzhou,

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 14 EE141 Outline Parallelism EE141 2 Parallelism Parallelism is the act of doing more

More information

Flash Trends: Challenges and Future

Flash Trends: Challenges and Future Flash Trends: Challenges and Future John D. Davis work done at Microsoft Researcher- Silicon Valley in collaboration with Laura Caulfield*, Steve Swanson*, UCSD* 1 My Research Areas of Interest Flash characteristics

More information

Accelerating Enterprise Search with Fusion iomemory PCIe Application Accelerators

Accelerating Enterprise Search with Fusion iomemory PCIe Application Accelerators WHITE PAPER Accelerating Enterprise Search with Fusion iomemory PCIe Application Accelerators Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents

More information