When Graph Meets Big Data: Opportunities and Challenges

Similar documents
EYWA: a Distributed Graph Engine in the Huawei MIND Platform. Yinglong Xia Huawei Research America 02/09/2017

DataSToRM: Data Science and Technology Research Environment

Practical Near-Data Processing for In-Memory Analytics Frameworks

Dynamic Graph Query Support for SDN Management. Ramya Raghavendra IBM TJ Watson Research Center

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

Chapter 1. Social Media and Social Computing. October 2012 Youn-Hee Han

Challenges for Data Driven Systems

E6895 Advanced Big Data Analytics Lecture 4:

E6885 Network Science Lecture 10: Graph Database (II)

Windows 10 IoT Overview. Microsoft Corporation

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

NKN is a new kind of network connectivity protocol & ecosystem powered by blockchain for an open, decentralized, and shared Internet.

Big Data, Complex Data

Smart Sustainable Cities. Trends and Real-World Opportunities

Harp-DAAL for High Performance Big Data Computing

GraFBoost: Using accelerated flash storage for external graph analytics

Recent Advances in Heterogeneous Computing using Charm++

Hybrid Implementation of 3D Kirchhoff Migration

Jure Leskovec Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah

TrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa

Massive Data Analysis

DIGITAL CENTRAL ASIA SOUTH ASIA (CASA) PROGRAM. Transport and ICT Global Practice World Bank

Apache Spark Graph Performance with Memory1. February Page 1 of 13

NUMA-aware Graph-structured Analytics

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

Mosaic: Processing a Trillion-Edge Graph on a Single Machine

Netezza The Analytics Appliance

Real-time Fraud Detection with Innovative Big Graph Feature. Gaurav Deshpande, VP Marketing, TigerGraph; Mingxi Wu, VP Engineering, TigerGraph

Transport and ICT Global Practice Smart Connections for All Sandra Sargent, Senior Operations Officer, Transport & ICT GP, The World Bank

Latent Space Model for Road Networks to Predict Time-Varying Traffic. Presented by: Rob Fitzgerald Spring 2017

Tech Data s Acquisition of Avnet Technology Solutions

New Approach to Unstructured Data

A Highly Efficient Runtime and Graph Library for Large Scale Graph Analytics

Introduction to Networks and Business Intelligence

Automated Context and Incident Response

GPU Sparse Graph Traversal

Analyzing a social network using Big Data Spatial and Graph Property Graph

Dynamic Infrastructure: Building a Smarter Planet. Laura Voglino Vice President, Marketing System & Technology Group Growth Market

Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor

Solution Brief. A Key Value of the Future: Trillion Operations Technology. 89 Fifth Avenue, 7th Floor. New York, NY

From Biological Cells to Populations of Individuals: Complex Systems Simulations with CUDA (S5133)

High-Performance Graph Primitives on the GPU: Design and Implementation of Gunrock

Intelligent Enterprise meets Science of Where. Anand Raisinghani Head Platform & Data Management SAP India 10 September, 2018

A New Parallel Algorithm for Connected Components in Dynamic Graphs. Robert McColl Oded Green David Bader

Orange Smart Cities. Smart Metering and Smart Grid : how can a telecom operator contribute? November

G(B)enchmark GraphBench: Towards a Universal Graph Benchmark. Khaled Ammar M. Tamer Özsu

Graph Database and Analytics in a GPU- Accelerated Cloud Offering

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV Joe Eaton Ph.D.

Video-Aware Networking: Automating Networks and Applications to Simplify the Future of Video

IBM Power Systems: Open innovation to put data to work Dexter Henderson Vice President IBM Power Systems

Welcome to the Industrial Internet Forum

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

Sensor networks. Ericsson

COUNTRY PROFILE. Taiwan

COUNTRY PROFILE. Korea Republic

Toward a Memory-centric Architecture

The Digital Operator How do operators transform to a full Service Strategy

Transform to Your Cloud

What do you see as GSMA s

Reduce costs and enhance user access with Lenovo Client Virtualization solutions

OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS

e BOOK Do you feel trapped by your database vendor? What you can do to take back control of your database (and its associated costs!

An Advanced Graph Processor Prototype

World Bank regional broadband programs and proposed Central Asian regional fiber optic network (CARFON) Juan Navas-Sabater Program Leader

Intel Workstation Platforms

COUNTRY PROFILE. Malaysia

Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit

Modernize Your Infrastructure

AWS & Intel: A Partnership Dedicated to fueling your Innovations. Thomas Kellerer BDM CSP, Intel Central Europe

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

Building digital societies in Asia: mobile government and m-services

CERN openlab & IBM Research Workshop Trip Report

Cisco Unified Computing System Delivering on Cisco's Unified Computing Vision

CIO Forum Maximize the value of IT in today s economy

Graph Databases. Graph Databases. May 2015 Alberto Abelló & Oscar Romero

Doug Couto Texas A&M Transportation Technology Conference 2017 College Station, Texas May 4, 2017

Tanuj Kr Aasawat, Tahsin Reza, Matei Ripeanu Networked Systems Laboratory (NetSysLab) University of British Columbia

GPU Sparse Graph Traversal. Duane Merrill

Strong performance in a growing market

Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search

The Evolution of Big Data Platforms and Data Science

NVIDIA DGX SYSTEMS PURPOSE-BUILT FOR AI

Morgan Stanley Digital Day. London, March

Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph ORACLE WHITE PAPER NOVEMBER 2016

Complex Systems Simulations on the GPU

Data Center Engineering Acceleration Efficiency Interoperability HCL ERS DATA CENTER ENGINEERING SERVICES

Storage Class Memory in Scalable Cognitive Systems

Thinking cities. Khalil Laaboudi. Smart & Sustainable Cities. Global Marketing

Mining Web Data. Lijun Zhang

Bringing cyber to the Board of Directors & C-level and keeping it there. Dirk Lybaert, Proximus September 9 th 2016

DATACENTER SERVICES DATACENTER

IBM DeepFlash Elastic Storage Server

REALISING JORDAN S MOBILE FUTURE 28 APRIL 2014 AMMAN, JORDAN

Achieving Horizontal Scalability. Alain Houf Sales Engineer

OpenFog Reference Architecture. Presented by Dr. Maria Gorlatova OpenFog Consortium Communications Working Group Co-chair, Technical Committee Member

Mining Web Data. Lijun Zhang

5G Initiative 5G Roadmap Working Group Proposal for Contribution

IT Redefined. Hans Timmerman CTO EMC Nederland. Copyright 2015 EMC Corporation. All rights reserved.

PRESENTATION TITLE GOES HERE. Understanding Architectural Trade-offs in Object Storage Technologies

SMART. Investing in urban innovation

Transcription:

High Performance Graph Data Management and Processing (HPGDM 2016) When Graph Meets Big Data: Opportunities and Challenges Yinglong Xia Huawei Research America 11/13/2016 The International Conference for High Performance Computing, Networking, Storage and Analysis (SC 16)

2 Introduction

3

Recent Growth Huawei has business in over 170 countries, with 150,000 employees, approximately 70,000 of which are engaged in Research & development. Huawei operates a global network of 14 regional headquarters, 16 R&D Centers, 28 Innovation Centers jointly operated with customers, and 45 Training Centers. Revenue 4 Cash flow http://www.huawei.com/en/about-huawei Net Profits

5 Graph Analytics Basics

Brief History 2016 61.6 million V 1.47 billion E 40 million V 300 million E Neuronal network @ Human Brain Project 89 billion V & 100 trillion E 6 N.T. Bliss, Confronting the Challenges of Graphs and Networks, Lincoln Laboratory Journal, 2013

Complex Network Analysis Real world complex networks include WWW, Social Network, Biological network, Citation Network, Power Grid, Food Web, Metabolic network, etc. 7 Import properties/metrics: - Small-world effect - Betweenness - Eccentricity/Centrality - Transitivity - Resilience - Community structure - Clustering coefficient - Matching index Complex network models: - Poisson random graph - degree~poisson - Small world effect - Watts and Strogatz graph - Transitivity - Small world effect - Barabasi and Albert graph - Small world - Power law

Diversity in Graph Technology Dynamic graph helps analyze the spatial and temporal influence over the entities in the network RDF graph enables knowledge inference over linked data Streaming graph monitors sentiment propagation over time and how the graph structure can impact 8 Property graph is widely used as a data storage model to manage the properties of entities as well as the interconnections Vertex ID Edge property Edge label Graph technology leads to rich analytic abilities Graphical models leverages statistics to inference latent factors in a complex system

9 Industrial Use Cases

Financial Risk Management Motivation 56 million small business owners in China 1/3 want to finance their ventures Weight to 24.3% of the GDP 16.3% received loads from banks 3/4 citizens in China has no credit history Credit Scoring Traditional credit scoring organizations Emerging credit scoring organizations Identifying Fraud by Graph Traditional approach Strong variables Weak variables Individual regression collective regression Auditing Visit gamble forum Called lier in someone s comment Analyze one s social connections Credit default swap record Emerging approach Customer profiling Precise engagement Anti-fraud control Credit scoring in realtime Post-load management Lost-customer engage Samsung Group crossownership structure in Gephi network visualization Auditing such labyrinthian beasts is enough to bring a grown auditor to tears. This is where network graph analytics can be invaluable 10

Public Security Use Case -1 Use Case -2 Insider Thread Solution Knowledge graph Heterogeneous data Information integration Complex search Security analysis 11 Advantage of Graph for Security Graph is natural to link data from different sources for exploration Graph can combine with probabilistic models for inference Graph helps find anomalous behaviors

Telecom Fraud In May 2016, Communications Fraud Control Association (CFCA) and the Forum for International Irregular Network Access (FIINA), operators ranging from AT&T, Vodafone, Korea Telecom to Orange and Deutsche Telekom shed light on how old and newer forms of fraud are detected and combatted 12 Anti-Fraud platform proposed by TMForum

13 Existing Graph Systems

Some Existing Products Visualization Analytics ScaleGraph Frameworks Flink/Gelly Storage 14

Neo4j System Architecture and Storage Format Neo s declarative Traversals Core API query language Cypher Graph structure and data buffers Vertex/Edge Cache LFU-protocol FS Cache i.e. mmap Transaction log for TX roll-back Disk Thread local diffs changes in a TX HA Record files High Availability based on TX Link edges inclined to a vertex using the relationship data structure, imposing some performance issue for handling celebrates in power-law graphs e.g. social network Easy to implement horizontal partitioning in FS 15

Titan System Architecture and Storage Format Store Manager Transaction store Relations Index Store 16

OrientDB System Architecture Graph JDBC Support distributed platforms, offering key-value store, docdb, and graphdb in one system DocDB based storage 17

Glance at Graph Computing Engines Spark/GraphX GraphChi 18

Graph in ONOS HotSDN 2014 19

20 Challenges

Challenges - Performance Understand performance bottleneck by breaking down the execution time Bottleneck comes from the memory sub-system DTLB is inefficient Cache performs well Cache MPKI rate is high 3 different types of graph computing, with focus on structural traversal, property processing, and graph editing, respectively Core graph algorithms from 21 real-world use cases 21

Challenges Input Sensitivity Impact from graph topology Performance is inconsistent across different graph types 22 Power-law graph results in imbalanced workload due to dense vertices Dense subgraph, sparse backbone Dense subgraph can be converted into matrices Iterative update in a subgraph Road net is easy to decompose Property type matters More time spent on property management Computing performance can be negatively impacted

Challenges - Impact of H/W Accelerator GPU can be helpful Sufficient acceleration by GPU Requires re-design of the Speedup of NVIDIA Tesla K40 over 16-core Intel Xeon E5-2670 algorithms Challenges Data must be transferred to GPU Cost of Host to Device data transfer Difficulty in putting large graph into GPU (Double buffering) Sensitive to input graph data 23 Memory divergency shows higher sensitivity for graph computing on GPU

Challenges - Scale-out Issue Poor data locality and difficult partitioning result in challenges in scaling out the computing Scale-out challenges can be degraded performance when #core is 100~1000 seen in Graph500 analysis Single machine with big memory can help Must be cautious to use many computing nodes Analysis of data from Graph500 *from Peter Kogge 24

25 Breakthrough

Graph Platform for Smart Big Data Visualization Dynamic Graph Vis Property Vis Large Graph Vis Analytics Community Centrality Bayes Net Anomaly detection Label propagation Matching Ego Feature Max Flow Graph engines Streaming Graph Graphical Model Hyper Graph Basic Engine Incremental Update Data Management Structure Management Property Management Metadata Management Permission Control Infrastructure Single Machine Cluster GPU Server Cloud 26

Unified Graph Data Access Patterns step 1 shard 1 (1, 2) shard 2 (3,4) shard 3 (5,6) src dst value 1 2 0.3 3 2 0.2 4 1 1.4 5 1 0.5 2 0.6 6 2 0.8 src dst value 1 3 0.4 2 3 0.3 3 4 0.8 5 3 0.2 6 4 1.9 src dst value 2 5 0.6 3 5 0.9 6 1.2 4 5 0.3 5 6 1.1 equivalent 1 2 3 4 5 6 1.4 1 2 3 4 5 6 0.3 0.2 0.5 0.6 0.8 0.4 0.3 0.2 0.8 1.9 0.6 0.9 1.2 0.3 1.1 27 Iteration i step 2 step 3 src dst value 1 2 0.3 3 2 0.2 4 1 1.4 5 1 0.5 2 0.6 6 2 0.8 src dst value 1 2 0.3 3 2 0.2 4 1 1.4 5 1 0.5 2 0.6 6 2 0.8 src dst value 1 3 0.4 2 3 0.3 3 4 0.8 5 3 0.2 6 4 1.9 src dst value 1 3 0.4 2 3 0.3 3 4 0.8 5 3 0.2 6 4 1.9 src dst value 2 5 0.6 3 5 0.9 6 1.2 4 5 0.3 5 6 1.1 src dst value 2 5 0.6 3 5 0.9 6 1.2 4 5 0.3 5 6 1.1 observation on PSW data access patterns inspires highly efficient sharding representation 1 2 3 4 5 6 1 2 3 4 5 6 0.3 0.2 1.4 0.5 0.6 0.8 0.3 0.2 1.4 0.5 0.6 0.8 0.4 0.3 0.2 0.4 0.3 0.2 0.8 1.9 0.8 1.9 0.6 0.9 1.2 0.3 1.1 0.6 0.9 1.2 0.3 1.1

Experiments Disk read bandwidth over time by GraphChi and EdgesSet. Edge-Set showed up to 2x aggregate bandwidth and more constant IO usage. 28 Performance improvement of SSSP against GraphChi including data ingestion Execution time breakdown on Pagerank running Twitter-2010 Execution time breakdown on Pagerank running Twitter-2010

Opportunities in Graph Technology for Big Data Develop high performance graph computing kernels and primitives Graph500 technique based architecture-awareness for graph computing Heterogeneous computing and computing near-data technology Reinvent graph technology for supporting cognitive computing One open platform with multiple graph and graph-related technologies Integral consideration on graphical model, streaming graphs, etc. for AI/IoT Offer vertical solutions to break through separation among technique stacks Holistic solution for rapidly building industry-level graph analytics solutions Incorporating with market segmentation, such as security, finance, etc. Collaborations and Standardization Foster collaboration with relevant professional communities to educate the market Developing domain or cross-domain standardizations 29

THANK YOU Yinglong Xia yinglong.xia.2010@ieee.org