Extending In-Memory Relational Database Engines with Native Graph Support

Similar documents
arxiv: v2 [cs.db] 12 Oct 2017

Extending In-Memory Relational Database Engines with Native Graph Support

Adaptive Parallel Compressed Event Matching

Motivation and basic concepts Storage Principle Query Principle Index Principle Implementation and Results Conclusion

Wedge A New Frontier for Pull-based Graph Processing. Samuel Grossman and Christos Kozyrakis Platform Lab Retreat June 8, 2018

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs

I/O Efficient Algorithms for Exact Distance Queries on Disk- Resident Dynamic Graphs

Big Data Management and NoSQL Databases

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Graph Data Management

Advanced Database Systems

SociaLite: A Datalog-based Language for

Relational Query Optimization. Highlights of System R Optimizer

Neo4J: Graph Database

Accelerating XML Structural Matching Using Suffix Bitmaps

G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs

GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

Cost Estimation of Spatial k-nearest-neighbor Operators

How Achaeans Would Construct Columns in Troy. Alekh Jindal, Felix Martin Schuhknecht, Jens Dittrich, Karen Khachatryan, Alexander Bunte

CMSC 424 Database design Lecture 18 Query optimization. Mihai Pop

Overview of Data Exploration Techniques. Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Improving Query Plans. CS157B Chris Pollett Mar. 21, 2005.

Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Avoiding Sorting and Grouping In Processing Queries

Dynamic and Historical Shortest-Path Distance Queries on Large Evolving Networks by Pruned Landmark Labeling

Principles of Data Management. Lecture #12 (Query Optimization I)

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages

Keyword query interpretation over structured data

Efficient Top-k Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling with Application to Network Structure Prediction

Overview of Query Evaluation

Lecture Query evaluation. Combining operators. Logical query optimization. By Marina Barsky Winter 2016, University of Toronto

E6885 Network Science Lecture 10: Graph Database (II)

Data-Intensive Computing with MapReduce

BLAS : An Efficient XPath Processing System

Principles of Data Management. Lecture #9 (Query Processing Overview)

Plan for today. Query Processing/Optimization. Parsing. A query s trip through the DBMS. Validation. Logical plan

CISC 7610 Lecture 4 Approaches to multimedia databases. Topics: Graph databases Neo4j syntax and examples Document databases

Apache Spark Graph Performance with Memory1. February Page 1 of 13

Fast, In-Memory Analytics on PPDM. Calgary 2016

Evaluating XPath Queries

CISC 7610 Lecture 4 Approaches to multimedia databases. Topics: Document databases Graph databases Metadata Column databases

Path-Hop: efficiently indexing large graphs for reachability queries. Tylor Cai and C.K. Poon CityU of Hong Kong

Red Fox: An Execution Environment for Relational Query Processing on GPUs

University of Waterloo. Storing Directed Acyclic Graphs in Relational Databases

Chap 6: Spatial Networks

Chapter 13: Query Processing

Keyword query interpretation over structured data

Chapter 12: Query Processing

Jignesh M. Patel. Blog:

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

GPU Sparse Graph Traversal

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

Optimization Overview

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

Relational Query Optimization

Exploiting Predicate-window Semantics over Data Streams

Hadoop vs. Parallel Databases. Juliana Freire!

Outline. Database Management and Tuning. Index Data Structures. Outline. Index Tuning. Johann Gamper. Unit 5

Chapter 12: Query Processing

Chapter 14 Query Optimization

Chapter 14 Query Optimization

Chapter 14 Query Optimization

Supporting Fuzzy Keyword Search in Databases

On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage

HyPer-sonic Combined Transaction AND Query Processing

Chapter 4: SQL. Basic Structure

Scylla Open Source 3.0

Cache-Aware Database Systems Internals. Chapter 7

Tanuj Kr Aasawat, Tahsin Reza, Matei Ripeanu Networked Systems Laboratory (NetSysLab) University of British Columbia

Simba: Towards Building Interactive Big Data Analytics Systems. Feifei Li

On Graph Query Optimization in Large Networks

Recursive query facilities in relational databases: a survey

Query Processing & Optimization

High-Performance Holistic XML Twig Filtering Using GPUs. Ildar Absalyamov, Roger Moussalli, Walid Najjar and Vassilis Tsotras

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

9/23/2009 CONFERENCES CONTINUOUS NEAREST NEIGHBOR SEARCH INTRODUCTION OVERVIEW PRELIMINARY -- POINT NN QUERIES

Query Processing with K-Anonymity

High-Performance Graph Primitives on the GPU: Design and Implementation of Gunrock

Column-Stores vs. Row-Stores: How Different Are They Really?

20762B: DEVELOPING SQL DATABASES

DATABASE TECHNOLOGY - 1MB025

Guoping Wang and Chee-Yong Chan Department of Computer Science, School of Computing National University of Singapore VLDB 14.

Database System Concepts

CIS 601 Graduate Seminar Presentation Introduction to MapReduce --Mechanism and Applicatoin. Presented by: Suhua Wei Yong Yu

Pathfinder: XQuery Compilation Techniques for Relational Database Targets

Fundamentals of Database Systems

Towards Declarative and Efficient Querying on Protein Structures

Design and Implementation of Bit-Vector filtering for executing of multi-join qureies

Graph Databases. Guilherme Fetter Damasio. University of Ontario Institute of Technology and IBM Centre for Advanced Studies IBM Corporation

Microsoft. [MS20762]: Developing SQL Databases

Corolla: GPU-Accelerated FPGA Routing Based on Subgraph Dynamic Expansion

HYRISE In-Memory Storage Engine

2.2.2.Relational Database concept

Data Filtering Using Reverse Dominance Relation in Reverse Skyline Query

Transcription:

Extending In-Memory Relational Database Engines with Native Graph Support EDBT 18 Mohamed S. Hassan 1 Tatiana Kuznetsova 1 Hyun Chai Jeong 1 Walid G. Aref 1 Mohammad Sadoghi 2 1 Purdue University West Lafayette, IN, USA 2 Exploratory Systems Lab (ExpoLab) 2 University of California Davis, CA, USA

Graphs are Ubiquitous 2 Road Network Biological Network Social Network Datacenter Network

Specialized Graph Databases 3 Specialized graph databases can handle graph query-workloads Vital queries include shortest-path and reachability queries

Why Relational Databases for Graph Support? 4 Specialized graph systems are not as mature as RDBMSs Relational databases are widely-adopted Graphs and RDBMSs Relational data can have latent graph structures Graphs can be represented in terms of relational tables Graph queries are essential in many applications Queries can also involve relations n E.g., for every patient, say P, in selected areas, find the nearest hospital to Patient P How can an RDBMS effectively and efficiently handle graph query workloads?

Graph Support in RDBMSs 5 Why is it challenging? There is an impedance mismatch between the relational model and the graph model Graph support w.r.t. RDBMSs has two extremes: Native Relational-Core Native Graph-Core Native G+R Core [Proposed]

Native Relational-Core 6 Use a vanilla RDBM Encode graphs in relational schema Support limited graph queries Translate the supported graph queries into SQL or procedural SQL E.g., SQLGraph [SIGMOD 15], Grail [CIDR 15] Disadvantages Several graph queries are inefficient to evaluate using pure SQL Graphs are encoded in complex schema Relational Queries (SQL) Relational Data Relational Database Graph Queries Results SQL Translation Layer Graph Encoded into Relational Tables

Native Graph-Core 7 Build on top of an RDBMS Extract graphs from the RDBMS Store graphs and process queries outside the realm of the RDBMS E.g., Ringo [SIGMOD 15], GraphGen [VLDB 15, SIGMOD 17] Disadvantages Graph updates require re-extracting the graphs Queries cannot reference any non-extracted relational data Graph Extraction and Materialization Engine Graph Extraction Queries (SQL) Relational Data Extracted Graphs Relational Database Results Graph Queries

The Relational Model vs. the Graph Model 8 Graph-core approach +ve: Queries involving graph traversals are efficiently handled in the graph model (e.g., shortest paths) -ve: Not as pervasive and mature as RDBMSs Relational-core approach +ve: Mature and pervasive -ve: Either many temporary inserts/deletes/updates, or too many joins to traverse a graph n Intermediate-result size and cardinality estimation Can the best of the two worlds be combined? Support native graph processing inside an RDBMS

Proposed Approach: Native G+R Core 9 Assume graphs with relational schema Enable graphs to be defined as native database objects Store graphs in non-relational structures optimized for graph operations Extend the SQL language Queries can compose relational and graph operations Cross-Data-Model QEPs Graph updates are supported Graph-Relational Queries (SQL) σ Relational Data π Graph and Relational Operators in the Same QEP GraphOp Graph Views (Topology + Tuple Pointers) Graph Construction Relational Database Results

GRFusion: Realizing the G+R Approach 10 We realized the G+R approach in an open-source in-memory RDBMS, VoltDB We refer to the realization as GRFusion Declarative Graph-Relational Queries Query Parser Query Optimizer Plan Executor Graph-Relational Query Engine Relational Data Graph Views In-Memory Relational Database

Create Graph View 11 Create-Graph-View statement Creates a named graph database object that can be referenced in queries Defines the relational sources of the graph s vertexes/edges Martializes the topology of the graph in the main-memory as a singleton graph structure

12 Graph-View of a Social Network

13 Graph-View Structure [Traversal Index]

14 Declarative Graph-Relational Queries

The PATHS Construct Extended SQL 15 Appears in the FROM clause and references a graph view Select From MyGraphView.PATHS P PATHS represents a set of lazy-evaluated paths A path is a set of consecutive edges, each edge has two endpoint vertexes E.g., (V:attributes) (:E:attributes)à(V:attributes).. A path is a tuple with the following properties: Length StartVertex EndVertex Vertexes Edges

The PathScan Operator 16 PathScan is a logical operator that acts on a graph-view Has three corresponding physical operators: BFScan, DFScan, SPScan The output of PathScan is a tuple that extends the standard relational tuple Hence, the output can be ingested by any relational operator PathScan accepts the id of the vertex to start traversal from Otherwise, all the vertexes will be considered as start vertexes Filters can be pushed ahead of PathScan operators E.g., P.PathLength = 2

Friends-of-Friends Query Example 17 For all the users working as lawyers, retrieve the last name of their friends of friends, where the friendships happened after 1/1/2000

18 QEP of the Friends-of-Friends Query

Reachability Query Example 19 Check if Protein X interacts directly (i.e., by an edge) or indirectly (i.e., by a path) with Protein Y through either a covalent or a stable interaction type.

20 Shortest-Path Queries with Relational Predicates

Evaluating GRFusion 21 Experimental setup Single node running Linux kernel version 3.17.7 n 32 cores of Intel Xeon 2.90 GHz n 384 GB of RAM VoltDB version 6.7 Comparing to Native Relational-Core: SQLGraph [SIGMOD 15], Grail [CIDR 15] Specialized graph systems: Neo4j, Titan Disk-cost is mitigated by running over ram disk

Evaluating GRFusion (Cont d) 22 Graph queries Reachability queries (using breadth-first-search) Reachability queries with filtering predicates Shortest path queries (using Dijkstra s algorithm) Subgraph queries (e.g., count triangles) Datasets

23 Constrained-Reachability Queries (String Dataset)

24 SSSP Queries Tiger Dataset

A Note on the Performance Gains of GRFusion 25 Table scan or index scan/seek Direct pointers are more efficient Relational joins Large intermediate results Inaccurate cardinality estimation σ σ σ etable T3 etable T1 etable T2

Conclusions 26 The G+R approach allows composing relational and graph operations E.g., by allowing graph-valued functions GRFusion proposes and realizes how an RDBMS can be extended to support graphs as native objects GRFusion outperforms the state-of-the-art by one to four orders-ofmagnitude query-time speedup The SQL language of GRFusion allows writing declarative path-queries with relational predicates For relational recursive queries, GRFusion allows an RDBMS to avoid Large intermediate results Inaccurate cardinality estimation that may lead to non-optimal join-algorithm selection

27 Thank You!

The VERTEXES Construct 28 Appears in the FROM clause and references a graph view Select From MyGraphView.VERTEXES v VERTEXES represents the vertexes of a graph view A vertex is a tuple with the following properties: Id FanIn FanOut Property for each vertex attribute

The EDGES Construct 29 Appears in the FROM clause and references a graph view Select From MyGraphView.EDGES v EDGES represents the edges of a graph view An edge is a tuple with the following properties: Id StartVertexId EndVertexId Property for each edge attribute

Vertex Query Example 30 Retrige the Birthdate and the number of friends of each user in the social network with last name = Smith