Graph Databases. Graph Databases. May 2015 Alberto Abelló & Oscar Romero

Similar documents
Introduction to Graph Databases

Introduction to NoSQL by William McKnight

Big Data Management and NoSQL Databases

Chapter 24 NOSQL Databases and Big Data Storage Systems

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

L24: NoSQL (continued) CS3200 Database design (sp18 s2) 4/12/2018

CISC 7610 Lecture 4 Approaches to multimedia databases. Topics: Document databases Graph databases Metadata Column databases

Graph Database. Relation

CIB Session 12th NoSQL Databases Structures

Introduction to Graph Data Management

Introduction to NoSQL Databases

HBase vs Neo4j. Technical overview. Name: Vladan Jovičić CR09 Advanced Scalable Data (Fall, 2017) Ecolé Normale Superiuere de Lyon

OLAP Introduction and Overview

AllegroGraph for Flexibility in the Enterprise and on the Web. Jans Aasman Franz Inc

CISC 7610 Lecture 4 Approaches to multimedia databases. Topics: Graph databases Neo4j syntax and examples Document databases

Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases

CISC 7610 Lecture 2b The beginnings of NoSQL

MapReduce-II. September 2013 Alberto Abelló & Oscar Romero 1

Road to a Multi-model Database -- making PostgreSQL the most popular and versatile database

New Approach to Graph Databases

An overview of Graph Categories and Graph Primitives

E6885 Network Science Lecture 10: Graph Database (II)

Hands-on immersion on Big Data tools

Topics. History. Architecture. MongoDB, Mongoose - RDBMS - SQL. - NoSQL

Big Data Analytics using Apache Hadoop and Spark with Scala

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

Graph Databases. Big Data Course. Antonio Maccioni. 24 April Rome. locatedin

Graph Data Management Systems in New Applications Domains. Mikko Halin

Distributed Non-Relational Databases. Pelle Jakovits

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Introduction to Temporal Database Research. Outline

Big Data Big Mess? Ein Versuch einer Positionierung

Unit 10 Databases. Computer Concepts Unit Contents. 10 Operational and Analytical Databases. 10 Section A: Database Basics

Big Data Analytics. Rasoul Karimi

Introduction III. Graphs. Motivations I. Introduction IV

LAB 2 Notes. Conceptual Design ER. Logical DB Design (relational) Schema Refinement. Physical DD

EF6 - Version: 1. Entity Framework 6

Logic: The Big Picture. Axiomatizing Arithmetic. Tautologies and Valid Arguments. Graphs and Trees

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Computer Vision Group Prof. Daniel Cremers. 4. Probabilistic Graphical Models Directed Models

CS-580K/480K Advanced Topics in Cloud Computing. NoSQL Database

Databases 2 (VU) ( / )

GRAPH DB S & APPLICATIONS

A NoSQL Introduction for Relational Database Developers. Andrew Karcher Las Vegas SQL Saturday September 12th, 2015

CIS Advanced Databases Group 14 Nikita Ghare Pratyoush Srivastava Prakriti Vardhan Chinmaya Kelkar

Non-Relational Databases. Pelle Jakovits

Administration Naive DBMS CMPT 454 Topics. John Edgar 2

Relational databases

Databases : Lectures 11 and 12: Beyond ACID/Relational databases Timothy G. Griffin Lent Term 2013

Challenges for Data Driven Systems

Course Content MongoDB

High-Level Data Models on RAMCloud

Lecture 1: Introduction and Motivation Markus Kr otzsch Knowledge-Based Systems

Data Partitioning and MapReduce

DB2 NoSQL Graph Store

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

Describe The Differences In Meaning Between The Terms Relation And Relation Schema

BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29,

Databases and Information Retrieval Integration TIETS42. Kostas Stefanidis Autumn 2016

KNOWLEDGE GRAPHS. Lecture 1: Introduction and Motivation. TU Dresden, 16th Oct Markus Krötzsch Knowledge-Based Systems

Inf 496/596 Topics in Informatics: Analysis of Social Network Data

Social Network Analysis

/ Cloud Computing. Recitation 6 October 2 nd, 2018

RELATIONAL DATABASE AND GRAPH DATABASE: A COMPARATIVE ANALYSIS

Extreme Computing. NoSQL.

Introduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data

NOSQL Databases and Neo4j

A Global In-memory Data System for MySQL Daniel Austin, PayPal Technical Staff

Lecture 0: Course Intro

Databases : Lecture 1 2: Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled )

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

Application development with relational and non-relational databases

Graphs. A graph is a data structure consisting of nodes (or vertices) and edges. An edge is a connection between two nodes

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014

Study of NoSQL Database Along With Security Comparison

Relational Model. Courses B0B36DBS, A4B33DS, A7B36DBS: Database Systems. Lecture 02: Martin Svoboda

Introduction to Data Management. Lecture #2 (Big Picture, Cont.) Instructor: Chen Li

«Computer Science» Requirements for applicants by Innopolis University

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016

Big Data Hadoop Course Content

NoSQL Databases. CPS352: Database Systems. Simon Miner Gordon College Last Revised: 4/22/15

Search Engines and Time Series Databases

Data Modeling with Neo4j. Stefan Armbruster, Neo Technology (slides from Michael Hunger)

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

HyperGraphDB. Motivation, Architecture and Applications

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 NoSQL Databases

L22: NoSQL. CS3200 Database design (sp18 s2) 4/5/2018 Several slides courtesy of Benny Kimelfeld

Graph Analytics in the Big Data Era

CS 5220: Parallel Graph Algorithms. David Bindel

What is a multi-model database and why use it?

Goals for Today. CS 133: Databases. Final Exam: Logistics. Why Use a DBMS? Brief overview of course. Course evaluations

11/22/2016. Chapter 9 Graph Algorithms. Introduction. Definitions. Definitions. Definitions. Definitions

September 2013 Alberto Abelló & Oscar Romero 1

Chapter 9 Graph Algorithms

Integrating Oracle Databases with NoSQL Databases for Linux on IBM LinuxONE and z System Servers

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Approach for Mapping Ontologies to Relational Databases

Transcription:

Graph Databases 1

Knowledge Objectives 1. Describe what a graph database is 2. Explain the basics of the graph data model 3. Enumerate the best use cases for graph databases 4. Name two pros and cons of graph databases in front of relational databases 5. Name two pros and cons of graph databases in front of other NOSQL options 2

Understanding Objectives 1. Simulate traversing processing in a relational database and compare it with a graph database 3

Application Objectives Model simple graph databases following the property graph data model Implement graphs in Neo4J and use Cypher for traversing them 4

Graph Databases in a Nutshell Occurrence-oriented May contain millions of instances Big Data! It is a form of schemaless databases There is no explicit database schema Data (and its relationships) may quickly vary Objects and relationships as first-class citizens An object o relates (through a relationship r) to another object o Both objects and relationships may contain properties Built on top of the graph theory Euler (18 th century) More natural and intuitive than the relational model 5

Notation (I) A graph G is a set of nodes and edges: G (N, E) N - Nodes (or vertices): n1, n2, Nm E - Edges are represented as pairs of nodes: (n1, n2) An edge is said to be incident to n1 and n2 Also, n1 and n2 are said to be adjacent An edge is drawn as a line between n1 and n2 Directed edges entail direction: from n1 to n2 An edge is said to be multiple if there is another edge relating exactly the same nodes An hyperedge is an edge inciding in more than 2 nodes. Multigraph: If it contains at least one multiple edge. Simple graph: If it does not contain multiple edges. Hypergraph: A graph allowing hyperedges. 6

Notation (II) Size (of a graph): #edges Degree (of a node): #(incident edges) The degree of a node denotes the node adjacency The neighbourhood of a node are all its adjacent nodes Out-degree (of a node): #(edges leaving the node) Sink node: A node with 0 out-degree In-degree (of a node): #(incoming edges reaching the node) Source node: A node with 0 in-degree Cliques and trees are specific kinds of graphs Clique: Every node is adjacent to every other node Tree: A connected acyclic simple graph May 2015 Alberto Abelló & Oscar Romero 7

The Property Graph Data Model Two main constructs: nodes and edges Nodes represent entities, Edges relate pairs of nodes, and may represent different types of relationships. Nodes and edges might be labeled May have a set of properties represented as attributes (key-value pairs) Further assumptions: Edges are directed, Multi-graphs are allowed. Note: in some definitions edges are not allowed to have attributes 8

Example of Graph Database http://grouplens.org/datasets/movielens What movies did Lewis Gilbert direct? What movies did receive a rating lower than 60 by the audience? How would a RDBMS perform this query? May 2015 Alberto Abelló & Oscar Romero 9

Activity Objective: Understand the graph data model capabilities Tasks: 1. (5 ) With a teammate think of the following: I. Think of three-four queries that naturally suit the graph data model II. Think of three-four queries that do not suit the graph data model that nicely 2. (5 ) Think tank: So, what kind of queries graph databases are thought for? 10

GDBs Keystone: Traversal Navigation The ability to rapidly traverse structures to an arbitrary depth (e.g., tree structures, cyclic structures) and with an arbitrary path description (e.g. friends that work together, roads below a certain congestion threshold). Marko Rodriguez Totally opposite to set theory (on which relational databases are based on) Sets of elements are operated by means of the relational algebra May 2015 Alberto Abelló & Oscar Romero 11

Traversing Data in a RDBMS In the relational theory, it is equivalent to joining data (schema level) and select data (based on a value) SELECT * FROM user u, user_order uo, orders o, items i WHERE u.user = uo.user AND uo.orderid = o.orderid AND i.lineitemid = i.lineitemid AND u.user = Alice Cardinalities: User : 5.000.000 UserOrder : 100.000.000 Orders : 1.000.000.000 Item : 35.000 Query Cost?! May 2015 Alberto Abelló & Oscar Romero 12

Traversing Data in a Graph Database Cardinalities: User : 5.000.000 Orders : 1.000.000.000 Item : 35.000 Query Cost?! O(N) May 2015 Alberto Abelló & Oscar Romero 13

Typical Graph Operations Content-based queries The value is relevant Get a node, get the value of a node / edge attribute, etc. A typical case are summarization queries (i.e., aggregations) Topological queries Only the graph topology is considered Typically, several business problems (such as fraud detection, trend prediction, product recommendation, network routing or route optimization) are solved using graph algorithms exploring the graph topology Computing the centrality of a node in a social network an analyst can detect influential people or groups for targeting a marketing campaign audience For a telecommunication operator, being able to detect central nodes of an antenna network helps optimizing the routing and load balancing across the infrastructure Hybrid queries 14

Implementation of the Operations Note that the operations presented are conceptual: agnostic of the technology The implementation of the ops depends on: The graph database data structure Typical exploration algorithms used A* Breath / depth-first search Constraint programming 15

Implementation of Graphs (I) 16

Implementation of Graphs (II) Adjacency matrix (baseline) 17

Implementation of Graphs (III) Linked Lists (adjacency list) Neo4J One physical file to store all nodes (in-memory) Fixed-size record: 9 bytes in length Fast look-up: O(1) Record for node with id 5 Each record is as follows: 1 byte (metadata; e.g., in-use?) 2-5 bytes: id first relationship 6-9 bytes: id first property May 2015 Alberto Abelló & Oscar Romero 18

Linked Lists Neo4J Two files: relationship and property files. Both contain records of fixed size Cache with Least Frequently Used policy Relationship file (similarly for properties) Metadata, id starting node, id end node, id type, ids of the previous and following relationship of the starting node and ending node, id first property 19

Linked Lists - Titan Works on top of Cassandra or Hbase Properties and edges are stored as column:value Sort key ~ key design 20

Linked Lists - Sparksee Each node and edge identified by a oid The graph is represented as a set of lists Two lists, Heads (e1, v1) and Tails (e1, v2) to represent edges and nodes One list, Labels (e1, ACTS), to represent the type of each element Lists of attributes. A list per kind of attribute: Atitle (v1, The Big Bang Theory ) The lists are transformed into value sets (value, <values>) The first element is always a node or an edge 21

Other popular graph databases Pregel Parallel graph processing Giraph Extends Pregel with several new features, including sharded aggr. Similar to vertical fragmentation OrientDB Mixed document-graph DB ArangoDB Mixed document-graph DB 22

What Is Still Missing? Graph data management is getting mature enough as to be used However, RDF, RDFS and OWL are something else than graph data management Semantics related to properties Inference allowed by means of exploiting such semantics There is no current framework on top of graph data management addressing such problem Many academic prototypes 23

Summary A graph database is a database, thus every GDBMS (Graph Database Management System) must choose its: Concurrency control Transactions (ACID Vs. BASE) Replication / fragmentation strategy Position with regard to the CAP theorem Etc. Graph databases do not scale well Since there is no schema, edges are implemented as pointers and thus affected by data distribution Algorithms to fragment graphs and distribute Be aware of graph databases in the near future They are de facto standard for Linked Data Becoming more and more popular for Open Data 24

Bibliography Ian Robinson et al. Graph Databases. O Reilly, 2013 http://graphdatabases.com Resource Description Framework (RDF). W3C Recommendation. http://www.w3.org/tr/rdf-concepts OWL 2 Web Ontology Language (OWL). W3C Recommendation. http://www.w3.org/tr/owl2-overview 25

Resources http://linkeddata.org http://www.neo4j.org http://www.sparsity-technologies.com http://thinkaurelius.github.io/titan 26