Multiple Graphs Updatable views & Choices

Similar documents
CIP Multiple Graphs. Presentation at ocig 8 on April 11, Stefan Plantikow, Andrés Taylor, Petra Selmer (Neo4j)

opencypher.org

CIP Configurable Pattern Matching Semantics. Stefan Plantikow, Mats Rydberg, Petra Selmer

The Cypher Language 2017

Oracle Database 11g: SQL and PL/SQL Fundamentals

Oracle Database: SQL and PL/SQL Fundamentals NEW

Oracle Database: SQL and PL/SQL Fundamentals Ed 2

Schema and Constraints

From Cypher 9 to GQL: Conceptual overview of Multiple Named Graphs and Composable Queries

Property Graph Querying roadmap and initial scope

Relational Model: History

MTA Database Administrator Fundamentals Course

Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course:

0. Overview of this standard Design entities and configurations... 5

T-SQL Training: T-SQL for SQL Server for Developers

Review -Chapter 4. Review -Chapter 5

5. Single-row function

G-CORE: A Core for Future Graph Query Languages

Querying Data with Transact SQL

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Oracle Database 10g: Introduction to SQL

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Oracle Database: SQL and PL/SQL Fundamentals

Oracle Database: Introduction to SQL Ed 2

Mobile MOUSe MTA DATABASE ADMINISTRATOR FUNDAMENTALS ONLINE COURSE OUTLINE

COGNOS (R) 8 GUIDELINES FOR MODELING METADATA FRAMEWORK MANAGER. Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata

Introduction to Computer Science and Business

Teradata SQL Features Overview Version

6232B: Implementing a Microsoft SQL Server 2008 R2 Database

Programming Languages Third Edition. Chapter 7 Basic Semantics

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

Querying Data with Transact-SQL

Oracle Developer Track Course Contents. Mr. Sandeep M Shinde. Oracle Application Techno-Functional Consultant

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Creating and Managing Tables Schedule: Timing Topic

Querying Data with Transact-SQL

MySQL for Beginners Ed 3

Ian Kenny. November 28, 2017

Course Prerequisites: This course requires that you meet the following prerequisites:

6232B: Implementing a Microsoft SQL Server 2008 R2 Database

CS352 Lecture - Introduction to SQL

Provenance Information in the Web of Data

Teiid Designer User Guide 7.5.0

Chapter 11 Object and Object- Relational Databases

What Every Xtext User Wished to Know Industry Experience of Implementing 80+ DSLs

The SQL database language Parts of the SQL language

Querying Microsoft SQL Server 2014

Database Management System Dr. S. Srinath Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Information Systems Engineering. SQL Structured Query Language DML Data Manipulation (sub)language

Table of Contents. PDF created with FinePrint pdffactory Pro trial version

Chapter 9: Relational DB Design byer/eer to Relational Mapping Relational Database Design Using ER-to- Relational Mapping Mapping EER Model

Course Outline. Querying Data with Transact-SQL Course 20761B: 5 days Instructor Led

Analytics: Server Architect (Siebel 7.7)

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

DATABASE TECHNOLOGY - 1MB025

Querying Data with Transact-SQL

FedDW Global Schema Architect

Midterm Review. Winter Lecture 13

DATABASTEKNIK - 1DL116

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23

1 Writing Basic SQL SELECT Statements 2 Restricting and Sorting Data

Venezuela: Teléfonos: / Colombia: Teléfonos:

Lecture 3 SQL. Shuigeng Zhou. September 23, 2008 School of Computer Science Fudan University

20761B: QUERYING DATA WITH TRANSACT-SQL

20461: Querying Microsoft SQL Server 2014 Databases

Table of Contents Chapter 1 - Introduction Chapter 2 - Designing XML Data and Applications Chapter 3 - Designing and Managing XML Storage Objects

Ch. 10: Name Control

Oracle Syllabus Course code-r10605 SQL

Oracle PLSQL. Course Summary. Duration. Objectives

normalization are being violated o Apply the rule of Third Normal Form to resolve a violation in the model

II. Structured Query Language (SQL)

Understanding Impact of J2EE Applications On Relational Databases. Dennis Leung, VP Development Oracle9iAS TopLink Oracle Corporation

CS558 Programming Languages

Basic Structure Set Operations Aggregate Functions Null Values Nested Subqueries Derived Relations Views Modification of the Database Data Definition

MANAGING DATA(BASES) USING SQL (NON-PROCEDURAL SQL, X401.9)

20761 Querying Data with Transact SQL

Querying Data with Transact-SQL

Principles of Programming Languages

Q&As Querying Data with Transact-SQL (beta)

PGQL: a Property Graph Query Language

Querying Data with Transact-SQL

Chapter 3. Algorithms for Query Processing and Optimization

QMF: Query Management Facility

Data Modeling with the Entity Relationship Model. CS157A Chris Pollett Sept. 7, 2005.

Optional SQL Feature Summary

Chapter 19 Query Optimization

The Assent Materials Declaration Tool User Manual

arxiv: v3 [cs.db] 9 Nov 2018

20464 Developing Microsoft SQL Server Databases

ER-to-Relational Mapping

Chapter 11 - Data Replication Middleware

8) A top-to-bottom relationship among the items in a database is established by a

Chapter 7. Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel

Course Outline Faculty of Computing and Information Technology

EECS 647: Introduction to Database Systems

Oracle Zero Data Loss Recovery Appliance (ZDLRA)

Writing Queries Using Microsoft SQL Server 2008 Transact-SQL. Overview

Databasesystemer, forår 2006 IT Universitetet i København. Forelæsning 9: Mere om SQL. 30. marts Forelæser: Esben Rune Hansen

Introduction to Computer Science and Business

Duration Level Technology Delivery Method Training Credits. Classroom ILT 5 Days Intermediate SQL Server

Transcription:

CIP2017-06-18 and related Multiple Graphs Updatable views & Choices Presentation at ocim 4 on May 22-24, 2018 Stefan Plantikow, Andrés Taylor, Petra Selmer (Neo4j)

History Started working on multiple graphs early 2017 Parallel work in LDBC query language task force Built first non-prototype proposal for ocim 3 in Nancy (Nov 2017) Lots of great feedback but outcome incomplete Inspired by Nancy we started to reconstruct the MG proposal Starting from first principles (data model, execution model, language semantics) Incrementally add orthogonal concern Aim for simple, minimal design Reduce scope of work Today: Primer for core of new proposal draft => Input to GQL

Outline Multiple property graphs model Query processing model (Catalog and working graph) Working with multiple graphs (Matching, updating, returning graphs) Graph operations (Graph projection and graph union) Updating the catalog Examples Discussion (Status, timeline, next steps)

Why multiple graphs? - Combining and transforming graphs from multiple sources Graph Management - Versioning, snapshotting, computing difference graphs - Graph views for access control - Provisioning applications with tailored graph views - Shaping and integrating heterogeneous graph data Graph Modeling - Roll-up and drill-down at different levels of detail 4

Multiple Property Graphs Model

Cypher today: (Single) property graph model Graph Database System (e.g. a cluster) Application Server Client 1 Client 2 Client 3 The (single) Graph 6

Cypher 10: Multiple property graphs model Graph Database System (e.g. a cluster) Application Server Client 1 Client 2 Client 3 Multiple Graphs 7

Single Property Graph Model A single graph Labeled nodes with properties Typed, directed relationships with properties Multiple Property Graphs Model Multiple graphs Labeled nodes with properties and typed, directed relationships with properties existing in a single graph The nodes of a relationship R must be part of the same graph as R; i.e. each graph is independent 8

Graph and entity ids The graph(entity) function returns the graph id of the entity An entity (a node or a relationship) in Cypher 10 needs to track its graph (to differentiate whether or not two entities are from the same graph) graph(entity) is a value that uniquely identifies a graph within the implementation This is a two-way process: a graph will also be aware of the entities it contains The id(entity) function returns the (entity) id of entity in graph(entity) id(entity) is a value that uniquely identifies an entity within its graph and types (i.e. nodes and relationships do not share the id space) The format of ids is implementation specific This does not change between Cypher 9 and Cypher 10 9

Id validity and comparability Validity of ids describes the duration for which an id is not assigned to another object. Validity of graph and entity ids are only guaranteed for the duration of a query! Validity of entity ids is only guaranteed from the matching of an entity to the consumption of a query result. In other words: for the duration of the query. Validity, entity ids are guaranteed to not be re-used per graph for the duration of the query. These are the minimal lifetime guarantees mandated by Cypher 10. Implementations may provide extended validity guarantees as they see fit. Comparability of ids Graph and entity ids are suggested to be comparable values, i.e. can be compared with <, <=, =, >=, > 10

Query Processing Model

Query processor Query Processor = Query Processing Service (e.g. a single server or a cluster) G1 G2 G3 Catalog 12

Graph catalog A graph catalog maps graph names to graphs A qualified graph name is an (optionally dotted) identifier A graph name may be contextual; i.e. only work inside a specific server database... There are no further restrictions placed on graphs; e.g. graphs may be mutable or immutable exist only for a limited time (session, transaction,...) visible to all or only some of the users of the system... 13

Syntax for qualified graph names Qualified graph name syntax follows function name syntax Parts graph namespace (optional) graph name Syntax foo.bar.baz `foo`.`bar` `foo.bar` `foo.bar`.`baz.grok` (namespace foo.bar, graph baz.grok) Number of periods can be 0 to * Namespace and name separation is implementation-dependent Last part is always a suffix of the graph name 14

Working graph 1. A clause is always operating within the context of a working graph a. by reading from it b. by updating it 2. During the course of executing a query a. the working graph may be assigned by name (i.e. assigned to a graph from the catalog) b. the working graph may be assigned by graph projection (i.e. assigned by creating a new working graph) c. the working graph may be returned as a query result d. the graph id of the current working graph may be obtained using the graph() function 15

Initial working graph Determined by Query Processor This allows for implementation freedom Example This includes having no initial working graph Queries that need an initial working graph fail if no working graph has been set In Neo4j: Upon establishing a session via Bolt, an initial working graph to be used by subsequent queries is established (perhaps per user etc.) In CAPS: Cypher queries are always executed explicitly against a graph which implicitly becomes the initial working graph 16

Working graph operations Assign working graph by name (from catalog lookup) Access multiple graphs within one query Assign working graph by graph projection (next section) Consolidate data from one or more data sets into one graph Return (working) graph from query Construct temporary graph via multiple update queries (e.g. during interactive workflow) Return (export) graphs (e.g. system graph) 17

Working with multiple graphs

Select working graph // Set the working graph to the graph foo in the catalog for reading FROM foo // Set the working graph to the graph foo in the catalog for updating UPDATE foo // Change between reading and updating FROM GRAPH UPDATE GRAPH In any case, Does not consume the current cardinality Retains all variable bindings 19

Example: Reading from multiple graphs Which friends to invite to my next dinner party? [1] FROM social_graph [2] MATCH (a:person {ssn: $ssn})-[:friend]-(friend) [3] FROM salary_graph [4] MATCH (f:entity {ssn: friend.ssn})<-[:income]-(event) [5] WHERE $startdate < event.date < $enddate [6] RETURN friend.name, sum(event.amount) as incomes 20

Matching with multiple graphs FROM persons MATCH (a:person) FROM cars MATCH (b:car)-[:owned_by]->(a) // Does NOT match Updating multiple graphs FROM persons MATCH (a:person) UPDATE cars CREATE (b:car)-[:owned_by]->(a) // Error

Handling multiple graphs: Copy patterns How to copy data between graphs? FROM persons MATCH (a:person) UPDATE cars CREATE (b:car)-[:owned_by]->(x COPY OF a) // This works COPY OF node Copies labels, properties COPY OF rel Copies relationship type, properties (ignores start, end!) Also usable in matching: MATCH (COPY OF a)... 22

Handling multiple graphs: Equivalence How to work with data from different graphs? FROM persons MATCH (a {name: $name}) FROM people MATCH (b {name: $name}) WHERE content(a) ~ content(b) RETURN count(*) New function for comparing by content content // labels + properties New operator for equivalence ~ (NULL ~ NULL => true) 23

Returning a graph Cypher 10 queries may return a graph Three variations of consuming a graph returned by a query a. Deliver graph to client b. Pass graph to another query; e.g. within a subquery (not covered in these slides) c. Store graph in the catalog; i.e. registering a returned graph under a new name Delivering a result graph from the query to the client is always by value a. All nodes with their labels and properties b. All relationships with their relationship type, start node, end node, and their properties c. Result graph is completely independent from graphs managed by the query processor 24

Syntax for returning a graph // Returns the working graph... RETURN GRAPH // Removing the current cardinality while keeping the working graph... WITH GRAPH... // Return a graph from the catalog FROM graphname RETURN GRAPH 25

Graph operations

Graph operations Graph construction Common graph union Disjoint graph union

Graph construction Graph construction dynamically constructs a new working graph for querying, storing in the catalog, later updating using entities from other graphs (this is called replication in this proposal) Simple example MATCH (a)-[:knows {from: "Berlin"}]->(b) CONSTRUCT MERGE (a), (b) // replication, aka "shared entities" CREATE (a)-[:met_in_berlin]->(b) RETURN GRAPH

Compare and contrast CAPS Cloning, shared entities have a different id precise too verbose, somewhat confusing for the user G-Core Entities as references to base data but shared in views right model idea lack of updatable view semantics limited construction capabilities

Compare and contrast Cypher 10/ GQL Similar to G-Core but uses DML syntax for construction and pays attention to identity and equality semantics formulation of provenance tracking construction behavior updatable view behavior

Replicating nodes Take an original entity and create a representative replica in the constructed graph MATCH (a) CONSTRUCT MERGE (a) Replicating the same original multiple times still only creates a single replica MATCH (root)-[:parent_of*]->(child) CONSTRUCT MERGE (root), (child) CREATE (root)-[:ancestor_of]->(child)...

Replicating relationships Replicating a relationship r implicitly replicates startnode(r) and endnode(r) Replication is powerful enough to reconstruct whole subgraphs! FROM car_owners MATCH (a:person {city: "Berlin"})-[r:OWNERS]->(c:Car) CONSTRUCT MERGE r // cars and their owners from berlin RETURN GRAPH

Replicating paths and complex values Cloning a path p clones all nodes(p) and rels(p) More generally, cloning a value clones all contained entities! MATCH (a:person), (b:person) MATCH p = (a)-[:knows*]-(b) CONSTRUCT MERGE a, b, p // must be variables (not any expr)! CREATE (a)<-[:start]-(x:path {steps: size(p)})-[:end]->(b) RETURN GRAPH

Replicating graphs Replicate all entities from given graphs into a new graph CONSTRUCT MERGE GRAPH social_graph // Sugar: CONSTRUCT ON social_graph Replicate all entities from working graph into new graph TRANSFORM // Alternative: CONSTRUCT ON GRAPH Replicate a graph and a contained entity still only creates a single replica for the entity MATCH (a) WITH * LIMIT 1 TRANSFORM MERGE (a) // creates only one a (not two) in the new graph // by re-using replicas already constructed by TRANSFORM

Multi-step graph construction So far, we've only looked at one step of graph construction. What about? [1] MATCH (a) // single node a [2] CONSTRUCT [3] MERGE (a) // replicate node a at most once [4] TRANSFORM [5] MERGE (a) // replicate node a at most once [6]... There is at most one replica per original. This is achieved via provenance tracking. It's useful for updatable views and graph union.

Provenance tracking aka entity sharing Data model: Entities belong to one and only one graph Node #1 in Graph #1 Provenance graph: Tracks entities across graph construction Node #2 in Graph #2 is a replica of Node #1 in Graph #1 Entity values: References to a replica group with the same root n references Node #1 in Graph #1 and all of its replicas (e.g. Node #2 in Graph #2) graph(n) - Graph of root, e.g. Graph #1 id(n) - id of root, e.g. #1 a=b - graph(a) = graph(b) && id(a) = id (b)

Updatable views CONSTRUCT... // build the view... WITH GRAPH UPDATE GRAPH... // update the view...

Updates during graph construction General rule: Maintain logical consistency regarding provenance tracking CREATE Create in the constructed graph MERGE Merge in the constructed graph SET REMOVE Not allowed on replicas [DETACH] DELETE Delete from the constructed graph

Updates after graph construction General rule: Only allow updates that can be propagated safely CREATE Not allowed MERGE SET REMOVE Not allowed Update replica group [DETACH] DELETE Update all graphs

General form of graph construction CONSTRUCT [ON graph1, graph2,...] TRANSFORM [MERGE GRAPH [name]] [MERGE variable, pattern,... * ] <updating clause 1> <updating clause 2>... WITH... WITH GRAPH RETURN... RETURN GRAPH

Common graph union Entities are always replicated CONSTRUCT... RETURN GRAPH UNION CONSTRUCT... RETURN GRAPH More graph operations: INTERSECT, EXCEPT

Disjoint graph union All entities (including replicas) are copied (i.e. this breaks provenance tracking) CONSTRUCT... RETURN GRAPH UNION ALL CONSTRUCT... RETURN GRAPH

Updating the catalog

Catalog side-effects CREATE GRAPH foo // Create new graph 'foo' DELETE GRAPH foo // Delete graph 'foo' COPY foo TO bar // Copy graph 'foo' with schema RENAME foo TO bar // Rename graph 'foo' to 'bar' TRUNCATE foo // Remove data but keep schema in 'foo' // Extensions ALIAS foo TO bar // Aliasing... GRAPH foo TO bar // Error if 'foo' is not a graph... GRAPH TO bar // Use working graph 44

Create new base graph (snapshot) in catalog // create a new base graph in the catalog CREATE GRAPH mygraph // copy content of graph as new graph into the catalog CREATE GRAPH bar { <some query that returns a graph> } These are always top-level clauses; i.e. they may not be nested 45

Copy data from one base graph to another We can accomplish this by using DML and hand crafting the queries MATCH (a)-[r]->(b) UPDATE foo CREATE... MERGE... SET... 46

global graphs & view definitions CREATE GRAPH name {... RETURN GRAPH } CREATE QUERY name {... RETURN GRAPH } CREATE QUERY name(params) { WITH a, b MATCH (a)-[:x]->(b)<-[:y]-(c {name: $param}) RETURN GRAPH }

local graphs & view definitions GRAPH name1 {... RETURN GRAPH } QUERY name2 {... RETURN GRAPH } QUERY name3(params) {... RETURN GRAPH } FROM... MATCH...

Schema updates Need to select the graph SETUP foo CREATE CONSTRAINT... CREATE INDEX...... // Alternative: Use UPDATE 49

Catalog function catalog() catalog(graph(e)) Current working graph name in catalog or NULL Name of graph of e in catalog or NULL

Nested subqueries

Anatomy of a Cypher query WITH expected, variables... RETURN variables RETURN GRAPH

Let's nest tabular subqueries CALL { // without: input driving table ignored WITH expected, variables... // without: updating side-effect RETURN variables }

Let's nest graph subqueries FROM { WITH expected, variables... RETURN GRAPH }

Let's parameterize // call per grouping key $param CALL PER foo+bar AS $param { WITH expected, variables... RETURN GRAPH }

Summary Multiple graphs: select, construct, return Working with disjoint base graphs from the catalog: ~, content, COPY OF Working with updatable, overlapping, dynamically constructed graph views Graph operations like graph union based on provenance tracking in views Catalog management Full composition via parameterized nested subqueries

Status Working on three CIPs Multiple graphs CIP (This talk, also: larger examples) Equivalence operator CIP Query composition CIP Features: views, nested subqueries, equivalence, more graph operations Future work: More elaborate updatable views, graph aggregation Goal is to finalize draft of multiple graphs CIP for next ocim or GQL draft Design heavily influenced by feedback from CAPS project and other implementation concerns

Feedback Please feedback by commenting on slides or CIP or opencypher implementers slack Thanks!