Analyzing a social network using Big Data Spatial and Graph Property Graph

Similar documents
Graph Databases nur ein Hype oder das Ende der relationalen Welt? DOAG 2016

Overview of Oracle Big Data Spatial and Graph Property Graph

Apply Graph and Deep Learning to Recommendation and Network Intrusion Detection

Analyzing Blockchain and Bitcoin Transaction Data as Graph

Graph Analytics and Machine Learning A Great Combination Mark Hornick

Using Graphs to Analyze Big Linked Data

Analyzing Blockchain and Bitcoin Transaction Data as Graph

Session 7: Oracle R Enterprise OAAgraph Package

Build Recommender Systems, Detect Network Intrusion, and Integrate Deep Learning with Graph Technologies

Introduction to Graph Analytics and Oracle Cloud Service

Analysing the Panama Papers with Oracle Big Data Spatial and Graph

Oracle Big Data Spatial and Graph Property Graph: Features and Performance ORACLE TECHNICAL WHITEPAPER DECEMBER 2017

PGQL: a Property Graph Query Language

Deep Learning und Graphenanalyse im Einsatz gegen Hacker

What Are They Talking About These Days? Analyzing Topics with Graphs

Introducing Oracle Machine Learning

Open And Linked Data Oracle proposition Subtitle

Continuous delivery of Java applications. Marek Kratky Principal Sales Consultant Oracle Cloud Platform. May, 2016

Oracle Machine Learning Notebook

Link Analysis in the Cloud

Big Data Management and NoSQL Databases

Kevin Madden Chief Software Engineer

Combining Graph and Machine Learning Technology using R

Safe Harbor Statement

Javaentwicklung in der Oracle Cloud

MySQL InnoDB Cluster. MySQL HA Made Easy! Miguel Araújo Senior Software Developer MySQL Middleware and Clients. FOSDEM 18 - February 04, 2018

NoSQL + SQL = MySQL Get the Best of Both Worlds

<Insert Picture Here> A Brief Introduction to Live Object Pattern

MySQL Cluster Web Scalability, % Availability. Andrew

Oracle Machine Learning Notebook

Autonomous Data Warehouse in the Cloud

Oracle APEX 18.1 New Features

Traversing Graph Databases with Gremlin. Marko A. Rodriguez Graph Systems Architect.

NOSQL Databases and Neo4j

Using the MySQL Document Store

Data-Intensive Computing with MapReduce

Data-and-Compute Intensive Processing: Middle-tier or Database? Trade-Offs and Case Study. Kuassi Mensah Marcelo Ochoa Oracle

Big Data, Complex Data

NoSQL + SQL = MySQL. Nicolas De Rico Principal Solutions Architect

Oracle Secure Backup 12.2 What s New. Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Introduction to NoSQL by William McKnight

Distributed Graph Storage. Veronika Molnár, UZH

MySQL & NoSQL: The Best of Both Worlds

Analyzing Flight Data

Dynamic Graph Query Support for SDN Management. Ramya Raghavendra IBM TJ Watson Research Center

Disclaimer MULTIMODEL DATABASE WITH ORACLE DATABASE 18C

E6895 Advanced Big Data Analytics Lecture 4:

CIB Session 12th NoSQL Databases Structures

Deploying Spatial Applications in Oracle Public Cloud

Brad Dayley. Sams Teach Yourself. NoSQL with MongoDB. SAMS 800 East 96th Street, Indianapolis, Indiana, USA

Easier than Excel: Social Network Analysis of DocGraph with Gephi

Introduction to Graph Databases

Presented by Sunnie S Chung CIS 612

Socrates: A System for Scalable Graph Analytics C. Savkli, R. Carr, M. Chapman, B. Chee, D. Minch

Building Real-time Data in Web Applications with Node.js

HEALTH CARE FOR THE ELDERLY USING. Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Mix n Match Async and Group Replication for Advanced Replication Setups. Pedro Gomes Software Engineer

Neo4j.rb. Graph Database. The Natural Way to Persist Data? Andreas Kollegge. Andreas Ronge

Research challenges in data-intensive computing The Stratosphere Project Apache Flink

Graph Databases. Graph Databases. May 2015 Alberto Abelló & Oscar Romero

University of Maryland. Tuesday, March 2, 2010

Verarbeitung von Vektor- und Rasterdaten auf der Hadoop Plattform DOAG Spatial and Geodata Day 2016

Managing and Mining Billion Node Graphs. Haixun Wang Microsoft Research Asia

MongoDB An Overview. 21-Oct Socrates

E6885 Network Science Lecture 10: Graph Database (II)

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Oracle NoSQL Database at OOW 2017

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

NOSQL DATABASE CLOUD SERVICE. Flexible Data Models. Zero Administration. Automatic Scaling.

G(B)enchmark GraphBench: Towards a Universal Graph Benchmark. Khaled Ammar M. Tamer Özsu

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Modern and Fast: A New Wave of Database and Java in the Cloud. Joost Pronk Van Hoogeveen Lead Product Manager, Oracle

Apache Giraph. for applications in Machine Learning & Recommendation Systems. Maria Novartis

MySQL as a Document Store. Ted Wennmark

Dealing with Data Especially Big Data

Graph Data Management

Steven Edouard SDET, US - DX Audience West Microsoft Bruno Terkaly Principal Software Engineer - Microsoft

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Architect.

Introduction to Oracle NoSQL Database

Press Release Writing Tips and Tricks for the Enterprise Technology Space

Digital Excellence. Inventer de nouveaux services et nouveaux usages Témoignage de la ville de Marseille

Graph Algorithms using Map-Reduce. Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web

MySQL High Availability

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

What s New with SpaDal and Graph?

BIG DATA COURSE CONTENT

Graphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Everything You Need to Know About MySQL Group Replication

Jordan Boyd-Graber University of Maryland. Thursday, March 3, 2011

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS

A Highly Efficient Runtime and Graph Library for Large Scale Graph Analytics

Cloud Computing & Visualization

GeoSPARQL Support and Other Cool Features in Oracle 12c Spatial and Graph Linked Data Seminar Culture, Base Registries & Visualisations

Graph Data Management Systems in New Applications Domains. Mikko Halin

Jakarta EE Meets NoSQL at the Cloud Age

MySQL Document Store. How to replace a NoSQL database by MySQL without effort but with a lot of gains?

State of the Dolphin Developing new Apps in MySQL 8

Transcription:

Analyzing a social network using Big Data Spatial and Graph Property Graph Oskar van Rest Principal Member of Technical Staff Gabriela Montiel-Moreno Principal Member of Technical Staff

Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle. Confidential Oracle Internal/Restricted/Highly Restricted 2

Program Agenda 1 2 3 4 5 Introduction Property Graph Data Model & BDSG Architecture Oracle Big Data Spatial and Graph Core Features Graph Analytics using PGX Graph Analytics Engine HoL: Analyzing a social network using Property Graphs 3

Graph Database Definition Graph database is a database that uses graph structures with nodes, edges, and properties to represent and store data. 1 C B D A F E 1. http://en.wikipedia.org/wiki/graph_database 4

Why do we care? Graphs are intuitive and flexible Easy to navigate, easy to form a path, natural to visualize Enables views and queries that would be expensive on other databases Graphs are everywhere Road networks, power grids, biological networks Social Networks Knowledge graphs (RDF, OWL) 5

Graph Use Case Scenarios Fraud detection Find parties in insurance data who are on both sides of multiple claims, who live near each other Internet of Things Manage graph of interconnected devices and predict the effect of an disruptions across network Cyber Security Find entry points and affected machines Border Control Analyze flight histories of a suspicious passenger. Indentify his co-travelers, co-traveler s co-travelers,

Program Agenda 1 2 3 4 5 Introduction Property Graph Data Model & BDSG Architecture Oracle Big Data Spatial and Graph Core Features Graph Analytics using PGX Graph Analytics Engine HoL: Analyzing a social network using Property Graphs 7

Property Graph Data Model name= marko age = 29 weight=0.5 1 7 knows 2 8 name = vadas age = 27 9 weight=0.4 created weight=1.0 knows name= lop lang = java 11 4 10 created 5 (example from https://github.com/tinkerpop/gremlin/wiki/defining-a-property-graph) 3 created weight=0.4 weight=1.0 name= ripple lang = java name= josh age = 32 created weight=0.2 12 6 name= peter age = 35 A set of vertices each vertex has a unique identifier each vertex has a set of outgoing/incoming edges each vertex has a collection of key-value properties A set of edges each edge has a unique identifier each edge has a head/tail vertex each edge has a label that denotes the type of relationship between two vertices each edge has a collection of key-value properties Blueprints Java APIs Implementations Neo4j, Titan, InfiniteGraph, Dex, Sail, MongoDB 8

Oracle Big Data Spatial and Graph Architecture Graph Analytics Parallel In-Memory Graph Analytics (PGX) Java APIs REST/Web Service Python, Perl, PHP, Ruby, Javascript, Property Graph formats Graph Data Access Layer (DAL) GraphML Apache Blueprints & Lucene/SolrCloud RDF (RDF/XML, GMLN- Triples, N-Quads, Graph-SON TriG,N3,JSON) Flat Files Java APIs CSV Relational Scalable and Persistent Storage Management Apache HBase Oracle NoSQL Database 9

Program Agenda 1 2 3 4 5 Introduction Property Graph Data Model & BDSG Architecture Oracle Big Data Spatial and Graph Core Features Graph Analytics using PGX Graph Analytics Engine HoL: Analyzing a social network using Property Graphs 10

Data Access (APIs) Blueprints 2.3.0, Gremlin 2.3.0, Rexster 2.3.0 Groovy shell for accessing property graph data REST APIs (through Rexster integration) PGQL (Property Graph Query Languge) 2/28/2017 11

Text Search through Apache Lucene/Solr Use text indexing to access vertices or edges Eg. find person with given name as starting point for reachability analysis oraclepropertygraph.createkeyindex( name, Vertex.class); oraclepropertygraph.getvertices( name, *Obama*, true); Based on Apache Solr/Solr Cloud Highly scaleable through sharding and replication Uses Apache Lucene under the covers open source text search engine library inverted index, ranked searching, fuzzy matching Supports manual and auto indexing of Graph elements 12

Support for Cytoscape Open Source Visualization

Integration with Tom Sawyer Perspectives via property graph REST APIs

Python Interface Installation property_graph/pyopg/readme Usage cd ${ORACLE_HOME}/md/property_graph/pyopg./pyopg.sh ipython notebook 15

Program Agenda 1 2 3 4 5 Introduction Property Graph Data Model & BDSG Architecture Oracle Big Data Spatial and Graph Core Features Graph Analytics using PGX Graph Analytics Engine HoL: Analyzing a social network using Property Graphs 16

Graph Analytics workloads Computational Graph Analytics Graph Pattern Matching Connected Components Modularity Conductance Shortest Path Pagerank Clustering Coefficient Centrality Coloring Spanning Tree Compute certain values on nodes and edges While (repeatedly) traversing or iterating on the graph In certain procedural ways Given a description of a pattern Find every sub-graph that matches it ex friend friend friend Typical graph analysis systems do not support both 17

In-Memory Analyst (PGX) PGX is the in-memory, parallel graph analytics engine of Oracle Big Data Spatial and Graph Approaches Reads snapshot of graph data from database (or file) Support delta-update from transactional changes in database Processes analytic requests efficiently in-memory Supports remote clients via REST Analytic Analytic Request Request In-Memory Analyst (PGX) File REST API Bulk Read Oracle Big Data Spatial and Graph Database Analytic Analytic Request Analytic Request Analytic Request Transactional Request Request Confidential Oracle Internal 18

Computational Analytics: Built-in Package Rich set of built-in parallel graph algorithms and parallel graph mutation operations Detecting Components and Communities Tarjan s, Kosaraju s, Weakly Connected Components, Label Propagation (w/ variants), Soman and Narang s Spacification Evaluating Community Structures Link Prediction Conductance, Modularity Clustering Coefficient (Triangle Counting) Adamic-Adar SALSA (Twitter s Who-to-follow) Ranking and Walking Path-Finding Other Classics Pagerank, Personalized Pagerank, Betwenness Centrality (w/ variants), Closeness Centrality, Degree Centrality, Eigenvector Centrality, HITS, Random walking and sampling (w/ variants) Hop-Distance (BFS) Dijkstra s, Bi-directional Dijkstra s Bellman-Ford s Vertex Cover Minimum Spanning-Tree (Prim s) a b e d g c Create Bipartite Graph b d e Filtered Subgraph i i g Left Set: a,b,e The original graph a b c d e Sort-By-Degree (Renumbering) e b d i a f c g h i f g h a b c d e i f g h Create Undirected Graph a b c d e g Simplify Graph i f h 19

Pattern matching using PGQL SQL-like syntax but with graph pattern description and property access Interactive (real-time) analysis Supporting aggregates, comparison, such as max, min, order by, group by Finding a given pattern in graph Fraud detection Anomaly detection Subgraph extraction... Proposed for standardization by Oracle Specification available on-line Open-sourced front-end (i.e. parser) https://github.com/oracle/pgql-lang Recursive path querying 20

PGQL Example query Find all instances of a given pattern/template in data graph Fast, scaleable query mechanism SELECT v3.name, v3.age FROM mygraph WHERE (v1:person WITH name = Amber ) [:friendof]-> (v2:person) [:knows]-> (v3:person) :Person{100} name = Amber age = 25 :friendof {2513} since = 08/01/2014 :worksat{1831} startdate = 09/01/2015 :friendof{1173} :Company{777} name = Oracle location = Redwood City data graph mygraph query Query: Find all people who are known to friends of Amber. :Person{200} name = Paul age = 30 :knows{2200} :Person{300} name = Heather age = 27 https://github.com/oracle/pgql-lang/ http://pgql-lang.org/ 21

In-Memory Analyst (PGX) in Apache Zeppelin Create notebooks with paragraphs that run graph queries or graph algorithms 22

Apache Zeppelin Integration Apache Zeppelin is a multi-purpose notebook for data analysis and visualization similar to ipython/jupyter Lots of language bindings and interpreters built in -> JVM based Very active development community Easy extensible 23

In-Memory Analyst (PGX) in Apache Zeppelin Web Browser Notebook HTTPS Zeppelin Server In-Memory Analyst (PGX) Interpreter HTTPS In-Memory Analyst (PGX) Server 24

Program Agenda 1 2 3 4 5 Introduction Property Graph Data Model & BDSG Architecture Oracle Big Data Spatial and Graph Core Features Graph Analytics using PGX Graph Analytics Engine HoL: Analyzing a social network using Property Graphs 25