Session 7: Oracle R Enterprise OAAgraph Package

Similar documents
Graph Analytics and Machine Learning A Great Combination Mark Hornick

Combining Graph and Machine Learning Technology using R

Deep Learning und Graphenanalyse im Einsatz gegen Hacker

Analyzing a social network using Big Data Spatial and Graph Property Graph

Introducing Oracle Machine Learning

Graph Databases nur ein Hype oder das Ende der relationalen Welt? DOAG 2016

Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold BIWA Summit 2016

Safe Harbor Statement

What Are They Talking About These Days? Analyzing Topics with Graphs

Oracle R Technologies

Oracle R Enterprise Platform and Configuration Requirements Oracle R Enterprise runs on 64-bit platforms only.

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

ORAAH Change List Summary. ORAAH Change List Summary

C5##54&6*"6*1%2345*D&'*E2)2*F"4G)&"69

Oracle Big Data Spatial and Graph Property Graph: Features and Performance ORACLE TECHNICAL WHITEPAPER DECEMBER 2017

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Using Graphs to Analyze Big Linked Data

Callisto-RTS: Fine-Grain Parallel Loops

Turning Relational Database Tables into Spark Data Sources

Oracle Big Data Fundamentals Ed 1

Learning R Series Session 5: Oracle R Enterprise 1.3 Integrating R Results and Images with OBIEE Dashboards Mark Hornick Oracle Advanced Analytics

Introducing Oracle R Enterprise 1.4 -

PGQL: a Property Graph Query Language

Moving Databases to Oracle Cloud: Performance Best Practices

Efficient and Scalable Friend Recommendations

Apply Graph and Deep Learning to Recommendation and Network Intrusion Detection

Oracle Machine Learning Notebook

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

Resource and Performance Distribution Prediction for Large Scale Analytics Queries

Session 4: Oracle R Enterprise Embedded R Execution SQL Interface Oracle R Technologies

DBAs can use Oracle Application Express? Why?

DATA INTEGRATION PLATFORM CLOUD. Experience Powerful Data Integration in the Cloud

Analyzing Flight Data

Why do we need graph processing?

Trouble-free Upgrade to Oracle Database 12c with Real Application Testing

Webinar Series TMIP VISION

Do-It-Yourself 1. Oracle Big Data Appliance 2X Faster than

How to Troubleshoot Databases and Exadata Using Oracle Log Analytics

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

1Z Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions

Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Blended Learning Outline: Cloudera Data Analyst Training (171219a)

<Insert Picture Here> DBA s New Best Friend: Advanced SQL Tuning Features of Oracle Database 11g

Verarbeitung von Vektor- und Rasterdaten auf der Hadoop Plattform DOAG Spatial and Geodata Day 2016

Jure Leskovec Including joint work with Y. Perez, R. Sosič, A. Banarjee, M. Raison, R. Puttagunta, P. Shah

Oracle Big Data Science

MySQL for Developers Ed 3

What s New in MySQL 5.7 Geir Høydalsvik, Sr. Director, MySQL Engineering. Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Oracle Machine Learning Notebook

Apache Giraph. for applications in Machine Learning & Recommendation Systems. Maria Novartis

Performance Innovations with Oracle Database In-Memory

Graph Data Management

Deploying Spatial Applications in Oracle Public Cloud

Visualization and text mining of patent and non-patent data

Oracle Data Integrator 12c New Features

MySQL & NoSQL: The Best of Both Worlds

Oracle Big Data Discovery

Page 1. Goals for Today" Background of Cloud Computing" Sources Driving Big Data" CS162 Operating Systems and Systems Programming Lecture 24

Oracle Enterprise Data Quality - Roadmap

Shark: SQL and Rich Analytics at Scale. Michael Xueyuan Han Ronny Hajoon Ko

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

Oracle Big Data Science IOUG Collaborate 16

Analysing the Panama Papers with Oracle Big Data Spatial and Graph

Social Network Analysis

Oracle Big Data Fundamentals Ed 2

Copyright 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12

Databricks, an Introduction

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

MySQL for Developers Ed 3

Data-Intensive Computing with MapReduce

SociaLite: A Datalog-based Language for

Introduction to Graph Analytics and Oracle Cloud Service

Databases 2 (VU) ( / )

Oracle Enterprise Manager 12c IBM DB2 Database Plug-in

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

MySQL Group Replication. Bogdan Kecman MySQL Principal Technical Engineer

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

State of the Dolphin Developing new Apps in MySQL 8

Oracle Identity Manager 11gR2-PS2 Hands-on Workshop Tech Deep Dive DB Schema, Backup & Restore, Bulkload, Reports, Archival & Purge

Performance and Load Testing R12 With Oracle Applications Test Suite

Overview of Oracle Big Data Spatial and Graph Property Graph

Oracle NoSQL Database at OOW 2017

HEALTH CARE FOR THE ELDERLY USING. Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Unifying Big Data Workloads in Apache Spark

CS6200 Information Retreival. The WebGraph. July 13, 2015

Real Time Summarization. Copyright 2014, Oracle and/or its affiliates. All rights reserved.

Oracle Big Data SQL. Release 3.2. Rich SQL Processing on All Data

Prototyping Data Intensive Apps: TrendingTopics.org

This is a brief tutorial that explains how to make use of Sqoop in Hadoop ecosystem.

An Exploratory Journey Into Network Analysis A Gentle Introduction to Network Science and Graph Visualization

Oracle Data Integrator 12c: Integration and Administration

Epilog: Further Topics

Introduction To Graphs and Networks. Fall 2013 Carola Wenk

Data Science Training

MySQL Cluster Web Scalability, % Availability. Andrew

Getting Started with ORE - 1

Transcription:

Session 7: Oracle R Enterprise 1.5.1 OAAgraph Package Oracle Spatial and Graph PGX Graph Algorithms Oracle R Technologies Mark Hornick Director, Oracle Advanced Analytics and Machine Learning July 2017

Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle s products remains at the sole discretion of Oracle.

Topics Graph Analysis and Machine Learning ORE integration with Oracle Spatial and Graph option for PGX OAAgraph interface OAAgraph examples

Graph Analysis And Machine Learning

Graph Analysis A methodology in data analysis Represent your data as a graph Data entities become nodes Relationships become edges Analyze fine-grained relationships through the graph Navigate multi-hop relationships quickly Without computing expensive joins repeatedly

Graph Analysis Inter-relationships between data and networks are growing in importance Graphs are everywhere Facebook (friends of friends), Twitter, LinkedIn, etc. Most data has inter-relationships that contain insights Two major types of graph algorithms Computational Graph Analytics: Analysis of entire Graph Influencer ID, community detect, pattern machine, recommendations Graph Pattern Matching Queries that find sub-graphs fitting relationship patterns Structure finding, ranking communities, path finding frien d spouse frien d frien d 6

Graph Analysis and Machine Learning Graph analysis can augment Machine Learning Raw Data Graph View Typical machine learning techniques create/train models based on observed features Graph analysis can provide additional strong signals That make predictions more accurate D1 D2 D3 Feature1 Feature 2 Feature 3 Feature 4 Feature 5 Feature 6 Feature 7 e.g. Can you identify groups of close customers from their call graph in order to predict customer churn? Predictive Model

ORE integration with Oracle Spatial and Graph option for PGX

What is PGX? PGX (Parallel Graph Analytics) An in-memory graph analysis engine Originated from Oracle Labs Provides fast, parallel graph analysis Built-in Algorithm Packages Graph Query (Pattern-Matching) Custom Algorithm Compilation (Advanced Use case) Integrated with Oracle Product(s) Oracle Big Data Spatial and Graph (with BDA) Property Graph Support at RDBMS 12.2c (Planned) 35+ graph algorithms Exceeds open source tool capabilities OTN Technology Preview Oracle Big Data and Spatial and Graph 9

PGX Graph Algorithms Ranking Pagerank (+ variants) Vertex Betweenness Centrality (including approximations) Closeness Centrality Eigenvector Centrality Degree Centrality Hyperlink-Induced Topic Search (HITS) Path Finding Dijkstra (+ variants) Bellman Ford (+ variants) Hop Distance (+ variants) Fattest path Partitioning Weakly and Strongly Connected Components Conductance and Modularity Community Detection Recommendation Twitter s whom-to-follow Matrix Factorization Other Breadth First Search with filter Triangle Counting Degree Distribution K-core Adamic Adar

OAAgraph An R interface integrating PGX and ORE/ORAAH for Machine Learning

Why an R interface to Graph? Single, unified interface across complementary technologies Work with R data.frames and convenient functions across ML and graph Results returned as R data.frames allows further processing in R env R users take advantage of multiple, powerful technologies Highly scalable PGX engine on both Oracle Database and Hadoop Integrated with Oracle R Enterprise, part of Oracle Database Advanced Analytics option Integrated with Oracle R Advanced Analytics for Hadoop, part of Oracle Big Data Connectors 12

Graph Analytics Machine Learning Compute graph metric(s) Add to structured data Build predictive model using graph metric Explore graph or compute new metrics using ML result Add to graph Build model(s) and score or classify data

OAAgraph Architecture with Oracle Database OAAgraph gives remote control of PGX server PGX loads graph from database (ore.frames) R Client PGX Server Client ORE OAAgraph Oracle Database OAAgraph is an additional R package that comes with Oracle R Enterprise Database Server

OAAgraph Architecture with Spark/Hadoop OAAgraph gives remote control of PGX server PGX loads graph via SPARK data frames R Client PGX Server Client ORAAH OAAgraph OAAgraph is also available with Oracle R Advanced Analytics for Hadoop Hadoop & Spark HDFS / Hive / Oracle Big Data (Hadoop) Cluster

Execution Overview (ORE) Initialization and Connection R Client PGX Server Client ORE OAAgraph Oracle Database # Connect R client to # Oracle Database using ORE R> ore.connect(..) # Connect to PGX server # using OAAgraph R> oaa.init(..) R> oaa.graphconnect(...) Database Server

Execution Overview (ORE) Data Source Graph data represented as two tables Nodes and Edges Multiple graphs stored in database Using user-specified, unique table names PGX Server Node Table Edge Table Oracle Database Node ID Node Prop 1 (name) Node Prop 2 (age) 1238 John 39 From Node To Node Edge Prop 1 (relation) 1238 1299 Likes 1299 4818 FriendOf node1 edge1 node2 edge2 Database Server 1299 Paul 41 4818 1299 6637 FriendOf

Execution Overview (ORE) Loading Graph R Client PGX Server Client ORE OAAgraph # Load graph into PGX: # Graph load happens at the server side. # Returns OAAgraph object, which is a # proxy (remote handle) for the graph in PGX R> mygraph <- oaa.graph (NodeTable, EdgeTable,...) node Oracle Database edge Database Server

Execution Overview (ORE) Running Graph Algorithm R Client PGX Server Client ORE OAAgraph Oracle Database # e.g. compute Pagerank for every node in the graph # Execution occurs in PGX server side R> result1<- pagerank (mygraph,... ) Database Server

Execution Overview (ORE) Iterating remote values with cursor R Client pagerank PGX Server Client ORE OAAgraph Oracle Database # e.g. compute Pagerank for every node in the graph # Execution occurs in PGX server side R> result1<- pagerank (mygraph,... ) Database Server # Return value is a cursor object # for the computed result: # client can get local data frames by oaa.next() R> df <- oaa.next(result1, 10)

Execution Overview (ORE) Querying the graph R Client x 0.05 y z 0.2 0.001 w 0.01 Client ORE OAAgraph # Query graph using a SQL syntax pattern specification R> q_result <- oaa.cursor(mygraph, SELECT n.name, m.name, n.pagerank, m.pagerank WHERE (n WITH pagerank < 0.1) -> (m), n.pagerank > m.pagerank ORDER BY n.pagerank ) # Returns a cursor to examine results R> df <- oaa.next(q_result, 10) PGX Server Oracle Database Database Server

Execution Overview (ORE) Exporting the result to DB R Client PGX Server Client ORE OAAgraph # Export result to DB as Table(s) R> oaa.create(mygraph, nodetablename = node, nodeproperties = c( pagerank, ), ) NODE S Oracle Database EDGES Database Server

Execution Overview (ORE) Continuing analysis with ORE Machine Learning R Client PGX Server Client ORE OAAgraph # Machine Learning analysis can be applied # to the exported tables identified using ore.frames R> model <- ore.odmkmmeans(formula = ~., data = NODES, num.centers = 5, ) R> scores <- predict(model, NODES, ) NODES Oracle Database EDGES Database Server

PGX Performance X86 Server Xeon E5-2660 2.2Ghz 2 socket x 8 cores x 2HT 256GB DRAM #Nodes #Edges Closeness Centrality (PGX) Closeness Centrality (igraph) Betweenness Centrality (PGX) Betweenness Centrality (igraph) Epinion 75,879 811,480 17 sec 6.27 min 45 sec 11.5 min Google 107,614 2,4476,570 15 sec 132 min 30 min 290 min Churner 3,000,000 16,519,402 26 min 5+ days 20 hrs 5+ days Patent 3,774,768 33,037,894 6.6 hrs 5+ days 48 hrs 5+ days LiveJ 4,846,609 85,702,474 5.8 hrs 5+ days 60 hrs 5+ days Effectively does not complete! 24

OAAgraph Examples

Example: Connecting to PGX library(ore) library(oaagraph) dbhost <- "<DATABASE_HOST>" dbuser <- "<DATABASE_USERNAME>" dbpassword <- "<DATABASE_PASSWORD>" dbsid <- "<DATABASE_SID>" pgxbaseurl <- "<PGX_BASE_URL>" ore.connect(host = dbhost, user = dbuser, password = dbpassword, sid = dbsid) oaa.graphconnect(pgxbaseurl = pgxbaseurl, dbhost = dbhost, dbsid = dbsid, dbuser = dbuser, dbpassword = dbpassword)

Example: Create some data node and edge tables #-- Create the node table in Oracle Database VID <- c(1, 2, 3, 4, 5) NP1 <- c("node1", "node2", "node3", "node4", "node5") NP2 <- c(111.11, 222.22, 333.33, 444.44, 555.55) NP3 <- c(1, 2, 3, 4, 5) nodes <- data.frame(vid, NP1, NP2, NP3) ore.drop(table="my_nodes") ore.create(nodes, table = "MY_NODES") #-- Create the edge table in Oracle Database EID <- c(1, 2, 3, 4, 5) SVID <- c(1, 3, 3, 2, 4) DVID <- c(2, 1, 4, 3, 2) EP1 <- c("edge1", "edge2", "edge3", "edge4", "edge5") EL <- c("label1", "label2", "label3", "label4", "label5") edges <- data.frame(eid, SVID, DVID, EP1, EL) ore.drop(table="my_edges") ore.create(edges, table = "MY_EDGES")

Example: Create some data node and edge tables #-- Create a graph in PGX from the node and edge tables in the database graph <- oaa.graph(my_edges, MY_NODES, "mypgxgraph") names(graph, "nodes") names(graph, "edges") #-- See result of counttriangles function, which gives an overview of the #-- number of connections between nodes in neighborhoods counttriangles(graph, sortverticesbydegree=false) #-- See results from degree algorithm variants, note the graph nodes #-- are augmented with new properties as indicated by the 'name' argument degree(graph, name = "OutDegree") degree(graph, name = "InDegree", variant = "in") degree(graph, name = "InOutDegree", variant = "all")

Example: Load graph from database tables #-- Create a graph in PGX from the node and edge tables in the database graph <- oaa.graph(my_edges, MY_NODES, "mypgxgraph") names(graph, "nodes") names(graph, "edges") #-- See result of counttriangles function, which gives an overview of the #-- number of connections between nodes in neighborhoods counttriangles(graph, sortverticesbydegree=false) #-- See results from degree algorithm variants, note the graph nodes #-- are augmented with new properties as indicated by the 'name' argument degree(graph, name = "OutDegree") degree(graph, name = "InDegree", variant = "in") degree(graph, name = "InOutDegree", variant = "all")

Example: Create cursors to access the results #-- Create a cursor including the degree properties cursor <- oaa.cursor(graph, c("outdegree", "InOutDegree", "InDegree"), "nodes") oaa.next(cursor, 5) #-- Create a cursor over the degree properties using #-- the PGX SQL-like query language PGQL cursor <- oaa.cursor(graph, query = "select n.outdegree, n.inoutdegree, n.indegree where (n) order by n.outdegree desc") #-- View the first 5 entries from the cursor oaa.next(cursor, 5)

Example: Create cursors to access the results #-- See results from the pagerank algorithm pagerankcursor <- pagerank(graph, 0.085, 0.1, 100) oaa.next(pagerankcursor, 5) #-- Create a cursor over the pagerank property using PGQL cursor <- oaa.cursor(graph, query = "select n.pagerank where (n) order by n.pagerank desc") oaa.next(cursor, 5) #-- This could be done using the R interface as well... cursor <- oaa.cursor(graph, "pagerank", ordering="desc") oaa.next(cursor, 5)

Example: More graph analytics and create snapshots #-- Compute the adamic adar index for edges topedges <- adamicadarcounting(graph) oaa.next(topedges) #-- List any graph snapshots available oaa.graphsnapshotlist() #-- Export a binary snapshot of the whole graph into Oracle Database #-- and view the listing again oaa.graphsnapshotpersist(graph, nodeproperties = TRUE, edgeproperties = TRUE) oaa.graphsnapshotlist()

Example: Read snapshots and create tables from graphs #-- Read the snapshot back into memory graph2 <- oaa.graphsnapshot("mypgxgraph") #-- Export the graph nodes and specific node properties from memory into a database table oaa.create(graph2, nodetablename = "RANKED_NODES", nodeproperties = TRUE) #-- Export both nodes and edges as tables from memory into the database, #-- but only export the pagerank node property oaa.create(graph2, nodetablename = "RANKED_GRAPH_N", nodeproperties = c("np1", "pagerank"), edgetablename = "RANKED_GRAPH_E") #-- Export graph edges and their properties from memory into a database table oaa.create(graph2, edgetablename = "RANKED_EDGES", edgeproperties = TRUE) #-- Free the graphs at the PGX server oaa.rm(graph) oaa.rm(graph2)

Learn More about Oracle s R Technologies http://oracle.com/goto/r 35