E 6885 Topics in Signal Processing -- Network Science E6885 Network Science Lecture 10: Graph Database (II) Ching-Yung Lin, Dept. of Electrical Engineering, Columbia University November 18th, 2013
Course Structure 2 Class Date Lecture Topics Covered 09/09/13 1 Overview of Network Science 09/16/13 2 Network Representation and Feature Extraction 09/23/13 3 Network Paritioning, Clustering and Visualization 09/30/13 4 Network Analysis Use Case 10/07/13 5 Network Sampling, Estimation, and Modeling 10/14/13 6 Network Topology Inference 10/21/13 7 Network Information Flow 10/28/13 8 Dynamic & Probabilistic Networks and Graph Database 11/11/13 9 Final Project Proposal Presentation 11/18/13 10 Graph Databases II 11/25/13 11 Information Diffusion in Networks 12/02/13 12 Large-Scale Network Processing System 13 Final Project Presentation I 12/09/13
RDF and SPARQL 3
RDF and SPARQL 4
Resource Description Format (RDF) A W3C standard sicne 1999 Triples 5 Example: A company has nince of part p1234 in stock, then a simplified triple rpresenting this might be {p1234 instock 9}. Instance Identifier, Property Name, Property Value. In a proper RDF version of this triple, the representation will be more formal. They require uniform resource identifiers (URIs).
An example complete description 6
Advantages of RDF 7 Virtually any RDF software can parse the lines shown above as self-contained, working data file. You can declare properties if you want. The RDF Schema standard lets you declare classes and relationships between properties and classes. The flexibility that the lack of dependence on schemas is the first key to RDF's value. Split trips into several lines that won't affect their collective meaning, which makes sharding of data collections easy. Multiple datasets can be combined into a usable whole with simple concatenation. For the inventory dataset's property name URIs, sharing of vocabulary makes easy to aggregate.
SPARQL Query Langauge for RDF The following SPQRL query asks for all property names and values associated with the fbd:s9483 resource: 8
The SPAQRL Query Result from the previous example 9
Another SPARQL Example What is this query for? Data 10
Open Source Software Apache Jena 11
Property Graphs 12
Reference 13
A usual example 14
Query Example I 15
Query Examples II & III Computational intensive 16
Graph Database Example 17
Executation Time in the example of finding extended friends (by Neo4j) 18
Modeling Order History as a Graph 19
A query language on Property Graph Cypher 20
Cypher Example 21
Other Cypher Clauses 22
Property Graph Example Shakespheare 23
Creating the Shakespeare Graph 24
Query on the Shakespear Graph 25
Another Query on the Shakespear Graph 26
Chaining on the Query 27
Example Email Interaction Graph What's this query for? 28
Building Application Example Collaborative Filtering 29
How to make graph database fast? 30
Use Relationships, not indexes, for fast traversal 31
Storage Structure Example 32
Nodes and Relationships in the Object Cache 33
IBM System G 34
What is System G? A Complete Set of Visualizations, Analytical Algorithms, Middleware and Data Stores Designed to Support Graph Applications Rich Graph Algorithm/ Functions Primitives Centralities Communities Graph Sampling Network Info Flow Shortest Paths Ego Net Features Graph Matching Graph Query Graph Search Bayesian Networks Latent Net Inference Markov Networks Multi Graph Type Support Few, very large graphs (e.g. social, Internet of things) And More: Graph Visualizations Graph Databases Many, many small graphs (e.g. protein, healthcare) Large semantic graph (Semantic web, RDF, Graph search, Graph recommendation) Large Probabilistic graphical models: Bayesian networks, Markovian networks, HMMs, etc. Graph Middleware for Hardware Platform Optimization Graph Data Interface and Processing Interface Graph-Embedded Industry Solutions Based Basedon on~$21m ~$21Mresearch researchfunding funding==> ==> 65+ 65+research researchinnovations/papers innovations/papersincluding including77best bestpaper paperawards awards 35 New: BigData 2013 Best Paper Award (http://www.ieeebigdata.org)
Graphs Graph Database RDF / Property Graph Attributes Contextual Analysis 36 Topological Analytics Collective Graph Macro Collective Analysis Graphical Models Activity Graph Micro & Reasoning Cognitive Understanding
Preliminary comparison for Recommendation & Visualization IBM KnowledgeView 1-year Access Log: 72.3K users, 82.1K docs, and 1.74 million downloads Recommendation ==> 2-hop traversal & ranking Query Time (sec) / App. Type Collaborative Filtering for Recommenda tion* Centroid Graph Extraction for Visualization DB2 via SQL 0.24 52.0 (cold) 50.6 (cache) Oracle via SQL 0.35 201.0 (cold) 42.0 (cache) DB2RDF via SPARQL Neo4j Titan (Berk. DB) Titan (HBase) GBase (HBase) System G Native Store TBD 0.068 0.281 0.414 0.201 0.015 TBD 4.8 (cold) 1.2 (cache) 17.3 (cold) 6.8 (cache) 24.2 (cold) 5.7 (cache) 27.0 (cold) 2.4 (cache) 4.2 (cold) 0.07 (cache) Note: All numbers are preliminary. 37 For Visualization ==> 4-hop traversal & rankings
An Emerging Benchmark Test Set: data generator of full social media activity simulation of any number of users Next Bi-Annual Meeting: November 19 38
Questions? 39