Graph Distribution
Graph Database SRC Relation DEST
Graph Database Use cases: Fraud detection Recommendation engine Social networks...
RedisGraph Property graph Labeled entities Schema less Cypher query language Aggregations, Arithmetic expressions, Sort... Tabular resultset
Structure
Tables Person Visit Country Name Age Height SRC DEST Name Population Roi 33 187 1 2 Israel 8.5M Hila 33 170 2 2 Japan 127M Shany 23 167 2 3 Italy 60M Amit 31 180 4 1 4 3
Documents ID: 1, Name: Roi, Age: 33, ID: 6, Name: Japan, Population: 127M Height: 187, Visited: [6]
Graph structure 101
Adjacency list 1 2 3 4 3 2 1 4
Adjacency matrix 1 0 1 0 1 0 0 0 0 1 0 0 1 1 0 1 0 0 1 0 1 0 0 0 1 1 0 1 1 0 1 1 1 0 1 0 1 1 0 0 Node i is connected to node j If A[i,j] = 1
Hexastore S Subject SPO OSP SOP PSO OPS POS P Predicate O Object 6
Graph structure Hexastore Triplets SPO:Michael:Boss:Jim SOP:Michael:Jim:Boss OPS:Jim:Boss:Michael OSP:Jim:Michael:Boss PSO:Boss:Michael:Jim POS:Boss:Jim:Michael Michael S Boss P Jim O
Node property set Entities - Key value store. Person node with attributes: { name : Bruce Buffer, age : 60, gender : male }
Problem 2 billion users 338 average friends for user 676 billion edges 152 terabytes ~= 1024*32 bytes per user + 64 * 2 bytes per edge
Partitioning
Entities distribution Property set 1 Property set 2 Graph index
Query Find friends of mine who ve visited places I ve been to and are older than me. Match (ME:person)-[friend]->(F:person)-[visited]->(C:country)<-[visit]-(ME) WHERE ME.ID = 33 AND F.age > ME.age RETURN F.name, C.name
Graph traversal (ME:person) ME.ID = 33 Graph index
Graph traversal (ME:person)-[friend]->(F:person) Graph index
Graph traversal (F:person)-[visited]->(C:country) Graph index
Graph traversal (C:country)<-[visit]-(ME) Graph index
Resultset Friend ID Friend name Country ID Country name 70? 25? 92? 55? 56? 4?
Query WHERE F.age > ME.age RETURN F.name, C.name NETWORK! Fetch age for ID 33 Index Entities
Query example continued WHERE F.age > ME.age RETURN F.name, C.name NETWORK! Fetch name of every entity in (IDs) Entity s age > 29 Index Entities
Resultset Friend ID Friend name Country ID Country name 70 Noam 25 Japan
Index distribution Friend relation Visit relation Graph index
Query Find all posts liked by friends of friends of mine, written by author X. MATCH (ME:person)-[friend]->(:person)-[friend]->(F:person)-[like]->(post)<-[author]-(A:author) WHERE ME.ID=46 AND A.ID=71070 RETURN A.name, F.name
Query 1. Node X contains FRIEND relations. 2. Seek to my ID in Node X (1 RPC). Retrieve a list of friend uids. 3. Do multiple seeks for each of the friend uids, to generate a list of friends of friends uids. result set 1 Friend Index (ME:person)-[friend]->(:person)-[friend]->(F:person) Query executor
Resultset 1 Friends of friends Friend ID Friend name 70? 92? 56?
Query 1. Node Y contains posting list for predicate LIKE. 2. Ship result set 1 to Node Y (1 RPC), and do seeks to generate a list of all posts liked by result set 1. result set 2 Like Index (F:person)-[like]->(post) Resultset 1 Query executor
Resultset 2 Liked posts Friend ID Friend name Post ID 70? 534 70? 431 92? 8964 56? 12 56? 5356
Query Node Z contains relations for predicate AUTHOR. Ship result set 2 to Node Z (1 RPC). Seek to author X, and generate a list of posts authored by X. result set 3 Author Index (post)<-[author]-(a:author) Resultset 2 Query executor
Resultset 4 Intersected resultset 2 and 3 Friend ID Friend name Post ID Author ID Author name 70? 534 71070? 92? 8964 71070?
Query Node N contains names for all uids, ship result set 4 to Node N (1 RPC), and convert uids to names by doing multiple seeks. Author Index RETURN A.name, F.name Resultset 4 Query executor
Resultset 4 Intersected resultset 2 and 3 Friend ID Friend name Post ID Author ID Author name 70 Ailon 534 71070 Omri 92 Boaz 8964 71070 Omri
RedisGraph Not distributed, Yet, Work in progress: Compact distributed index Concurrent fast independent traversals
@roilipman (you)-[ask]->(question)
Solutions JanusGraph successor of Titan Relays on a storage backend e.g. Casandar. Provides a graph interface on top of a table. Delegates storing, replicating, distributing and persisting a graph to the underline storage backend. Takes a mature application from a similar domain and introduce a new data type API on top of existing data structure. (not optimal)
Solutions DGraph Uses the concept of RDF NQuad to represents connections and badger as its key value store. Both the graph index and the entities are distributed.
Solutions Arangodb From my understanding this multi model database uses documents to represent all three data types: Documents, key value store and graph. Not sure about how it distributes its data but it s using RAFT to ensure consistency It is ACID.