TriAD: A Distributed Shared-Nothing RDF Engine based on Asynchronous Message Passing
|
|
- Morgan Bryan
- 6 years ago
- Views:
Transcription
1 TriAD: A Distributed Shared-Nothing RDF Engine based on Asynchronous Message Passing Sairam Gurajada, Stephan Seufert, Iris Miliaraki, Martin Theobald Databases & Information Systems Group ADReM Research Group Max-Planck Institute for Informatics University of Antwerp Saarbrücken, Germany Antwerp, Belgium 1 / 19
2 Resource Description Framework (RDF) RDF is a data model for representing information on the Web 2 / 19
3 Resource Description Framework (RDF) RDF is a data model for representing information on the Web RDF triples obtained from IE Barack Obama isa President of USA. Barack Obama bornin Honolulu. Barack Obama Nobel Peace Prize. Barack Obama memberof Democratic Party. 2 / 19
4 Resource Description Framework (RDF) RDF is a data model for representing information on the Web RDF triples obtained from IE Barack Obama isa President of USA. Barack Obama bornin Honolulu. Barack Obama Nobel Peace Prize. Barack Obama memberof Democratic Party. Barack Obama isa Singer isa Lady Gaga bornin memof Honolulu Democratic Party Grammy Award bornin locin USA New York locin locin locin locin Texas New Haven governor bornin Plains Nobel Peace prize George W Bush bornin memof Jimmy Carter Republican Party 2 / 19
5 Resource Description Framework (RDF) RDF is a data model for representing information on the Web RDF triples obtained from IE Barack Obama isa President of USA. Barack Obama bornin Honolulu. Barack Obama Nobel Peace Prize. Example datasets: YAGO (>120 MBarack facts), Obama DBpedia memberof (>400 MDemocratic facts), Party. Freebase,... isa isa Barack Obama Singer Lady Gaga bornin memof Honolulu Democratic Party Grammy Award bornin locin USA New York locin locin locin locin Texas New Haven governor bornin Plains Nobel Peace prize George W Bush bornin memof Jimmy Carter Republican Party 2 / 19
6 TriAD: A Distributed Shared-Nothing RDF Engine based on Asynchronous Message Passing Scale of RDF Data Published Linked Open Data Cloud I As of 2011, there exists more than 30 billion triples in the linked data cloud (from more than 300 sources) I Billion Triple Challenge (BTC) published more than one billion triples in years 2008, 2010, 2011, 2012 Source: 3 / 19
7 Indexing and Querying RDF Data RDF triples are stored and indexed in a relational table (relational approach) SPARQL is the language suggested by W3C for querying RDF data SPARQL has many similarities with standard SQL SELECT-PROJECT-JOIN forms the main building blocks of SPARQL 4 / 19
8 Indexing and Querying RDF Data RDF triples are stored and indexed in a relational table (relational approach) SPARQL is the language suggested by W3C for querying RDF data SPARQL has many similarities with standard SQL SELECT-PROJECT-JOIN forms the main building blocks of SPARQL SPARQL Query: Find persons who are born in USA and a prize.. SELECT?person,?city,?prize WHERE { (R1)?person <bornin>?city. (R2)?city <locatedin> USA. (R3)?person <>?prize. } bornin locatedin?person?city USA?prize RDF Data: Subject Predicate Object Barack Obama bornin Honolulu Barack Obama Nobel Peace Prize Barack Obama Grammy Award Barack Obama memberof Republican Party Honolulu locatedin United States Barack Obama isa Singer John F. Kennedy bornin Brookline John F. Kennedy memberof Republican Party John F. Kennedy diedin Dallas Dallas locatedin United States Brookline locatedin United States / 19
9 Indexing and Querying RDF Data RDF triples are stored and indexed in a relational table (relational approach) SPARQL is the language suggested by W3C for querying RDF data SPARQL has many similarities with standard SQL SELECT-PROJECT-JOIN forms the main building blocks of SPARQL SPARQL Query: Find persons who are born in USA and a prize.. SELECT?person,?city,?prize WHERE { (R1)?person <bornin>?city. (R2)?city <locatedin> USA. (R3)?person <>?prize. }?person bornin locatedin?city USA?prize Results: (R1 R2 R3)?person?city?prize Barack Obama Honolulu Peace Nobel Prize Barack Obama Honolulu Grammy Award... RDF Data: Subject Predicate Object Barack Obama bornin Honolulu Barack Obama Nobel Peace Prize Barack Obama Grammy Award Barack Obama memberof Republican Party Honolulu locatedin United States Barack Obama isa Singer John F. Kennedy bornin Brookline John F. Kennedy memberof Republican Party John F. Kennedy diedin Dallas Dallas locatedin United States Brookline locatedin United States / 19
10 Current Approaches Efficiency thoroughly investigated in single-node setting crucial factors Join-order optimization, Join-ahead pruning, indexing layout, choice of operators Eg. Jena, Sesame, HexaStore, MonetDB-RDF, SW-Store, RDF-3X, TripleBit, BitMat, gstore,... 5 / 19
11 Current Approaches Efficiency thoroughly investigated in single-node setting crucial factors Join-order optimization, Join-ahead pruning, indexing layout, choice of operators Eg. Jena, Sesame, HexaStore, MonetDB-RDF, SW-Store, RDF-3X, TripleBit, BitMat, gstore,... Scalability recently a line of distributed systems has been proposed SHARD, H-RDF-3X, Trinity.RDF,... }{{}}{{} Relational-based Graph-based 5 / 19
12 Current Approaches Efficiency thoroughly investigated in single-node setting crucial factors Join-order optimization, Join-ahead pruning, indexing layout, choice of operators Eg. Jena, Sesame, HexaStore, MonetDB-RDF, SW-Store, RDF-3X, TripleBit, BitMat, gstore,... Scalability recently a line of distributed systems has been proposed SHARD, H-RDF-3X, Trinity.RDF,... }{{}}{{} Relational-based Graph-based Relational-based (Joins) vs Graph-based (Exploration) [distributed setting] SPARQL 1.0 requires a row-oriented output joins are inevitable Relational approaches suffer from large intermediate relations (inaccurate/insufficient statistics resulting in poor query plans) 5 / 19
13 Current Approaches Efficiency thoroughly investigated in single-node setting crucial factors Join-order optimization, Join-ahead pruning, indexing layout, choice of operators Eg. Jena, Sesame, HexaStore, MonetDB-RDF, SW-Store, RDF-3X, TripleBit, BitMat, gstore,... Scalability recently a line of distributed systems has been proposed SHARD, H-RDF-3X, Trinity.RDF,... }{{}}{{} Relational-based Graph-based Relational-based (Joins) vs Graph-based (Exploration) [distributed setting] SPARQL 1.0 requires a row-oriented output joins are inevitable Relational approaches suffer from large intermediate relations (inaccurate/insufficient statistics resulting in poor query plans) Graph-based systems use distributed graph exploration followed by relational joins Effective when graph exploration prunes a lot of bindings (in case of selective queries) 5 / 19
14 Current Approaches Efficiency thoroughly investigated in single-node setting crucial factors Join-order optimization, Join-ahead pruning, indexing layout, choice of operators Eg. Jena, Sesame, HexaStore, MonetDB-RDF, SW-Store, RDF-3X, TripleBit, BitMat, gstore,... Scalability recently a line of distributed systems has been proposed SHARD, H-RDF-3X, Trinity.RDF,... }{{}}{{} Relational-based Graph-based Relational-based (Joins) vs Graph-based (Exploration) [distributed setting] SPARQL 1.0 requires a row-oriented output joins are inevitable Relational approaches suffer from large intermediate relations (inaccurate/insufficient statistics resulting in poor query plans) Graph-based systems use distributed graph exploration followed by relational joins Effective when graph exploration prunes a lot of bindings (in case of selective queries) Problems with existing distributed relational-based RDF engines 1. Synchronous processing of joins 2. Dangling triples occur in intermediate relations but not in final results 5 / 19
15 1. Synchronous Processing of Joins Consider a query with four relations R 1, R 2, R 3, R 4 with join order: (R 1 2 R 2 ) 1 (R 3 3 R 4 ) R 1 R 2 R 4 R 3 6 / 19
16 1. Synchronous Processing of Joins Consider a query with four relations R 1, R 2, R 3, R 4 with join order: (R 1 2 R 2 ) 1 (R 3 3 R 4 ) Synchronous case 1 : MR job1 MR job2 MR job3 MR job3 1 MR job1 2 MR job2 3 R 1 R 2 R 4 R 3 6 / 19
17 1. Synchronous Processing of Joins Consider a query with four relations R 1, R 2, R 3, R 4 with join order: (R 1 2 R 2 ) 1 (R 3 3 R 4 ) Synchronous case 1 : MR job1 MR job2 MR job3 Synchronous case 2 : MR job1 MR job2 MR job1 MR job R 1 R 2 R 4 R 3 6 / 19
18 1. Synchronous Processing of Joins Consider a query with four relations R 1, R 2, R 3, R 4 with join order: (R 1 2 R 2 ) 1 (R 3 3 R 4 ) Synchronous case 1 : MR job1 MR job2 MR job3 Synchronous case 2 : MR job1 MR job2 Synchronous case 3 : (MR job1 MR job2) MR job3 MR job1 MR job3 1 2 MR job2 3 R 1 R 2 R 4 MR job3 Slave 1 1 MR job3 R 3 Slave 2 1 MR job1 2 MR job2 3 MR job1 2 MR job2 3 Wait!! 6 / 19
19 1. Synchronous Processing of Joins Consider a query with four relations R 1, R 2, R 3, R 4 with join order: (R 1 2 R 2 ) 1 (R 3 3 R 4 ) Synchronous case 1 : MR job1 MR job2 MR job3 Synchronous case 2 : MR job1 MR job2 Synchronous case 3 : (MR job1 MR job2) MR job Our approach: Minimize dependancies among join operators R 1 and only synchronize when needed! (using asynchronous communication (MPICH2) protocol) R 2 R 4 Slave 1 1 R 3 Slave Asynchronous Wait!! 6 / 19
20 2. Pruning Dangling Triples Consider a query with four relations R 1, R 2, R 3, R 4 with join order: (R 1 2 R 2 ) 1 (R 3 3 R 4 ) R 1 R 2 R 4 R 3 1 RDF-3X: a RISC-style Engine for RDF, VLDBJ / 19
21 2. Pruning Dangling Triples Consider a query with four relations R 1, R 2, R 3, R 4 with join order: (R 1 2 R 2 ) 1 (R 3 3 R 4 ) Sideways Information Passing (SIP) of RDF-3X 1 Powerful runtime pruning technique Shares information across join operators 1 Requires synchronization 2 3 R 1 R 2 R 4 R 3 1 RDF-3X: a RISC-style Engine for RDF, VLDBJ / 19
22 2. Pruning Dangling Triples Consider a query with four relations R 1, R 2, R 3, R 4 with join order: (R 1 2 R 2 ) 1 (R 3 3 R 4 ) Sideways Information Passing (SIP) of RDF-3X 1 Powerful runtime pruning technique Shares information across join operators 1 Requires synchronization 2 3 Our approach: Join-ahead pruning 1 Pre-partition the triples (into groups) R 1 R 2 R 4 R 3 1 RDF-3X: a RISC-style Engine for RDF, VLDBJ / 19
23 2. Pruning Dangling Triples Consider a query with four relations R 1, R 2, R 3, R 4 with join order: (R 1 2 R 2 ) 1 (R 3 3 R 4 ) Sideways Information Passing (SIP) of RDF-3X 1 Powerful runtime pruning technique Shares information across join operators 1 Requires synchronization 2 3 Our approach: Join-ahead pruning 1 Pre-partition the triples (into groups) 2 Query over groups to find the ones which are relevant to the query R 1 R 2 R 4 R 3 1 RDF-3X: a RISC-style Engine for RDF, VLDBJ / 19
24 2. Pruning Dangling Triples Consider a query with four relations R 1, R 2, R 3, R 4 with join order: (R 1 2 R 2 ) 1 (R 3 3 R 4 ) Sideways Information Passing (SIP) of RDF-3X 1 Powerful runtime pruning technique Shares information across join operators 1 Requires synchronization 2 3 Our approach: Join-ahead pruning 1 Pre-partition the triples (into groups) 2 Query over groups to find the ones which are relevant to the query 3 Scan & Join only triples from the relevant groups R 1 R 2 R 3 R 4 1 RDF-3X: a RISC-style Engine for RDF, VLDBJ / 19
25 2. Join-ahead Pruning via Graph Summarization isa isa Barack Obama Singer Lady Gaga bornin memof Honolulu Democratic Party Grammy Award bornin locin USA New York locin locin locin locin Texas New Haven memof governor bornin Plains Nobel Peace prize George W Bush bornin memof Jimmy Carter Republican Party 8 / 19
26 2. Join-ahead Pruning via Graph Summarization Locality-based grouping (using METIS) isa isa Barack Obama Singer Lady Gaga bornin memof Honolulu Democratic Party Grammy Award bornin locin USA New York locin locin locin locin Texas New Haven memof governor bornin Plains Nobel Peace prize George W Bush bornin memof Jimmy Carter Republican Party 8 / 19
27 2. Join-ahead Pruning via Graph Summarization Locality-based grouping (using METIS) isa isa Barack Obama Singer Lady Gaga bornin memof Honolulu Democratic Party Grammy Award bornin locin USA New York locin locin locin locin Texas New Haven memof governor bornin Plains Nobel Peace prize George W Bush bornin memof Jimmy Carter Republican Party Summary Graph memof governor locin, bornin locin isa,, bornin locin P 1 P 2 isa, locin, memof P 3 P 4 memof,bornin,bornin 8 / 19
28 2. Join-ahead Pruning via Graph Summarization Locality-based grouping (using METIS) isa isa Barack Obama Singer Lady Gaga bornin memof Honolulu Democratic Party Grammy Award bornin locin USA New York locin locin locin locin Texas New Haven memof governor bornin Plains Nobel Peace prize George W Bush bornin memof Jimmy Carter Republican Party Summary Graph memof governor locin, bornin locin isa,, bornin locin P 1 P 2 isa, locin, memof P 3 P 4 memof,bornin,bornin SPARQL Query: SELECT?city,?prize WHERE { Barack Obama <bornin>?city.?city <locatedin> USA. Barack Obama <>?prize. } Supernode Bindings:?city : P 1?prize : P 2, P 4 8 / 19
29 2. Join-ahead Pruning via Graph Summarization Locality-based grouping (using METIS) isa isa Barack Obama Singer Lady Gaga bornin memof Honolulu Democratic Party Grammy Award bornin locin USA New York locin locin locin locin Texas New Haven memof governor bornin Plains Nobel Peace prize George W Bush bornin memof Jimmy Carter Republican Party SPARQL Query: SELECT?city,?prize WHERE { Barack Obama <bornin>?city.?city <locatedin> USA. Barack Obama <governor>?city2. } Summary Graph memof governor locin, bornin locin isa,, bornin locin P 1 P 2 isa, locin, memof P 3 P 4 memof,bornin,bornin Supernode Bindings:!!! Empty Result?city : --?prize : -- 8 / 19
30 2. Join-ahead Pruning via Graph Summarization Locality-based grouping (using METIS) isa isa Barack Obama Singer Lady Gaga bornin memof Honolulu Democratic Party Grammy Award bornin locin USA New York locin locin locin locin Texas New Haven memof governor bornin Plains Nobel Peace prize George W Bush bornin memof Jimmy Carter Republican Party Summary Graph memof governor locin, bornin locin isa,, bornin locin P 1 P 2 isa, locin, memof P 3 P 4 memof,bornin,bornin SPARQL Query: SELECT?city,?prize WHERE { Barack Obama <bornin>?city.?city <locatedin> USA. Barack Obama <governor>?city2. } Supernode Bindings:!!! Empty Result?city : --?prize : -- False postives may occur but no false negatives!!! 8 / 19
31 TriAD Distributed RDF Store TriAD (for Triple Asynchronous and Distributed ) a distributed RDF Store 9 / 19
32 TriAD Distributed RDF Store TriAD (for Triple Asynchronous and Distributed ) a distributed RDF Store Main-memory backed six permutation distributed indexes (with locality-aware sharding and encoded summary information) 9 / 19
33 TriAD Distributed RDF Store TriAD (for Triple Asynchronous and Distributed ) a distributed RDF Store Asynchronous Processing of Joins via MPICH2 communication protocol Main-memory backed six permutation distributed indexes (with locality-aware sharding and encoded summary information) 9 / 19
34 TriAD Distributed RDF Store TriAD (for Triple Asynchronous and Distributed ) a distributed RDF Store Join-ahead pruning via Graph Summarization Asynchronous Processing of Joins via MPICH2 communication protocol Main-memory backed six permutation distributed indexes (with locality-aware sharding and encoded summary information) 9 / 19
35 TriAD Distributed RDF Store TriAD (for Triple Asynchronous and Distributed ) a distributed RDF Store Two-stage Query Optimization (distributed, multi-threading, and join-ahead pruning into account) Join-ahead pruning via Graph Summarization Asynchronous Processing of Joins via MPICH2 communication protocol Main-memory backed six permutation distributed indexes (with locality-aware sharding and encoded summary information) 9 / 19
36 TriAD Architecture RDF Data METIS Partitions Info Master Node Summary Graph SPARQL Query Encoded Triples RDF Parser Bidirectional Dictionaries Query Plan Local Query Processor Local SPO Indexes Partitioner Intermediate Results SPO SOP PSO OSP OPS POS Summary Triples Data Triples Horizontal Partitioning Encoded Triples Global Statistics Local Statistics Query Plan Local Query Processor Local SPO Indexes SPO SOP PSO OSP OPS POS Supernodes MPICH2 Asynchronous Communication Protocol Query Optimizer Global Query Plan Intermediate Encoded Results Triples SPARQL Query Parser Query Graph Query Plan SPARQL Results Results Intermediate Results Local Query Processor Local SPO Indexes SPO SOP PSO OSP OPS POS Slave 1 Slave 2 Slave n 10 / 19
37 TriAD Architecture Indexing 1 Encoded Triples RDF Data RDF Parser Bidirectional Dictionaries Query Plan Local Query Processor Local SPO Indexes Partitioner Intermediate Results SPO SOP PSO OSP OPS POS METIS Partitions Info Master Node Summary Graph Summary Triples Data Triples Horizontal Partitioning Encoded Triples Global Statistics Local Statistics Query Plan Local Query Processor Local SPO Indexes SPO SOP PSO OSP OPS POS Supernodes MPICH2 Asynchronous Communication Protocol Query Optimizer Global Query Plan Intermediate Encoded Results Triples SPARQL Query Parser Query Plan SPARQL Query Query Graph SPARQL Results Results Intermediate Results Local Query Processor Local SPO Indexes SPO SOP PSO OSP OPS POS Slave 1 Slave 2 Slave n 10 / 19
38 TriAD Architecture 2 Indexing 1 Encoded Triples RDF Data RDF Parser Bidirectional Dictionaries Query Plan Local Query Processor Local SPO Indexes Partitioner Intermediate Results SPO SOP PSO OSP OPS POS METIS Partitions Info Master Node Summary Graph Summary Triples Data Triples Horizontal Partitioning Encoded Triples Global Statistics Local Statistics Query Plan Local Query Processor Local SPO Indexes SPO SOP PSO OSP OPS POS Supernodes MPICH2 Asynchronous Communication Protocol Query Optimizer Global Query Plan Intermediate Encoded Results Triples SPARQL Query Parser Query Plan SPARQL Query Query Graph SPARQL Results Results Intermediate Results Local Query Processor Local SPO Indexes SPO SOP PSO OSP OPS POS Slave 1 Slave 2 Slave n 10 / 19
39 TriAD Architecture 2 Indexing 1 Encoded Triples RDF Data RDF Parser Bidirectional Dictionaries Query Plan Local Query Processor Local SPO Indexes Partitioner Intermediate Results SPO SOP PSO OSP OPS POS METIS Partitions Info Master Node Summary Graph Summary Triples Data Triples Horizontal Partitioning Encoded Triples Global Statistics Local Statistics Query Plan Local Query Processor Local SPO Indexes 3 SPO SOP PSO OSP OPS POS Supernodes MPICH2 Asynchronous Communication Protocol Query Optimizer Global Query Plan Intermediate Encoded Results Triples SPARQL Query Parser Query Plan SPARQL Query Query Graph SPARQL Results Results Intermediate Results Local Query Processor Local SPO Indexes SPO SOP PSO OSP OPS POS Slave 1 Slave 2 Slave n 10 / 19
40 TriAD Architecture Encoded Triples RDF Data RDF Parser Bidirectional Dictionaries Query Plan Local Query Processor Local SPO Indexes Partitioner Intermediate Results SPO SOP PSO OSP OPS POS METIS Partitions Info Master Node Summary Graph Summary Triples Data Triples Horizontal Partitioning Encoded Triples Global Statistics Local Statistics Query Plan Local Query Processor Local SPO Indexes SPO SOP PSO OSP OPS POS Supernodes MPICH2 Asynchronous Communication Protocol Intermediate Encoded Results Triples SPARQL Query Processing Query Optimizer Global Query Plan SPARQL Query Parser Query Plan SPARQL Query Query Graph SPARQL Results Results Intermediate Results Local Query Processor Local SPO Indexes SPO SOP PSO OSP OPS POS Stage1 Slave 1 Slave 2 Slave n 10 / 19
41 TriAD Architecture Encoded Triples RDF Data RDF Parser Bidirectional Dictionaries Query Plan Local Query Processor Local SPO Indexes Partitioner Intermediate Results SPO SOP PSO OSP OPS POS METIS Partitions Info Master Node Summary Graph Summary Triples Data Triples Horizontal Partitioning Encoded Triples Global Statistics Local Statistics Query Plan Local Query Processor Local SPO Indexes SPO SOP PSO OSP OPS POS Supernodes MPICH2 Asynchronous Communication Protocol Intermediate Encoded Results Triples SPARQL Query Processing Query Optimizer Global Query Plan SPARQL Query Parser Query Plan SPARQL Query Query Graph SPARQL Results Results Intermediate Results Local Query Processor Local SPO Indexes SPO SOP PSO OSP OPS POS Stage1 Stage2 Slave 1 Slave 2 Slave n 10 / 19
42 Indexing RDF graph partitioning using a locality-based non-overlapping partitioning algorithm i.e Each s (or o) of RDF triple s, p, o mapped to one supernode P s (or P o) 11 / 19
43 Indexing RDF graph partitioning using a locality-based non-overlapping partitioning algorithm i.e Each s (or o) of RDF triple s, p, o mapped to one supernode P s (or P o) Triple encoding Each triple s, p, o is encoded into two triples: data triple, summary triple Dictionary encoding: Each entity is assigned a globally unique id is computed by concatenating (supernode id) P s and a (local id) id s Eg. Barack Obama,, Nobel Prize Data triple: 1 1, 6, 4 3 Summary triple 1, 6, 4 11 / 19
44 Indexing RDF graph partitioning using a locality-based non-overlapping partitioning algorithm i.e Each s (or o) of RDF triple s, p, o mapped to one supernode P s (or P o) Triple encoding Each triple s, p, o is encoded into two triples: data triple, summary triple Dictionary encoding: Each entity is assigned a globally unique id is computed by concatenating (supernode id) P s and a (local id) id s Eg. Barack Obama,, Nobel Prize Data triple: 1 1, 6, 4 3 Summary triple 1, 6, 4 Locality-aware sharding and distributed indexing of data triples Each data triple is hash partitioned onto atmost two slaves and indexed in six permutations (in total) Eg. triple 1 1, 6, 4 3 is hashed on to slaves 1 mod n and 4 mod n (for n-slaves) 11 / 19
45 Indexing RDF graph partitioning using a locality-based non-overlapping partitioning algorithm i.e Each s (or o) of RDF triple s, p, o mapped to one supernode P s (or P o) Triple encoding Each triple s, p, o is encoded into two triples: data triple, summary triple Dictionary encoding: Each entity is assigned a globally unique id is computed by concatenating (supernode id) P s and a (local id) id s Eg. Barack Obama,, Nobel Prize Data triple: 1 1, 6, 4 3 Summary triple 1, 6, 4 Locality-aware sharding and distributed indexing of data triples Each data triple is hash partitioned onto atmost two slaves and indexed in six permutations (in total) Eg. triple 1 1, 6, 4 3 is hashed on to slaves 1 mod n and 4 mod n (for n-slaves) Summary graph index Summary triples are indexed in two-permutation adjacency list for efficient graph exploration 11 / 19
46 Query Processing In TriAD, query processing is performed in two stages Stage 1: Summary graph query processing (join-ahead pruning) Performed using graph exploration at master node to generate supernode bindings 12 / 19
47 Query Processing In TriAD, query processing is performed in two stages Stage 1: Summary graph query processing (join-ahead pruning) Performed using graph exploration at master node to generate supernode bindings Stage 2: Data graph query processing Done using relational joins in a distributed setup Inspired by the RISC style processing, we employ three physical operators: Distributed Index Scan (DIS): Invokes a parallel scan over a permutation list that is sharded across all slaves Distributed Merge Join (DMJ): If both input (or intermediate) relations are sorted according to the join key(s) in the query plan Distributed Hash Join (DHJ): If the input (or intermediate) relations are not sorted according to their join key(s) 12 / 19
48 Global Statistics & Query Optimization Precomputed statistics Computed in parallel at slaves and sent to master node Statistics: Individual cardinalities: s, p, o of SPO triples Pair cardinalities: (s, o), (s, p), (p, o) Join selectivities of predicate pairs (p 1, p 2 ) Similar statistics are computed over summary graph For a given query with patterns Q {R 1, R 2,..R n} Stage 1: Exploration Optimization We compute the exploration plan by using a bottom-up dynamic programming and the following cost model: Cost(R 1,..., R n) Card(R 1 ) + n i=2 ( Card(R i ) i j=1 ) Sel(R i, R j ) (1) 13 / 19
49 Global Statistics & Query Optimization Precomputed statistics Computed in parallel at slaves and sent to master node Statistics: Individual cardinalities: s, p, o of SPO triples Pair cardinalities: (s, o), (s, p), (p, o) Join selectivities of predicate pairs (p 1, p 2 ) Similar statistics are computed over summary graph For a given query with patterns Q {R 1, R 2,..R n} Stage 1: Exploration Optimization We compute the exploration plan by using a bottom-up dynamic programming and the following cost model: Cost(R 1,..., R n) Card(R 1 ) + n i=2 ( Card(R i ) i j=1 ) Sel(R i, R j ) (1) 13 / 19
50 Re-estimating cardinalities of a relations R i New cardinalities are re-estimated from the Stage 1 supernode bindings Card(R i ) := CQ s C s CQ o C o Card(R i) (on the fly computation) where C s, C o are the supernode bindings of R i and C Q s, C Q o are the supernode bindings of Q (R i Q) Stage 2: Global plan optimization A global plan is computed for stage 2 using re-estimated cardinalities and the following cost model Cost(Q) := Cost(Ri k) if R i denotes a DIS over permutation k; max(cost(q left ), Cost(Q right )) +Cost(Q left op Q right ) +Cost(Q left op Q right ) otherwise. The cost model captures the multi-threaded and distributed execution framework 14 / 19
51 Re-estimating cardinalities of a relations R i New cardinalities are re-estimated from the Stage 1 supernode bindings Card(R i ) := CQ s C s CQ o C o Card(R i) (on the fly computation) where C s, C o are the supernode bindings of R i and C Q s, C Q o are the supernode bindings of Q (R i Q) Stage 2: Global plan optimization A global plan is computed for stage 2 using re-estimated cardinalities and the following cost model Cost(Q) := Cost(Ri k) if R i denotes a DIS over permutation k; max(cost(q left ), Cost(Q right )) +Cost(Q left op Q right ) +Cost(Q left op Q right ) otherwise. The cost model captures the multi-threaded and distributed execution framework 14 / 19
52 Querying Optimization & Processing - Example SPARQL Query: Find persons who are born in USA and a prize.. SELECT?person,?city,?prize WHERE { R 1?person <bornin>?city. R 2?city <locatedin> USA. R 3?person <>?prize. R 4?prize <hasname>?name. } 15 / 19
53 Querying Optimization & Processing - Example SPARQL Query: Find persons who are born in USA and a prize.. SELECT?person,?city,?prize WHERE { R 1?person <bornin>?city. R 2?city <locatedin> USA. R 3?person <>?prize. R 4?prize <hasname>?name. } Stage 1 (Optimization) Summary Graph locin, bornin isa,, bornin locin memof governor locin P 1 P 2 isa, locin, memof P 3 P 4 memof,bornin,bornin Exploration Supernode Bindings:?person : P1, P2, P4?city : P1, P2, P4?prize : P2, P4 15 / 19
54 Querying Optimization & Processing - Example SPARQL Query: Find persons who are born in USA and a prize.. SELECT?person,?city,?prize WHERE { R 1?person <bornin>?city. R 2?city <locatedin> USA. R 3?person <>?prize. R 4?prize <hasname>?name. } Stage 1 (Optimization) Summary Graph locin, bornin isa,, bornin locin memof governor locin P 1 P 2 isa, locin, memof P 3 P 4?person DHJ(R 1,2,3,4 ) Cost:max(105,215)+130 Sharding:R 1,2,R 3,4 Cost:max(100,10)+5 DMJ(R 1,2) DMJ(R 3,4) Sharding: R 2?city?prize Cost:max(200,150)+15 Sharding: None Order: Cost: Slaves: Supernodes: Global Plan DIS(R 1) DIS(R 2) DIS(R 3) DIS(R 4) POS POS POS PSO [1,2] [1] [1,2] [1,2] [1,2,4] [1] [1,2,4] [1,2,4] memof,bornin Exploration Supernode Bindings:?person : P1, P2, P4?city : P1, P2, P4 Stage 2?prize : P2, P4 (Optimization),bornIn 15 / 19
55 Query Optimization & Processing Example (2) Slave 1 Slave 2 DHJ(R 1,2,3,4) Partial Results R 1,2 R 3,4 Partial Results DHJ(R 1,2,3,4) DMJ(R 1,2) DMJ(R 3,4) EP1 EP2 EP3 EP4 R 2 DMJ(R 1,2) DMJ(R 3,4) DIS(R 1 ) DIS(R 2 ) DIS(R ) DIS(R ) DIS(R ) DIS(R 3 ) 4 DIS(R 4 ) DIS(R ) (bornin,?o,?s) (locin,usa,?s) (,?o,?s) (hasname,?s,?o) (bornin,?o,?s) (locin,usa,?s) (,?o,?s) (hasname,?s,?o) P O S P O S P O S P S O P O S Stage 2: Distributed query execution P O S P S O 16 / 19
56 Evaluation We compared performance of TriAD with the following state-of-the-art systems Centralized systems RDF-3X, MonetDB-RDF, BitMat Distributed systems H-RDF-3X, Trinity.RDF, 4store, SHARD, Spark Datasets: (Synthetic) LUBM Million triples, 16GB raw data (Synthetic) LUBM Billion triples, 730GB raw data (Real world) BTC Billion triples, 231 GB raw data (Synthetic) WSDTS 109 Million triples, 15 GB raw data Benchmark queries for LUBM, BTC, & WSDTS datasets System Setup: TriAD, TriAD-SG is implemented in C++ Cluster setup: 12-nodes, 48GB RAM, 2 quad-core CPUs of 2.4GHz (HT enabled) 17 / 19
57 Evaluation - Large datasets LUBM Dataset 1.8 Billion triples (Query Performance in milli seconds) Geo.- Mean (in ms) 1e4 1e3 1e2 1e1 Warm/Main-memory Cold 1e0 TriAD TriAD-SG Trinity.RDF H-RDF-3X RDF-3X } {{ } Our Approach Queries Characteristics TriAD TriAD-SG Trinity.RDF H-RDF-3X RDF-3X Q1 Selective (6 joins) 7,631 2,146 12, e6 1.7e5 1.9e6 1.8e6 Q2 Non-Selective(1 join) 1,663 2,025 6, e5 4, e5 1.8e5 Q4 Selective (5 joins) Q7 Selective (6 joins) 14,895 16,863 31, e6 2.1e5 6.5e5 46, / 19
58 Summary TriAD is a fast distributed RDF engine built on top of asynchronous communication layer and multi-threaded execution framework efficient distributed and parallel join executions join-ahead pruning technique via graph summarization helps in pruning dangling triples and making query processing efficient distributed- and join-ahead pruning aware query optimizer so far reported fastest runtimes over three benchmark datasets: LUBM, BTC, WSDTS 19 / 19
59 Summary TriAD is a fast distributed RDF engine built on top of asynchronous communication layer and multi-threaded execution framework efficient distributed and parallel join executions join-ahead pruning technique via graph summarization helps in pruning dangling triples and making query processing efficient distributed- and join-ahead pruning aware query optimizer so far reported fastest runtimes over three benchmark datasets: LUBM, BTC, WSDTS Questions & Thank You!! 19 / 19
Advanced Data Management
Advanced Data Management Medha Atre Office: KD-219 atrem@cse.iitk.ac.in Aug 11, 2016 Assignment-1 due on Aug 15 23:59 IST. Submission instructions will be posted by tomorrow, Friday Aug 12 on the course
More informationDistributed Processing of Generalized Graph-Pattern Queries in SPARQL 1.1
Distributed Processing of Generalized Graph-Pattern Queries in SPARQL 1.1 Sairam Gurajada 1 and Martin Theobald 2 1 Max-Planck-Institute for Informatics, gurajada@mpi-inf.mpg.de 2 University of Ulm, martin.theobald@uni-ulm.de
More informationPERFORMANCE OF RDF QUERY PROCESSING ON THE INTEL SCC
MARC Symposium at ONERA'2012 1 PERFORMANCE OF RDF QUERY PROCESSING ON THE INTEL SCC Vasil Slavov, Praveen Rao, Dinesh Barenkala, Srivenu Paturi Department of Computer Science & Electrical Engineering University
More informationFast and Concurrent RDF Queries with RDMA-Based Distributed Graph Exploration
Fast and Concurrent RDF Queries with RDMA-Based Distributed Graph Exploration JIAXIN SHI, YOUYANG YAO, RONG CHEN, HAIBO CHEN, FEIFEI LI PRESENTED BY ANDREW XIA APRIL 25, 2018 Wukong Overview of Wukong
More informationAdvanced Data Management
Advanced Data Management Medha Atre Office: KD-29 atrem@cse.iitk.ac.in Aug 8, 206 Project Groups Groups for the course project are due on August 22, 206 8:00 IST. Instructions on how to submit project
More informationA Survey and Experimental Comparison of Distributed SPARQL Engines for Very Large RDF Data
A Survey and Experimental Comparison of Distributed SPARQL Engines for Very Large RDF Data Ibrahim Abdelaziz Razen Harbi Zuhair Khayyat Panos Kalnis King Abdullah University of Science and Technology Saudi
More informationD2R2: Disk-oriented Deductive Reasoning in a RISC-style RDF Engine
D2R2: Disk-oriented Deductive Reasoning in a RISC-style RDF Engine RuleML Nov. 03, 2011 Mohamed Yahya Martin Theobald {myahya,mtb}@mpi-inf.mpg.de Max-Planck Institute for Informatics, Saarbrücken, Germany
More informationFedX: Optimization Techniques for Federated Query Processing on Linked Data. ISWC 2011 October 26 th. Presented by: Ziv Dayan
FedX: Optimization Techniques for Federated Query Processing on Linked Data ISWC 2011 October 26 th Presented by: Ziv Dayan Andreas Schwarte 1, Peter Haase 1, Katja Hose 2, Ralf Schenkel 2, and Michael
More informationShortest paths on large graphs: Systems, Algorithms, Applications
Shortest paths on large graphs: Systems, Algorithms, Applications Andrey Gubichev TU München January 2012 Andrey Gubichev Shortest paths on large graphs 1 / 53 Outline Introduction Systems Algorithms Applications
More informationScalable Indexing and Adaptive Querying of RDF Data in the cloud
Scalable Indexing and Adaptive Querying of RDF Data in the cloud Nikolaos Papailiou Computing Systems Laboratory, National Technical University of Athens npapa@cslab.ece.ntua.gr Panagiotis Karras Management
More informationAdaptive Workload-based Partitioning and Replication for RDF Graphs
Adaptive Workload-based Partitioning and Replication for RDF Graphs Ahmed Al-Ghezi and Lena Wiese Institute of Computer Science, University of Göttingen {ahmed.al-ghezi wiese}@cs.uni-goettingen.de Abstract.
More informationScalable Join Processing on Very Large RDF Graphs
Scalable Join Processing on Very Large RDF Graphs Thomas Neumann Max-Planck Institute for Informatics Saarbrücken, Germany neumann@mpi-inf.mpg.de Gerhard Weikum Max-Planck Institute for Informatics Saarbrücken,
More informationSempala. Interactive SPARQL Query Processing on Hadoop
Sempala Interactive SPARQL Query Processing on Hadoop Alexander Schätzle, Martin Przyjaciel-Zablocki, Antony Neu, Georg Lausen University of Freiburg, Germany ISWC 2014 - Riva del Garda, Italy Motivation
More informationgstore: A Graph-based SPARQL Query Engine
Noname manuscript No. (will be inserted by the editor) gstore: A Graph-based SPARQL Query Engine Lei Zou M. Tamer Özsu Lei Chen Xuchuan Shen Ruizhe Huang Dongyan Zhao the date of receipt and acceptance
More informationTechnical Report. Graph Indexing for SPARQL Query Optimization with Extended Characteristic Sets
Technical Report Graph Indexing for SPARQL Query Optimization with Extended Characteristic Sets Marios Meimaris m.meimaris@imis.athena-innovation.gr (1) Institute for the Management of Information Systems
More information7 Analysis of experiments
Natural Language Addressing 191 7 Analysis of experiments Abstract In this research we have provided series of experiments to identify any trends, relationships and patterns in connection to NL-addressing
More informationExternal Sorting for Index Construction of Large Semantic Web Databases
External ing for Index Construction of Large Semantic Web Databases Sven Groppe Institute of Information Systems (IFIS) University of Lübeck Ratzeburger Allee 16, D-25 Lübeck, Germany + 51 5 576 groppe@ifis.uni-luebeck.de
More informationCIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )
Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL
More informationOutline Introduction Triple Storages Experimental Evaluation Conclusion. RDF Engines. Stefan Schuh. December 5, 2008
December 5, 2008 Resource Description Framework SPARQL Giant Triple Table Property Tables Vertically Partitioned Table Hexastore Resource Description Framework SPARQL Resource Description Framework RDF
More informationGraph Analytics in the Big Data Era
Graph Analytics in the Big Data Era Yongming Luo, dr. George H.L. Fletcher Web Engineering Group What is really hot? 19-11-2013 PAGE 1 An old/new data model graph data Model entities and relations between
More informationA Distributed Graph Engine for Web Scale RDF Data
A Distributed Graph Engine for Web Scale RDF Data Kai Zeng Jiacheng Yang Haixun Wang Bin Shao Zhongyuan Wang, UCLA Columbia University Microsoft Research Asia Renmin University of China kzeng@cs.ucla.edu
More informationAn Efficient Approach to Triple Search and Join of HDT Processing Using GPU
An Efficient Approach to Triple Search and Join of HDT Processing Using GPU YoonKyung Kim, YoonJoon Lee Computer Science KAIST Daejeon, South Korea e-mail: {ykkim, yjlee}@dbserver.kaist.ac.kr JaeHwan Lee
More informationA Comparison of MapReduce Join Algorithms for RDF
A Comparison of MapReduce Join Algorithms for RDF Albert Haque 1 and David Alves 2 Research in Bioinformatics and Semantic Web Lab, University of Texas at Austin 1 Department of Computer Science, 2 Department
More informationSurvey of RDF Storage Managers
Survey of RDF Storage Managers Kiyoshi Nitta Yahoo JAPAN Research Tokyo, Japan knitta@yahoo-corp.jp Iztok Savnik University of Primorska & Institute Jozef Stefan, Slovenia iztok.savnik@upr.si Abstract
More informationPathfinder/MonetDB: A High-Performance Relational Runtime for XQuery
Introduction Problems & Solutions Join Recognition Experimental Results Introduction GK Spring Workshop Waldau: Pathfinder/MonetDB: A High-Performance Relational Runtime for XQuery Database & Information
More informationTriple Stores in a Nutshell
Triple Stores in a Nutshell Franjo Bratić Alfred Wertner 1 Overview What are essential characteristics of a Triple Store? short introduction examples and background information The Agony of choice - what
More informationEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management
More informationSPARQL-Based Applications for RDF-Encoded Sensor Data
SPARQL-Based Applications for RDF-Encoded Sensor Data Mikko Rinne, Seppo Törmä, Esko Nuutila http://cse.aalto.fi/instans/ 5 th International Workshop on Semantic Sensor Networks 12.11.2012 Department of
More informationEindhoven University of Technology MASTER. A framework for query opimization on value-based RDF indexes. Wolff, B.G.J.
Eindhoven University of Technology MASTER A framework for query opimization on value-based RDF indexes Wolff, B.G.J. Award date: 2013 Disclaimer This document contains a student thesis (bachelor's or master's),
More informationIncremental Export of Relational Database Contents into RDF Graphs
National Technical University of Athens School of Electrical and Computer Engineering Multimedia, Communications & Web Technologies Incremental Export of Relational Database Contents into RDF Graphs Nikolaos
More informationA Study of RDB-based RDF Data Management Techniques
A Study of RDB-based RDF Data Management Techniques Vahid Jalali, Mo Zhou, Yuqing Wu School of Informatics and Computing, Indiana University, Bloomington {vjalalib, mozhou, yugwu}@indianaedu Abstract RDF
More informationRainbow: A Distributed and Hierarchical RDF Triple Store with Dynamic Scalability
2014 IEEE International Conference on Big Data Rainbow: A Distributed and Hierarchical RDF Triple Store with Dynamic Scalability Rong Gu, Wei Hu, Yihua Huang National Key Laboratory for Novel Software
More informationAccelerating SPARQL Queries and Analytics on RDF Data
Accelerating SPARQL Queries and Analytics on RDF Data Dissertation by Razen Mohammad Al-Harbi In Partial Fulfillment of the Requirements For the Degree of Doctor of Philosophy King Abdullah University
More informationConceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.
Conceptual Modeling on Tencent s Distributed Database Systems Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc. Outline Introduction System overview of TDSQL Conceptual Modeling on TDSQL Applications Conclusion
More informationEfficient SPARQL Query Evaluation In a Database Cluster
2013 IEEE International Congress on Big Data Efficient SPARQL Query Evaluation In a Database Cluster Fang Du 1,2, Haoqiong Bian 1, Yueguo Chen 1, Xiaoyong Du 1 1 School of Information and DEKE lab, Renmin
More informationWORQ: Workload-Driven RDF Query Processing. Department of Computer Science Purdue University
WORQ: Workload-Driven RDF Query Processing Amgad Madkour Ahmed Aly Walid G. Aref (Purdue) (Google) (Purdue) Department of Computer Science Purdue University Introduction RDF Data Is Everywhere RDF is an
More informationMulti-threaded Queries. Intra-Query Parallelism in LLVM
Multi-threaded Queries Intra-Query Parallelism in LLVM Multithreaded Queries Intra-Query Parallelism in LLVM Yang Liu Tianqi Wu Hao Li Interpreted vs Compiled (LLVM) Interpreted vs Compiled (LLVM) Interpreted
More informationHYRISE In-Memory Storage Engine
HYRISE In-Memory Storage Engine Martin Grund 1, Jens Krueger 1, Philippe Cudre-Mauroux 3, Samuel Madden 2 Alexander Zeier 1, Hasso Plattner 1 1 Hasso-Plattner-Institute, Germany 2 MIT CSAIL, USA 3 University
More informationAn Empirical Evaluation of RDF Graph Partitioning Techniques
An Empirical Evaluation of RDF Graph Partitioning Techniques Adnan Akhter, Axel-Cyrille Ngonga Ngomo,, and Muhammad Saleem AKSW, Germany, {lastname}@informatik.uni-leipzig.de University of Paderborn, Germany
More informationManaging and Mining Billion Node Graphs. Haixun Wang Microsoft Research Asia
Managing and Mining Billion Node Graphs Haixun Wang Microsoft Research Asia Outline Overview Storage Online query processing Offline graph analytics Advanced applications Is it hard to manage graphs? Good
More informationQuestion Answering over Knowledge Bases: Entity, Text, and System Perspectives. Wanyun Cui Fudan University
Question Answering over Knowledge Bases: Entity, Text, and System Perspectives Wanyun Cui Fudan University Backgrounds Question Answering (QA) systems answer questions posed by humans in a natural language.
More informationProcessing of Very Large Data
Processing of Very Large Data Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first
More informationRDFPath. Path Query Processing on Large RDF Graphs with MapReduce. 29 May 2011
29 May 2011 RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1 st Workshop on High-Performance Computing for the Semantic Web (HPCSW 2011) Martin Przyjaciel-Zablocki Alexander Schätzle
More informationRobust Discovery of Positive and Negative Rules in Knowledge-Bases
Robust Discovery of Positive and Negative Rules in Knowledge-Bases Paolo Papotti joint work with S. Ortona (Meltwater) and V. Meduri (ASU) http://www.eurecom.fr/en/publication/5321/detail/robust-discovery-of-positive-and-negative-rules-in-knowledge-bases
More informationAnswering Aggregate Queries Over Large RDF Graphs
1 Answering Aggregate Queries Over Large RDF Graphs Lei Zou, Peking University Ruizhe Huang, Peking University Lei Chen, Hong Kong University of Science and Technology M. Tamer Özsu, University of Waterloo
More informationOptimizer Challenges in a Multi-Tenant World
Optimizer Challenges in a Multi-Tenant World Pat Selinger pselinger@salesforce.come Classic Query Optimizer Concepts & Assumptions Relational Model Cost = X * CPU + Y * I/O Cardinality Selectivity Clustering
More informationOLAP over Federated RDF Sources
OLAP over Federated RDF Sources DILSHOD IBRAGIMOV, KATJA HOSE, TORBEN BACH PEDERSEN, ESTEBAN ZIMÁNYI. Outline o Intro and Objectives o Brief Intro to Technologies o Our Approach and Progress o Future Work
More informationScalaRDF: a Distributed, Elastic and Scalable In-Memory RDF Triple Store
ScalaRDF: a Distributed, Elastic and Scalable In-Memory RDF Triple Chunming Hu, Xixu Wang, Renyu Yang, Tianyu Wo State Key Laboratory of Software Development Environment (NLSDE) School of Computer Science
More informationGraph-Aware, Workload-Adaptive SPARQL Query Caching
Graph-Aware, Workload-Adaptive SPARQL Query Caching Nikolaos Papailiou Dimitrios Tsoumakos Panagiotis Karras Nectarios Koziris National Technical University of Athens, Greece {npapa, nkoziris}@cslab.ece.ntua.gr
More informationKoral: A Glass Box Profiling System for Individual Components of Distributed RDF Stores
Koral: A Glass Box Profiling System for Individual Components of Distributed RDF Stores Daniel Janke 1, Steffen Staab 1,2, and Matthias Thimm 1 1 Institute for Web Science and Technologies Universität
More informationMain-Memory Databases 1 / 25
1 / 25 Motivation Hardware trends Huge main memory capacity with complex access characteristics (Caches, NUMA) Many-core CPUs SIMD support in CPUs New CPU features (HTM) Also: Graphic cards, FPGAs, low
More informationThe Pennsylvania State University The Graduate School RDF3X-MPI: A PARTITIONED RDF ENGINE FOR DATA-PARALLEL SPARQL QUERYING
The Pennsylvania State University The Graduate School RDF3X-MPI: A PARTITIONED RDF ENGINE FOR DATA-PARALLEL SPARQL QUERYING A Thesis in Computer Science and Engineering by Sai Krishnan Chirravuri c 2014
More informationCSE 544: Principles of Database Systems
CSE 544: Principles of Database Systems Anatomy of a DBMS, Parallel Databases 1 Announcements Lecture on Thursday, May 2nd: Moved to 9am-10:30am, CSE 403 Paper reviews: Anatomy paper was due yesterday;
More informationEfficient SPARQL Query Evaluation via Automatic Data Partitioning
Efficient SPARQL Query Evaluation via Automatic Data Partitioning Tao Yang #, Jinchuan Chen, Xiaoyan Wang#, Yueguo Chen, and Xiaoyong Du # # School Of Information, Renmin University of China Key Laboratory
More informationD2R2: Disk-oriented Deductive Reasoning in a RISC-style RDF Engine
D2R2: Disk-oriented Deductive Reasoning in a RISC-style RDF Engine Mohamed Yahya and Martin Theobald Max-Planck Institute for Informatics, Saarbrücken, Germany {myahya,mtb}@mpi-inf.mpg.de Abstract. Deductive
More informationPS2 out today. Lab 2 out today. Lab 1 due today - how was it?
6.830 Lecture 7 9/25/2017 PS2 out today. Lab 2 out today. Lab 1 due today - how was it? Project Teams Due Wednesday Those of you who don't have groups -- send us email, or hand in a sheet with just your
More informationThe architectures of triple-stores
The architectures of triple-stores Iztok Savnik University of Primorska, Slovenia Tutorial, DBKDA 2018, Nice. dbkda-18 1 Outline Introduction Triple data model Storage level representations Data distribution
More informationAchieving Horizontal Scalability. Alain Houf Sales Engineer
Achieving Horizontal Scalability Alain Houf Sales Engineer Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches
More informationEnabling fine-grained HTTP caching of SPARQL query results
Enabling fine-grained HTTP caching of SPARQL query results Gregory Todd Williams willig4@cs.rpi.edu @kasei 1 Jesse Weaver weavej3@cs.rpi.edu @jrweave 1 Overview Motivation for (HTTP) caching SPARQL Related
More informationRDF-TX: A Fast, User-Friendly System for Querying the History of RDF Knowledge Bases
RDF-TX: A Fast, User-Friendly System for Querying the History of RDF Knowledge Bases Shi Gao Jiaqi Gu Carlo Zaniolo University of California, Los Angeles {gaoshi, gujiaqi, zaniolo}@cs.ucla.edu ABSTRACT
More informationContinuous Graph Pattern Matching Over Knowledge Graph Streams
Continuous Graph Pattern Matching ver Knowledge Graph treams yed Gillani, Gauthier Picard, Frederique Laforest Laboratoire Hubert Curien & Institute Mines t-etienne, France DEB 2016 [utline] Knowledge
More informationSemantic Web Information Management
Semantic Web Information Management Norberto Fernández ndez Telematics Engineering Department berto@ it.uc3m.es.es 1 Motivation n Module 1: An ontology models a domain of knowledge n Module 2: using the
More informationA Framework for Performance Study of Semantic Databases
A Framework for Performance Study of Semantic Databases Xianwei Shen 1 and Vincent Huang 2 1 School of Information and Communication Technology, KTH- Royal Institute of Technology, Kista, Sweden 2 Services
More informationGhislain Fourny. Big Data 5. Wide column stores
Ghislain Fourny Big Data 5. Wide column stores Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage 2 Where we are User interfaces
More informationProcessing SPARQL queries over distributed RDF graphs
The VLDB Journal DOI 10.1007/s00778-015-0415-0 REGULAR PAPER Processing SPARQL queries over distributed RDF graphs Peng Peng 1 Lei Zou 1 M. Tamer Özsu 2 Lei Chen 3 Dongyan Zhao 1 Received: 30 March 2015
More informationTrack Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross
Track Join Distributed Joins with Minimal Network Traffic Orestis Polychroniou Rajkumar Sen Kenneth A. Ross Local Joins Algorithms Hash Join Sort Merge Join Index Join Nested Loop Join Spilling to disk
More informationMatrix "Bit"loaded: A Scalable Lightweight Join Query Processor for RDF Data
Matrix "Bit"loaded: A Scalable Lightweight Join Query Processor for RDF Data Medha Atre, Vineet Chaoji,MohammedJ.Zaki, and James A. Hendler Dept. of Computer Science, Rensselaer Polytechnic Institute,
More informationarxiv: v1 [cs.db] 16 Feb 2018
PRoST: Distributed Execution of SPARQL Queries Using Mixed Partitioning Strategies Matteo Cossu elcossu@gmail.com Michael Färber michael.faerber@cs.uni-freiburg.de Georg Lausen lausen@informatik.uni-freiburg.de
More informationEfficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488)
Efficiency Efficiency: Indexing (COSC 488) Nazli Goharian nazli@cs.georgetown.edu Difficult to analyze sequential IR algorithms: data and query dependency (query selectivity). O(q(cf max )) -- high estimate-
More information7. Query Processing and Optimization
7. Query Processing and Optimization Processing a Query 103 Indexing for Performance Simple (individual) index B + -tree index Matching index scan vs nonmatching index scan Unique index one entry and one
More informationAn Entity Based RDF Indexing Schema Using Hadoop And HBase
An Entity Based RDF Indexing Schema Using Hadoop And HBase Fateme Abiri Dept. of Computer Engineering Ferdowsi University Mashhad, Iran Abiri.fateme@stu.um.ac.ir Mohsen Kahani Dept. of Computer Engineering
More informationRandom Walk TripleRush: Asynchronous Graph Querying and Sampling
Random Walk TripleRush: Asynchronous Graph Querying and Sampling Philip Stutz Department of Informatics University of Zurich Zurich, Switzerland stutz@ifi.uzh.ch Bibek Paudel Department of Informatics
More informationSPAR-Key: Processing SPARQL-Fulltext Queries to Solve Jeopardy! Clues
SPAR-Key: Processing SPARQL-Fulltext Queries to Solve Jeopardy! Clues Arunav Mishra 1, Sairam Gurajada 1, and Martin Theobald 2 1 Max Planck Institute for Informatics, Campus E1.4, Saarbrücken, Germany
More informationRDF in the clouds: a survey
The VLDB Journal (2015) 24:67 91 DOI 10.1007/s00778-014-0364-z REGULAR PAPER RDF in the clouds: a survey Zoi Kaoudi Ioana Manolescu Received: 28 October 2013 / Revised: 14 April 2014 / Accepted: 10 June
More informationBig Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic
More informationUniversity of Zurich. Hexastore: Sextuple Indexing for Semantic Web Data Management. Zurich Open Repository and Archive
University of Zurich Zurich Open Repository and Archive Winterthurerstr. 90 CH-8057 Zurich http://www.zora.uzh.ch Year: 2008 : Sextuple Indexing for Semantic Web Data Management Weiss, C; Karras, P; Bernstein,
More informationBitMat Scalable Indexing and Querying of Large RDF Graphs
BitMat Scalable Indexing and Querying of Large RDF Graphs (Technical Report) Medha Atre, Vineet Chaoji 2, Mohammed J. Zaki, and James A. Hendler Dept. of Computer Science, Rensselaer Polytechnic Institute,
More informationAn Extensible Framework for Query Optimization on TripleT-Based RDF Stores
An Extensible Framework for Query Optimization on TripleT-Based RDF Stores Bart G. J. Wolff Eindhoven University of Technology b.g.j.wolff@alumnus.tue.nl George H. L. Fletcher Eindhoven University of Technology
More informationarxiv: v4 [cs.db] 21 Mar 2016
Noname manuscript No. (will be inserted by the editor) Processing SPARQL Queries Over Distributed RDF Graphs Peng Peng Lei Zou M. Tamer Özsu Lei Chen Dongyan Zhao arxiv:4.6763v4 [cs.db] 2 Mar 206 the date
More informationIndex Construction. Dictionary, postings, scalable indexing, dynamic indexing. Web Search
Index Construction Dictionary, postings, scalable indexing, dynamic indexing Web Search 1 Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis Query processing
More informationGraph Database. Relation
Graph Distribution Graph Database SRC Relation DEST Graph Database Use cases: Fraud detection Recommendation engine Social networks... RedisGraph Property graph Labeled entities Schema less Cypher query
More informationRapid Processing of Synthetic Seismograms using Windows Azure Cloud
Rapid Processing of Synthetic Seismograms using Windows Azure Cloud Vedaprakash Subramanian, Liqiang Wang Department of Computer Science University of Wyoming En-Jui Lee, Po Chen Department of Geology
More informationFast and Concurrent RDF Queries with RDMA- Based Distributed Graph Exploration
Fast and Concurrent RDF Queries with RDMA- Based Distributed Graph Exploration Jiaxin Shi, Youyang Yao, Rong Chen, and Haibo Chen, Shanghai Jiao Tong University; Feifei Li, University of Utah https://www.usenix.org/conference/osdi16/technical-sessions/presentation/shi
More informationNoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018
NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE DACE https://dace.unige.ch Data and Analysis Center for Exoplanets. Facility to store, exchange and analyse data
More informationA Fast and High Throughput SQL Query System for Big Data
A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 24: MapReduce CSE 344 - Fall 2016 1 HW8 is out Last assignment! Get Amazon credits now (see instructions) Spark with Hadoop Due next wed CSE 344 - Fall 2016
More informationIngo Brenckmann Jochen Kirsten Storage Technology Strategists SAS EMEA Copyright 2003, SAS Institute Inc. All rights reserved.
Intelligent Storage Results from real life testing Ingo Brenckmann Jochen Kirsten Storage Technology Strategists SAS EMEA SAS Intelligent Storage components! OLAP Server! Scalable Performance Data Server!
More informationColumn Stores vs. Row Stores How Different Are They Really?
Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background
More informationSDA: Software-Defined Accelerator for general-purpose big data analysis system
SDA: Software-Defined Accelerator for general-purpose big data analysis system Jian Ouyang(ouyangjian@baidu.com), Wei Qi, Yong Wang, Yichen Tu, Jing Wang, Bowen Jia Baidu is beyond a search engine Search
More informationAdvanced Databases: Parallel Databases A.Poulovassilis
1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger
More informationIntroduction to NoSQL Databases
Introduction to NoSQL Databases Roman Kern KTI, TU Graz 2017-10-16 Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 1 / 31 Introduction Intro Why NoSQL? Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 2 / 31 Introduction
More informationRDF-3X: a RISC-style Engine for RDF
RDF-3X: a RISC-style Engine for RDF Thomas Neumann Max-Planck-Institut für Informatik Saarbrücken, Germany neumann@mpi-inf.mpg.de Gerhard Weikum Max-Planck-Institut für Informatik Saarbrücken, Germany
More informationMassive Scalability With InterSystems IRIS Data Platform
Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special
More information1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Integrating Complex Financial Workflows in Oracle Database Xavier Lopez Seamus Hayes Oracle PolarLake, LTD 2 Copyright 2011, Oracle
More informationMySQL Database Scalability
MySQL Database Scalability Nextcloud Conference 2016 TU Berlin Oli Sennhauser Senior MySQL Consultant at FromDual GmbH oli.sennhauser@fromdual.com 1 / 14 About FromDual GmbH Support Consulting remote-dba
More informationFedX: A Federation Layer for Distributed Query Processing on Linked Open Data
FedX: A Federation Layer for Distributed Query Processing on Linked Open Data Andreas Schwarte 1, Peter Haase 1,KatjaHose 2, Ralf Schenkel 2, and Michael Schmidt 1 1 fluid Operations AG, Walldorf, Germany
More informationHANA Performance. Efficient Speed and Scale-out for Real-time BI
HANA Performance Efficient Speed and Scale-out for Real-time BI 1 HANA Performance: Efficient Speed and Scale-out for Real-time BI Introduction SAP HANA enables organizations to optimize their business
More informationCompSci 516 Database Systems
CompSci 516 Database Systems Lecture 20 NoSQL and Column Store Instructor: Sudeepa Roy Duke CS, Fall 2018 CompSci 516: Database Systems 1 Reading Material NOSQL: Scalable SQL and NoSQL Data Stores Rick
More informationChapter 13: Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationgstore: Answering SPARQL Queries via Subgraph Matching
gstore: Answering SPARQL Queries via Subgraph Matching Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Özsu, Dongyan Zhao,4 Peking University, China; Hong Kong University of Science and Technology, China; University
More information