Shortest paths on large graphs: Systems, Algorithms, Applications

Size: px
Start display at page:

Download "Shortest paths on large graphs: Systems, Algorithms, Applications"

Transcription

1 Shortest paths on large graphs: Systems, Algorithms, Applications Andrey Gubichev TU München January 2012 Andrey Gubichev Shortest paths on large graphs 1 / 53

2 Outline Introduction Systems Algorithms Applications Semantic Web Social Search Andrey Gubichev Shortest paths on large graphs 2 / 53

3 Everything is a graph Internet Graph,Richardson Web Graph Social Network Wikipedia, Tulip Proteins, Bordalier Inst Andrey Gubichev Shortest paths on large graphs 3 / 53

4 RDF: format for graph data Maria Sklodowska bornas Poland in Warsaw bornin Nobel Prize Chemistry haswon Henri Becquerel adviser bornon diedon Marie Curie marriedto Pierre Curie alma mater haswon U Paris haswon Nobel Prize Physics Andrey Gubichev Shortest paths on large graphs 4 / 53

5 RDF: format for graph data Maria Sklodowska bornas 1867 bornon diedon 1934 Poland in Warsaw bornin Marie Curie marriedto Pierre Curie Nobel Prize Chemistry haswon alma mater Henri Becquerel adviser haswon U Paris haswon Nobel Prize Physics RDF: (id1,name, Marie Curie ) (id1,bornon,1867) (id1,bornin,id2) (id2,name, Warsaw ) (id2,locatedin,id3) (id3,name, Poland ) (G.Weikum, WSDM 09) Andrey Gubichev Shortest paths on large graphs 4 / 53

6 RDF: format for graph data Maria Sklodowska bornas bornon diedon Poland in Warsaw bornin Marie Curie marriedto Pierre Curie Nobel Prize Chemistry haswon alma mater Henri Becquerel adviser haswon U Paris haswon Nobel Prize Physics pay-as-you-go: schema-agnostic, schema-later RDF triples form ER graph RDF: (id1,name, Marie Curie ) (id1,bornon,1867) (id1,bornin,id2) (id2,name, Warsaw ) (id2,locatedin,id3) (id3,name, Poland ) (G.Weikum, WSDM 09) Andrey Gubichev Shortest paths on large graphs 4 / 53

7 RDF: a lot of data out there Linked Data Project, linkeddata.org Linked Data: extract explicit knowledge (ER-oriented facts) from the world s best information sources (Wikipedia, Web, Web 2.0) Andrey Gubichev Shortest paths on large graphs 5 / 53

8 SPARQL: a query language Select?c Where {?p isa scientist.?p bornin?t.?p haswon?a.?t locatedin?c.?a Name NobelPrize. } SQL-like syntax triple patterns common variables form joins Andrey Gubichev Shortest paths on large graphs 6 / 53

9 SPARQL: a query language for RDF... Select?c Where {?p isa scientist.?p bornin?t.?p haswon?a.?t locatedin?c.?a Name NobelPrize. Filter (?t < 1900) }... SQL-like syntax triple patterns common variables form joins filter predicates Andrey Gubichev Shortest paths on large graphs 7 / 53

10 SPARQL: a query language Select Distinct?c Where {?p?r1?t.?t?r2?c.?c isa Country.?p bornon?b. Filter (?b > 1945) } SQL-like syntax triple patterns common variables form joins filter predicates wildcard joins Andrey Gubichev Shortest paths on large graphs 8 / 53

11 RDF & SPARQL Engines giant triples table S P O id1 Name Marie Curie id1 bornon 1867 id1 bornin id2 Name... id2 Warsaw Sesame/OpenRDF YARS2 (DERI) Andrey Gubichev Shortest paths on large graphs 9 / 53

12 RDF & SPARQL Engines giant triples table clustered property tables S P O id1 Name Marie Curie id1 bornon 1867 id1 bornin id2 Name... id2 Warsaw Sesame/OpenRDF YARS2 (DERI) Person S Name bornon bornin... id1 Marie C 1867 id3... id2 Henri B 1852 id Town S Name Country id3 Warsaw id11... Jena (HP Labs) Oracle RDF MATCH Andrey Gubichev Shortest paths on large graphs 9 / 53

13 RDF & SPARQL Engines giant triples table clustered property tables property table S P O id1 Name Marie Curie id1 bornon 1867 id1 bornin id2 Name... id2 Warsaw Sesame/OpenRDF YARS2 (DERI) Person S Name bornon bornin... id1 Marie C 1867 id3... id2 Henri B 1852 id Town S Name Country id3 Warsaw id11... Jena (HP Labs) Oracle RDF MATCH bornon S O id id Advisor S O id1 id C-Store (MIT) MonetDB(CWI) Andrey Gubichev Shortest paths on large graphs 9 / 53

14 RDF & SPARQL Engines giant triples table clustered property tables property table S P O id1 Name Marie Curie id1 bornon 1867 id1 bornin id2 Name... id2 Warsaw Sesame/OpenRDF YARS2 (DERI) Why a new engine? Person S Name bornon bornin... id1 Marie C 1867 id3... id2 Henri B 1852 id9... Three main things in database design: Performance 2. Performance 3. Performance Town S Name Country id3 Warsaw id11... Jena (HP Labs) Oracle RDF MATCH bornon S O id id Advisor S O id1 id C-Store (MIT) MonetDB(CWI) Andrey Gubichev Shortest paths on large graphs 9 / 53

15 Scalable Semantic Web: RDF-3X Engine [T.Neumann et al: VLDB 08] tuning-free system architecture: giant triple table Andrey Gubichev Shortest paths on large graphs 10 / 53

16 Scalable Semantic Web: RDF-3X Engine [T.Neumann et al: VLDB 08] tuning-free system architecture: giant triple table map literals into ids (dictionary) S P O id1 Name Marie Curie id1 bornon 1867 id1 bornin id2 Name... id2 Warsaw map ID S P O Andrey Gubichev Shortest paths on large graphs 10 / 53

17 Scalable Semantic Web: RDF-3X Engine [T.Neumann et al: VLDB 08] tuning-free system architecture: giant triple table map literals into ids (dictionary) and precompute exhaustive indexing for SPO triples: SPO, SOP, OPS, OSP, PSO, POS, SP*, SO*, OS*, PO*, OP*, S*, P*, O* very high compression, index-only store directly store indexes into clustered B+ trees P O S Andrey Gubichev Shortest paths on large graphs 10 / 53

18 Scalable Semantic Web: RDF-3X Engine [T.Neumann et al: VLDB 08] tuning-free system architecture: giant triple table map literals into ids (dictionary) and precompute exhaustive indexing for SPO triples: SPO, SOP, OPS, OSP, PSO, POS, SP*, SO*, OS*, PO*, OP*, S*, P*, O* very high compression, index-only store directly store indexes into clustered B+ trees can choose any order for scan and join Andrey Gubichev Shortest paths on large graphs 10 / 53

19 Scalable Semantic Web: RDF-3X Engine [T.Neumann et al: VLDB 08] tuning-free system architecture: giant triple table map literals into ids (dictionary) and precompute exhaustive indexing for SPO triples: SPO, SOP, OPS, OSP, PSO, POS, SP*, SO*, OS*, PO*, OP*, S*, P*, O* very high compression, index-only store directly store indexes into clustered B+ trees can choose any order for scan and join also store two mapping indexes: literal id, id literal Andrey Gubichev Shortest paths on large graphs 10 / 53

20 Scalable Semantic Web: RDF-3X Engine [T.Neumann et al: VLDB 08] tuning-free system architecture: giant triple table map literals into ids (dictionary) and precompute exhaustive indexing for SPO triples: SPO, SOP, OPS, OSP, PSO, POS, SP*, SO*, OS*, PO*, OP*, S*, P*, O* very high compression, index-only store directly store indexes into clustered B+ trees can choose any order for scan and join also store two mapping indexes: literal id, id literal efficient merge joins with order-preservation Andrey Gubichev Shortest paths on large graphs 10 / 53

21 RDF-3X Query Optimization [T.Neumann et al: VLDB 08] bottom-up dynamical programming for plan enumaration exploit numerous indexes, order-preservation cost model based on selectivity estimation Andrey Gubichev Shortest paths on large graphs 11 / 53

22 Evaluation [T.Neumann et al: SIGMOD 09] Queries like: find a polish scientist with a french advisor, both got some awards YAGO knowledge base: 40 Mio. triples Billion Triple dataset, Uniprot (845 Mio.) - similar results Andrey Gubichev Shortest paths on large graphs 12 / 53

23 Evaluation [T.Neumann et al: SIGMOD 09] Queries like: find a polish scientist with a french advisor, Try it out! both got some awards RDF-3X is freely available: YAGO knowledge base: 40 Mio. triples Billion Triple dataset, Uniprot (845 Mio.) - similar results Andrey Gubichev Shortest paths on large graphs 12 / 53

24 Outline Introduction Systems Algorithms Applications Semantic Web Social Search Andrey Gubichev Shortest paths on large graphs 13 / 53

25 What is missing? What kind of queries we CAN answer? Find lat and long of the Eiffel Tower Find politicians who are also scientists What kind of queries we CAN NOT answer? Find common things between Angela Merkel and Arnold Schwarznegger Find all European-born Nobel prize winners Why? They require path traversals over RDF graph. Andrey Gubichev Shortest paths on large graphs 14 / 53

26 Why is SPARQL not enough? Sometimes we need to form join chains with unknown length (e.g., we need the transitive closure of the predicate). Example Triples Humboldt bornin Berlin. Berlin locatedin Germany. Example Triples Einstein bornin Ulm. Ulm locatedin Baden-Württemberg. Baden-Württemberg locatedin Germany. Were they both born in Germany? Yes. How to figure that out? Einstein bornin Ulm locatedin Baden-Württemberg locatedin Germany locatedin Humboldt bornin Berlin Andrey Gubichev Shortest paths on large graphs 15 / 53

27 Why is SPARQL not enough? Sometimes we need to form join chains with unknown length (e.g., we need the transitive closure of the predicate). Example Triples Humboldt bornin Berlin. Berlin locatedin Germany. Example Triples Einstein bornin Ulm. Ulm locatedin Baden-Württemberg. Baden-Württemberg locatedin Germany. How to find all scientists that were born in Germany? SPARQL?person bornin?place.?place locatedin Germany. UNION?person bornin?place.?place locatedin?place1.?place1 locatedin Germany. UNION... Andrey Gubichev Shortest paths on large graphs 16 / 53

28 Why is SPARQL not enough? Sometimes we need to form join chains with unknown length (e.g., we need the transitive closure of the predicate). Example Triples Humboldt bornin Berlin. Berlin locatedin Germany. Example Triples Einstein bornin Ulm. Ulm locatedin Baden-Württemberg. Baden-Württemberg locatedin Germany. How to find all scientists that were born in Germany? SPARQL with paths?person bornin?place.?place??path Germany. Andrey Gubichev Shortest paths on large graphs 17 / 53

29 SPARQL with path variables Introduced by K.Anyanwu et al. (WWW 07) Example: select??p?obj where {?place??path Germany} (path triple)??p: there exists a path from place to Germany in the RDF graph we consider only shortest paths we can specify filter (conditions) on??p we can join such path patterns with regular patterns Example select?name where {?m type Mountain.?m hasname?name.?m??location Europe. filter(containsonly(??location, locatedin)) } Andrey Gubichev Shortest paths on large graphs 18 / 53

30 How to execute SPARQL with path variables? [A.Gubichev et al: WebDB 11] We build upon RDF-3X. Two goals: Query Optimization: How to estimate cardinality of path triples? Physical Level: How to perform path scan efficiently? Andrey Gubichev Shortest paths on large graphs 19 / 53

31 Outline Introduction Systems Algorithms Applications Semantic Web Social Search Andrey Gubichev Shortest paths on large graphs 20 / 53

32 Can we do better? Dijkstra s algo is fine, but let s consider approximate algorithms (trade quality for speed) Let s change the setting for now: shortest paths on social network Social network: a set of people a social relationship linking them Andrey Gubichev Shortest paths on large graphs 21 / 53

33 Problem Statement Exact shortest path: V users, E friend of relationships Graph G(V, E) directed, unweighted, static Given u, v V find the shortest path from u to v Approximate shortest path: Graph is disk-resident Offline step: Do some precomputation, store on disk Online step: for u,v V quickly find some path from u to v Approximation error: approximate exact exact Andrey Gubichev Shortest paths on large graphs 22 / 53

34 Different approaches Exact SP Dijkstra: very slow A : works well for road networks, slow for OSN Hierarchy-based decomposition: works well for road networks, slow for OSN Approximate SP Different types of preprocessing: keep distances from all nodes to small subset of nodes (random, with high degree or centrality) Poor results for OSN: average error is 10% Find just the distance, not the path itself Andrey Gubichev Shortest paths on large graphs 23 / 53

35 Precomputation Step1 Set r = log V Step2 Sample r + 1 sets of nodes (uniformly, at random) of sizes: 1, 2, 2 2, 2 3,...,2 r Step3 For every u V and for every set S 1. Find the closest nodes to u in S (landmarks): landmark h S : dist(u, h) = dist(u, S) landmark h S : dist(h, u) = dist(s, u) 2. Find the distance from u to h and from h to u Andrey Gubichev Shortest paths on large graphs 24 / 53

36 Precomputation - WSDM 10 approach [A.Das Sarma et al: WSDM 10] h 1 S 1 u h 2 S 2... Sketch in RDF: u 2 h 1 u 3 h 2 u 1 h r h r S r Andrey Gubichev Shortest paths on large graphs 25 / 53

37 Precomputation - our approach [A.Gubichev et al: CIKM 10] x h 1 S 1 u... y h 2 S 2 Sketch in RDF: u x h 1 u x y h 2 u h r h r S r Andrey Gubichev Shortest paths on large graphs 26 / 53

38 Precomputation Step1 Set r = log V Step2 Sample r + 1 sets of nodes (uniformly, at random) of sizes: 1, 2, 2 2, 2 3,...,2 r Step3 For every u V and for every set S 1. Find the closest nodes to u in S (landmarks): landmark h S : dist(u, h) = dist(u, S) landmark h S : dist(h, u) = dist(s, u) 2. Find the path from u to h and from h to u 3. Store the paths (RDF): u path h, h path u Step4 Repeat Steps 2-3 k times (we use k = 2). Andrey Gubichev Shortest paths on large graphs 27 / 53

39 Sketch Sketch for a node u consists of 1. Landmarks h 1,...,h kr 2. Paths from u to landmarks 3. Paths from landmarks to u Sketch for u consists of two trees (u is the root) We keep sketches for every u V Andrey Gubichev Shortest paths on large graphs 28 / 53

40 SKETCH algorithm: online part [A.Das Sarma et al: WSDM 10] s Input: nodes s, d V d

41 SKETCH algorithm: online part [A.Das Sarma et al: WSDM 10] s Input: nodes s, d V 1. Load all the distances from s d

42 SKETCH algorithm: online part [A.Das Sarma et al: WSDM 10] s Input: nodes s, d V 1. Load all the distances from s 2. Load all the distances to d d

43 SKETCH algorithm: online part [A.Das Sarma et al: WSDM 10] s Input: nodes s, d V 1. Load all the distances from s 2. Load all the distances to d 3. Find common landmarks d

44 SKETCH algorithm: online part [A.Das Sarma et al: WSDM 10] s Input: nodes s, d V 1. Load all the distances from s 2. Load all the distances to d 3. Find common landmarks 4. Construct the paths d

45 SKETCH algorithm: online part [A.Das Sarma et al: WSDM 10] s Input: nodes s, d V 1. Load all the distances from s 2. Load all the distances to d 3. Find common landmarks 4. Construct the paths 5. Select the shortest distance Output: distance from s to d Andrey Gubichev Shortest paths on large graphs 29 / 53 d

46 SKETCH algorithm with paths [A.Gubichev et al: CIKM 10] s Input: nodes s, d V 1. Load all the paths from s 2. Load all the paths to d 3. Find common landmarks 4. Construct the paths 5. Select the shortest path Output: path from s to d: s x y h z d x y h z d Andrey Gubichev Shortest paths on large graphs 30 / 53

47 Datasets Slashdot: 77 K nodes, undirected YouTube: 1.1 Mln nodes Flickr: 1.7 Mln nodes WikiTalk: 2.2 Mln nodes Twitter: 2.4 Mln nodes Orkut: 3 Mln nodes, undirected Sources: Stanford, MPI, Telefonica Research Andrey Gubichev Shortest paths on large graphs 31 / 53

48 Approximation error of the Sketch algorithm Error = approximate exact exact Dataset (#nodes) Sketch error Slashdot (77K) 46% YouTube (1.1M) 30% Flickr (1.7M) 28% WikiTalk (2.2M) 55% Twitter (2.4M) 51% Orkut (3M) 71% Andrey Gubichev Shortest paths on large graphs 32 / 53

49 Precomputation Step1 Set r = log V Step2 Sample r + 1 sets of nodes (uniformly, at random) of sizes: 1, 2, 2 2, 2 3,...,2 r Step3 For every u V and for every set S 1. Find the closest nodes to u in S (landmarks): landmark h S : dist(u, h) = dist(u, S) landmark h S : dist(h, u) = dist(s, u) 2. Find the path from u to h and from h to u 3. Store the paths (RDF): u path h, h path u Step4 Repeat Steps 2-3 k times (we use k = 2). Andrey Gubichev Shortest paths on large graphs 33 / 53

50 First modification We find the path, not just the distance! s d Andrey Gubichev Shortest paths on large graphs 34 / 53

51 First modification Are there cycles? s a a d Andrey Gubichev Shortest paths on large graphs 34 / 53

52 First modification Are there cycles? s a d

53 First modification Construct a shorter path s a d Andrey Gubichev Shortest paths on large graphs 34 / 53

54 Approximation error of the first modification No time overhead! Dataset (#nodes) Sketch error Sketch I error Slashdot (77K) 46% 26% YouTube (1.1M) 30% 12% Flickr (1.7M) 28% 11% WikiTalk (2.2M) 55% 31% Twitter (2.4M) 51% 38% Orkut (3M) 71% 48% Andrey Gubichev Shortest paths on large graphs 35 / 53

55 Second modification s d Andrey Gubichev Shortest paths on large graphs 36 / 53

56 Second modification Are there any hidden connections? s d? Andrey Gubichev Shortest paths on large graphs 36 / 53

57 Second modification If yes, construct a shorter path s d Andrey Gubichev Shortest paths on large graphs 36 / 53

58 Second modification How to check it? 1. For every node in the path load the list of friends from the original dataset 2. For every pair of nodes from the path check whether they are friends Number of nodes in the path is usually small! Andrey Gubichev Shortest paths on large graphs 37 / 53

59 Approximation error of the second modification Dataset (#nodes) Sketch error Sketch I error Sketch II error Slashdot (77K) 46% 26% 0.6% YouTube (1.1M) 30% 12% 0.6% Flickr (1.7M) 28% 11% 0.3% WikiTalk (2.2M) 55% 31% 0.2% Twitter (2.4M) 51% 38% 0.8% Orkut (3M) 71% 48% 0.6% Andrey Gubichev Shortest paths on large graphs 38 / 53

60 Tree algorithm s Paths from a node to landmarks form a tree landmarks Andrey Gubichev Shortest paths on large graphs 39 / 53

61 Tree algorithm Load paths from s and to d s d

62 Tree algorithm Load paths from s and to d Start BFS from s and d For every visited node load a list of friends s1 s s2... s3 s4... s5... d4 d3 d2 d1 d

63 Tree algorithm Load paths from s and to d Start BFS from s and d For every visited node load a list of friends For every pair of visited nodes check: 1. are they equal? (s3, d1) 2. are they friends? (s1, d) s1 s s2... s3 s4... s5... d4 d3 d2 d1 d

64 Tree algorithm Load paths from s and to d Start BFS from s and d For every visited node load a list of friends For every pair of visited nodes check: 1. are they equal? (s3, d1) 2. are they friends? (s1, d) Form a new path and put it to the queue Q s1 s s2... s3 s4... s5... d4 d3 d2 d

65 Tree algorithm Load paths from s and to d Start BFS from s and d For every visited node load a list of friends For every pair of visited nodes check: 1. are they equal? (s3, d1) 2. are they friends? (s1, d) Form a new path and put it to the queue Q Don t go too deep: terminate if s s1 s2 s3 s4 s5 level s + level d = 4 > 2 d4 d3 d2 d1 level s + level d > Q.top.length d Andrey Gubichev Shortest paths on large graphs 40 / 53

66 Approximation error of the Tree algorithm Dataset Sketch error Sketch I error Sketch II error Tree error Slashdot 46% 26% 0.6% 0 YouTube 30% 12% 0.6% 0.06% Flickr 28% 11% 0.3% 0.04% WikiTalk 55% 31% 0.2% 0 Twitter 51% 38% 0.8% 0.03% Orkut 71% 48% 0.6% 0.1% Andrey Gubichev Shortest paths on large graphs 41 / 53

67 Experimental setup Pick 100 nodes (uniformly at random) from the OSN. For each node compute Shortest Path Tree (Dijkstra) The result is {(x, y, dist) x, y V, dist = dist(x, y)} Group triples by distance and randomly choose 50 triples from every group For every chosen triple (x, y, dist): find approximate shortest paths from x to y and compare their lengths with dist Andrey Gubichev Shortest paths on large graphs 42 / 53

68 Implementation details Datasets in RDF: user 1 friend-of user 2 Precomputed paths in RDF: u path h h path u RDF3X for datasets and precomputed data C++ Laptop: 2.0GHz Intel Core 2 Duo, 4 Gb RAM, L2 cache 3 Mb Andrey Gubichev Shortest paths on large graphs 43 / 53

69 Time Dataset (#nodes) Sketch Sketch II Tree Dijkstra Dijkstra (sec) (sec) (sec) (sec) (queue) Flickr (1.7M) K WikiTalk (2.2M) Mln Twitter (2.4M) Mln Orkut (3M) Mln Andrey Gubichev Shortest paths on large graphs 44 / 53

70 Disk space Disk space for precomputed data, Gb Dataset Dataset size Sketch with distances Sketch with paths Flickr WikiTalk Twitter Orkut Andrey Gubichev Shortest paths on large graphs 45 / 53

71 Number of shortest paths We find several shortest paths: Dataset (#nodes) Sketch II Tree Flickr (1.7M) Wikitalk (2.2M) Twitter (2.4M) Orkut (3M) Andrey Gubichev Shortest paths on large graphs 46 / 53

72 Outline Introduction Systems Algorithms Applications Semantic Web Social Search Andrey Gubichev Shortest paths on large graphs 47 / 53

73 Application #1: Semantic Web SPARQL v SPARQL + path traversal Querying the DB of entire human knowledge (everything that Wikipedia knows) Andrey Gubichev Shortest paths on large graphs 48 / 53

74 Outline Introduction Systems Algorithms Applications Semantic Web Social Search Andrey Gubichev Shortest paths on large graphs 49 / 53

75 Small World Milgram 1967 People are given letters, asked to forward to one friend Source: random Omahaians; Target: stockbrocker in Sharon, MA Of completed chains, averaged 6 hops to reach target Andrey Gubichev Shortest paths on large graphs 50 / 53

76 Shortest paths on Social Networks Shortest paths are interesting... per se: what is the distance between you and Angela Merkel? for geeks: Erdös number Andrey Gubichev Shortest paths on large graphs 51 / 53

77 Shortest paths on Social Networks Shortest paths are interesting... per se: what is the distance between you and Angela Merkel? for geeks: Erdös number as an important primitive for social network analysis (diameter, centrality, etc) social search Of course, we can do one-to-many shortest paths algo John searches Mary Ranking: 1. Mary A 2. Mary B 3. Mary C M. Potamias et al. CIKM 2009 Andrey Gubichev Shortest paths on large graphs 51 / 53

78 Acknowledgements Srikanta Bedathur Gerhard Weikum Josep M. Pujol Thomas Neumann Sihem Amer-Yahia Andrey Gubichev Shortest paths on large graphs 52 / 53

79 Thank you! Questions? Andrey Gubichev Shortest paths on large graphs 53 / 53

Dynamic and Historical Shortest-Path Distance Queries on Large Evolving Networks by Pruned Landmark Labeling

Dynamic and Historical Shortest-Path Distance Queries on Large Evolving Networks by Pruned Landmark Labeling 2014/04/09 @ WWW 14 Dynamic and Historical Shortest-Path Distance Queries on Large Evolving Networks by Pruned Landmark Labeling Takuya Akiba (U Tokyo) Yoichi Iwata (U Tokyo) Yuichi Yoshida (NII & PFI)

More information

TriAD: A Distributed Shared-Nothing RDF Engine based on Asynchronous Message Passing

TriAD: A Distributed Shared-Nothing RDF Engine based on Asynchronous Message Passing TriAD: A Distributed Shared-Nothing RDF Engine based on Asynchronous Message Passing Sairam Gurajada, Stephan Seufert, Iris Miliaraki, Martin Theobald Databases & Information Systems Group ADReM Research

More information

D2R2: Disk-oriented Deductive Reasoning in a RISC-style RDF Engine

D2R2: Disk-oriented Deductive Reasoning in a RISC-style RDF Engine D2R2: Disk-oriented Deductive Reasoning in a RISC-style RDF Engine RuleML Nov. 03, 2011 Mohamed Yahya Martin Theobald {myahya,mtb}@mpi-inf.mpg.de Max-Planck Institute for Informatics, Saarbrücken, Germany

More information

Advanced Data Management

Advanced Data Management Advanced Data Management Medha Atre Office: KD-219 atrem@cse.iitk.ac.in Aug 11, 2016 Assignment-1 due on Aug 15 23:59 IST. Submission instructions will be posted by tomorrow, Friday Aug 12 on the course

More information

I/O Efficient Algorithms for Exact Distance Queries on Disk- Resident Dynamic Graphs

I/O Efficient Algorithms for Exact Distance Queries on Disk- Resident Dynamic Graphs I/O Efficient Algorithms for Exact Distance Queries on Disk- Resident Dynamic Graphs Yishi Lin, Xiaowei Chen, John C.S. Lui The Chinese University of Hong Kong 9/4/15 EXACT DISTANCE QUERIES ON DYNAMIC

More information

Approximate Shortest Distance Computing: A Query-Dependent Local Landmark Scheme

Approximate Shortest Distance Computing: A Query-Dependent Local Landmark Scheme Approximate Shortest Distance Computing: A Query-Dependent Local Landmark Scheme Miao Qiao, Hong Cheng, Lijun Chang and Jeffrey Xu Yu The Chinese University of Hong Kong {mqiao, hcheng, ljchang, yu}@secuhkeduhk

More information

Reasoning on Web Data Semantics

Reasoning on Web Data Semantics Reasoning on Web Data Semantics Oui. Peut-on préciser l'heure et le lieu? Merci Marie-Christine Rousset Université de Grenoble (UJF) et Institut Universitaire de France Amicalement Marie-Christine 1 Evolution

More information

On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs

On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs On Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs Sungpack Hong 2, Nicole C. Rodia 1, and Kunle Olukotun 1 1 Pervasive Parallelism Laboratory, Stanford University

More information

Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs

Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs Fast Parallel Detection of Strongly Connected Components (SCC) in Small-World Graphs Sungpack Hong 2, Nicole C. Rodia 1, and Kunle Olukotun 1 1 Pervasive Parallelism Laboratory, Stanford University 2 Oracle

More information

Querying Shortest Distance on Large Graphs

Querying Shortest Distance on Large Graphs .. Miao Qiao, Hong Cheng, Lijun Chang and Jeffrey Xu Yu Department of Systems Engineering & Engineering Management The Chinese University of Hong Kong October 19, 2011 Roadmap Preliminary Related Work

More information

Efficient Top-k Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling with Application to Network Structure Prediction

Efficient Top-k Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling with Application to Network Structure Prediction Efficient Top-k Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling with Application to Network Structure Prediction Takuya Akiba U Tokyo Takanori Hayashi U Tokyo Nozomi Nori Kyoto

More information

Graph Analytics in the Big Data Era

Graph Analytics in the Big Data Era Graph Analytics in the Big Data Era Yongming Luo, dr. George H.L. Fletcher Web Engineering Group What is really hot? 19-11-2013 PAGE 1 An old/new data model graph data Model entities and relations between

More information

An Efficient Approach to Triple Search and Join of HDT Processing Using GPU

An Efficient Approach to Triple Search and Join of HDT Processing Using GPU An Efficient Approach to Triple Search and Join of HDT Processing Using GPU YoonKyung Kim, YoonJoon Lee Computer Science KAIST Daejeon, South Korea e-mail: {ykkim, yjlee}@dbserver.kaist.ac.kr JaeHwan Lee

More information

E6885 Network Science Lecture 10: Graph Database (II)

E6885 Network Science Lecture 10: Graph Database (II) E 6885 Topics in Signal Processing -- Network Science E6885 Network Science Lecture 10: Graph Database (II) Ching-Yung Lin, Dept. of Electrical Engineering, Columbia University November 18th, 2013 Course

More information

PERFORMANCE OF RDF QUERY PROCESSING ON THE INTEL SCC

PERFORMANCE OF RDF QUERY PROCESSING ON THE INTEL SCC MARC Symposium at ONERA'2012 1 PERFORMANCE OF RDF QUERY PROCESSING ON THE INTEL SCC Vasil Slavov, Praveen Rao, Dinesh Barenkala, Srivenu Paturi Department of Computer Science & Electrical Engineering University

More information

Managing and Mining Billion Node Graphs. Haixun Wang Microsoft Research Asia

Managing and Mining Billion Node Graphs. Haixun Wang Microsoft Research Asia Managing and Mining Billion Node Graphs Haixun Wang Microsoft Research Asia Outline Overview Storage Online query processing Offline graph analytics Advanced applications Is it hard to manage graphs? Good

More information

Graph Data Management

Graph Data Management Graph Data Management Analysis and Optimization of Graph Data Frameworks presented by Fynn Leitow Overview 1) Introduction a) Motivation b) Application for big data 2) Choice of algorithms 3) Choice of

More information

Advanced Data Management

Advanced Data Management Advanced Data Management Medha Atre Office: KD-29 atrem@cse.iitk.ac.in Aug 8, 206 Project Groups Groups for the course project are due on August 22, 206 8:00 IST. Instructions on how to submit project

More information

Scalable Reduction of Large Datasets to Interesting Subsets

Scalable Reduction of Large Datasets to Interesting Subsets Scalable Reduction of Large Datasets to Interesting Subsets Gregory Todd Williams, Jesse Weaver, Medha Atre, and James A. Hendler Tetherless World Constellation, Rensselaer Polytechnic Institute, Troy,

More information

Label Constrained Shortest Path Estimation on Large Graphs

Label Constrained Shortest Path Estimation on Large Graphs Label Constrained Shortest Path Estimation on Large Graphs Ankita Likhyani IIIT-D-MTech-CS-DE-13 July 8, 2013 Indraprastha Institute of Information Technology New Delhi Thesis Committee Dr. Srikanta Bedathur

More information

An Extensible Framework for Query Optimization on TripleT-Based RDF Stores

An Extensible Framework for Query Optimization on TripleT-Based RDF Stores An Extensible Framework for Query Optimization on TripleT-Based RDF Stores Bart G. J. Wolff Eindhoven University of Technology b.g.j.wolff@alumnus.tue.nl George H. L. Fletcher Eindhoven University of Technology

More information

Reach for A : an Efficient Point-to-Point Shortest Path Algorithm

Reach for A : an Efficient Point-to-Point Shortest Path Algorithm Reach for A : an Efficient Point-to-Point Shortest Path Algorithm Andrew V. Goldberg Microsoft Research Silicon Valley www.research.microsoft.com/ goldberg/ Joint with Haim Kaplan and Renato Werneck Einstein

More information

Benchmarking Database Representations of RDF/S Stores

Benchmarking Database Representations of RDF/S Stores Benchmarking Database Representations of RDF/S Stores Yannis Theoharis 1, Vassilis Christophides 1, Grigoris Karvounarakis 2 1 Computer Science Department, University of Crete and Institute of Computer

More information

Enabling fine-grained HTTP caching of SPARQL query results

Enabling fine-grained HTTP caching of SPARQL query results Enabling fine-grained HTTP caching of SPARQL query results Gregory Todd Williams willig4@cs.rpi.edu @kasei 1 Jesse Weaver weavej3@cs.rpi.edu @jrweave 1 Overview Motivation for (HTTP) caching SPARQL Related

More information

Databases & Information Retrieval

Databases & Information Retrieval Databases & Information Retrieval Maya Ramanath (Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G. Weikum, G. Kasneci, M. Ramanath and F.M. Suchanek,

More information

Data-Intensive Computing with MapReduce

Data-Intensive Computing with MapReduce Data-Intensive Computing with MapReduce Session 5: Graph Processing Jimmy Lin University of Maryland Thursday, February 21, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

CSE 100: GRAPH ALGORITHMS

CSE 100: GRAPH ALGORITHMS CSE 100: GRAPH ALGORITHMS Dijkstra s Algorithm: Questions Initialize the graph: Give all vertices a dist of INFINITY, set all done flags to false Start at s; give s dist = 0 and set prev field to -1 Enqueue

More information

Point-to-Point Shortest Path Algorithms with Preprocessing

Point-to-Point Shortest Path Algorithms with Preprocessing Point-to-Point Shortest Path Algorithms with Preprocessing Andrew V. Goldberg Microsoft Research Silicon Valley www.research.microsoft.com/ goldberg/ Joint work with Chris Harrelson, Haim Kaplan, and Retato

More information

Scalable Join Processing on Very Large RDF Graphs

Scalable Join Processing on Very Large RDF Graphs Scalable Join Processing on Very Large RDF Graphs Thomas Neumann Max-Planck Institute for Informatics Saarbrücken, Germany neumann@mpi-inf.mpg.de Gerhard Weikum Max-Planck Institute for Informatics Saarbrücken,

More information

External Sorting for Index Construction of Large Semantic Web Databases

External Sorting for Index Construction of Large Semantic Web Databases External ing for Index Construction of Large Semantic Web Databases Sven Groppe Institute of Information Systems (IFIS) University of Lübeck Ratzeburger Allee 16, D-25 Lübeck, Germany + 51 5 576 groppe@ifis.uni-luebeck.de

More information

Evaluating find a path reachability queries

Evaluating find a path reachability queries Evaluating find a path reachability queries Panagiotis ouros and Theodore Dalamagas and Spiros Skiadopoulos and Timos Sellis Abstract. Graphs are used for modelling complex problems in many areas, such

More information

Outline Introduction Triple Storages Experimental Evaluation Conclusion. RDF Engines. Stefan Schuh. December 5, 2008

Outline Introduction Triple Storages Experimental Evaluation Conclusion. RDF Engines. Stefan Schuh. December 5, 2008 December 5, 2008 Resource Description Framework SPARQL Giant Triple Table Property Tables Vertically Partitioned Table Hexastore Resource Description Framework SPARQL Resource Description Framework RDF

More information

An Experimental Study on Hub Labeling based Shortest Path Algorithms

An Experimental Study on Hub Labeling based Shortest Path Algorithms An Experimental Study on Hub Labeling based Shortest Path Algorithms Ye Li #1 Leong Hou U #2 Man Lung Yiu 3 Ngai Meng Kou #4 # Department of Computer and Information Science, University of Macau 1 yb47438@umac.mo

More information

Index Construction. Dictionary, postings, scalable indexing, dynamic indexing. Web Search

Index Construction. Dictionary, postings, scalable indexing, dynamic indexing. Web Search Index Construction Dictionary, postings, scalable indexing, dynamic indexing Web Search 1 Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis Query processing

More information

Path-Hop: efficiently indexing large graphs for reachability queries. Tylor Cai and C.K. Poon CityU of Hong Kong

Path-Hop: efficiently indexing large graphs for reachability queries. Tylor Cai and C.K. Poon CityU of Hong Kong Path-Hop: efficiently indexing large graphs for reachability queries Tylor Cai and C.K. Poon CityU of Hong Kong Reachability Query Query(u,v): Is there a directed path from vertex u to vertex v in graph

More information

Search in Social Networks with Access Control

Search in Social Networks with Access Control Search in Social Networks with Access Control Truls A. Bjørklund, Michaela Götz, Johannes Gehrke Norwegian University of Science and Technology, Cornell University 1 KEYS 2010, Michaela Götz Content Search

More information

Answering Aggregate Queries Over Large RDF Graphs

Answering Aggregate Queries Over Large RDF Graphs 1 Answering Aggregate Queries Over Large RDF Graphs Lei Zou, Peking University Ruizhe Huang, Peking University Lei Chen, Hong Kong University of Science and Technology M. Tamer Özsu, University of Waterloo

More information

OSDBQ: Ontology Supported RDBMS Querying

OSDBQ: Ontology Supported RDBMS Querying OSDBQ: Ontology Supported RDBMS Querying Cihan Aksoy 1, Erdem Alparslan 1, Selçuk Bozdağ 2, İhsan Çulhacı 3, 1 The Scientific and Technological Research Council of Turkey, Gebze/Kocaeli, Turkey 2 Komtaş

More information

Extending In-Memory Relational Database Engines with Native Graph Support

Extending In-Memory Relational Database Engines with Native Graph Support Extending In-Memory Relational Database Engines with Native Graph Support EDBT 18 Mohamed S. Hassan 1 Tatiana Kuznetsova 1 Hyun Chai Jeong 1 Walid G. Aref 1 Mohammad Sadoghi 2 1 Purdue University West

More information

On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage

On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage Arijit Khan Nanyang Technological University (NTU), Singapore Gustavo Segovia ETH Zurich, Switzerland Donald Kossmann Microsoft

More information

Extracting Information from Complex Networks

Extracting Information from Complex Networks Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform

More information

On Measuring the Lattice of Commonalities Among Several Linked Datasets

On Measuring the Lattice of Commonalities Among Several Linked Datasets On Measuring the Lattice of Commonalities Among Several Linked Datasets Michalis Mountantonakis and Yannis Tzitzikas FORTH-ICS Information Systems Laboratory University of Crete Computer Science Department

More information

COMPUTER AND INFORMATION SCIENCE JENA DB. Group Abhishek Kumar Harshvardhan Singh Abhisek Mohanty Suhas Tumkur Chandrashekhara

COMPUTER AND INFORMATION SCIENCE JENA DB. Group Abhishek Kumar Harshvardhan Singh Abhisek Mohanty Suhas Tumkur Chandrashekhara JENA DB Group - 10 Abhishek Kumar Harshvardhan Singh Abhisek Mohanty Suhas Tumkur Chandrashekhara OUTLINE Introduction Data Model Query Language Implementation Features Applications Introduction Open Source

More information

Event Stores (I) [Source: DB-Engines.com, accessed on August 28, 2016]

Event Stores (I) [Source: DB-Engines.com, accessed on August 28, 2016] Event Stores (I) Event stores are database management systems implementing the concept of event sourcing. They keep all state changing events for an object together with a timestamp, thereby creating a

More information

A Main Memory Index Structure to Query Linked Data

A Main Memory Index Structure to Query Linked Data A Main Memory Index Structure to Query Linked Data Olaf Hartig http://olafhartig.de/foaf.rdf#olaf @olafhartig Frank Huber Database and Information Systems Research Group Humboldt-Universität zu Berlin

More information

Route Planning in Road Networks

Route Planning in Road Networks Sanders/Schultes: Route Planning 1 Route Planning in Road Networks simple, flexible, efficient Peter Sanders Dominik Schultes Institut für Theoretische Informatik Algorithmik II Universität Karlsruhe (TH)

More information

Order or Shuffle: Empirically Evaluating Vertex Order Impact on Parallel Graph Computations

Order or Shuffle: Empirically Evaluating Vertex Order Impact on Parallel Graph Computations Order or Shuffle: Empirically Evaluating Vertex Order Impact on Parallel Graph Computations George M. Slota 1 Sivasankaran Rajamanickam 2 Kamesh Madduri 3 1 Rensselaer Polytechnic Institute, 2 Sandia National

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/4/2011 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

More information

Effective Searching of RDF Knowledge Bases

Effective Searching of RDF Knowledge Bases Effective Searching of RDF Knowledge Bases Shady Elbassuoni Joint work with: Maya Ramanath and Gerhard Weikum RDF Knowledge Bases Annie Hall is a 1977 American romantic comedy directed by Woody Allen and

More information

Graph Databases. Guilherme Fetter Damasio. University of Ontario Institute of Technology and IBM Centre for Advanced Studies IBM Corporation

Graph Databases. Guilherme Fetter Damasio. University of Ontario Institute of Technology and IBM Centre for Advanced Studies IBM Corporation Graph Databases Guilherme Fetter Damasio University of Ontario Institute of Technology and IBM Centre for Advanced Studies Outline Introduction Relational Database Graph Database Our Research 2 Introduction

More information

A Study of RDB-based RDF Data Management Techniques

A Study of RDB-based RDF Data Management Techniques A Study of RDB-based RDF Data Management Techniques Vahid Jalali, Mo Zhou, Yuqing Wu School of Informatics and Computing, Indiana University, Bloomington {vjalalib, mozhou, yugwu}@indianaedu Abstract RDF

More information

CSE 100 Minimum Spanning Trees Prim s and Kruskal

CSE 100 Minimum Spanning Trees Prim s and Kruskal CSE 100 Minimum Spanning Trees Prim s and Kruskal Your Turn The array of vertices, which include dist, prev, and done fields (initialize dist to INFINITY and done to false ): V0: dist= prev= done= adj:

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2015 Quiz I Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.830 Database Systems: Fall 2015 Quiz I There are 12 questions and 13 pages in this quiz booklet. To receive

More information

How Graphs and Java make GraphHopper efficient and fast. By Berlin Buzzwords,

How Graphs and Java make GraphHopper efficient and fast. By Berlin Buzzwords, How Graphs and Java make GraphHopper efficient and fast By Peter @timetabling Berlin Buzzwords, 2014-05-27 _ Available at graphhopper.com/public/slides How int[][] helped GraphHopper scaling How Graphs

More information

Fast and Scalable Analysis of Massive Social Graphs

Fast and Scalable Analysis of Massive Social Graphs Fast and Scalable Analysis of Massive Social Graphs Xiaohan Zhao, Alessandra Sala, Haitao Zheng and Ben Y. Zhao Department of Computer Science, U. C. Santa Barbara, Santa Barbara, CA USA {xiaohanzhao,

More information

HYRISE In-Memory Storage Engine

HYRISE In-Memory Storage Engine HYRISE In-Memory Storage Engine Martin Grund 1, Jens Krueger 1, Philippe Cudre-Mauroux 3, Samuel Madden 2 Alexander Zeier 1, Hasso Plattner 1 1 Hasso-Plattner-Institute, Germany 2 MIT CSAIL, USA 3 University

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 10. Graph databases Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Graph Databases Basic

More information

Information Retrieval II

Information Retrieval II Information Retrieval II David Hawking 30 Sep 2010 Machine Learning Summer School, ANU Session Outline Ranking documents in response to a query Measuring the quality of such rankings Case Study: Tuning

More information

Triple Stores in a Nutshell

Triple Stores in a Nutshell Triple Stores in a Nutshell Franjo Bratić Alfred Wertner 1 Overview What are essential characteristics of a Triple Store? short introduction examples and background information The Agony of choice - what

More information

Multi-agent and Semantic Web Systems: RDF Data Structures

Multi-agent and Semantic Web Systems: RDF Data Structures Multi-agent and Semantic Web Systems: RDF Data Structures Fiona McNeill School of Informatics 31st January 2013 Fiona McNeill Multi-agent Semantic Web Systems: RDF Data Structures 31st January 2013 0/25

More information

SPARQL Protocol And RDF Query Language

SPARQL Protocol And RDF Query Language SPARQL Protocol And RDF Query Language WS 2011/12: XML Technologies John Julian Carstens Department of Computer Science Communication Systems Group Christian-Albrechts-Universität zu Kiel March 1, 2012

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2017 Quiz I

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2017 Quiz I Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.830 Database Systems: Fall 2017 Quiz I There are 15 questions and 12 pages in this quiz booklet. To receive

More information

This presentation is for informational purposes only and may not be incorporated into a contract or agreement.

This presentation is for informational purposes only and may not be incorporated into a contract or agreement. This presentation is for informational purposes only and may not be incorporated into a contract or agreement. Oracle10g RDF Data Mgmt: In Life Sciences Xavier Lopez Director, Server Technologies Oracle

More information

Building Scalable Technologies for Semantic Analysis JOHN FEO HIGH PERFORMANCE DATA ANALYTICS PROJECT PACIFIC NORTHWEST NATIONAL LABORATORY

Building Scalable Technologies for Semantic Analysis JOHN FEO HIGH PERFORMANCE DATA ANALYTICS PROJECT PACIFIC NORTHWEST NATIONAL LABORATORY Building Scalable Technologies for Semantic Analysis JOHN FEO HIGH PERFORMANCE DATA ANALYTICS PROJECT PACIFIC NORTHWEST NATIONAL LABORATORY Size (PB) The problem Data is no longer owner produced, but rather

More information

Flexible querying for SPARQL

Flexible querying for SPARQL Flexible querying for SPARQL A. Calì, R. Frosini, A. Poulovassilis, P. T. Wood Department of Computer Science and Information Systems, Birkbeck, University of London London Knowledge Lab Overview of the

More information

Efficient Aggregation for Graph Summarization

Efficient Aggregation for Graph Summarization Efficient Aggregation for Graph Summarization Yuanyuan Tian (University of Michigan) Richard A. Hankins (Nokia Research Center) Jignesh M. Patel (University of Michigan) Motivation Graphs are everywhere

More information

Summary Models for Routing Keywords to Linked Data Sources

Summary Models for Routing Keywords to Linked Data Sources Summary Models for Routing Keywords to Linked Data Sources Thanh Tran, Lei Zhang, Rudi Studer Institute AIFB, Karlsruhe Institute of Technology, Germany {dtr,lzh,studer}@kit.edu Abstract. The proliferation

More information

Database Group Research Overview. Immanuel Trummer

Database Group Research Overview. Immanuel Trummer Database Group Research Overview Immanuel Trummer Talk Overview User Query Data Analysis Result Processing Talk Overview Fact Checking Query User Data Vocalization Data Analysis Result Processing Query

More information

Keyword query interpretation over structured data

Keyword query interpretation over structured data Keyword query interpretation over structured data Advanced Methods of Information Retrieval Elena Demidova SS 2018 Elena Demidova: Advanced Methods of Information Retrieval SS 2018 1 Recap Elena Demidova:

More information

Efficient, Scalable, and Provenance-Aware Management of Linked Data

Efficient, Scalable, and Provenance-Aware Management of Linked Data Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management

More information

Survey of RDF Storage Managers

Survey of RDF Storage Managers Survey of RDF Storage Managers Kiyoshi Nitta Yahoo JAPAN Research Tokyo, Japan knitta@yahoo-corp.jp Iztok Savnik University of Primorska & Institute Jozef Stefan, Slovenia iztok.savnik@upr.si Abstract

More information

Increasing Database Performance through Optimizing Structure Query Language Join Statement

Increasing Database Performance through Optimizing Structure Query Language Join Statement Journal of Computer Science 6 (5): 585-590, 2010 ISSN 1549-3636 2010 Science Publications Increasing Database Performance through Optimizing Structure Query Language Join Statement 1 Ossama K. Muslih and

More information

GEN_OMEGA2: The HPSUMMARY Procedure: A SAS Macro for Computing the Generalized Omega-Squared Effect Size Associated with

GEN_OMEGA2: The HPSUMMARY Procedure: A SAS Macro for Computing the Generalized Omega-Squared Effect Size Associated with GEN_OMEGA2: A SAS Macro for Computing the Generalized Omega-Squared Effect Size Associated with The HPSUMMARY Procedure: Analysis of Variance Models An Old Friend s Younger (and Brawnier) Cousin The HPSUMMARY

More information

MISO: Souping Up Big Data Query Processing with a Multistore System

MISO: Souping Up Big Data Query Processing with a Multistore System MISO: Souping Up Big Data Query Processing with a Multistore System Jeff LeFevre, UC Santa Cruz* Jagan Sankaranarayanan, NEC Labs Hakan Hacıgümüş. NEC Labs Junichi Tatemura, NEC Labs Neoklis Polyzotis,

More information

A Comparison of MapReduce Join Algorithms for RDF

A Comparison of MapReduce Join Algorithms for RDF A Comparison of MapReduce Join Algorithms for RDF Albert Haque 1 and David Alves 2 Research in Bioinformatics and Semantic Web Lab, University of Texas at Austin 1 Department of Computer Science, 2 Department

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

RDF-TX: A Fast, User-Friendly System for Querying the History of RDF Knowledge Bases

RDF-TX: A Fast, User-Friendly System for Querying the History of RDF Knowledge Bases RDF-TX: A Fast, User-Friendly System for Querying the History of RDF Knowledge Bases Shi Gao Jiaqi Gu Carlo Zaniolo University of California, Los Angeles {gaoshi, gujiaqi, zaniolo}@cs.ucla.edu ABSTRACT

More information

Atlas: Approximating Shortest Paths in Social Graphs

Atlas: Approximating Shortest Paths in Social Graphs Atlas: Approximating Shortest Paths in Social Graphs Lili Cao, Xiaohan Zhao, Haitao Zheng, and Ben Y. Zhao Computer Science Department, U. C. Santa Barbara Abstract. The search for shortest paths is an

More information

SharePoint Best Practices. Presented By: Mark Weinstein

SharePoint Best Practices. Presented By: Mark Weinstein SharePoint Best Practices Presented By: Mark Weinstein Installing SharePoint 32 or 64 bit? Which SharePoint version? WSS or MOSS? When installing SharePoint, be sure to select the Advanced method of installation

More information

Addressed Issue. P2P What are we looking at? What is Peer-to-Peer? What can databases do for P2P? What can databases do for P2P?

Addressed Issue. P2P What are we looking at? What is Peer-to-Peer? What can databases do for P2P? What can databases do for P2P? Peer-to-Peer Data Management - Part 1- Alex Coman acoman@cs.ualberta.ca Addressed Issue [1] Placement and retrieval of data [2] Server architectures for hybrid P2P [3] Improve search in pure P2P systems

More information

Supporting Fuzzy Keyword Search in Databases

Supporting Fuzzy Keyword Search in Databases I J C T A, 9(24), 2016, pp. 385-391 International Science Press Supporting Fuzzy Keyword Search in Databases Jayavarthini C.* and Priya S. ABSTRACT An efficient keyword search system computes answers as

More information

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012 Outline Database Management and Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 6 April 12, 2012 1 Acknowledgements: The slides are provided by Nikolaus Augsten

More information

Data Web. Dr. Mustafa Jarrar

Data Web. Dr. Mustafa Jarrar LDOW, WWW 09 April 20, 2009, Madrid, Spain MashQL A Data Mashup Language for the Data Web Dr. Mustafa Jarrar mjarrar@cs.ucy.ac.cy HPCLab, University of Cyprus Published As: Mustafa Jarrar and Marios D.

More information

Toward Analytics for RDF Graphs

Toward Analytics for RDF Graphs Toward Analytics for RDF Graphs Ioana Manolescu INRIA and Ecole Polytechnique, France ioana.manolescu@inria.fr http://pages.saclay.inria.fr/ioana.manolescu Joint work with D. Bursztyn, S. Cebiric (Inria),

More information

Fast Contextual Preference Scoring of Database Tuples

Fast Contextual Preference Scoring of Database Tuples Fast Contextual Preference Scoring of Database Tuples Kostas Stefanidis Department of Computer Science, University of Ioannina, Greece Joint work with Evaggelia Pitoura http://dmod.cs.uoi.gr 2 Motivation

More information

Outline. Database Tuning. Join Strategies Running Example. Outline. Index Tuning. Nikolaus Augsten. Unit 6 WS 2014/2015

Outline. Database Tuning. Join Strategies Running Example. Outline. Index Tuning. Nikolaus Augsten. Unit 6 WS 2014/2015 Outline Database Tuning Nikolaus Augsten University of Salzburg Department of Computer Science Database Group 1 Examples Unit 6 WS 2014/2015 Adapted from Database Tuning by Dennis Shasha and Philippe Bonnet.

More information

KNOWLEDGE GRAPHS. Lecture 1: Introduction and Motivation. TU Dresden, 16th Oct Markus Krötzsch Knowledge-Based Systems

KNOWLEDGE GRAPHS. Lecture 1: Introduction and Motivation. TU Dresden, 16th Oct Markus Krötzsch Knowledge-Based Systems KNOWLEDGE GRAPHS Lecture 1: Introduction and Motivation Markus Krötzsch Knowledge-Based Systems TU Dresden, 16th Oct 2018 Introduction and Organisation Markus Krötzsch, 16th Oct 2018 Knowledge Graphs slide

More information

An overview of Graph Categories and Graph Primitives

An overview of Graph Categories and Graph Primitives An overview of Graph Categories and Graph Primitives Dino Ienco (dino.ienco@irstea.fr) https://sites.google.com/site/dinoienco/ Topics I m interested in: Graph Database and Graph Data Mining Social Network

More information

Lecture 1: Introduction and Motivation Markus Kr otzsch Knowledge-Based Systems

Lecture 1: Introduction and Motivation Markus Kr otzsch Knowledge-Based Systems KNOWLEDGE GRAPHS Introduction and Organisation Lecture 1: Introduction and Motivation Markus Kro tzsch Knowledge-Based Systems TU Dresden, 16th Oct 2018 Markus Krötzsch, 16th Oct 2018 Course Tutors Knowledge

More information

Parallel Pruned Landmark Labeling for Shortest Path Queries on Unit-Weight Networks

Parallel Pruned Landmark Labeling for Shortest Path Queries on Unit-Weight Networks Parallel Pruned Landmark Labeling for Shortest Path Queries on Unit-Weight Networks Bachelor Thesis of Damir Ferizovic At the Department of Informatics Institute of Theoretical Informatics Reviewer: Second

More information

Comparing path-based and vertically-partitioned RDF databases

Comparing path-based and vertically-partitioned RDF databases 11/4/2007 Comparing path-based and vertically-partitioned RDF databases Abstract Preetha Lakshmi & Chris Mueller CSCI 8715 Given the increasing prevalence of RDF data formats for storing and sharing data

More information

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Integrating Complex Financial Workflows in Oracle Database Xavier Lopez Seamus Hayes Oracle PolarLake, LTD 2 Copyright 2011, Oracle

More information

Datenbanksysteme II: Caching and File Structures. Ulf Leser

Datenbanksysteme II: Caching and File Structures. Ulf Leser Datenbanksysteme II: Caching and File Structures Ulf Leser Content of this Lecture Caching Overview Accessing data Cache replacement strategies Prefetching File structure Index Files Ulf Leser: Implementation

More information

Edge Classification in Networks

Edge Classification in Networks Charu C. Aggarwal, Peixiang Zhao, and Gewen He Florida State University IBM T J Watson Research Center Edge Classification in Networks ICDE Conference, 2016 Introduction We consider in this paper the edge

More information

The Shortest Path Problem

The Shortest Path Problem The Shortest Path Problem 1 Shortest-Path Algorithms Find the shortest path from point A to point B Shortest in time, distance, cost, Numerous applications Map navigation Flight itineraries Circuit wiring

More information

Database Design with Entity Relationship Model

Database Design with Entity Relationship Model Database Design with Entity Relationship Model Vijay Kumar SICE, Computer Networking University of Missouri-Kansas City Kansas City, MO kumarv@umkc.edu Database Design Process Database design process integrates

More information

A Schema Extraction Algorithm for External Memory Graphs Based on Novel Utility Function

A Schema Extraction Algorithm for External Memory Graphs Based on Novel Utility Function DEIM Forum 2018 I5-5 Abstract A Schema Extraction Algorithm for External Memory Graphs Based on Novel Utility Function Yoshiki SEKINE and Nobutaka SUZUKI Graduate School of Library, Information and Media

More information

Shark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker

Shark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Shark Hive on Spark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Agenda Intro to Spark Apache Hive Shark Shark s Improvements over Hive Demo Alpha

More information

Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph ORACLE WHITE PAPER NOVEMBER 2016

Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph ORACLE WHITE PAPER NOVEMBER 2016 Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph ORACLE WHITE PAPER NOVEMBER 2016 Introduction One trillion is a really big number. What could you store with one trillion facts?» 1000

More information

SIREn: Entity Retrieval System for the Web of Data

SIREn: Entity Retrieval System for the Web of Data SIREn: Entity Retrieval System for the Web of Data Renaud Delbru Digital Enterprise Research Institute National University of Ireland Galway, Ireland renaud.delbru@deri.org Abstract We present ongoing

More information