Triple Stores in a Nutshell Franjo Bratić Alfred Wertner 1
Overview What are essential characteristics of a Triple Store? short introduction examples and background information The Agony of choice - what s on the market? which one fits for me? - Few examples Benchmark - Example Live Demo With AllegroGraph Import data Use Java Client API and run some queries 2
Motivation RDF is good in modeling assertions RDF consists of assertions Aka Triples Application developers need tools which can manage RDF data Import/Export Query Update http://www.franz.com/agraph/support/documentation/current/agraph-introduction.html 3
Triple Stores: Essentials Triple Stores are tools for RDF Data Management Essential characteristics: Persist RDF Data Native Storage Design (Graph Database) Use Relational Database Query and update the graph Support SPARQL 4
Persist RDF Data: Native Store Designed for storing graphs Block diagram of a native store implementation http://www.franz.com/agraph/support/documentation/current/agraph-introduction.html 5
Persist RDF Data: Quads A quad extends a triple with context information Fast retrieval of triples Supported by many Triple Stores Is not part of RDF! Get everything about Chuck s home page Subject Predicate Object Context Ground Chuck Type Human Chuck s home page Angel petof Ground Chuck Chuck s home page petof inverseof haspet English grammar Dog subclassof Mammal science 6
Persist RDF Data: Rdbms Stores triples with a relational database Can you imagine of a simple solution how to achieve that? 7
Triple Stores: Essentials Triple Stores are tools for RDF Data Management Essential characteristics: Persist RDF Data Native Storage Design (Graph Database) Use Relational Database Query and update the graph Support SPARQL 8
Query and update the Graph: SPARQL SPARQL Query Language support SPARQL Protocol SPARQL Query Language SPARQL Protocol Query and update operations based on HTTP Between client and SPARQL endpoint SPARQL Query Language Queries: SELECT, ASK, DESCRIBE, CONSTRUCT Updates: INSERT, DELETE 9
Triple Stores 10
Are there differences? The agony of choice Is one of them the right one? How to choose one for the project? - Requirements / criteria? - Environment of use? - Performance? - Costs? - 11
Scalability Set some criteria - Persistent stores better than in-memory stores Interoperability & portability - Programming language!!! - commit to use entire stack of a store Optimization - native stores vs. 3 rd party stores License, Support, Community, only a few left! 12
AllegroGraph v4.9 load, store, query RDF data includes an implementation of Prolog runs natively on Linux x86-64 bit Interfaces: Tools: Java, Python, Ruby, Perl, C#, Clojure, Common Lisp AGWebView, Gruff, License: Free < 50 Million Triples 13
AllegroGraph v4.9 http://www.franz.com/agraph/allegrograph/ag_client-server_arch_4.2.2.png 14
OpenLink Virtuoso v6.2 high-performance object-relational SQL database written in C distributions for Unix & Windows Access through: Jena & Sesame Tools: ISQL, Graphical Conductor License: GPL v2 & commercial 15
OpenLink Virtuoso v6.2 http://virtuoso.openlinksw.com/images/varch625.jpg 16
Jena Java based Open Source Framework represents RDF Graphs as native models: - In-memory - other data sources (file, database) Framework includes: - RDF API - Reading and writing RDF in RDF/XML, N3 and N-Triples - OWL API - In-memory and persistent storage SPARQL query engine - Rule-based inference engine - Query engine with SPARQL specification 17
Jena TDB high performance, pure-java non-sql storage subsystem persistent graph storage layer for Jena works with Jena SPARQL query engine (ARQ) number of extensions (e.g. property functions, aggregates, arbitrary length property paths) custom implementation of B+Tree-s License: BSD-License 18
basically is a Java Loader Multiple stores supported Jena SDB - e.g. MySQL, PostgreSQL, Oracle, DB2, Apache Derby, provides for: - scalable storage & query of RDF datasets using conventional SQL databases database tools for - load balancing, security, clustering - backup and administration can all be used to manage the installation designed specifically to support SPARQL 19
Sesame framework for processing RDF data - parsing, storing, inference & querying on top of a variety of storage systems - relational db-s, in-memory, file systems, keyword indexers, large scale of tools - HTTP, SOAP, RMI access supports 100% SPARQL (since 2008) supports main RDF file formats: - RDF/XML, Turtle, N-Triples, TriG & TriX, 20
as Java Servlet Application in Apache Tomcat Sesame communicate over HTTP http://www.openrdf.org/doc/sesame/users/figures/sesame-server.png 21
Sesame Sesame s overall architecture http://www.openrdf.org/doc/sesame/users/figures/sesame-arch.png 22
What data to be used? Benchmark - Lehigh University Benchmark (LUBM) - 14 test queries - Berlin SPARQL Benchmark (BSBM) - 12 test queries - real-world data - e.g. DBPedia, WordNet, Who is testing? - no central institution - tests (mostly) only by creator manipulated Testing architecture? 23
In almost all not considered - RDFS reasoning - SPARQL 1.1 - Heavy load - multiple queries in parallel Benchmark Conclusion of every benchmark in advance: NO store wins in every field!!! 24
Benchmark example Yet Another Triple Store Benchmark http://mt.inf.tu-dresden.de/forschung/topics/bm/ Machine Hardware CPU: Intel Xeon CPU X5660 @ 2.80GHz x 4 RAM: 16 GB Harddisk: 1 x 34 GB, 1 x 42 GB Software OS: Ubuntu 12.04 LTS / 64 Bit JRE: JDK 1.7.0_04 Apache Tomcat Ver. 7.0.28 25
Benchmark example stores Fuseki (Jena TDB SPARQL Server) ver. 0.2.3 - TDB Loader of Jena TDB 0.9.0 NanoSPARQLServer of bigdata ver. 1.2.0 - deployed on a tomcat server OWLIM LITE ver. 5.0.5001 - via Sesame 2.6.5 deployed on a tomcat server OpenLink Virtuoso Ver. 6.01.3127 26
Benchmark example dataset NYTimes Jamendo Movie DB Yago 2 Core N-Triple Datasize (MByte) 56.2 151.0 891.6 5,427.2 Triple (Mio) 0.35 1.05 6.15 35.43 Instances (k) 13.2 290.4 665.4 2,648.4 Classes 19 21 53 292,861 Properties 69 47 222 93 27
Query 1-6 Benchmark example queries - generic queries - same for each dataset Query 7-13 - SPARQL 1.1 Queries specialized for each dataset Query 14&15: - SPARQL Update queries - delete and insert some data in the graph 28
Load Time Result http://mt.inf.tu-dresden.de/forschung/topics/bm/loading.pdf 29
Load Time Result http://mt.inf.tu-dresden.de/forschung/topics/bm/loading.pdf 30
Memory requirement http://mt.inf.tu-dresden.de/forschung/topics/bm/memory.pdf 31
Memory requirement http://mt.inf.tu-dresden.de/forschung/topics/bm/memory.pdf 32
http://mt.inf.tu-dresden.de/forschung/topics/bm/queries_no_inf.pdf 33
http://mt.inf.tu-dresden.de/forschung/topics/bm/queries_no_inf.pdf 34
Triple Store DEMO!!! 35