OSDBQ: Ontology Supported RDBMS Querying

Similar documents
RDF Stores Performance Test on Servers with Average Specification

Triple Stores in a Nutshell

Benchmarking RDF Production Tools

Incremental Export of Relational Database Contents into RDF Graphs

RiMOM Results for OAEI 2009

New Approach to Graph Databases

A Framework for Performance Study of Semantic Databases

Two Layer Mapping from Database to RDF

Event Stores (I) [Source: DB-Engines.com, accessed on August 28, 2016]

Grid Resources Search Engine based on Ontology

a paradigm for the Introduction to Semantic Web Semantic Web Angelica Lo Duca IIT-CNR Linked Open Data:

Evaluating semantic data infrastructure components for small devices

Object-UOBM. An Ontological Benchmark for Object-oriented Access. Martin Ledvinka

7 Analysis of experiments

Orchestrating Music Queries via the Semantic Web

PECULIARITIES OF LINKED DATA PROCESSING IN SEMANTIC APPLICATIONS. Sergey Shcherbak, Ilona Galushka, Sergey Soloshich, Valeriy Zavgorodniy

6 Experiments for NL-storing of middle-size and large RDF-datasets

Semantic Web Fundamentals

Comparing path-based and vertically-partitioned RDF databases

COMPUTER AND INFORMATION SCIENCE JENA DB. Group Abhishek Kumar Harshvardhan Singh Abhisek Mohanty Suhas Tumkur Chandrashekhara

Shop on: Ontology for E-shopping

An efficient SQL-based querying method to RDF schemata

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008

Web Ontology for Software Package Management

An Efficient Approach to Triple Search and Join of HDT Processing Using GPU

Semantic Web. Tahani Aljehani

Efficient Optimization of Sparql Basic Graph Pattern

Development of Prediction Model for Linked Data based on the Decision Tree for Track A, Task A1

Publishing Linked Statistical Data: Aragón, a case study.

USING DECISION MODELS METAMODEL FOR INFORMATION RETRIEVAL SABINA CRISTIANA MIHALACHE *

Ontology Modeling and Storage System for Robot Context Understanding

Semantic Web: vision and reality

MODEL-BASED SYSTEMS ENGINEERING DESIGN AND TRADE-OFF ANALYSIS WITH RDF GRAPHS

RDF on Cloud Number Nine

DBpedia-An Advancement Towards Content Extraction From Wikipedia

Comparative Study of RDB to RDF Mapping using D2RQ and R2RML Mapping Languages

Readme file for Oracle Spatial and Graph and OBIEE Sample Application (V305) VirtualBox

An Ontology-Based Methodology for Integrating i* Variants

A Fast and High Throughput SQL Query System for Big Data

Benchmarking the UB-tree

International Journal of Scientific & Engineering Research, Volume 4, Issue 7, July ISSN

Lightweight Transformation of Tabular Open Data to RDF

Copyright 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12

An Annotation Tool for Semantic Documents

The Semantic Web & Ontologies

XML ALONE IS NOT SUFFICIENT FOR EFFECTIVE WEBEDI

Existing System : MySQL - Relational DataBase

Ontology Extraction from Tables on the Web

Ontology Creation and Development Model

SWRL RULE EDITOR: A WEB APPLICATION AS RICH AS DESKTOP BUSINESS RULE EDITORS

Flexible Cache Cache for afor Database Management Management Systems Systems Radim Bača and David Bednář

Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute

From Online Community Data to RDF

ISENS: A System for Information Integration, Exploration, and Querying of Multi-Ontology Data Sources

Demo: Linked Open Statistical Data for the Scottish Government

Accessing information about Linked Data vocabularies with vocab.cc

Scaling Parallel Rule-based Reasoning

Proposal for Implementing Linked Open Data on Libraries Catalogue

Dartgrid: a Semantic Web Toolkit for Integrating Heterogeneous Relational Databases

Deep Learning Performance and Cost Evaluation

Extracting knowledge from Ontology using Jena for Semantic Web

Using Semantic Web Technologies for context-aware Information Providing to Mobile Devices

Annales UMCS Informatica AI 1 (2003) UMCS. Using PHP and HTML languages and MySQL databases to create servers of scientific information

SkyEyes: A Semantic Browser For the KB-Grid

Integrated Usage of Heterogeneous Databases for Novice Users

On the use of Abstract Workflows to Capture Scientific Process Provenance

CHAPTER 1 INTRODUCTION

Ontology Extraction from Heterogeneous Documents

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction

An Ontology Based Question Answering System on Software Test Document Domain

QuickTime and a Tools API Breakout. TIFF (LZW) decompressor are needed to see this picture.

ArgQL: A Declarative Language for Querying Argumentative Dialogues

Automating Instance Migration in Response to Ontology Evolution

An Evaluation of Client-Side Dependencies of Search Engines by Load Testing

Deep Learning Performance and Cost Evaluation

Collaborative Ontology Construction using Template-based Wiki for Semantic Web Applications

A Community-Driven Approach to Development of an Ontology-Based Application Management Framework

A Tagging Approach to Ontology Mapping

Using RDF to Model the Structure and Process of Systems

Novel System Architectures for Semantic Based Sensor Networks Integraion

Scalewelis: a Scalable Query-based Faceted Search System on Top of SPARQL Endpoints

Create A Relational Database Schema For The Following Library System

Provided by the author(s) and NUI Galway in accordance with publisher policies. Please cite the published version when available.

X-KIF New Knowledge Modeling Language

Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

Hyperdata: Update APIs for RDF Data Sources (Vision Paper)

Benchmarking triple stores with biological data

Accessing Relational Data on the Web with SparqlMap

Semantic Processing of Sensor Event Stream by Using External Knowledge Bases

Several major software companies including IBM, Informix, Microsoft, Oracle, and Sybase have all released object-relational versions of their

Leverage the power of SQL Analytical functions in Business Intelligence and Analytics. Viana Rumao, Asher Dmello

Revisiting Blank Nodes in RDF to Avoid the Semantic Mismatch with SPARQL

Database Management System 2

Intelligent flexible query answering Using Fuzzy Ontologies

Welcome to INFO216: Advanced Modelling

An overview of RDB2RDF techniques and tools

Theme Identification in RDF Graphs

Motivation and basic concepts Storage Principle Query Principle Index Principle Implementation and Results Conclusion

Database Acceleration Solution Using FPGAs and Integrated Flash Storage

Scalability Report on Triple Store Applications

Transcription:

OSDBQ: Ontology Supported RDBMS Querying Cihan Aksoy 1, Erdem Alparslan 1, Selçuk Bozdağ 2, İhsan Çulhacı 3, 1 The Scientific and Technological Research Council of Turkey, Gebze/Kocaeli, Turkey 2 Komtaş Information Management Inc., Ostim/Ankara, Turkey 3 Turkish Court of Accounts, Söğütözü/Ankara, Turkey {caksoy, ealparslan}@uekae.tubitak.gov.tr Abstract. Data handling and retrieving has an essential importance where the data size is larger than a certain amount. Storing transactional data in relational form and querying with Structured Query Language (SQL) is very preferable because of its tabular structure. Data may be also stored in an ontological form if it includes numerous semantic relations. This method is more suitable in order to infer information from relations. When transactional data contain many semantic relations inside as in our problem, it is not easy to decide the method of storing and querying. We introduce a new querying mechanism Ontology Supported RDBMS Querying (OSDBQ) covering positive sides of both two storing and querying methods. This paper describes OSDBQ and presents a comparison between three querying methods. Results show that most of the time OSDBQ returns fairly better results than the others. Keywords: Ontology, relational database management systems, inference, transactional data querying, semantic matching. 1 Introduction The semantic web introduces web of data that enables machines to interpret the meaning of information on the web [1]. This approach was required since one cannot achieve corresponding information belong to the searching criteria from the web, because the web turned into information dump in the last decade. This technology aims at semantically tagging the resources in order to cope with dirty information. One of the most important parts of semantic web is ontology layer. Ontology is hierarchy of concepts representing the meaning of an information field [2]. It is possible to obtain ontological data from these concepts by instantiating them. Therefore, data includes same relationships between each other like the concepts. This allows us to infer hidden or deep information that is achieved by using some relationships. On the other hand, data size is increasing between two and three times since the data includes relationships. There are several ontology representations such as OWL [3], RDF [4], N-Triple [5] etc. A widely known query language for RDF is called as SPARQL [8]. Commonly used relational tables are far away from the meaning notion; they only keep the data in a tabular form. Data retrieved from these tables by querying with SQL. Since they only focus on data, their size is smaller than an ontological data store

that has same amount of data. That s why querying a relational table returns faster results. Relational tables are designed for tabular data, ontologies well behaved for hierarchical data where semantic relationships exist. Transactional data that consist of sales, deliveries, invoices, claims and other monetary and non-monetary interactions, may have many relationships inside. They are usually kept in tabular form. Modeling transactional data is a challenging point when the data size is larger than a certain amount. We should consider the querying performance, data size, extracting hidden or deep information. In this paper, we propose a new querying method that satisfies the lack of relational querying method by benefitting from the powerful side of ontology. Hence, our approach suggests keeping transactional data in tabular form and querying with SQL. But while querying, it infers supplementary information from ontologies and adds obtained information in parameterized SQL queries. The rest of the paper is organized as follows. In Section 2, we give details about the used data models and querying methods. Also we introduce a new querying method and explain its working principle. In Section 3, we apply mentioned methods to the sample data and we show the results. Finally, in section 4, we discuss the limitations of our approach and explain what can be done to solve them as future works. 2 Applied Methods 2.1 Querying Ontological Data via SPARQL As a first data model, we made BSBM Tools [10] a data generator tool provided by Berlin SPARQL Benchmark (BSBM) [11]- generate N-Triple data format to have ontological data. Ontological data includes not only the concepts that belong to a certain domain but also the instances of these concepts. These instances correspond to the tuples of a tabular data model. Moreover, the semantic relations between concepts and between instances are found in this kind of data. That s why ontological data covers larger size on disk compared with relational data. In order to have a querying interface, we put the generated data in TDB Store [6] by using tdbloader component of Jena API [7]. It is observed that the data size on disk is a bit decreasing due to the efficient storing mechanism of TDB Store. After have stored the data, we sent queries in SPARQL query language from a programmatic interface thanks to Jena API. Also we used JENA s reasoner because SPARQL queries are required to make inferences. That is very suitable for ontological data since they hold relationships inside. Indexing and caching capabilities of TDB Store provides shorter query time after each time for same query as seen in results.

2.2 Querying Relational Data via SQL In order to prepare a relational database as an environment of second method, we again used BSBM Tools and we made it generate an SQL dump in which all necessary commands to form the database tables are found. We executed these commands in MySQL database; we extracted and loaded this data in PostgreSQL. As we represent the characteristics of the dataset in 3.2, certain points are identical in all types of generated data for the same scale factor. For example, relationships inside the data, amounts of instances, amounts of concepts etc. are same in all types of generated data whereas the value of data may change. After we loaded the data in database, we noticed that there are many relation tables, so that the data represent same relationships which are found in ontological data. Instead of use the relationships of tables directly; we applied Das et al. s Supporting Ontology-based Semantic Matching in RDBMS approach [9] by taking these relationships into separate tables that represent the ontology. Therefore, as shown in Figure 1, the database can be viewed as it is formed from two main table group; one of them holds the ontologies while the other holds the data. Fig. 1. Das et al. s architecture. In this approach, as mentioned before, there are two groups of tables. In the first group, called system defined tables, the ontological structure and semantic relations are stored. We can infer semantic relations between individuals from system defined data tables. The second group, called user tables, stores the bulk individuals or tuples in tabular form. After inferring semantic rules from system defined tables, prepared rich SQL statement can be executed on user tables. In this way, querying semantic related data by using RDBMS can be achieved without using an inference engine. After generated the database, we executed the same queries from the same interface except the query language. In this method, we transformed SPARQL queries into SQL queries. Moreover, we inferred relationships from ontology related tables by using stored procedures as explained in Das et al s approach. For example, the operator ONT_RELATED returns all parents attached hierarchically to a certain concept or instance. The goal of this method is to represent the ontology in a relational database to be able to benefit from the performance of this kind of database.

2.3 Ontology Supported Relational Data Querying Thirdly, we propose to keep the semantic relations between individuals separate from the tabular transaction data. As we mentioned before, the first approach infers and queries both semantic relations and transactional data from RDF store. On the other hand, the second approach queries both semantic relations and transactional bulk data from RDBMS data store. In this approach we propose to infer semantic relations from RDF store and then querying the transactional bulk data from conventional RDBMS. In other words, we realize the second approach by replacing system-defined tables with RDF store. Fig. 2. OSDBQ architecture. To realize this approach the same RDF store have been used that we have generated in the first method. Different from first method, this time RDF store is used for querying the domain ontology and inferring semantic relations, not for querying the transactional bulk data. The related ontology file obtained from RDF store is loaded into the memory in order to avoid time losses of file opening while inferring. By using JENA API s reasoner, necessary semantic relations are inferred and then reflected to the second part of this approach. The second part, which queries bulk transactional data, takes the inference results previously presented by JENA as parameters and sends the parameterized SQL query to the RDBMS data store. Therefore a transactional data can be queried in the ontological form and with the RDBMS performance, as seen in Figure 2. Figure 3 depicts a basic flow of our proposed Ontology Supported Relational Database Querying Architecture. Complex user query, which may require both semantic inferring and transactional data querying, is given to the system by the user interaction. If the complex query needs inference then the system loads the required ontology RDF files into the memory and realize the semantic inferences by using ontology objects. These inferences basically prepare rich SQL parameters for transactional data querying. In other words, some of the parameters which are used in WHERE clauses of transactional SQL queries are prepared by inference engine

loaded into the memory. Therefore the system is able to send the SQL queries on the transactional data by using inferred parameters obtained from the inference engine running on memory. No SQL querying on transactional bulk data Complex user query Needs inference? Return rich query results Yes Load ontology into the memory Infer additional query parameters on the fly Fig. 3. Flowchart of OSDBQ architecture. 3 Application to Sample Data It is very important to properly decide for organizations how the data will be stored and retrieved in case of the huge data sizes. Size of the data is not only the considered point, but also the characteristic of the data is taken into account while giving a decision; data may be designed hierarchical, relational, etc. To choose the right architecture, organizations are in need of comparison and benchmarking studies. In this section, firstly we give some information about the environment and utilities that we have used during the tests, secondly we show the dataset on which we have applied proposed methods, thirdly we explain the queries, and finally we represent the comparison results of three proposed methods. 3.1 Experimental Setup We realized the experiments on a HP Workstation (processor: 2 x Intel Pentium 3 Xeon, 2833 MHz; memory: 8GB DDR2 667; hard disk: 320GB 7200Rpm SATA2) with Ubuntu 10.04 LTS 64-bit operating system. Also following utilities were used: Jena TDB Version 0.8.9 as RDF storage and query component PostgreSQL Version 1.12.2 as database To measure the performance of these methods, we prepared two different size of dataset for each method so that the results are rendered more consistent. One of these dataset includes 10000 products and related tables whereas the other one has 100000 products. For the same goal, we also repeated the execution of each query 4 times to avoid certain effects which can slightly change the real result, such as caching mechanisms in data stores and in databases. All methods were executed on the same machine in order to avoid network latency. Obtained results are recorded in millisecond.

3.2 Dataset Berlin Sparql Benchmark s dataset [11] are used because it is relevant with transactional data. It is built around an e-commerce use case, where a set of products is offered by different vendors and different consumers have posted reviews about products. It is possible to generate an arbitrary amount of data where number of product is scale factor. The data generation is deterministic to be able to create different representation of the same dataset. The dataset is composed of instances of these classes: Product, ProductType, ProductFeature, Producer, Vendor, Offer, Review, Reviewer and ReviewingSite. All products have between 3-5 textual properties. Each property consists of 5-15 words that randomly selected from a dictionary. Also products contain between 3-5 numeric properties whose values range between 1 and 2000. All products have a product type from the type hierarchy. The depth and width of the product type hierarchy depends on number of products. This hierarchy is set to the dataset even if the data store or database doesn t support RDFS inference. All products have a variable number of product features regarding to its position on the product hierarchy. All products are offered by vendors. Offers contain the price and the number of days for the delivery, also they are proposed for a certain date interval. Reviews are published by reviewers. Reviewers have a name, a mailbox checksum, and a country that shows where they live in. Reviews have a title and a review text that consist of between 50-300 words. Also they have four random ratings. Table 1 shows the number of instances of each class in BSBM dataset depending on our choice of product number. We shortly called as small dataset which is generated with 10000 products, and big dataset which is generated with 100000 products. Since ontology hold data as triples, we gave the number of triples below for each amount of data. Table 1. Number of instances in BSBM datasets for different scales. Data Type Small dataset Big dataset Number of Product Feature 10519 47884 Number of Product Type 329 2011 Number of Producer 206 1988 Number of Vendor 105 995 Number of Offer 200000 2000000 Number of Review 100000 1000000 Total Number of Instances 311159 3052878 Total Number of Triples 3565060 35270984 3.3 Queries We used 4 queries which include sufficient depths in order to compare the methods. We give attention of depth since we look for an ideal data storing and querying mechanism for transactional data. As mentioned before, dataset is built around an e-

1 st query 2 nd query 3 rd query 4 th query commerce use-case, and queries correspond with the search and navigation pattern of a consumer looking for a product. In the first query, the costumer searches for products that have a specific type and features. Secondly, the consumer asks for products belong to certain types having several features but not having a specific other feature. Thirdly, the consumer looks for products belong to certain types matching either one set of features or another set. Inference is needed for these three queries since each product type hierarchically belongs to another product type as described in dataset section, thus products that don't match directly with a given type may be returned as result. In the last query, a vendor wants to find out which product categories get the most attention by people from a certain country. A similar inference as the others; product category that attracts most attention of people will increase the popularity of the parental product category. 3.4 Results and Interpretation After have prepared the environment, we started to execute the queries. Firstly we sent the queries on small dataset that consists of 10000 products. Table 2. Results of methods for 10000 products (ms). 1 st method 2 nd method 3 rd method 1 st test 3009 1169 1881 2 nd test 269 35 12 3 rd test 179 34 13 4 th test 117 31 12 1 st test 1075 772 1675 2 nd test 146 58 14 3 rd test 147 50 13 4 th test 72 35 12 1 st test 1637 104 496 2 nd test 150 46 64 3 rd test 105 48 62 4 th test 115 48 61 1 st test 2304 3153 1912 2 nd test 960 28 28 3 rd test 549 30 18 4 th test 515 29 17

1 st query 2 nd query 3 rd query In Table 2, each column represents one of data storing and querying method mentioned in the previous section. At the rows, experiments of queries are shown. Each query was performed four times. We took better results for each time we execute the same query. It is observed that results are sufficiently consistent after the first test. We can easily deduce that the 1 st method where ontologies queried gives worse results compared to the others. However, the 3 rd method where ontologies were partially used with tabular data, returns better results than 2 nd method where only tabular data were queried. This means that supporting relational databases with ontologies and querying them by using semantic reasoners may increase the performance. Table 3. Results of methods for 100000 products (ms). 1 st method 2 nd method 3 rd method 1 st test 77034 12728 10154 2 nd test 591 286 298 3 rd test 484 285 285 4 th test 468 279 286 1 st test 514 4856 12631 2 nd test 448 568 211 3 rd test 441 565 212 4 th test 443 559 211 1 st test 1401 1434 3233 2 nd test 1367 912 756 3 rd test 1342 912 754 4 th test 1323 905 749 In order to be certain from consistency of the results of our small dataset, we performed the same process with 10 times bigger data. In Table 3, where columns and rows represent same points as Table 2, results proved that 3 rd method returns usually the best results among three methods. On the other hand, although the 3 rd method gives the best result, it may not be always applicable as seen in our experiment. Since the suitable ontology for executing the 4 th query was larger than the size of the memory, we couldn t realize last query. This shows the scalability problem of the 3 rd method.

4 Conclusion In this paper, a new data storing and querying mechanism, Ontology Supported RDBMS Querying (OSDBQ) is introduced by comparing with the most known two data querying mechanisms. SQL is ideal for querying tabular data. SPARQL is preferred for querying and inferring ontological data where relations exist. OSDBQ approach is well behaved for querying tabular data where relations exist like ontological data. Its performance not only relies on the positive sides of the others, but also the necessary inferences are realized with ontologies that are held on memory. However, our approach may be restricted from where it bases on. Large ontologies may not be fit into the memory. Since the size of the data is increased because of all relationships are included in ontological data, results of 1 st method are always worse than the others. Most of the time 3 rd method returned better results than the 2 nd method. So we can easily say that it should be used as long as the ontologies are fit into the memory. This paper aims to develop a new querying method handling semantically related transactional dataset. The new OSDBQ method may be applied on huge datasets and large ontologies behind these dataset. Our future work will be coping with handling large ontologies on memory. We will try to predict the necessary parts of ontologies and partially bring them on memory. A huge transactional and semantically relational audit dataset of Turkish Court of Accounts will be adapted to the OSDBQ framework for analytical purposes. Therefore, we will try to overcome the size problem of large ontologies. Ontology merging may be another challenge for improving the performance by facilitating the foresight mentioned above. References 1. Berners-Lee, T., Hendler, J., Lassila O.: The Semantic Web. Scientific American Magazine. (2001) 2. Guarino, N.: Formal Ontology and Information Systems. In: 1 st International Conference on Formal Ontology in Information Systems [FOIS], pp. 2--5. Torino (1998) 3. McGuinness, D.L., Harmelen, F.: Owl Web Ontology Language, http://www.w3.org/tr/owl-features/ 4. RDF, http://www.w3.org/rdf/ 5. N-Triples, http://www.w3.org/2001/sw/rdfcore/ntriples/ 6. TDB, http://openjena.org/wiki/tdb 7. Jena A Semantic Web Framework for Java, http://jena.sourceforge.net/ 8. SPARQL Query Language for RDF, http://www.w3.org/tr/rdf-sparql-query/ 9. Das, S., Chong, E., Eadon, G., Srinivasan, J.: Supporting Ontology-based Semantic Matching in RDBMS. VLDB, pp. 1054-1065, Toronto (2004) 10. Schultz, A.: BSBM Tools, http://sourceforge.net/projects/bsbmtools/ 11. Bizer, C., Schultz, A.: The Berlin Sparql Benchmark. In: Int. J. Semantic Web. Inf. Syst., pp. 1--24. (2009)