An FCA Framework for Knowledge Discovery in SPARQL Query Answers

Similar documents
Assisted Policy Management for SPARQL Endpoints Access Control

Linked data from your pocket: The Android RDFContentProvider

QAKiS: an Open Domain QA System based on Relational Patterns

Scalewelis: a Scalable Query-based Faceted Search System on Top of SPARQL Endpoints

Fault-Tolerant Storage Servers for the Databases of Redundant Web Servers in a Computing Grid

CORON: A Framework for Levelwise Itemset Mining Algorithms

YAM++ : A multi-strategy based approach for Ontology matching task

Structuring the First Steps of Requirements Elicitation

Malware models for network and service management

Setup of epiphytic assistance systems with SEPIA

Tacked Link List - An Improved Linked List for Advance Resource Reservation

Change Detection System for the Maintenance of Automated Testing

Comparator: A Tool for Quantifying Behavioural Compatibility

BoxPlot++ Zeina Azmeh, Fady Hamoui, Marianne Huchard. To cite this version: HAL Id: lirmm

Sewelis: Exploring and Editing an RDF Base in an Expressive and Interactive Way

Natural Language Based User Interface for On-Demand Service Composition

Syrtis: New Perspectives for Semantic Web Adoption

Multimedia CTI Services for Telecommunication Systems

Every 3-connected, essentially 11-connected line graph is hamiltonian

DANCer: Dynamic Attributed Network with Community Structure Generator

RDF/SPARQL Design Pattern for Contextual Metadata

Service Reconfiguration in the DANAH Assistive System

How to simulate a volume-controlled flooding with mathematical morphology operators?

Representation of Finite Games as Network Congestion Games

LaHC at CLEF 2015 SBS Lab

A Voronoi-Based Hybrid Meshing Method

Comparison of spatial indexes

Teaching Encapsulation and Modularity in Object-Oriented Languages with Access Graphs

YANG-Based Configuration Modeling - The SecSIP IPS Case Study

An Experimental Assessment of the 2D Visibility Complex

Real-Time and Resilient Intrusion Detection: A Flow-Based Approach

Blind Browsing on Hand-Held Devices: Touching the Web... to Understand it Better

Traffic Grooming in Bidirectional WDM Ring Networks

UsiXML Extension for Awareness Support

Relabeling nodes according to the structure of the graph

Taking Benefit from the User Density in Large Cities for Delivering SMS

A Practical Evaluation Method of Network Traffic Load for Capacity Planning

MUTE: A Peer-to-Peer Web-based Real-time Collaborative Editor

Deformetrica: a software for statistical analysis of anatomical shapes

Study on Feebly Open Set with Respect to an Ideal Topological Spaces

IntroClassJava: A Benchmark of 297 Small and Buggy Java Programs

The New Territory of Lightweight Security in a Cloud Computing Environment

A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme

Open Digital Forms. Hiep Le, Thomas Rebele, Fabian Suchanek. HAL Id: hal

Multi-atlas labeling with population-specific template and non-local patch-based label fusion

Mokka, main guidelines and future

Navigating and Exploring RDF Data using Formal Concept Analysis

Boosting QAKiS with multimedia answer visualization

Linux: Understanding Process-Level Power Consumption

The Proportional Colouring Problem: Optimizing Buffers in Radio Mesh Networks

Moveability and Collision Analysis for Fully-Parallel Manipulators

Efficient implementation of interval matrix multiplication

Improving Collaborations in Neuroscientist Community

Regularization parameter estimation for non-negative hyperspectral image deconvolution:supplementary material

Quality of Service Enhancement by Using an Integer Bloom Filter Based Data Deduplication Mechanism in the Cloud Storage Environment

Implementing an Automatic Functional Test Pattern Generation for Mixed-Signal Boards in a Maintenance Context

From Object-Oriented Programming to Service-Oriented Computing: How to Improve Interoperability by Preserving Subtyping

Kernel perfect and critical kernel imperfect digraphs structure

Detecting Anomalies in Netflow Record Time Series by Using a Kernel Function

The optimal routing of augmented cubes.

Decentralised and Privacy-Aware Learning of Traversal Time Models

X-Kaapi C programming interface

Computing and maximizing the exact reliability of wireless backhaul networks

Type Feedback for Bytecode Interpreters

Reverse-engineering of UML 2.0 Sequence Diagrams from Execution Traces

HySCaS: Hybrid Stereoscopic Calibration Software

QuickRanking: Fast Algorithm For Sorting And Ranking Data

SIM-Mee - Mobilizing your social network

Light field video dataset captured by a R8 Raytrix camera (with disparity maps)

Graphe-Based Rules For XML Data Conversion to OWL Ontology

Geometric Search in TGTP

The Connectivity Order of Links

Prototype Selection Methods for On-line HWR

ASAP.V2 and ASAP.V3: Sequential optimization of an Algorithm Selector and a Scheduler

THE COVERING OF ANCHORED RECTANGLES UP TO FIVE POINTS

A case-based reasoning approach for unknown class invoice processing

Catalogue of architectural patterns characterized by constraint components, Version 1.0

A case-based reasoning approach for invoice structure extraction

XBenchMatch: a Benchmark for XML Schema Matching Tools

Mapping classifications and linking related classes through SciGator, a DDC-based browsing library interface

lambda-min Decoding Algorithm of Regular and Irregular LDPC Codes

Periodic meshes for the CGAL library

Fuzzy sensor for the perception of colour

Real-Time Collision Detection for Dynamic Virtual Environments

Aligning Legivoc Legal Vocabularies by Crowdsourcing

Modularity for Java and How OSGi Can Help

Very Tight Coupling between LTE and WiFi: a Practical Analysis

Application of RMAN Backup Technology in the Agricultural Products Wholesale Market System

Monitoring Air Quality in Korea s Metropolises on Ultra-High Resolution Wall-Sized Displays

Validating Ontologies against OWL 2 Profiles with the SPARQL Template Transformation Language

Leveraging ambient applications interactions with their environment to improve services selection relevancy

On Code Coverage of Extended FSM Based Test Suites: An Initial Assessment

NP versus PSPACE. Frank Vega. To cite this version: HAL Id: hal

Generative Programming from a Domain-Specific Language Viewpoint

FIT IoT-LAB: The Largest IoT Open Experimental Testbed

Application of Artificial Neural Network to Predict Static Loads on an Aircraft Rib

Simulations of VANET Scenarios with OPNET and SUMO

Stream Ciphers: A Practical Solution for Efficient Homomorphic-Ciphertext Compression

Spectral Active Clustering of Remote Sensing Images

A typology of adaptation patterns for expressing adaptive navigation in Adaptive Hypermedia

Transcription:

An FCA Framework for Knowledge Discovery in SPARQL Query Answers Melisachew Wudage Chekol, Amedeo Napoli To cite this version: Melisachew Wudage Chekol, Amedeo Napoli. An FCA Framework for Knowledge Discovery in SPARQL Query Answers. The 12th International Semantic Web Conference, Oct 2013, Sydney, Australia. 2013. <hal-00881080> HAL Id: hal-00881080 https://hal.inria.fr/hal-00881080 Submitted on 7 Nov 2013 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

An FCA Framework for Knowledge Discovery in SPARQL Query Answers Melisachew Wudage Chekol and Amedeo Napoli LORIA (INRIA, CNRS, and Université de Lorraine), France {melisachew.chekol,amedeo.napoli}@inria.fr Abstract. Formal concept analysis (FCA) is used for knowledge discovery within data. In FCA, concept lattices are very good tools for classification and organization of data. Hence, they can also be used to visualize the answers of a SPARQL query instead of the usual answer formats such as: RDF/XML, JSON, CSV, and HTML. Consequently, in this work, we apply FCA to reveal and visualize hidden relations within SPARQL query answers by means of concept lattices. 1 Introduction Recently, the amount of semantically rich data available on the Web has grown considerably. Since the conception of linked data publishing principles, over 295 linked (open) datasets (LOD) have been produced 1. A reasonable number of these datasets provide endpoints for accessing (querying) their contents. Querying is mainly done through the W3C recommended query language SPARQL. The answers of SPARQL queries have often the following formats: RDF/XML, JSON, CSV, text, TSV, Java Script, XML, Spreadsheet, Ntriples, and HTML. It might be interesting to analyse, mine, and then visualize hidden relations within the answers. These tasks can be carried out using Formal Concept Analysis (FCA). FCA is used for knowledge discovery within data represented by means of objects and their attributes [3]. Concept lattices can reveal hidden relations within data and can be used for organizing, classifying, and even mining data. A survey of the benefits of FCA to semantic web (SW) and vice versa has been proposed in [6] (in particular ontology completion [1]). Additionally, studies in [2] and [4] are based on FCA for managing SW data. The former provides an entry point to linked data using questions in a way that can be navigated. It gives a transformation of an RDF graph into a formal context where the subject of an RDF triple becomes the object, a composition of the predicate and object of the triple becomes an attribute. The latter obliges the user to specify variables corresponding to objects and attributes of a context. These variables are used to create a SPARQL query which is used to extract content from linked data in order to build the formal context. Following this line, we propose a way to 1 http://linkeddata.org/

1 Keyword 2 SELECT Query 3 Object and Attribute CONSTRUCT Query CONSTRUCT Query SELECT Query Linked data Formal Context Concept Lattice Fig. 1: Architecture of SPARQL answers organization. organize Semantic Web data, and more precisely, the organization of SPARQL query answers by means of concept lattices. As a result, the user is able to visualize, navigate and classify the answers w.r.t. their context. For that, we propose the architecture depicted in Figure 1, based on three components which are discussed below: 1. Keyword search: In this component, a keyword (for instance, 14 juillet ) is used to search (to find information regarding this word) a specified dataset. To do so, a URI produced from the keyword is sent to the dataset to check its existence. When this is the case, a CONSTRUCT query containing the keyword is directed to the endpoint of the dataset. The answers are collected and organized to create a formal context as explained in the next section. 2. SELECT Query: This component builds a formal context out of the answers of a SPARQL query. SPARQL queries are converted into CONSTRUCT queries to form RDF graphs from the answers. Again, this will be illustrated in the next section. 3. Variables corresponding to objects and attributes of a formal context: This component enables the user to precisely specify the objects and attributes of the formal context. Out of which a SPARQL query is formed and sent to a chosen SPARQL endpoint. The answers of the query are collected to build a formal context. From that, a concept lattice is constructed. This is the approach considered by the authors in [4]. An example is proposed in the next section. In each case, the objective is to build a formal context and then to build the associated concept lattice. 2 Proposal A formal context represents data using objects and their attributes. Formally, it is a triple K = (G,M,I) where G is a set of objects, M is a set of attributes,

and I G M is a binary relation. A derivation operator ( ) is used to compute formal concepts of a context. Given a set of objects A, a derivation operator computes the maximal set of attributes shared by objects in A and is denoted by A (this is done dually with set of attributes B). A formal concept is a pair (A,B) where A = B and B = A. A set of formal concepts ordered with the set inclusion relation form a concept lattice [3]. Definition 1 (RDF as a Formal Context). Given an RDF graph G and a transformation function σ, a formal context is obtained from G as follows: C If o 1,rdf : type,c G, then σ( o 1,rdf : type,c ) = o 1 x R. R. If o 1,R,o 2 G, then σ( o 1,R,o 2 ) = o 1 x x We consider a core fragment of RDFS called ρdf [5] which contains the minimal vocabulary, ρdf = {sp,sc,type,dom,range}, where sp denotes the subproperty relation, sc is subclass, and dom stands for domain. This fragment was proven to be minimal and well-behaved in [5]. Its semantics corresponds to that of full RDFS. Triples containing schema information are transformed as: 1. If C 1,rdfs : subclassof,c 2 G, then σ( C 1,rdfs : subclassof,c 2 ) = o G. (o,c 1 ) I (o,c 2 ) I. 2. If R 1,rdfs : subpropertyof,r 2 G,thenσ( R 1,rdfs : subpropertyof,r 2 ) = o 1,o 2 G. (o 1, R 1. ) I (o 1, R 2. ) I. 3. If R,rdfs : domain,c G, then σ( R,rdfs : domain,c ) = o 1 G. (o 1, R. ) I (o 1,C) I. 4. If R,rdfs : range,c G, then σ( R,rdfs : range,c ) = o 2 G. (o 1, R. ) I (o 2,C) I. The users above are able to build a formal context from an RDF graph or a set of SPARQL query answers. Then, there are several algorithms that can compute the concept lattice associated with a formal context and that can be used in our framework. Example: consider a SPARQL query that selects film titles (as objects of the formal context) and genres (as attributes) from DBpedia to populate a formal context. Consequently, this formal context is shown in Figure 2. A possible concept lattice obtained from the formal context associated with the query answers is depicted in Figure 3. Now we have a classification of the results of the query w.r.t. a given topic or constraint. Additionally, this is exactly the same thing as if we were querying the Web with Google and here we have a classification of the answers w.r.t. a user constraints. 3 Conclusion SPARQL query answers are provided in different formats (RDF/XML, CSV, JSON, TTL, and others), which do not reveal hidden semantics in the answers. o 2

Fig.2: A part of the formal context associated with the query. Fig. 3: Concept lattice of DBpedia movies and genres. Concept lattices are useful in this regard. In this work, we used concept lattices to hierarchically organize and analyse the content of query answers. This is an ongoing work and we are currently implementing the procedure. We should investigate how well it scales, given the size of SPARQL query answers over linked data. Overall, this work shows some of the benefits of FCA that can be provided to the semantic web. References 1. Baader, F., Ganter, B., Sertkaya, B., Sattler, U.: Completing description logic knowledge bases using formal concept analysis. In: Proc. of IJCAI. vol. 7, pp. 230 235 (2007) 2. d Aquin, M., Motta, E.: Extracting relevant questions to an RDF dataset using formal concept analysis. In: Proceedings of the sixth international conference on Knowledge capture. pp. 121 128. ACM (2011) 3. Ganter, B., Wille, R.: Formal Concept Analysis. Springer, Berlin (1999) 4. Kirchberg, M., Leonardi, E., Tan, Y.S., Link, S., Ko, R.K., Lee, B.S.: Formal concept discovery in semantic web data. In: ICFCA. pp. 164 179. Springer-Verlag (2012) 5. Muñoz, S., Pérez, J., Gutierrez, C.: Minimal deductive systems for RDF. In: The Semantic Web: Research and Applications, pp. 53 67. Springer (2007) 6. Sertkaya, B.: A survey on how description logic ontologies benefit from FCA. In: CLA. vol. 672, pp. 2 21 (2010)