Scalewelis: a Scalable Query-based Faceted Search System on Top of SPARQL Endpoints

Similar documents
Sewelis: Exploring and Editing an RDF Base in an Expressive and Interactive Way

QAKiS: an Open Domain QA System based on Relational Patterns

Assisted Policy Management for SPARQL Endpoints Access Control

An FCA Framework for Knowledge Discovery in SPARQL Query Answers

Boosting QAKiS with multimedia answer visualization

Linked data from your pocket: The Android RDFContentProvider

Fault-Tolerant Storage Servers for the Databases of Redundant Web Servers in a Computing Grid

Tacked Link List - An Improved Linked List for Advance Resource Reservation

YAM++ : A multi-strategy based approach for Ontology matching task

LaHC at CLEF 2015 SBS Lab

How to simulate a volume-controlled flooding with mathematical morphology operators?

BoxPlot++ Zeina Azmeh, Fady Hamoui, Marianne Huchard. To cite this version: HAL Id: lirmm

A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme

Multimedia CTI Services for Telecommunication Systems

Syrtis: New Perspectives for Semantic Web Adoption

Change Detection System for the Maintenance of Automated Testing

Service Reconfiguration in the DANAH Assistive System

UsiXML Extension for Awareness Support

Teaching Encapsulation and Modularity in Object-Oriented Languages with Access Graphs

Relabeling nodes according to the structure of the graph

Study on Feebly Open Set with Respect to an Ideal Topological Spaces

The Connectivity Order of Links

An Efficient Numerical Inverse Scattering Algorithm for Generalized Zakharov-Shabat Equations with Two Potential Functions

Every 3-connected, essentially 11-connected line graph is hamiltonian

Comparison of spatial indexes

Comparator: A Tool for Quantifying Behavioural Compatibility

KeyGlasses : Semi-transparent keys to optimize text input on virtual keyboard

HySCaS: Hybrid Stereoscopic Calibration Software

Taking Benefit from the User Density in Large Cities for Delivering SMS

Deformetrica: a software for statistical analysis of anatomical shapes

Blind Browsing on Hand-Held Devices: Touching the Web... to Understand it Better

A Voronoi-Based Hybrid Meshing Method

Representation of Finite Games as Network Congestion Games

Quality of Service Enhancement by Using an Integer Bloom Filter Based Data Deduplication Mechanism in the Cloud Storage Environment

RDF/SPARQL Design Pattern for Contextual Metadata

Setup of epiphytic assistance systems with SEPIA

FIT IoT-LAB: The Largest IoT Open Experimental Testbed

Reverse-engineering of UML 2.0 Sequence Diagrams from Execution Traces

Very Tight Coupling between LTE and WiFi: a Practical Analysis

Malware models for network and service management

Application of RMAN Backup Technology in the Agricultural Products Wholesale Market System

Moveability and Collision Analysis for Fully-Parallel Manipulators

A Practical Evaluation Method of Network Traffic Load for Capacity Planning

Using a Medical Thesaurus to Predict Query Difficulty

Real-time FEM based control of soft surgical robots

Open Digital Forms. Hiep Le, Thomas Rebele, Fabian Suchanek. HAL Id: hal

Mokka, main guidelines and future

Graphe-Based Rules For XML Data Conversion to OWL Ontology

Linux: Understanding Process-Level Power Consumption

ASAP.V2 and ASAP.V3: Sequential optimization of an Algorithm Selector and a Scheduler

The New Territory of Lightweight Security in a Cloud Computing Environment

XBenchMatch: a Benchmark for XML Schema Matching Tools

Multi-atlas labeling with population-specific template and non-local patch-based label fusion

Improving Collaborations in Neuroscientist Community

Integration of an on-line handwriting recognition system in a smart phone device

GDS Resource Record: Generalization of the Delegation Signer Model

QuickRanking: Fast Algorithm For Sorting And Ranking Data

Natural Language Based User Interface for On-Demand Service Composition

Computing and maximizing the exact reliability of wireless backhaul networks

Technical Overview of F-Interop

MUTE: A Peer-to-Peer Web-based Real-time Collaborative Editor

Hardware Acceleration for Measurements in 100 Gb/s Networks

Kernel perfect and critical kernel imperfect digraphs structure

Traffic Grooming in Bidirectional WDM Ring Networks

Privacy-preserving carpooling

Leveraging ambient applications interactions with their environment to improve services selection relevancy

Structuring the First Steps of Requirements Elicitation

DANCer: Dynamic Attributed Network with Community Structure Generator

Zigbee Wireless Sensor Network Nodes Deployment Strategy for Digital Agricultural Data Acquisition

An Experimental Assessment of the 2D Visibility Complex

Simulations of VANET Scenarios with OPNET and SUMO

Catalogue of architectural patterns characterized by constraint components, Version 1.0

Real-Time and Resilient Intrusion Detection: A Flow-Based Approach

NP versus PSPACE. Frank Vega. To cite this version: HAL Id: hal

Agile Browsing of a Document Collection with Dynamic Taxonomies

X-Kaapi C programming interface

YANG-Based Configuration Modeling - The SecSIP IPS Case Study

Fuzzy sensor for the perception of colour

Detecting Anomalies in Netflow Record Time Series by Using a Kernel Function

A 64-Kbytes ITTAGE indirect branch predictor

Mapping classifications and linking related classes through SciGator, a DDC-based browsing library interface

IntroClassJava: A Benchmark of 297 Small and Buggy Java Programs

Light field video dataset captured by a R8 Raytrix camera (with disparity maps)

A Short SPAN+AVISPA Tutorial

A New Method for Mining High Average Utility Itemsets

Periodic meshes for the CGAL library

BugMaps-Granger: A Tool for Causality Analysis between Source Code Metrics and Bugs

XML Document Classification using SVM

Full Text Search Engine as Scalable k-nearest Neighbor Recommendation System

The Proportional Colouring Problem: Optimizing Buffers in Radio Mesh Networks

Primitive roots of bi-periodic infinite pictures

SDLS: a Matlab package for solving conic least-squares problems

Comparison of radiosity and ray-tracing methods for coupled rooms

SIM-Mee - Mobilizing your social network

A Generic Architecture of CCSDS Low Density Parity Check Decoder for Near-Earth Applications

Formal modelling of ontologies within Event-B

Branch-and-price algorithms for the Bi-Objective Vehicle Routing Problem with Time Windows

Aligning Legivoc Legal Vocabularies by Crowdsourcing

Decentralised and Privacy-Aware Learning of Traversal Time Models

Stream Ciphers: A Practical Solution for Efficient Homomorphic-Ciphertext Compression

Transcription:

Scalewelis: a Scalable Query-based Faceted Search System on Top of SPARQL Endpoints Joris Guyonvarc H, Sébastien Ferré To cite this version: Joris Guyonvarc H, Sébastien Ferré. Scalewelis: a Scalable Query-based Faceted Search System on Top of SPARQL Endpoints. Elena Cabrio, Philipp Cimiano, Vanessa Lopez, Axel-Cyrille Ngonga Ngomo, Christina Unger, and Sebastian Walter. Work. Multilingual Question Answering over Linked Data (QALD-3), Sep 2013, Valencia, Spain. 2013, <http://www.clef2013.org/index.php?page=pages/proceedings.php>. <hal-00943528> HAL Id: hal-00943528 https://hal.inria.fr/hal-00943528 Submitted on 7 Feb 2014 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Scalewelis: a Scalable Query-based Faceted Search System on Top of SPARQL Endpoints Joris Guyonvarch and Sébastien Ferré IRISA, Université de Rennes 1 Campus de Beaulieu, 35042 Rennes cedex, France joris.guyonvarch@gmail.com, ferre@irisa.fr Abstract. This paper overviews the participation of Scalewelis in the QALD-3 open challenge. Scalewelis is a Faceted Search system. Faceted Search systems refine the result set at each navigation step. In Scalewelis, refinements are syntactic operations that modify the user query. Scalewelis uses the Semantic Web standards (URI, RDF, SPARQL) and connects to SPARQL endpoints. 1 Introduction We participated with Scalewelis in the question answering task of the QALD-3 open challenge for DBpedia. This task was about answering a hundred questions formulated in several languages. The answer could be given either as a SPARQL query or directly as a list of results. SPARQL is a very expressive language that is based on graph patterns and relational algebra but it requires an important backgroundforthosewhowanttouseit.wediscusstwoalternativestosparql that are Question Answering and Faceted Search. Question Answering (QA) gives answers to natural questions or keywords rather than from formal queries. For example, Querix [6] asks for users to write full English questions and, then, there is a dialog between Querix and users that clarifies ambiguities. However, question answering does not prevent dead ends, i.e. empty results. Faceted Search guides users with feedback. The result set is incrementally refined by the selection of facets. A facet selection always leads to non-empty results, so that there are no dead ends. For example, VisiNav [4] is a set-based Faceted Search system that applies operations to the result set. gfacet [5] is a graph-based Faceted Search system that applies operations to the result graph. Sewelis [2] is a query-based Faceted Search system that applies operations to the query. Scalewelis 1 is inspired from Sewelis. Sewelis manages its own data structures and does not scale to large datasets such as DBpedia. On the contrary, Scalewelis connects to SPARQL endpoints and uses partial result sets in order to scale to large datasets. 1 Scalewelis has been implemented as a Web application and is available at: http://lisfs2008.irisa.fr/scalewelis/

SELECT DISTINCT?class WHERE { [] a?class. } Fig. 1. Computation of classes at the initial step. Query 1: Computation of the partial results, limited to 1,000 entities SELECT DISTINCT?result WHERE { <Pattern> } LIMIT 1000 Query 2: Computation of class facets from the partial results SELECT DISTINCT?class WHERE {?result a?class } Query 3: Computation of property facets from the partial results SELECT DISTINCT?prop WHERE {?result?prop [] } Query 4: Computation of inverse property facets from the partial results SELECT DISTINCT?invProp WHERE { []?invprop?result } Fig. 2. Computation of partial results and facets. 2 Approach Query-based Faceted Search allows for more expressive search than traditional set-based Faceted Search. In Sewelis and Scalewelis, facets are query elements and selecting a facet adds it to the search query. Only valid facets are provided, so that navigation is safe: there are no dead ends. Facets are computed from the results of the previous query. Scalewelis uses only a subset of those results. In experiments on datasets of up to 15 millions triples, automatically generated with the Berlin SPARQL Benchmark [1], the partial computation of facets required far less time than the complete computation. Moreover, it nonetheless gave all the facets in most cases [3]. At the initial step, Scalewelis retrieves all classes in the dataset (Figure 1). Properties can be computed in a similar manner but are not in the current implementation for efficiency reasons. In the next steps, partial results and facets are computed with four successive SPARQL queries (Figure 2). Query 1 retrieves the first 1,000 results matching <Pattern>, which is the SPARQL graph pattern translated from the user query. Queries 1, 2, and 3 incorporate those results in

order to retrieve the facets for the current query, using the VALUES construct of SPARQL. Query 2 retrieves the types of the results, i.e. classes. Query 3 retrieves the properties linking from the results. Query 4 retrieves the property linking to the results, i.e. inverse properties. Finally, the selection of a facet by the user modifies the query so that it is more constrained but is ensured to return results. 3 Resources Regarding the QALD-3 open challenge, Scalewelis was connected to the standard DBpedia SPARQL endpoint because response times of the provided endpoint were too long for effective use. The process is intrinsically interactive and we acted as the user. We answered 70 questions out of 99, requiring a few minutes per question (human and machine time). Two kinds of questions were not addressed. Firstly, questions involving quantities (such as Which German cities have more than 250000 inhabitants?) because filters and aggregations are not yet implemented in Scalewelis. Secondly, questions for which facets were not found by the user. In general, it was unclear whether those facets were not listed (due to partial results) or whether the user was not knowledgeable enough to identify the right facet (for example Give me all B-sides of the Ramones.). One of the question in the challenge is Which films starring Clint Eastwood did he direct himself?. We explain how we answered this question with Scalewelis. At the initialization step, the user filtered the list of classes with the word film. The selection of the class a dbo:film led to the query What is a dbo:film?. At this step, there are property facets and the selection of has dbo:starring and has dbo:director led to the query What is a dbo:film and has dbo:starring a thing and has dbo:director a thing?. Finally, the selection of the result Clint Eastwood at the two undefined element represented by a thing led to the final user query: What is a dbo:film and has dbo:starring dbr:clint Eastwood and has dbo:director dbr:clint Eastwood?. 4 Results Scalewelis ranked third out of six candidates with 32 correct answers plus 1 partial answer. Scalewelis was theoretically able to correctly answer the 70 questions the user chose to answer, but he failed to answer 37 of them because incorrect facets were chosen. 1. Firstly, English is not the mother tongue of the user so that incorrect facets were used because of misunderstandings. 2. Secondly, the user had sometimes the choice between two facets differing only by their namespace, for example DBpedia ontology and infobox properties. 3. Thirdly, the standard DBpedia SPARQL endpoint to which Scalewelis was connected sometimes gave different results compared to the provided SPARQL endpoint.

5 Discussion Scalewelis can be improved in several ways. First, there are missing functionalities from SPARQL such as comparisons or aggregations. Second, the partial computation of results and facets entails a bias in the search, because the first results returned by SPARQL may not form a representative sample of all results. Third, Scalewelis could take advantage of the DBpedia ontology to further improve scalability, and to display facets as class/property hierarchies. References 1. Bizer, C., Schultz, A.: The Berlin SPARQL Benchmark. International Journal on Semantic Web and Information Systems (IJSWIS) 5(2), 1 24 (2009) 2. Ferré, S., Hermann, A.: Reconciling faceted search and query languages for the semantic web. IJMSO 7(1), 37 54 (2012) 3. Guyonvarch, J., Ferré, S., Ducassé, M.: Scalable query-based faceted search on top of SPARQL endpoints for guided and expressive semantic search (2013), research report, IRISA 4. Harth, A.: VisiNav: A system for visual search and navigation on web data. J. Web Sem. 8(4), 348 354 (2010) 5. Heim, P., Ertl, T., Ziegler, J.: Facet Graphs: Complex semantic querying made easy. In: et al, L.A. (ed.) ESWC (1). pp. 288 302. Lecture Notes in Computer Science, Springer (2010) 6. Kaufmann, E., Bernstein, A.: Evaluating the usability of natural language query languages and interfaces to semantic web knowledge bases. J. Web Sem. 8(4), 377 393 (2010)