Keyword Search over RDF Graphs. Elisa Menendez

Similar documents
Comparing Keyword-based Query Processing over RDF Datasets and Relational Databases

Semantic Web Information Management

Ranking RDF Instances in Degree-decoupled RDF Graphs

Proposal for Implementing Linked Open Data on Libraries Catalogue

R2RML by Assertion: A Semi-Automatic Tool for Generating Customised R2RML Mappings

SPARQL Protocol And RDF Query Language

Graph Databases. Guilherme Fetter Damasio. University of Ontario Institute of Technology and IBM Centre for Advanced Studies IBM Corporation

Event Stores (I) [Source: DB-Engines.com, accessed on August 28, 2016]

infoh509 xml & web technologies lecture 9: sparql Stijn Vansummeren February 14, 2017

Linked Data. Department of Software Enginnering Faculty of Information Technology Czech Technical University in Prague Ivo Lašek, 2011

Semantic Cloud Generation based on Linked Data for Efficient Semantic Annotation

A Semantic Framework for the Retrieval and Execution of Open Source Code

Semantic Web Fundamentals

Using Linked Data Concepts to Blend and Analyze Geospatial and Statistical Data Creating a Semantic Data Platform

Interacting with Linked Data Part I: General Introduction

ITARC Stockholm Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS)

ITARC Stockholm Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS)

a paradigm for the Introduction to Semantic Web Semantic Web Angelica Lo Duca IIT-CNR Linked Open Data:

THE GETTY VOCABULARIES TECHNICAL UPDATE

Semantic Web Fundamentals

Efficient, Scalable, and Provenance-Aware Management of Linked Data

RDF* and SPARQL* An Alternative Approach to Statement-Level Metadata in RDF

An Entity Relatedness Test Dataset

DBpedia-An Advancement Towards Content Extraction From Wikipedia

5/13/2009. Introduction. Introduction. Introduction. Introduction. Introduction

Representing and Querying Linked Geospatial Data

Diversification of Query Interpretations and Search Results

SPARK: Top-k Keyword Query in Relational Database

Semantic Web. Tahani Aljehani

IJCSC Volume 5 Number 1 March-Sep 2014 pp ISSN

Extending Keyword Search to Metadata in Relational Database

Publishing the Norwegian Petroleum Directorate s FactPages as Semantic Web Data

Serving Ireland s Geospatial as Linked Data on the Web

Web Ontology Editor: architecture and applications

Semantic Web and Python Concepts to Application development

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Semantic Web and Linked Data

Vijetha Shivarudraiah Sai Phalgun Tatavarthy. CSc 8711 Georgia State University

BUILDING THE SEMANTIC WEB

SEMANTIC WEB DATA MANAGEMENT. from Web 1.0 to Web 3.0

KNOWLEDGE GRAPHS. Lecture 3: Modelling in RDF/Introduction to SPARQL. TU Dresden, 30th Oct Markus Krötzsch Knowledge-Based Systems

Publishing Statistical Data and Geospatial Data as Linked Data Creating a Semantic Data Platform

arxiv: v5 [cs.db] 29 Mar 2016

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Representing Linked Data as Virtual File Systems

The Semantic Web Revisited. Nigel Shadbolt Tim Berners-Lee Wendy Hall

Weaving the Pedantic Web - Information Quality on the Web of Data

SPARQL Extensions with Preferences: a Survey Olivier Pivert, Olfa Slama, Virginie Thion

Exploring and Using the Semantic Web

Grid Resources Search Engine based on Ontology

On Measuring the Lattice of Commonalities Among Several Linked Datasets

Semantic Technologies and CDISC Standards. Frederik Malfait, Information Architect, IMOS Consulting Scott Bahlavooni, Independent

DBpedia Extracting structured data from Wikipedia

CC LA WEB DE DATOS PRIMAVERA Lecture 10: RDB2RDF. Aidan Hogan

Chapter 27 Introduction to Information Retrieval and Web Search

Database of historical places, persons, and lemmas

Digital Public Space: Publishing Datasets

Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph ORACLE WHITE PAPER NOVEMBER 2016

Towards an Integrated Information Framework for Service Technicians

SPARQL Protocol And RDF Query Language

A Formal Definition of RESTful Semantic Web Services. Antonio Garrote Hernández María N. Moreno García

Keyword query interpretation over structured data

Orchestrating Music Queries via the Semantic Web

Comparative Study of RDB to RDF Mapping using D2RQ and R2RML Mapping Languages

Learning Path Queries on Graph Databases

Knowledge-Driven Video Information Retrieval with LOD

DBpedia Data Processing and Integration Tasks in UnifiedViews

Semantic Web and Natural Language Processing

OWLIM Reasoning over FactForge

SERIMI Results for OAEI 2011

An overview of RDB2RDF techniques and tools

GeoSPARQL Support and Other Cool Features in Oracle 12c Spatial and Graph Linked Data Seminar Culture, Base Registries & Visualisations

Semantic Integration with Apache Jena and Apache Stanbol

Flexible querying for SPARQL

Introduction to Linked Data

Welcome to INFO216: Advanced Modelling

Classical DB Questions on New Kinds of Data

LOD in Digital Libraries - Current Issues

Querying the Semantic Web

A Relaxed Approach to RDF Querying

The Web, Semantics and Data Mining

geospatial querying ApacheCon Big Data Europe 2015 Budapest, 28/9/2015

Electronic Health Records with Cleveland Clinic and Oracle Semantic Technologies

Analyses of RDF Triples in Sample Datasets

KNOWLEDGE GRAPHS. Lecture 4: Introduction to SPARQL. TU Dresden, 6th Nov Markus Krötzsch Knowledge-Based Systems

Search Engine Architecture II

Re-using Cool URIs: Entity Reconciliation Against LOD Hubs

E6885 Network Science Lecture 10: Graph Database (II)

SPARQL-Based Applications for RDF-Encoded Sensor Data

FedX: Optimization Techniques for Federated Query Processing on Linked Data. ISWC 2011 October 26 th. Presented by: Ziv Dayan

The role of vocabularies for estimating carbon footprint for food recipies using Linked Open Data

Why You Should Care About Linked Data and Open Data Linked Open Data (LOD) in Libraries

Semantic Annotation and Linking of Medical Educational Resources

OKKAM-based instance level integration

Interleaving Clustering of Classes and Properties for Disambiguating Linked Data

I'm going to introduce you to the basic concepts of linked data and provide some context on how linked data might be used to enhance access to

SPARQL (RDF Query Language)

Contribution of OCLC, LC and IFLA

Graph Data Management & The Semantic Web

Ranked Keyword Query on Semantic Web Data

Transcription:

Elisa Menendez emenendez@inf.puc-rio.br

Summary Motivation Keyword Search over RDF Process Challenges Example QUIOW System Next Steps

Motivation

Motivation Keyword search is an easy way to retrieve information Whether in Web pages or in databases

Motivation Keyword search is useful for databases It eliminates the overhead of writing structured queries The user does not need to know the database schema SELECT movie.title FROM movie, director WHERE movie.director = director.id AND director.name = 'Tim Burton'

Motivation From the Semantic Web and Linked Data RDF graphs emerged as a new kind of databases That models data as triples nflx:beetlejuice mo:directedby nflx:tim_burton subject predicate object PREFIX mo: <http://www.movieontology.org/> PREFIX nflx: <http://netflix.com/>

Motivation A collection of RDF triples forms an RDF graph Beetlejuice mo:title mo:releaseyear nflx:beetlejuice rdf mo:movie 1988 mo:synopsis mo:directedby A couple of ghosts contract nflx:tim_burton rdf mo:director

Motivation RDF graphs are a type of databases LOD Cloud 2017 1139 datasets

Motivation RDF graphs are interesting sources of knowledge Give more flexibility to describe resources Use W3C standardized graph formats Use popular ontologies Keep increasing

Motivation RDF graphs are usually queried with SPARQL queries SELECT?title WHERE{?movie rdf :Movie.?movie mo:directedby?director.?director "Tim Burton".?movie mo:title?title } => Keyword search is also useful for RDF Graphs

Motivation Advantages of RDF Graphs for Keyword Search Well-known graph theories Data and metadata all together And easy to access with SPARQL Integration of multiple sources And easy to access with SPARQL

Keyword Search Process Keywords Documents query formulation indexing Query Indexed Documents feedback Matching + Ranking Retrieved Documents

Keyword Search over RDF Process Keywords Graph query formulation indexing Query Indexed Triples feedback Matching + Ranking + Connecting Retrieved Sub Graphs

Main Challenges Find pieces of information in the graph Indexing and Matching problems Connect the pieces of information NP-Complete problem Ranking sub graphs

Problem definition Given an RDF graph G and a set of keywords K An answer for K is A minimal sub graph of G that covers all keywords in K Goal: Find all possible answers for K Rank them by relevance

Example: Graph G "Denzel Washington" R 3 "South Dakota" R 1 :Actor R 4 :Movie R 2 "Dakota Fanning" R 5 "Washington"

Index literals values "Denzel Washington" R 3 "South Dakota" R 1 :Actor R 4 :Movie R 2 "Dakota Fanning" R 5 "Washington"

Associating with the corresponding resources "Denzel Washington" R 3 "South Dakota" R 1 :Actor r 4 :Movie R 2 "Dakota Fanning" R 5 "Washington"

Simple indexing example Words Dakota Denzel Fanning South Washington Resource (Property) R 2 (name), R 3 (place) R 1 (name) R 2 (name) R 3 (place) R 1 (name), R 5 (place)

Example: K = { Dakota, Washington } "Denzel Washington" R 3 "South Dakota" R 1 :Actor R 4 :Movie R 2 "Dakota Fanning" R 5 "Washington"

Dakota matches with R 2 (name) and R 3 (place) "Denzel Washington" R 3 "South Dakota" R 1 :Actor r 4 :Movie R 2 "Dakota Fanning" R 5 "Washington"

Washington matches with R 1 (name) and R 5 (place) "Denzel Washington" R 3 "South Dakota" R 1 :Actor r 4 :Movie R 2 "Dakota Fanning" R 5 "Washington"

Combinations = { (R 2, R 5 ), (R 1, R 3 ), (R 3, R 5 ), (R 1, R 2 ) } "Denzel Washington" R 3 "South Dakota" R 1 :Actor r 4 :Movie R 2 "Dakota Fanning" R 5 "Washington"

Combinations = { (R 2, R 5 ), (R 1, R 3 ), (R 3, R 5 ), (R 1, R 2 ) } "Denzel Washington" R 3 "South Dakota" R 1 :Actor r 4 :Movie R 2 "Dakota Fanning" R 5 "Washington"

A movie in Washington with Dakota Fanning "Denzel Washington" R 3 "South Dakota" R 1 :Actor r 4 :Movie R 2 "Dakota Fanning" R 5 "Washington"

Combinations = { (R 2, R 5 ), (R 1, R 3 ), (R 3, R 5 ), (R 1, R 2 ) } "Denzel Washington" R 3 "South Dakota" R 1 :Actor r 4 :Movie R 2 "Dakota Fanning" R 5 "Washington"

A movie in South Dakota with Denzel Washington "Denzel Washington" R 3 "South Dakota" R 1 :Actor r 4 :Movie R 2 "Dakota Fanning" R 5 "Washington"

Combinations = { (R 2, R 5 ), (R 1, R 3 ), (R 3, R 5 ), (R 1, R 2 ) } "Denzel Washington" R 3 "South Dakota" R 1 :Actor r 4 :Movie R 2 "Dakota Fanning" R 5 "Washington"

A movie in South Dakota and a movie in Washington "Denzel Washington" R 3 "South Dakota" R 1 :Actor r 4 :Movie R 2 "Dakota Fanning" R 5 "Washington"

Combinations = { (R 2, R 5 ), (R 1, R 3 ), (R 3, R 5 ), (R 1, R 2 ) } "Denzel Washington" R 3 "South Dakota" R 1 :Actor r 4 :Movie R 2 "Dakota Fanning" R 5 "Washington"

Actor Denzel Washington and actress Dakota Fanning "Denzel Washington" R 3 "South Dakota" R 1 :Actor r 4 :Movie R 2 "Dakota Fanning" R 5 "Washington"

A movie with Denzel Washington and Dakota Fanning "Denzel Washington" R 3 "South Dakota" R 1 :Actor R 4 :Movie R 2 "Dakota Fanning" R 5 "Washington"

Problem definition Given an RDF graph G and a set of keywords K An answer for K is A minimal sub graph of G that covers all keywords in K Goal: Find all possible answers for K <= Not reasonable! Rank them by relevance

QUIOW A Keyword Search over RDF System

QUIOW

R2RML Oracle 12C Auxiliary tables RDB RDF Join Class Prop Value QUIOW SPARQL query cores graph Steiner Tree cores Matching Executor Greedy cores Parser Presentation results keywords

QUIOW QUIOW approaches Find pieces of information in the graph Auxiliary tables for data and metadata Indexing and Matching with Oracle Text Scoring cores based on The keywords they match The type of resource: Class > Property > Value Some basic statics

QUIOW QUIOW approaches Find pieces of information in the graph Greedy algorithm to select minimal cores Connect the pieces of information Steiner Tree approximation for the nucleuses Synthesize a SPARQL query to represent the Steiner Tree Rank the results based on the resources score

QUIOW García, G. M., Izquierdo, Y. T., Menendez, E. S., Dartayre, F., & Casanova, M. A. RDF Keyword-based Query Technology Meets a Real-World Dataset. In 20th International Conference on Extending Database Technology (EDBT), 2017. Evaluation Petrobras Petroleum Dataset Mondial - 64% of the 50 queries IMDb - 72% of the 50 queries

Next Steps

Next Steps QUIOW problems The score of cores are mostly based on keyword terms It is basically a Boolean Model Not suitable for ambiguous data as in IMDb and Mondial How can we improve the quality of QUIOW results? More sophisticated Information Retrieval models

Next Steps Which IR models we can adapt to RDF graphs? Vector Space Model? Language Model? Page Rank? User profiling? Which of them improves the quality of QUIOW results?

Next Steps Which of them improves the quality of QUIOW results? IMDb and Mondial Benchmarks DBpedia Adapt other benchmarks to RDF Social Book Search Last.fm

Thank you! Elisa Menendez emenendez@inf.puc-rio.br