An Entity Name Systems (ENS) for the [Semantic] Web

Similar documents
OKKAM-based instance level integration

An Entity Name System (ENS) for the Semantic Web

Multi-agent and Semantic Web Systems: Linked Open Data

Preserving Linked Data on the Semantic Web by the application of Link Integrity techniques from Hypermedia

Porting Social Media Contributions with SIOC

Toward a Knowledge-Based Solution for Information Discovery in Complex and Dynamic Domains

A General Approach to Query the Web of Data

DYNAMIC FOAF MANAGEMENT METHOD FOR SOCIAL NETWORKS IN THE SOCIAL WEB ENVIRONMENT

1. CONCEPTUAL MODEL 1.1 DOMAIN MODEL 1.2 UML DIAGRAM

Dynamic Semantics for the Internet of Things. Payam Barnaghi Institute for Communication Systems (ICS) University of Surrey Guildford, United Kingdom

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Source Code Search System Using The Knowledge Framework of The Semantic Web. The Graduate School of Science and Technology Kobe University

What you have learned so far. Interoperability. Ontology heterogeneity. Being serious about the semantic web

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE

August 2012 Daejeon, South Korea

Resilient Linked Data. Dave Reynolds, Epimorphics

Semantic Web Mining and its application in Human Resource Management

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

Semantic Annotation and Linking of Medical Educational Resources

A Digital Library Framework for Reusing e-learning Video Documents

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication

Semantic Web Systems Linked Open Data Jacques Fleuriot School of Informatics

Programming the Semantic Web

Pedigree Management and Assessment Framework (PMAF) Demonstration

Metadata Standards & Applications. 7. Approaches to Models of Metadata Creation, Storage, and Retrieval

Mapping between Digital Identity Ontologies through SISM

UNIT-V WEB MINING. 3/18/2012 Prof. Asha Ambhaikar, RCET Bhilai.

Semantic Web Technology Evaluation Ontology (SWETO): A Test Bed for Evaluating Tools and Benchmarking Applications

Ontology Servers and Metadata Vocabulary Repositories

JENA: A Java API for Ontology Management

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009

For each use case, the business need, usage scenario and derived requirements are stated. 1.1 USE CASE 1: EXPLORE AND SEARCH FOR SEMANTIC ASSESTS

Maskkot An Entity-centric Annotation Platform

Semantic Web Technology Evaluation Ontology (SWETO): A test bed for evaluating tools and benchmarking semantic applications

The NEPOMUK project. Dr. Ansgar Bernardi DFKI GmbH Kaiserslautern, Germany

Social Networks and Data Portability using Semantic Web technologies

Exploring and Using the Semantic Web

Semantic Annotation, Search and Analysis

Jumpstarting the Semantic Web

20489: Developing Microsoft SharePoint Server 2013 Advanced Solutions

Towards semantic asset management and Core Vocabularies for e-government. Makx Dekkers Stijn Goedertier

Tags, Categories and Keywords

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

The International Journal of Digital Curation Volume 8, Issue

Opus: University of Bath Online Publication Store

3 Publishing Technique

An Approach to Evaluate and Enhance the Retrieval of Web Services Based on Semantic Information

CONTEXT-SENSITIVE VISUAL RESOURCE BROWSER

MASKKOT A Tool for Annotating Entities through OKKAM Service

Semantic Web. Tahani Aljehani

URI Disambiguation in the Context of Linked Data

Everyday Activity. Course Content. Objectives of Lecture 13 Search Engine

Utilizing, creating and publishing Linked Open Data with the Thesaurus Management Tool PoolParty

Sig.ma: live views on the Web of Data

State-of-the-Art for Entity-Centric Repository and Authoring Environment for Multimedia

Wide Area Query Systems The Hydra of Databases

Web Ontology for Software Package Management

Weaving SIOC into the Web of Linked Data

INTELLIGENT SYSTEMS OVER THE INTERNET

Extended Identity for Social Networks

Interlinking Multimedia Principles and Requirements

Hyperdata: Update APIs for RDF Data Sources (Vision Paper)

DOOR Digital Open Object Repository User Manual v1.0 July 23, 2006

Springer Science+ Business, LLC

Natural Language Processing with PoolParty

How We Build RIC-O: Principles

Web Architecture Part 3

Representing Linked Data as Virtual File Systems

Medici for Digital Cultural Heritage Libraries. George Tsouloupas, PhD The LinkSCEEM Project

The Data Web and Linked Data.

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

Deep Web Crawling and Mining for Building Advanced Search Application

Information Retrieval (IR) through Semantic Web (SW): An Overview

3. Technical and administrative metadata standards. Metadata Standards and Applications

Visual Paradigm Doc. Composer Writer s Guide

From Open Data to Data- Intensive Science through CERIF

The European Commission s science and knowledge service. Joint Research Centre

Defining and Executing Assessment Tests on Linked Data for Statistical Analysis

The Semantic Institution: An Agenda for Publishing Authoritative Scholarly Facts. Leslie Carr

Semantic Interoperability. Being serious about the Semantic Web

The Butterfly Effect. A proposal for distribution and management for butterfly data programs. Dave Waetjen SESYNC Butterfly Workshop May 10, 2012

Just in time and relevant knowledge thanks to recommender systems and Semantic Web.

Accessing information about Linked Data vocabularies with vocab.cc

An Approach to Enhancing Workflows Provenance by Leveraging Web 2.0 to Increase Information Sharing, Collaboration and Reuse

Data is the new Oil (Ann Winblad)

Semantic Web Technologies

Semantic Adaptation Approach for Adaptive Web-Based Systems

ISO CTS2 and Value Set Binding. Harold Solbrig Mayo Clinic

Protégé-2000: A Flexible and Extensible Ontology-Editing Environment

Semantic agents for location-aware service provisioning in mobile networks

Linked Data and RDF. COMP60421 Sean Bechhofer

Linking Distributed Data across the Web

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Reducing Consumer Uncertainty

Chapter 4 Research Prototype

W3C Provenance Incubator Group: An Overview. Thanks to Contributing Group Members

Using Linked Data to Reduce Learning Latency for e-book Readers

An ontology of resources for Linked Data

PSI Registries. Sam Oh, Sungkyunkwan University Montreal, Canada. WG3, Montreal, Sam Oh

Towards the Semantic Desktop. Dr. Øyvind Hanssen University Library of Tromsø

Transcription:

An Entity Name Systems (ENS) for the [Semantic] Web Paolo Bouquet University of Trento (Italy) Coordinator of the FP7 OKKAM IP LDOW @ WWW2008 Beijing, 22 April 2008

An ordinary day on the [Semantic] Web News about the 2008 Olympics Revyu.com reviews on Beijing Metadata about WWW2008 Pictures and tags about Beijing Updated social network after WWW2008 Videos and tags from WWW2008

Lots of new linked data about Beijing? Not quite [see the idea of information islands from Falcons] The reference to Beijing is somehow hidden behind: Different names (e.g Beijing vs. Peking) in text documents Different URIs are used in different RDF files Different metadata schemas / vocabularies Different keys in databases

So what can t we (easily) do? Straight integration of RDF content via simple graph merging Reasoning requires mapping beforehand Linking multimedia (and Web2.0) content to RDF content Getting the best from business intelligence / Web mining apps Multimedia search

What can we do about it? OKKAM aims at developing an Entity Name System for the [Semantic] Web which can make sure that the same entity (individuals, like person, location, organization, event, product, ) is referred to through the same URI across: any type of content / format any application any domain all over the Web and beyond

How does it work? [How we are trying to do it in the EU-funded project OKKAM] http://fp7.okkam.org Global distributed storage of URIs of billions of URIs Supports entity matching for finding an entity s URI (based on simple profiles + links to external resources) Provides simple APIs for any human/application which needs to find the URI of an entity Makes available services for the automatic annotation of different content types with global URIs Offers secure and trustworthy methods for access control

Global and decentralized Replicated public nodes for the Web Local corporate nodes for non public data (+ cache)

Entity representation schema (ERS): the key concepts The ENS repository stores existing URIs + a representation of the corresponding real world entity This representation is not meant as a source of info about the entity, it is only used to maximize the chance of getting matching right (like a phone directory) In OKKAM, an entity representation has 4 main elements: A OKKAM URI for the entity An entity profile A collection of metadata A list of alternative URIs (including the preferred URI, if any)

ERS: Entity profiles Three main elements: 1. A semantic type (but we support only a small number 8 to 10 very high level categories, the rest must be found out there on the Web ) 2. A collection of name/value pairs (but very few, those which are most likely or most used to make sure that we got the right URI) [We don t assume any predefined vocabulary for attributes (though we may suggest a few ones for improving matching)] 3. A collection of typed links to external resources (RDF stores, HTML pages, PDF files, multimedia resources, ) which refer to that entity

ERS: Entity metadata Four main elements: 1. General metadata (e.g. creation time) 2. Statistics metadata (e.g. last modified, # of time retrieved, # of time selected, time last selected) 3. Provenance metadata (e.g. source, agent) 4. Access control metadata (e.g. owner, authority, subordination) [Metadata are available also for every single name/value pair of an entity profile]

ERS: alternative URIs A collection of alternative URIs (aliases, synonyms) for the same real world entity One of them can be marked as preferred and can be always returned to users/application instead of the internal ENS URI Dereferincing alternative URIs may provide background knowledge for advanced entity matching methods

Entity matching Obviously related to well-known problems: record linkage, deduplication, entity resolution, disambiguation, The ENS basic use case is as simple as follows: An application needs to find the URI for an entity From local information a look-up query is composed (mainly simple keywords or name/value pairs) The ENS tries to find the entities in the ENS repository which better matches the query A ranked list of results is returned (ranking is based both on similarity measures and statistical information on social use of the ENS)

ENS-enabled tools Content creation tools which are extended to interact with the ENS Our first prototypes: 1. Foaf-O-matic: create your FOAF profile with pre-existing URIs 2. Okkam4P: a Protégé 3.3.x plugin for creating individuals/instances with preexisting URIs 3. O4MSW: a MS Word plugin for annotating MS Word files with pre-existing URIs

ENS and Linked Data Complementary, in principles not competitors (though the ENS is mainly about reusing URIs in content creation, Linked Data is about linking data about a resource) The Linked Data content is a fantastic source of entities and name/value pairs for building entity profiles in the ENS Lots of methods and tools used for URI disambiguation can be shared and reused The ENS can be used by Linked Data tools to look-up for URIs in a single simple service (through APIs) The extension to non-rdf content may allow linking RDF data with unstructured data on the Web

However The ENS is based on the idea that, in general, having multiple URIs for the same thing is a bug, not a feature (is good for browsing, not for information integration and reasoning) Ex post vs. Ex ante approach: using billions of distributed owl:sameas statements will become impractical. Hopefully, the use of a single URI for the same entity may simplify the global graph of the Semantic Web The practice of using owl:sameas for interlinking heterogeneous URIs is semantically disputable URIs should not encode an identity, so it should make no difference which URI is used for an entity (provided it is unique and standard) The Linked Data methods do not support well the creation of new URIs

An extraordinary day on the [Semantic] Web http://www.okkam.org/entity/ ok78dfda18-2c96-45a5-a7e5-9093ed919424 http://www.okkam.org/entity/ ok78dfda18-2c96-45a5-a7e5-9093ed919424 http://www.okkam.org/entity/ ok78dfda18-2c96-45a5-a7e5-9093ed919424 http://www.okkam.org/entity/ ok78dfda18-2c96-45a5-a7e5-9093ed919424 http://www.okkam.org/entity/ ok78dfda18-2c96-45a5-a7e5-9093ed919424 http://www.okkam.org/entity/ ok78dfda18-2c96-45a5-a7e5-9093ed919424