Prof. Dr. Christian Bizer

Similar documents
Prof. Dr. Christian Bizer

global public Dataspace

WebDB 2010 June 6 th, 2010, Indianapolis, USA. Christian Bizer. Freie Universität Berlin. Christian Bizer: The Web of Linked Data (6/6/2010)

The Emerging Web of Linked Data

SRI International, Artificial Intelligence Center Menlo Park, USA, 24 July 2009

Architecture and Applications

Linked Data Evolving the Web into a Global Data Space

The Web of Linked Data

Knowledge Representation in Social Context. CS227 Spring 2011

Prof. Dr. Chris Bizer

Fusing the Web of Data

4 th Linked Data on the Web Workshop (LDOW 2011)

The R2R Framework: Christian Bizer, Andreas Schultz. 1 st International Workshop on Consuming Linked Data (COLD2010) Freie Universität Berlin

The Data Web and Linked Data.

The Emerging Web of Linked Data

Introducing Linked Data

DBpedia Extracting structured data from Wikipedia

Semantic Web and Linked Data

How to Publish Linked Data on the Web - Proposal for a Half-day Tutorial at ISWC2008

16th International World Wide Web Conference Developers Track, May 11, DBpedia. Querying Wikipedia like a Database

Exploiting Semantics Where We Find Them

The Linking Open Data Project Bootstrapping the Web of Data

W3C Workshop on RDF Access to Relational Databases October, 2007 Boston, MA, USA D2RQ. Lessons Learned

Linked Data. Department of Software Enginnering Faculty of Information Technology Czech Technical University in Prague Ivo Lašek, 2011

Turning the Web into a Database

Linking Distributed Data across the Web

a paradigm for the Semantic Web Linked Data Angelica Lo Duca IIT-CNR Linked Open Data:

Linked Open Europeana: Semantic Leveraging of European Cultural Heritage

Porting Social Media Contributions with SIOC

Exploring and Using the Semantic Web

Semantic e-science. Bibliographic Cloud

Ontology Matching and the Semantic Web

Linking Spatial Data from the Web

Creating Large-scale Training and Test Corpora for Extracting Structured Data from the Web

Linked Open Data: a short introduction

A General Approach to Query the Web of Data

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

Linked Data Practices for the Geospatial Community

Semantic Web Fundamentals

Lecture 7: Linked Data

Linked data and its role in the semantic web. Dave Reynolds, Epimorphics

August 2012 Daejeon, South Korea

Data Integration and Structured Search

SPARQL เอกสารหล ก ใน มคอ.3

COMP6217 Social Networking Technologies Web evolution and the Social Semantic Web. Dr Thanassis Tiropanis

Sindice Widgets: Lightweight embedding of Semantic Web capabilities into existing user applications.

Re-using Cool URIs: Entity Reconciliation Against LOD Hubs

Introduction. October 5, Petr Křemen Introduction October 5, / 31

ITARC Stockholm Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS)

ITARC Stockholm Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS)

The role of vocabularies for estimating carbon footprint for food recipies using Linked Open Data

Web of Hypertext (RDFa, Microformats) and Web of Data

Linked Open Data. University of Rome "Tor Vergata" ART: Artificial Intelligence Tor Vergata. "The Semantic Web done right" Tim Berners-Lee

Silk Server Adding missing Links while consuming Linked Data

From Online Community Data to RDF

Usage of Linked Data Introduction and Application Scenarios. Presented by: Barry Norton

Linked Data and RDF. COMP60421 Sean Bechhofer

LINKING WEB DATA WEB:

a paradigm for the Introduction to Semantic Web Semantic Web Angelica Lo Duca IIT-CNR Linked Open Data:

Weaving SIOC into the Web of Linked Data

Semantiska webben DFS/Gbg

The Politics of Vocabulary Control

Design & Manage Persistent URIs

Proposal for Implementing Linked Open Data on Libraries Catalogue

Running head: LINKED OPEN DATA IMPLEMENTATION REPORT 1

Mapping between Digital Identity Ontologies through SISM

Accessing information about Linked Data vocabularies with vocab.cc

Adoption of the Linked Data Best Practices in Different Topical Domains

Quality-Driven Information Filtering in the Context of Web-Based Information Systems

case study The Asset Description Metadata Schema (ADMS) A common vocabulary to publish semantic interoperability assets on the Web July 2011

R2RML by Assertion: A Semi-Automatic Tool for Generating Customised R2RML Mappings

WebGUI & the Semantic Web. William McKee WebGUI Users Conference 2009

Semantic Web Fundamentals

Financial Dataspaces: Challenges, Approaches and Trends

How can Open Data and Linked Data help your organization and its Big Data? Mark Harrison

Linked Data Overview and Usage in Social Networks

D3.2 Evaluation Report of P&C Library

COMP9321 Web Application Engineering

Advances in Data Management - Web Data Integration A.Poulovassilis

ProLD: Propagate Linked Data

Intelligent Information Management

Semantic Technologies and CDISC Standards. Frederik Malfait, Information Architect, IMOS Consulting Scott Bahlavooni, Independent

Semantics. Matthew J. Graham CACR. Methods of Computational Science Caltech, 2011 May 10. matthew graham

Semantic Web Test

CONTENTDM METADATA INTO LINKED DATA

The P2 Registry

W3C Workshop on the Future of Social Networking, January 2009, Barcelona

> Semantic Web Use Cases and Case Studies

Linked Data in the Clouds : a Sindice.com perspective

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Semantic Adaptation Approach for Adaptive Web-Based Systems

Social Networks and Data Portability using Semantic Web technologies

Why You Should Care About Linked Data and Open Data Linked Open Data (LOD) in Libraries

Using the Heterogeneous Database and Linked Data Technologies with Case Study of Thai Local Government Planning Database

Towards the Semantic Desktop. Dr. Øyvind Hanssen University Library of Tromsø

DBpedia-An Advancement Towards Content Extraction From Wikipedia

Enrichment, Reconciliation and Publication of Linked Data with the BIBFRAME model. Tiziana Possemato Casalini Libri

Linked Data in Archives

THE GETTY VOCABULARIES TECHNICAL UPDATE

Schema org/microdata Exposing Y our Your Data the Web (The Easy Way) Linked Data vs Schema.org: A Town Hall Debate about the Future of Information

Transcription:

28th British National Conference on Databases (BNCOD2011) July 12 th, 2011, Manchester, UK Evolving the Web into a Global Data Space Prof. Dr. Christian Bizer Freie Universität ität Berlin Germany

Outline 1. Linked Data What is the vision and goal? 2. Topology of the Web of Data What data is out there? 3. Global Data Integration How to split the integration effort? 4. Challenges and Opportunities

Web APIs expose proprietary Interfaces Image: Bob Jagensdorf, http://flickr.com/photos/darwinbell/, Christian CC-BY Bizer: Evolving the Web into a Global Data Space BNCOD 2011, Manchester (12/7/2011)

Mashups are based on a fixed set of data sources Mashup Up Web API Web API Web API Web API Adding a new data source requires manual effort. No single global data space. A B C D

Alternative Approach: Linked Data Extend the Web with a single global l data space. 1. by using RDF to publish structured data on the Web 2. by setting links between data items within different data sources. RDF RDF RDF RDF RDF RDF RDF RDF RDF RDF RDF link RDF links RDF links RDF links A B C D E

Linked Data Principles Set of best practices for publishing structured data on the Web in accordance with the general architecture of the Web. 1. Use URIs as names for things. 2. Use HTTP URIs so that people can look up those names. 3. When someone looks up a URI, provide useful RDF information. 4. Include RDF statements that link to other URIs so that they can discover related things. Tim Berners-Lee, http://www.w3.org/designissues/linkeddata.html, 2006

The Basis: RDF Data Model pd:cygri rdf:type foaf:person foaf:namef Richard Cyganiak foaf:based_near dbpedia:berlin Flexible, schema-less graph data model.

Entities are identified with HTTP URIs pd:cygri rdf:type foaf:person foaf:namef Richard Cyganiak foaf:based_near dbpedia:berlin HTTP URIs take the role of global primary keys. pd:cygri = http://richard.cyganiak.de/foaf.rdf#cygri h i d /f f df# i dbpedia:berlin = http://dbpedia.org/resource/berlin

Resolving URIs over the Web pd:cygri rdf:type foaf:person foaf:namef Richard Cyganiak foaf:based_near dbpedia:berlin dp:population 3.405.259 skos:subject dp:cities_in_germany i The HTTP protocol brings together identification and retrieval again.

Following Links deeper into the Web pd:cygri rdf:type foaf:person foaf:namef Richard Cyganiak foaf:based_near dbpedia:berlin dp:population 3.405.259 dbpedia:hamburgdi b skos:subject skos:subject dp:cities_in_germany i dbpedia:muenchen skos:subject

The Disco Hyperdata Browser

Properties of the Web of Linked Data Global, distributed data space build on a simple set of standards RDF, URIs, HTTP Entities are connected by links creating a global data graph that spans data sources and enables the discovery of new data sources at run-time Provides for data-coexistence Everyone can publish data to the Web of Linked Data Everyone can express their personal view on things

2. Topology of the Web of Data What data is out there?

W3C Linking Open Data Project Grassroots community effort to publish existing open license datasets as Linked Data on the Web interlink things between different data sources.

LOD Datasets on the Web: May 2007 Over 500 million RDF triples Around 120,000 RDF links between data sources

LOD Datasets on the Web: September 2008

LOD Datasets on the Web: September 2010 Over 26,9 billion RDF triples Over 436 million RDF links between data sources

Growth of the Web of Linked Data Year Datasets Triples Growth 2007 12 500.000.000000 000 2008 45 2.000.000.000 300% 2009 95 6.726.000.000 236% 2010 203 26.930.509.703 300%

Uptake in the Government Domain The EU is starting to publish Linked Data (LOD2, LATC) Various other national efforts W3C egovernment Interest Group

Uptake in the Libraries Community Institutions publishing Linked Data Library of Congress (subject headings) German National Library (PND dataset and subject headings) Swedish National Library (Libris - catalog) Hungarian National Library (OPAC and Digital Library) Europeana Digital it Library just released data about 4 million artifacts t Growth of Library Linked Data (2009-2010): 1000% W3C Library Linked Data Incubator Group Goals: 1. Integrate Library Catalogs on global scale. 2. Interconnect resources between repositories (by topic, by location, by historical period, by...).

Uptake in Life Sciences W3C Linking Open Drug Data Effort Bio2RDF Project Allen Brain Atlas Goal: Smoothly integrate internal and external data.

Uptake in the Media Industry Publish data as RDF/XML or RDFa Goal: Drive traffic to websites via search engines

LOD Cloud Data Catalog on CKAN http://www.ckan.net/group/lodcloud Basic Statistics (Nov 2010) Domain Data Sets Triples Percent RDF Links Percent Cross domain 20 1,999,085,950 7.42 29,105,638 7.36 Geographic 16 5,904,980,833 980 833 21.93 16,589,086 419 4.19 Government 25 11,613,525,437 43.12 17,658,869 4.46 Media 26 2,453,898,811 9.11 50,374,304 12.74 Libraries 67 2,237,435,732 8.31 77,951,898 19.71 Life sciences 42 2,664,119,184 9.89 200,417,873 50.67 User Content 7 57,463,756 0.21 3,402,228 0.86 203 26,930,509,703 395,499,896 More statistics http://www4.wiwiss.fu-berlin.de/lodcloud/state/

Linked Data Applications What can I do with this? Linked Data Browsers Linked Data Mashups Search Engines Thing Thing Thing Thing Thing Thing Thing Thing Thing Thing typed links typed links typed links typed links A B C D E

Linked Data Browsers Provide for navigating between data sources and for exploring the data space. Tabulator Browser (MIT, USA) Marbles (FU Berlin, DE) OpenLink p RDF Browser (OpenLink, UK) Zitgist RDF Browser (Zitgist, USA) Disco Hyperdata Browser (FU Berlin, DE) Fenfire (DERI, Irland)

Web of Data Search Engines Crawl the data space and provide best-effort t query answers over crawled data. Falcons (IWS, China) Sig.ma (DERI, Ireland) Swoogle (UMBC, USA) VisiNav (DERI, Ireland) Watson (Open University, it UK)

What are the big players doing?

Open Graph Protocol allows site owners to determine how entities are described in Facebook relies on RDFa for encoding structured data in HTML pages available since April 2010

Schema.org allows site owners to provide data to enrich search results. relies on Microdata for encoding structured data in HTML pages available since June 2011

Search Engines turn into Answering Engines Data snippets within search results Answer to a fact query

Publishing structured data becomes a SEO topic usage g of RDFa has increased 510% between March, 2009 and October, 2010 430 million webpages contain RDFa Source: Yahoo http://tripletalk.wordpress.com/2011/01/25/rdfa-deployment-across-the-web/ deployment across the web/

Topology of the Web of Data

The Web of Data provides equal opportunities Everybody can crawl the data. different from earlier approaches like Google Base and Google Fusion Tables just as on the classic Web

3. Global Data Integration Applications hate heterogeneity! The wild wild west My little world

The Dataspace Vision Alternative to classic data integration systems in order to cope with growing number of data sources. Properties of dataspaces no upfront investment into a global schema rely on pay-as-you-go data integration ti give best effort answers to queries Franklin, M., Halevy, A., and Maier, D.: From Databases to Dataspaces A new Abstraction for Information Management, SIGMOD Rec. 2005. Madhavan, J., et al.: Web-scale Data Integration: You Can Only Afford to Pay As You Go, CIDR 2007

Linked Data relies on the Pay-as-You-Go Idea for Identity Management for Schema/Vocabulary Management

Publish Identity Links on the Web Identity Link <http://www4.wiwiss.fu-berlin.de/is-group/resource/persons/person4> owl:sameas <http://dblp.l3s.de/d2r/resource/authors/christian_bizer>. You publish links pointing at other data sources Somebody else publishes links pointing at your data source Every link is an integration hint (semantic clue)

Effort Distribution between Publishers and Consumer Consumer data mines identity links Effort Distribution Publishers or third parties provides identity links

Vocabularies on the Web of Data Everyone can use whatever vocabularies she likes to publish data on the Web. Or invest effort into reusing Common Vocabularies Friend-of-a-Friend for describing people and their social network SKOS for representing topic taxonomies GoodRelations provides terms for describing products and business entities SIOC for describing forums and blogs Organization Ontology for describing the structure of organizations Music Ontology for describing artists, albums, and performances Review Vocabulary provides terms for representing reviews Many data source use mixture of common and proprietary vocabulary terms.

Publish Vocabulary Links on the Web Vocabulary Link <http://xmlns.com/foaf/0.1/person> owl:equivalentclass <http://dbpedia.org/ontology/person> di / t /P. Simple Mappings: RDFS, OWL rdfs:subclassof, rdfs:subpropertyof owl:equivalentclass, owl:equivalentproperty Complex Mappings: R2R provides value transformation ti functions structural transformations Every vocabulary link is an integration hint.

Existing Vocabulary Links Source: Linked Open Vocabularies http://labs.mondeca.com/dataset/lov

Effort Distribution between Publisher and Consumer Consumer defines or data mines mappings Effort Distribution Publisher reuses vocabularies Publisher or third party publishes mappings

Somebody-Pays-As-You-Go The overall data integration effort is split between data publishers, third parties, and the data consumer. Data Publisher publishes data as RDF sets identity links reuses terms or publishes mappings Fix Overall Data Integration Effort Third Parties set identity links pointing at your data publish mappings to the Web Publisher s Effort Third Party Effort Data Consumer has to do the rest using manual effort as well as record linkage and schema matching techniques Consumer s Effort

4. Challenges and Opportunities

Things that are already there Data Publication Tools D2R Server, Pubby, Triplify, Talis Plattform Data Access Tools LDspider, SQUIN, Linked Data Client Libraries RDF Stores RDF stores are getting better Example: Virtuoso 2 to 4 times slower than RDBMS on TCP-H inrdf See also: Berlin SPARQL Benchmark Identity Resolution Tools SILK, LIMES Data Translation Tools R2R Mapping Framework

Things that are still missing 1. More research on data space profiling is needed. What is in the data space and how does the content change over time? 2. More research on machine learning mappings and identity resolution heuristics within the Web context. Identity links make it easier to learn vocabulary links. Vocabulary links make it easier to learn identity links. 3. More research on data quality assessment and SPAM detection is needed. 4. More research on pay-as-you-go data integration is needed. How do human, community and machine contributions play together over time?

Querying the Web like a Single Global Database prototypes exist lots of opportunities for vertical applications Product Search, Job Search, Event Search,

Opportunities 1. Analytical Applications Web-scale OLAP Global Data Mining 2. Linked Data as Data Integration Technology within organizations. Example: BBC for networks of companies. Example: W3C Linking Open Drug Data

Hands-on: How to play around with the data? Download the Billion Triples Challenge Dataset 2 billion triples (20GB gzipped) crawled from the public Web of Linked Data in May/June 2011 http://challenge.semanticweb.org/ Download the Sindice Dump 12 billion triples (164GB gzipped, ~1,16TB uncompressed) crawled from the public Web of Linked Data and includes RDFa, Microformat, and wrapped API data http://data.sindice.com/trec2011/download.html If you do something interesting ti with the data submit your results to the Semantic Web Challenge until October 1 st http://challenge.semanticweb.org/ h ti /

Thanks! References Textbook: Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global Data Space. http://linkeddatabook.com/ Christian Bizer, Tom Heath, Tim Berners-Lee: Linked Data The Story So Far http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf