Semantic Web: Extracting and Mining Structured Data from Unstructured Content

Size: px
Start display at page:

Download "Semantic Web: Extracting and Mining Structured Data from Unstructured Content"

Transcription

1 : Extracting and Mining Structured Data from Unstructured Content Web Science Lecture Besnik Fetahu L3S Research Center, Leibniz Universität Hannover May 20, 2014

2 1 Introduction

3 1 Introduction

4 Introduction Large amounts of data. Heterogeneity of information: provenance, quality, content, representation, language etc. Unstructured vs. Structured. and. Entities, topics, relations. Use cases: translation, semantic search, etc.

5 1 Introduction

6 The vision The ultimate goal of the Web of data is to enable computers to do more useful work and to develop systems that can support trusted interactions over the network. The term refers to W3C s vision of the Web of linked data. technologies enable people to create data stores on the Web, build vocabularies, and write rules for handling data. Linked data are empowered by technologies such as RDF, SPARQL, OWL, and SKOS.

7 Main Components Format: turtle, n3, etc. Syntax: XML Schema Models: RDF Taxonomies: RDFS : OWL Query languages: SPARQL Interchange formats: RIF

8 Data Formats and Models XML data format

9 Data Formats and Models XML data format RDF data representation ( subject, predicate, object )

10 define the following concepts: Entities s Domains Rules Axioms

11 Representation Differences in OWL ontologies OWL-Lite (OWL-Lite OWL-DL): supports those users primarily needing a classification hierarchy and simple constraints. It supports cardinality constraints, and only permits cardinality values of 0 or 1. OWL-DL (OWL-DL OWL): supports maximum expressiveness while retaining computational completeness and decidability. OWL-DL includes all OWL language constructs, but they can be used only under certain restrictions. OWL: is meant for users who want maximum expressiveness and the syntactic freedom of RDF with no computational guarantees.

12 Representation and Schemas RDF Schema RDFS 1 1 classes: rdfs:class 2 properties: rdf:property, rdfs:subclassof 3 domains: rdfs:domain Web Ontology Language OWL 2 (OWL-Lite, OWL-DL) 1 classes: owl:class 2 properties: owl:equivalentclass, owl:sameas Friend of a Friend FOAF ontology 3 1 classes: foaf:agent, foaf:document, foaf:organisation, foaf:person Simple Organization System SKOS ontology 4 1 classes: skos:concept, skos:collection 2 properties: skos:related, skos:broader, skos:narrower

13 Representation RDFS example

14 Representation RDFS example Hierarchical class modelling

15 Representation RDFS example Hierarchical class modelling OWL ontology example

16 Representation vs. Taxonomies

17 Representation Abox vs. Tbox

18 RDF data published as triples subject, predicate, object SPARQL standard querying language over RDF data principles: 1 URIs as names for things 2 De-referencable URIs 3 Provide information about things using standards: RDF, SPARQL 4 Interlink with other things Billions of triples Interlink all data into one gigantic graph: lod-cloud,schema.org... Microformats: RDFa for annotating web pages

19 Everything done? Only a small fraction of data is actually structured Cumbersome to define manually and explicitly schemas, taxonomies, ontologies Large proportion of data is unstructured or semi-structured Can we automatically extract and model such content?

20 1 Introduction

21 1 Semi-structured data: Wikipedia, WordNet

22 1 Semi-structured data: Wikipedia, WordNet 2 Social Streams: twitter

23 1 Semi-structured data: Wikipedia, WordNet 2 Social Streams: twitter 3 News corpora: NYT Collection, Reuters, Wall Street Journal (WSJ)

24 1 Semi-structured data: Wikipedia, WordNet 2 Social Streams: twitter 3 News corpora: NYT Collection, Reuters, Wall Street Journal (WSJ) 4 Web pages: common-crawl, ClueWeb

25 1 Semi-structured data: Wikipedia, WordNet 2 Social Streams: twitter 3 News corpora: NYT Collection, Reuters, Wall Street Journal (WSJ) 4 Web pages: common-crawl, ClueWeb 5 : lod-cloud

26 1 Introduction

27 Very large corpora of unstructured text. Heterogeneity: languages, quality, domains. Rich underlying structure of unstructured text. Natural Language Processing (NLP): POS, NER, Co-Ref, Dependency Parsing (DP) etc. Utilise NLP output for IE based on syntactic, semantic and lexical patterns. Query and Entity based summarisation.

28 Autonomous understanding of text by machines Construct a belief based on the underlying corpus OpenIE: an IE domain-independent paradigm for relation, classes, and entities extraction. TextRunner (Etzioni et al. 2008) self-supervised approach for OpenIE. Represent each relation as a triple subject predicate object Understanding and semantics of extracted triples is primitive still. Etzioni O., Banko M., J. Cafarella M. AAAI Etzioni O., Banko M., J. Cafarella M. AAAI 2007

29 : TextRunner 1 Self-Supervised Learner

30 : TextRunner 1 Self-Supervised Learner 2 Single-pass extractor

31 : TextRunner 1 Self-Supervised Learner 2 Single-pass extractor 3 Redundancy-Based Assessor

32 DP of chunks of texts for relation extraction

33 DP of chunks of texts for relation extraction Syntactic patterns for relation extraction Michael Webb appeared on Oprah... Michael Webb; appear on; Oprah Schmitz et al. 2007

34 DP of chunks of texts for relation extraction Syntactic patterns for relation extraction Semantic and Lexical patterns for relation extraction Schmitz et al. 2007

35 DP of chunks of texts for relation extraction Syntactic patterns for relation extraction Semantic and Lexical patterns for relation extraction ReVerb: two step approach relation first rather than arguments first 1 identify relations 2 identify arguments Fader et al. 2011

36 ClausIE (del Corro et al., 2013) a clause based approach for relation extraction del Corro et al. 2013

37 ClausIE (del Corro et al., 2013) a clause based approach for relation extraction Automated approach, less restrictive and with improved recall. del Corro et al. 2013

38 Textual content has rich underlying syntactical and semantical structure

39 Textual content has rich underlying syntactical and semantical structure Frequently extracted syntactical and semantical information: POS, Co-Ref and NER.

40 Textual content has rich underlying syntactical and semantical structure Frequently extracted syntactical and semantical information: POS, Co-Ref and NER. Stanford CoreNLP: named entity recognition with specific entity types Person, Organisation, Place, Date.

41 Textual content has rich underlying syntactical and semantical structure Frequently extracted syntactical and semantical information: POS, Co-Ref and NER. Stanford CoreNLP: named entity recognition with specific entity types Person, Organisation, Place, Date. NED: named entity disambiguation of surface forms with entities from knowledge bases 1 DBpedia Spotlight 2 Wikiminer 3 AIDA...

42 1 Introduction

43 Prominent knowledge base examples: 1 WordNet knowledge base

44 Prominent knowledge base examples: 1 WordNet knowledge base 2 Wikipedia encyclopaedia

45 Prominent knowledge base examples: 1 WordNet knowledge base 2 Wikipedia encyclopaedia 3 DBpedia knowledge base

46 Prominent knowledge base examples: 1 WordNet knowledge base 2 Wikipedia encyclopaedia 3 DBpedia knowledge base 4 YAGO knowledge base

47 and Interlinking Semantic relatedness of entities Exploit existing knowledge base structures Latent relationships via semantic relations

48 Search through structured data in the form of triples Weigh differently different predicates Map user keyword queries to matching entities Blanco et al. 2011

49 Zaveri et al. 2012

50 1 Introduction

51 Large volumes of unstructured and high quality data High applicability of IE techiniques for structuring unstructured data Availability of encyclopaedias in the form of knowledge bases Wide range of applications in Further expansion of knowledge bases with facts about the real world from unstructured text apart from Wikipedia Infoboxes aspects of data

52 1 Introduction

53 1 YAGO: A Core of Semantic Unifying WordNet and Wikipedia. Suchanek F., Kasneci Gj., Weikum G.,. In Proceedings of the 16th WWW, page , Semantic Stability in Social Tagging Streams. Wagner C., Singer P., Strohmaier M., Huberman B.,. CoRR, Test-driven Evaluation of. Kontokostas D., Westphal P., Auer S., Hellmann S., Lehmann J., Cornelissen R., Zaveri A.,. In Proceedings of the 23rd WWW, page , Federated Entity Search Using On-the-Fly Consolidation. Herzig D., Mika P., Blanco R., Tran T.,. In proceedings of the ISWC, page Automatic Expansion of DBpedia Exploiting Wikipedia Cross-Language. Palmero Aprosio A., Giuliano C., Lavelli A.,. In proceedings of the 11th ESWC, page

54 Fabian Suchanek and Gerhard Weikum harvesting in the big-data era. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD 13). Gerhard Weikum and Martin Theobald From information to knowledge: harvesting entities and relationships from web sources. In Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems (PODS 10). Roi Blanco, Peter Mika, and Sebastiano Vigna Effective and efficient entity search in RDF data. In Proceedings of the 10th international conference on The semantic web (ISWC 11). Jeffrey Pound, Peter Mika, and Hugo Zaragoza Ad-hoc object retrieval in the web of data. In Proceedings of the 19th international conference on World wide web (WWW 10). Nunes, B. P., Dietze, S., Casanova, M.A., Kawase, R., Fetahu, B. and Nejdl, W.. Combining a co-occurrence-based and a semantic measure for entity linking. In Proceedings of the 10th Extended Conference, 2013 (ESWC 13). Zaveri, Amrapali, Rula, Anisa, Maurino, Andrea, Pietrobon, Ricardo, Lehmann, Jens and Auer, Sören. Assessment Methodologies for Linked Open Data. Journal (2014).

55 Gangemi, Aldo. A Comparison of Tools for the. In Proceedings of the 10th Extended Conference, 2013 (ESWC 13). Mendes, Pablo N., Jakob, Max, Garca-Silva, Andrés and Bizer, Christian. DBpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th International Conference on Semantic Systems, Yosef, Mohamed Amir, Hoffart, Johannes, Bordino, Ilaria, Spaniol, Marc and Weikum, Gerhard. AIDA: An Online Tool for Accurate of Named Entities in Text and Tables. PVLDB 4, no. 12 (2011): Isabelle Augenstein, Sebastian Padó, and Sebastian Rudolph LODifier: generating linked data from unstructured text. In Proceedings of the 9th international conference on The : research and applications (ESWC 12). Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives DBpedia: a nucleus for a web of open data. In Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference (ISWC 07/ASWC 07). Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum Yago: a core of semantic knowledge. In Proceedings of the 16th international conference on World Wide Web (WWW 07). Oren Etzioni, Michele Banko, Stephen Soderland, and Daniel S. Weld Open information extraction from the web. Commun. ACM 51, 12 (December 2008),

56 Mausam, Michael Schmitz, Robert Bart, Stephen Soderland, and Oren Etzioni Open language learning for information extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 12). Raymond J. Mooney and Razvan Bunescu Mining knowledge from text using information extraction. SIGKDD Explor. Newsl. Chang Wang, James Fan, Aditya Kalyanpur, and David Gondek extraction with relation topics. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 11). Robert Isele, Anja Jentzsch, Christian Bizer: Silk Server - Adding missing Links while consuming. COLD Oren Etzioni reading at web scale. In Proceedings of the 2008 International Conference on Web Search and Data Mining. Luciano Del Corro and Rainer Gemulla ClausIE: clause-based open information extraction. In Proceedings of the 22nd international conference on World Wide Web (WWW 13). Rudi Studer, V.Richard Benjamins, Dieter Fensel, engineering: Principles and methods, Data & Engineering, Volume 25, Issues 1 2, 1998, pages Christian Bizer, Tom Heath, and Tim Berners-Lee. International Journal on and Systems 5(3):1 22 (2009)

57 Thank you! Questions?

Creating Large-scale Training and Test Corpora for Extracting Structured Data from the Web

Creating Large-scale Training and Test Corpora for Extracting Structured Data from the Web Creating Large-scale Training and Test Corpora for Extracting Structured Data from the Web Robert Meusel and Heiko Paulheim University of Mannheim, Germany Data and Web Science Group {robert,heiko}@informatik.uni-mannheim.de

More information

DBpedia-An Advancement Towards Content Extraction From Wikipedia

DBpedia-An Advancement Towards Content Extraction From Wikipedia DBpedia-An Advancement Towards Content Extraction From Wikipedia Neha Jain Government Degree College R.S Pura, Jammu, J&K Abstract: DBpedia is the research product of the efforts made towards extracting

More information

A Korean Knowledge Extraction System for Enriching a KBox

A Korean Knowledge Extraction System for Enriching a KBox A Korean Knowledge Extraction System for Enriching a KBox Sangha Nam, Eun-kyung Kim, Jiho Kim, Yoosung Jung, Kijong Han, Key-Sun Choi KAIST / The Republic of Korea {nam.sangha, kekeeo, hogajiho, wjd1004109,

More information

Evaluating Class Assignment Semantic Redundancy on Linked Datasets

Evaluating Class Assignment Semantic Redundancy on Linked Datasets Evaluating Class Assignment Semantic Redundancy on Linked Datasets Leandro Mendoza CONICET, Argentina LIFIA, Facultad de Informática, UNLP, Argentina Alicia Díaz LIFIA, Facultad de Informática, UNLP, Argentina

More information

Enriching an Academic Knowledge base using Linked Open Data

Enriching an Academic Knowledge base using Linked Open Data Enriching an Academic Knowledge base using Linked Open Data Chetana Gavankar 1,2 Ashish Kulkarni 1 Yuan Fang Li 3 Ganesh Ramakrishnan 1 (1) IIT Bombay, Mumbai, India (2) IITB-Monash Research Academy, Mumbai,

More information

Discovering Names in Linked Data Datasets

Discovering Names in Linked Data Datasets Discovering Names in Linked Data Datasets Bianca Pereira 1, João C. P. da Silva 2, and Adriana S. Vivacqua 1,2 1 Programa de Pós-Graduação em Informática, 2 Departamento de Ciência da Computação Instituto

More information

A rule-based approach to address semantic accuracy problems on Linked Data

A rule-based approach to address semantic accuracy problems on Linked Data A rule-based approach to address semantic accuracy problems on Linked Data (ISWC 2014 - Doctoral Consortium) Leandro Mendoza 1 LIFIA, Facultad de Informática, Universidad Nacional de La Plata, Argentina

More information

YAGO - Yet Another Great Ontology

YAGO - Yet Another Great Ontology YAGO - Yet Another Great Ontology YAGO: A Large Ontology from Wikipedia and WordNet 1 Presentation by: Besnik Fetahu UdS February 22, 2012 1 Fabian M.Suchanek, Gjergji Kasneci, Gerhard Weikum Presentation

More information

T2KG: An End-to-End System for Creating Knowledge Graph from Unstructured Text

T2KG: An End-to-End System for Creating Knowledge Graph from Unstructured Text The AAAI-17 Workshop on Knowledge-Based Techniques for Problem Solving and Reasoning WS-17-12 T2KG: An End-to-End System for Creating Knowledge Graph from Unstructured Text Natthawut Kertkeidkachorn, 1,2

More information

Semantic Web and Linked Data

Semantic Web and Linked Data Semantic Web and Linked Data Petr Křemen December 2012 Contents Semantic Web Technologies Overview Linked Data Semantic Web Technologies Overview Semantic Web Technology Stack from Wikipedia. http://wikipedia.org/wiki/semantic_web,

More information

The Emerging Web of Linked Data

The Emerging Web of Linked Data 4th Berlin Semantic Web Meetup 26. February 2010 The Emerging Web of Linked Data Prof. Dr. Christian Bizer Freie Universität Berlin Outline 1. From a Web of Documents to a Web of Data Web APIs and Linked

More information

SRI International, Artificial Intelligence Center Menlo Park, USA, 24 July 2009

SRI International, Artificial Intelligence Center Menlo Park, USA, 24 July 2009 SRI International, Artificial Intelligence Center Menlo Park, USA, 24 July 2009 The Emerging Web of Linked Data Chris Bizer, Freie Universität Berlin Outline 1. From a Web of Documents to a Web of Data

More information

Disambiguating Entities Referred by Web Endpoints using Tree Ensembles

Disambiguating Entities Referred by Web Endpoints using Tree Ensembles Disambiguating Entities Referred by Web Endpoints using Tree Ensembles Gitansh Khirbat Jianzhong Qi Rui Zhang Department of Computing and Information Systems The University of Melbourne Australia gkhirbat@student.unimelb.edu.au

More information

Prof. Dr. Christian Bizer

Prof. Dr. Christian Bizer STI Summit July 6 th, 2011, Riga, Latvia Global Data Integration and Global Data Mining Prof. Dr. Christian Bizer Freie Universität ität Berlin Germany Outline 1. Topology of the Web of Data What data

More information

Mutual Disambiguation for Entity Linking

Mutual Disambiguation for Entity Linking Mutual Disambiguation for Entity Linking Eric Charton Polytechnique Montréal Montréal, QC, Canada eric.charton@polymtl.ca Marie-Jean Meurs Concordia University Montréal, QC, Canada marie-jean.meurs@concordia.ca

More information

Mapping between Digital Identity Ontologies through SISM

Mapping between Digital Identity Ontologies through SISM Mapping between Digital Identity Ontologies through SISM Matthew Rowe The OAK Group, Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK m.rowe@dcs.shef.ac.uk

More information

Knowledge Representation in Social Context. CS227 Spring 2011

Knowledge Representation in Social Context. CS227 Spring 2011 7. Knowledge Representation in Social Context CS227 Spring 2011 Outline Vision for Social Machines From Web to Semantic Web Two Use Cases Summary The Beginning g of Social Machines Image credit: http://www.lifehack.org

More information

Extracting Wikipedia Historical Attributes Data

Extracting Wikipedia Historical Attributes Data Extracting Wikipedia Historical Attributes Data Guillermo Garrido NLP & IR Group, UNED Madrid, Spain ggarrido@lsi.uned.es Jean-Yves Delort Google Research Zurich Switzerland jydelort@google.com Enrique

More information

Semantic Web and Natural Language Processing

Semantic Web and Natural Language Processing Semantic Web and Natural Language Processing Wiltrud Kessler Institut für Maschinelle Sprachverarbeitung Universität Stuttgart Semantic Web Winter 2014/2015 This work is licensed under a Creative Commons

More information

Linked Data Evolving the Web into a Global Data Space

Linked Data Evolving the Web into a Global Data Space Linked Data Evolving the Web into a Global Data Space Anja Jentzsch, Freie Universität Berlin 05 October 2011 EuropeanaTech 2011, Vienna 1 Architecture of the classic Web Single global document space Web

More information

Techreport for GERBIL V1

Techreport for GERBIL V1 Techreport for GERBIL 1.2.2 - V1 Michael Röder, Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo February 21, 2016 Current Development of GERBIL Recently, we released the latest version 1.2.2 of GERBIL [16] 1.

More information

Proposal for Implementing Linked Open Data on Libraries Catalogue

Proposal for Implementing Linked Open Data on Libraries Catalogue Submitted on: 16.07.2018 Proposal for Implementing Linked Open Data on Libraries Catalogue Esraa Elsayed Abdelaziz Computer Science, Arab Academy for Science and Technology, Alexandria, Egypt. E-mail address:

More information

Entity Linking at Web Scale

Entity Linking at Web Scale Entity Linking at Web Scale Thomas Lin, Mausam, Oren Etzioni Computer Science & Engineering University of Washington Seattle, WA 98195, USA {tlin, mausam, etzioni}@cs.washington.edu Abstract This paper

More information

A service based on Linked Data to classify Web resources using a Knowledge Organisation System

A service based on Linked Data to classify Web resources using a Knowledge Organisation System A service based on Linked Data to classify Web resources using a Knowledge Organisation System A proof of concept in the Open Educational Resources domain Abstract One of the reasons why Web resources

More information

a paradigm for the Introduction to Semantic Web Semantic Web Angelica Lo Duca IIT-CNR Linked Open Data:

a paradigm for the Introduction to Semantic Web Semantic Web Angelica Lo Duca IIT-CNR Linked Open Data: Introduction to Semantic Web Angelica Lo Duca IIT-CNR angelica.loduca@iit.cnr.it Linked Open Data: a paradigm for the Semantic Web Course Outline Introduction to SW Give a structure to data (RDF Data Model)

More information

The German DBpedia: A Sense Repository for Linking Entities

The German DBpedia: A Sense Repository for Linking Entities The German DBpedia: A Sense Repository for Linking Entities Sebastian Hellmann, Claus Stadler, and Jens Lehmann Abstract The modeling of lexico-semantic resources by means of ontologies is an established

More information

A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles

A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles Besnik Fetahu, Stefan Dietze, Bernardo Pereira Nunes *, Marco Antonio Casanova *, Davide Taibi, and Wolfgang Nejdl L3S Research

More information

AIDArabic A Named-Entity Disambiguation Framework for Arabic Text

AIDArabic A Named-Entity Disambiguation Framework for Arabic Text AIDArabic A Named-Entity Disambiguation Framework for Arabic Text Mohamed Amir Yosef, Marc Spaniol, Gerhard Weikum Max-Planck-Institut für Informatik, Saarbrücken, Germany {mamir mspaniol weikum}@mpi-inf.mpg.de

More information

Text, Knowledge, and Information Extraction. Lizhen Qu

Text, Knowledge, and Information Extraction. Lizhen Qu Text, Knowledge, and Information Extraction Lizhen Qu A bit about Myself PhD: Databases and Information Systems Group (MPII) Advisors: Prof. Gerhard Weikum and Prof. Rainer Gemulla Thesis: Sentiment Analysis

More information

Linking Entities in Chinese Queries to Knowledge Graph

Linking Entities in Chinese Queries to Knowledge Graph Linking Entities in Chinese Queries to Knowledge Graph Jun Li 1, Jinxian Pan 2, Chen Ye 1, Yong Huang 1, Danlu Wen 1, and Zhichun Wang 1(B) 1 Beijing Normal University, Beijing, China zcwang@bnu.edu.cn

More information

An Evaluation Dataset for Linked Data Profiling

An Evaluation Dataset for Linked Data Profiling An Evaluation Dataset for Linked Data Profiling Andrejs Abele, John McCrae, and Paul Buitelaar Insight Centre for Data Analytics, National University of Ireland, Galway, IDA Business Park, Lower Dangan,

More information

CROCUS: Cluster-based Ontology Data Cleansing

CROCUS: Cluster-based Ontology Data Cleansing CROCUS: Cluster-based Ontology Data Cleansing Didier Cherix 2, Ricardo Usbeck 12, Andreas Both 2, and Jens Lehmann 1 1 University of Leipzig, Germany {usbeck,lehmann}@informatik.uni-leipzig.de 2 R & D,

More information

DBpedia As A Formal Knowledge Base An Evaluation

DBpedia As A Formal Knowledge Base An Evaluation DBpedia As A Formal Knowledge Base An Evaluation TOMASZ BOIŃSKI Gdańsk University of Technology Faculty of Electronics, Telecommunications and Informatics Narutowicza Street 11/12 80-233 Gdańsk POLAND

More information

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper. Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview...

More information

The R2R Framework: Christian Bizer, Andreas Schultz. 1 st International Workshop on Consuming Linked Data (COLD2010) Freie Universität Berlin

The R2R Framework: Christian Bizer, Andreas Schultz. 1 st International Workshop on Consuming Linked Data (COLD2010) Freie Universität Berlin 1 st International Workshop on Consuming Linked Data (COLD2010) November 8, 2010, Shanghai, China The R2R Framework: Publishing and Discovering i Mappings on the Web Christian Bizer, Andreas Schultz Freie

More information

University of Rome Tor Vergata DBpedia Manuel Fiorelli

University of Rome Tor Vergata DBpedia Manuel Fiorelli University of Rome Tor Vergata DBpedia Manuel Fiorelli fiorelli@info.uniroma2.it 07/12/2017 2 Notes The following slides contain some examples and pictures taken from: Lehmann, J., Isele, R., Jakob, M.,

More information

A Self-Supervised Approach for Extraction of Attribute-Value Pairs from Wikipedia Articles

A Self-Supervised Approach for Extraction of Attribute-Value Pairs from Wikipedia Articles A Self-Supervised Approach for Extraction of Attribute-Value Pairs from Wikipedia Articles Wladmir C. Brandão 1, Edleno S. Moura 2, Altigran S. Silva 2, and Nivio Ziviani 1 1 Dep. of Computer Science,

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

Doctoral Thesis Proposal Learning Semantics of WikiTables

Doctoral Thesis Proposal Learning Semantics of WikiTables Doctoral Thesis Proposal Learning Semantics of WikiTables Chandra Sekhar Bhagavatula Department of Electrical Engineering and Computer Science Northwestern University csbhagav@u.northwestern.edu December

More information

DHTK: The Digital Humanities ToolKit

DHTK: The Digital Humanities ToolKit DHTK: The Digital Humanities ToolKit Davide Picca, Mattia Egloff University of Lausanne Abstract. Digital Humanities have the merit of connecting two very different disciplines such as humanities and computer

More information

CROCUS: Cluster-based Ontology Data Cleansing

CROCUS: Cluster-based Ontology Data Cleansing CROCUS: Cluster-based Ontology Data Cleansing Didier Cherix 2, Ricardo Usbeck 12, Andreas Both 2, and Jens Lehmann 1 1 University of Leipzig, Germany {usbeck,lehmann}@informatik.uni-leipzig.de 2 R & D,

More information

DBpedia Spotlight at the MSM2013 Challenge

DBpedia Spotlight at the MSM2013 Challenge DBpedia Spotlight at the MSM2013 Challenge Pablo N. Mendes 1, Dirk Weissenborn 2, and Chris Hokamp 3 1 Kno.e.sis Center, CSE Dept., Wright State University 2 Dept. of Comp. Sci., Dresden Univ. of Tech.

More information

GERBIL s New Stunts: Semantic Annotation Benchmarking Improved

GERBIL s New Stunts: Semantic Annotation Benchmarking Improved GERBIL s New Stunts: Semantic Annotation Benchmarking Improved Michael Röder, Ricardo Usbeck, and Axel-Cyrille Ngonga Ngomo AKSW Group, University of Leipzig, Germany roeder usbeck ngonga@informatik.uni-leipzig.de

More information

WebDB 2010 June 6 th, 2010, Indianapolis, USA. Christian Bizer. Freie Universität Berlin. Christian Bizer: The Web of Linked Data (6/6/2010)

WebDB 2010 June 6 th, 2010, Indianapolis, USA. Christian Bizer. Freie Universität Berlin. Christian Bizer: The Web of Linked Data (6/6/2010) WebDB 2010 June 6 th, 2010, Indianapolis, USA The Web of Linked Data A global public dataspace on the Web Christian Bizer Freie Universität Berlin Outline 1. Foundations of Dataspaces and Linked Data Where

More information

August 2012 Daejeon, South Korea

August 2012 Daejeon, South Korea Building a Web of Linked Entities (Part I: Overview) Pablo N. Mendes Free University of Berlin August 2012 Daejeon, South Korea Outline Part I A Web of Linked Entities Challenges Progress towards solutions

More information

Computer-assisted Ontology Construction System: Focus on Bootstrapping Capabilities

Computer-assisted Ontology Construction System: Focus on Bootstrapping Capabilities Computer-assisted Ontology Construction System: Focus on Bootstrapping Capabilities Omar Qawasmeh 1, Maxime Lefranois 2, Antoine Zimmermann 2, Pierre Maret 1 1 Univ. Lyon, CNRS, Lab. Hubert Curien UMR

More information

Knowledge Base Population and Visualization Using an Ontology based on Semantic Roles

Knowledge Base Population and Visualization Using an Ontology based on Semantic Roles Knowledge Base Population and Visualization Using an Ontology based on Semantic Roles Maryam Siahbani, Ravikiran Vadlapudi, Max Whitney, and Anoop Sarkar Simon Fraser University, School of Computing Science

More information

Automatic Detection of Outdated Information in Wikipedia Infoboxes. Thong Tran 1 and Tru H. Cao 2

Automatic Detection of Outdated Information in Wikipedia Infoboxes. Thong Tran 1 and Tru H. Cao 2 Automatic Detection of Outdated Information in Wikipedia Infoboxes Thong Tran 1 and Tru H. Cao 2 1 Da Lat University and John von Neumann Institute - VNUHCM thongt@dlu.edu.vn 2 Ho Chi Minh City University

More information

Reusing Linguistic Resources: Tasks and Goals for a Linked Data Approach

Reusing Linguistic Resources: Tasks and Goals for a Linked Data Approach Reusing Linguistic Resources: Tasks and Goals for a Linked Data Approach Marieke van Erp Abstract There is a need to share linguistic resources, but reuse is impaired by a number of constraints including

More information

A Scalable Approach to Incrementally Building Knowledge Graphs

A Scalable Approach to Incrementally Building Knowledge Graphs A Scalable Approach to Incrementally Building Knowledge Graphs Gleb Gawriljuk 1, Andreas Harth 1, Craig A. Knoblock 2, and Pedro Szekely 2 1 Institute of Applied Informatics and Formal Description Methods

More information

Capturing the Currency of DBpedia Descriptions and Get Insight into their Validity

Capturing the Currency of DBpedia Descriptions and Get Insight into their Validity Capturing the Currency of DBpedia Descriptions and Get Insight into their Validity Anisa Rula 1, Luca Panziera 2, Matteo Palmonari 1, and Andrea Maurino 1 1 University of Milano-Bicocca {rula palmonari

More information

Linked Data Profiling

Linked Data Profiling Linked Data Profiling Identifying the Domain of Datasets Based on Data Content and Metadata Andrejs Abele «Supervised by Paul Buitelaar, John McCrae, Georgeta Bordea» Insight Centre for Data Analytics,

More information

Clustering Tweets Containing Ambiguous Named Entities Based on the Co-occurrence of Characteristic Terms

Clustering Tweets Containing Ambiguous Named Entities Based on the Co-occurrence of Characteristic Terms DEIM Forum 2016 C5-5 Clustering Tweets Containing Ambiguous Named Entities Based on the Co-occurrence of Characteristic Terms Maike ERDMANN, Gen HATTORI, Kazunori MATSUMOTO, and Yasuhiro TAKISHIMA KDDI

More information

STS Infrastructural considerations. Christian Chiarcos

STS Infrastructural considerations. Christian Chiarcos STS Infrastructural considerations Christian Chiarcos chiarcos@uni-potsdam.de Infrastructure Requirements Candidates standoff-based architecture (Stede et al. 2006, 2010) UiMA (Ferrucci and Lally 2004)

More information

Meta Search Engine Powered by DBpedia

Meta Search Engine Powered by DBpedia 2011 International Conference on Semantic Technology and Information Retrieval 28-29 June 2011, Putrajaya, Malaysia Meta Search Engine Powered by DBpedia Boo Vooi Keong UMS-MIMOS Center of Excellence in

More information

WebIsALOD: Providing Hypernymy Relations extracted from the Web as Linked Open Data

WebIsALOD: Providing Hypernymy Relations extracted from the Web as Linked Open Data WebIsALOD: Providing Hypernymy Relations extracted from the Web as Linked Open Data Sven Hertling and Heiko Paulheim Data and Web Science Group, University of Mannheim, Germany {sven,heiko}@informatik.uni-mannheim.de

More information

A General Approach to Query the Web of Data

A General Approach to Query the Web of Data A General Approach to Query the Web of Data Xin Liu 1 Department of Information Science and Engineering, University of Trento, Trento, Italy liu@disi.unitn.it Abstract. With the development of the Semantic

More information

Semantic Suggestions in Information Retrieval

Semantic Suggestions in Information Retrieval The Eight International Conference on Advances in Databases, Knowledge, and Data Applications June 26-30, 2016 - Lisbon, Portugal Semantic Suggestions in Information Retrieval Andreas Schmidt Department

More information

Natural Language Processing with PoolParty

Natural Language Processing with PoolParty Natural Language Processing with PoolParty Table of Content Introduction to PoolParty 2 Resolving Language Problems 4 Key Features 5 Entity Extraction and Term Extraction 5 Shadow Concepts 6 Word Sense

More information

4 th Linked Data on the Web Workshop (LDOW 2011)

4 th Linked Data on the Web Workshop (LDOW 2011) WWW 2011 29th March 2011, Hyderabad, India 4 th Linked Data on the Web Workshop (LDOW 2011) Christian Bizer, Freie Universität Berlin, Germany Tom Heath, Talis, UK Tim Berners-Lee, W3C/MIT, USA Michael

More information

Utilizing, creating and publishing Linked Open Data with the Thesaurus Management Tool PoolParty

Utilizing, creating and publishing Linked Open Data with the Thesaurus Management Tool PoolParty Utilizing, creating and publishing Linked Open Data with the Thesaurus Management Tool PoolParty Thomas Schandl, Andreas Blumauer punkt. NetServices GmbH, Lerchenfelder Gürtel 43, 1160 Vienna, Austria

More information

Augmenting Video Search with Linked Open Data

Augmenting Video Search with Linked Open Data Augmenting Video Search with Linked Open Data Jörg Waitelonis, Harald Sack (Hasso-Plattner-Institute Potsdam, Germany {joerg.waitelonis harald.sack}@hpi.uni-potsdam.de) Abstract: Linked Open Data has become

More information

A service based on Linked Data to classify Web resources using a Knowledge Organisation System

A service based on Linked Data to classify Web resources using a Knowledge Organisation System A service based on Linked Data to classify Web resources using a Knowledge Organisation System A implementation to classify Open Educational Resources Janneth Chicaiza, Nelson Piedra and Jorge López Universidad

More information

Semantiska webben DFS/Gbg

Semantiska webben DFS/Gbg 1 Semantiska webben 2010 DFS/Gbg 100112 Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS) With thanks to Ivan for many slides 2 Trends and forces: Technology Internet

More information

Shortipedia. Aggregating and Curating Semantic Web Data. {varunr

Shortipedia. Aggregating and Curating Semantic Web Data. {varunr Shortipedia Aggregating and Curating Semantic Web Data Denny Vrandečić 1,2, Varun Ratnakar 2, Markus Krötzsch 3, Yolanda Gil 2 1 Institute AIFB, KIT Karlsruhe Institute of Technology, Karlsruhe, Germany

More information

Knowledge Base Population and Visualization Using an Ontology based on Semantic Roles

Knowledge Base Population and Visualization Using an Ontology based on Semantic Roles Knowledge Base Population and Visualization Using an Ontology based on Semantic Roles Maryam Siahbani, Ravikiran Vadlapudi, Max Whitney, and Anoop Sarkar Simon Fraser University, School of Computing Science

More information

A Robust Number Parser based on Conditional Random Fields

A Robust Number Parser based on Conditional Random Fields A Robust Number Parser based on Conditional Random Fields Heiko Paulheim Data and Web Science Group, University of Mannheim, Germany Abstract. When processing information from unstructured sources, numbers

More information

Prof. Dr. Christian Bizer

Prof. Dr. Christian Bizer 28th British National Conference on Databases (BNCOD2011) July 12 th, 2011, Manchester, UK Evolving the Web into a Global Data Space Prof. Dr. Christian Bizer Freie Universität ität Berlin Germany Outline

More information

Semantic Integration with Apache Jena and Apache Stanbol

Semantic Integration with Apache Jena and Apache Stanbol Semantic Integration with Apache Jena and Apache Stanbol All Things Open Raleigh, NC Oct. 22, 2014 Overview Theory (~10 mins) Application Examples (~10 mins) Technical Details (~25 mins) What do we mean

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2017 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 5 http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2465 1 Semantic

More information

BUILDING THE SEMANTIC WEB

BUILDING THE SEMANTIC WEB BUILDING THE SEMANTIC WEB You might have come across the term Semantic Web Applications often, during talks about the future of Web apps. Check out what this is all about There are two aspects to the possible

More information

Semantic Cloud Generation based on Linked Data for Efficient Semantic Annotation

Semantic Cloud Generation based on Linked Data for Efficient Semantic Annotation Semantic Cloud Generation based on Linked Data for Efficient Semantic Annotation - Korea-Germany Joint Workshop for LOD2 2011 - Han-Gyu Ko Dept. of Computer Science, KAIST Korea Advanced Institute of Science

More information

Comparative Study of RDB to RDF Mapping using D2RQ and R2RML Mapping Languages

Comparative Study of RDB to RDF Mapping using D2RQ and R2RML Mapping Languages International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 10, Number 1 (2018), pp. 23-36 International Research Publication House http://www.irphouse.com Comparative Study of

More information

Semi-Automatic Quality Assessment of Linked Data without Requiring Ontology

Semi-Automatic Quality Assessment of Linked Data without Requiring Ontology Semi-Automatic Quality Assessment of Linked Data without Requiring Ontology Saemi Jang, Megawati, Jiyeon Choi, and Mun Yong Yi Department of Knowledge Service Engineering, KAIST {sammy1221,megawati,jeeyeon51,munyi}@kaist.ac.kr

More information

Linking Thesauri to the Linked Open Data Cloud for Improved Media Retrieval

Linking Thesauri to the Linked Open Data Cloud for Improved Media Retrieval biblio.ugent.be The UGent Institutional Repository is the electronic archiving and dissemination platform for all UGent research publications. Ghent University has implemented a mandate stipulating that

More information

Semantic Technologies and CDISC Standards. Frederik Malfait, Information Architect, IMOS Consulting Scott Bahlavooni, Independent

Semantic Technologies and CDISC Standards. Frederik Malfait, Information Architect, IMOS Consulting Scott Bahlavooni, Independent Semantic Technologies and CDISC Standards Frederik Malfait, Information Architect, IMOS Consulting Scott Bahlavooni, Independent Part I Introduction to Semantic Technology Resource Description Framework

More information

Building Blocks of Linked Data

Building Blocks of Linked Data Building Blocks of Linked Data Technological foundations Identifiers: URIs Data Model: RDF Terminology and Semantics: RDFS, OWL 23,019,148 People s Republic of China 20,693,000 population located in capital

More information

DRX: A LOD browser and dataset interlinking recommendation tool

DRX: A LOD browser and dataset interlinking recommendation tool Undefined 1 (2009) 1 5 1 IOS Press DRX: A LOD browser and dataset interlinking recommendation tool Editor(s): Name Surname, University, Country Solicited review(s): Name Surname, University, Country Open

More information

Advances in Data Management - Web Data Integration A.Poulovassilis

Advances in Data Management - Web Data Integration A.Poulovassilis Advances in Data Management - Web Data Integration A.Poulovassilis 1 1 Integrating Deep Web Data Traditionally, the web has made available vast amounts of information in unstructured form (i.e. text).

More information

Tools and Infrastructure for Supporting Enterprise Knowledge Graphs

Tools and Infrastructure for Supporting Enterprise Knowledge Graphs Tools and Infrastructure for Supporting Enterprise Knowledge Graphs Sumit Bhatia, Nidhi Rajshree, Anshu Jain, and Nitish Aggarwal IBM Research sumitbhatia@in.ibm.com, {nidhi.rajshree,anshu.n.jain}@us.ibm.com,nitish.aggarwal@ibm.com

More information

An Implementation of LOD Instance Development System using Schema- Instance Layer Separation

An Implementation of LOD Instance Development System using Schema- Instance Layer Separation An Implementation of LOD Instance Development System using Schema- Instance Layer Separation Heekyung Moon*, Zhanfang Zhao**, Jintak Choi*** and Sungkook Han* * Department of Computer Engimeering, College

More information

Revisiting Blank Nodes in RDF to Avoid the Semantic Mismatch with SPARQL

Revisiting Blank Nodes in RDF to Avoid the Semantic Mismatch with SPARQL Revisiting Blank Nodes in RDF to Avoid the Semantic Mismatch with SPARQL Marcelo Arenas 1, Mariano Consens 2, and Alejandro Mallea 1,3 1 Pontificia Universidad Católica de Chile 2 University of Toronto

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION Most of today s Web content is intended for the use of humans rather than machines. While searching documents on the Web using computers, human interpretation is required before

More information

Towards Improving the Quality of Knowledge Graphs with Data-driven Ontology Patterns and SHACL

Towards Improving the Quality of Knowledge Graphs with Data-driven Ontology Patterns and SHACL Towards Improving the Quality of Knowledge Graphs with Data-driven Ontology Patterns and SHACL Blerina Spahiu, Andrea Maurino, Matteo Palmonari University of Milano-Bicocca {blerina.spahiu andrea.maurino

More information

Discovering Semantic Relations from the Web and Organizing them with PATTY

Discovering Semantic Relations from the Web and Organizing them with PATTY Discovering Semantic Relations from the Web and Organizing them with PATTY Ndapandula Nakashole, Gerhard Weikum, Fabian Suchanek Max Planck Institute for Informatics, Saarbruecken, Germany {nnakasho,weikum,suchanek}@mpi-inf.mpg.de

More information

Accessing information about Linked Data vocabularies with vocab.cc

Accessing information about Linked Data vocabularies with vocab.cc Accessing information about Linked Data vocabularies with vocab.cc Steffen Stadtmüller 1, Andreas Harth 1, and Marko Grobelnik 2 1 Institute AIFB, Karlsruhe Institute of Technology (KIT), Germany {steffen.stadtmueller,andreas.harth}@kit.edu

More information

Weaving SIOC into the Web of Linked Data

Weaving SIOC into the Web of Linked Data Weaving SIOC into the Web of Linked Data Uldis Bojārs uldis.bojars@deri.org Richard Cyganiak richard@cyganiak.de ABSTRACT Social media sites can act as a rich source of large amounts of data by letting

More information

Leveraging Community-built Knowledge for Type Coercion in Question Answering

Leveraging Community-built Knowledge for Type Coercion in Question Answering Leveraging Community-built Knowledge for Type Coercion in Question Answering Aditya Kalyanpur, J William Murdock, James Fan and Chris Welty IBM Research, 19 Skyline Drive, Hawthorne NY 10532 {adityakal,

More information

Using Linked Data to Reduce Learning Latency for e-book Readers

Using Linked Data to Reduce Learning Latency for e-book Readers Using Linked Data to Reduce Learning Latency for e-book Readers Julien Robinson, Johann Stan, and Myriam Ribière Alcatel-Lucent Bell Labs France, 91620 Nozay, France, Julien.Robinson@alcatel-lucent.com

More information

Knowledge Based Systems Text Analysis

Knowledge Based Systems Text Analysis Knowledge Based Systems Text Analysis Dr. Shubhangi D.C 1, Ravikiran Mitte 2 1 H.O.D, 2 P.G.Student 1,2 Department of Computer Science and Engineering, PG Center Regional Office VTU, Kalaburagi, Karnataka

More information

Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency Trees

Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency Trees Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency Trees Alan Akbik, Oresti Konomi and Michail Melnikov Technische Univeristät Berlin Databases and Information

More information

Exploiting Semantics Where We Find Them

Exploiting Semantics Where We Find Them Vrije Universiteit Amsterdam 19/06/2018 Exploiting Semantics Where We Find Them A Bottom-up Approach to the Semantic Web Prof. Dr. Christian Bizer Bizer: Exploiting Semantics Where We Find Them. VU Amsterdam,

More information

Semantic Web. Tahani Aljehani

Semantic Web. Tahani Aljehani Semantic Web Tahani Aljehani Motivation: Example 1 You are interested in SOAP Web architecture Use your favorite search engine to find the articles about SOAP Keywords-based search You'll get lots of information,

More information

arxiv: v1 [cs.ir] 28 Aug 2017

arxiv: v1 [cs.ir] 28 Aug 2017 On Type-Aware Entity Retrieval arxiv:178.8291v1 [cs.ir] 28 Aug 217 ABSTRACT Darío Garigliotti University of Stavanger dario.garigliotti@uis.no Today, the practice of returning entities from a knowledge

More information

a paradigm for the Semantic Web Linked Data Angelica Lo Duca IIT-CNR Linked Open Data:

a paradigm for the Semantic Web Linked Data Angelica Lo Duca IIT-CNR Linked Open Data: Linked Data Angelica Lo Duca IIT-CNR angelica.loduca@iit.cnr.it Linked Open Data: a paradigm for the Semantic Web Linked Data are a series of best practices to connect structured data through the Web.

More information

SPARQL Protocol And RDF Query Language

SPARQL Protocol And RDF Query Language SPARQL Protocol And RDF Query Language John Julian Carstens March 15, 2012 1 Introduction Beyond doubt, the world wide web has become central to the business reality of companies and to the personal reality

More information

Linked data and its role in the semantic web. Dave Reynolds, Epimorphics

Linked data and its role in the semantic web. Dave Reynolds, Epimorphics Linked data and its role in the semantic web Dave Reynolds, Epimorphics Ltd @der42 Roadmap What is linked data? Modelling Strengths and weaknesses Examples Access other topics image: Leo Oosterloo @ flickr.com

More information

Semantic Annotation and Linking of Medical Educational Resources

Semantic Annotation and Linking of Medical Educational Resources 5 th European IFMBE MBEC, Budapest, September 14-18, 2011 Semantic Annotation and Linking of Medical Educational Resources N. Dovrolis 1, T. Stefanut 2, S. Dietze 3, H.Q. Yu 3, C. Valentine 3 & E. Kaldoudi

More information

Introducing Linked Data

Introducing Linked Data Introducing Linked Data (Part of this work was funded by PlanetData NoE FP7/2007-2013) Irini Fundulaki 1 1 Institute of Computer Science FORTH & W3C Greece Office Manager EICOS : 4th Meeting, Athens, Greece

More information

Papers for comprehensive viva-voce

Papers for comprehensive viva-voce Papers for comprehensive viva-voce Priya Radhakrishnan Advisor : Dr. Vasudeva Varma Search and Information Extraction Lab, International Institute of Information Technology, Gachibowli, Hyderabad, India

More information