MASWS Natural Language and the Semantic Web

Similar documents
Populating the Semantic Web with Historical Text

Pulling Together, or

Linguaggi Logiche e Tecnologie per la Gestione Semantica dei testi

Text Mining for Software Engineering

Draft Dissertation: Constructing and Populating Ontologies from Text Documents and Database Fields

SWAD-Europe Deliverable 8.1 Core RDF Vocabularies for Thesauri

Populating the Semantic Web Combining Text and Relational Databases as RDF Graphs

STS Infrastructural considerations. Christian Chiarcos

Stanbol Enhancer. Use Custom Vocabularies with the. Rupert Westenthaler, Salzburg Research, Austria. 07.

Vocabulary Alignment for archaeological Knowledge Organization Systems

Metadata Common Vocabulary: a journey from a glossary to an ontology of statistical metadata, and back

Introduction and background

Multi-agent and Semantic Web Systems: Linked Open Data

Digital Public Space: Publishing Datasets

Vocabulary and Semantics in the Virtual Observatory

An Entity Name Systems (ENS) for the [Semantic] Web

GenTax: A Generic Methodology for Deriving OWL and RDF-S Ontologies from Hierarchical Classifications, Thesauri, and Inconsistent Taxonomies

Mapping between Digital Identity Ontologies through SISM

Knowledge Engineering with Semantic Web Technologies

Mustafa Jarrar: Lecture Notes on RDF Schema Birzeit University, Version 3. RDFS RDF Schema. Mustafa Jarrar. Birzeit University

Adding formal semantics to the Web

Semantic Web Test

Linked data and its role in the semantic web. Dave Reynolds, Epimorphics

0.1 Knowledge Organization Systems for Semantic Web

Ontology Development. Qing He

Wondering about either OWL ontologies or SKOS vocabularies? You need both!

Semantic Web. MPRI : Web Data Management. Antoine Amarilli Friday, January 11th 1/29

Multi-agent and Semantic Web Systems: RDF Data Structures

Information Extraction Techniques in Terrorism Surveillance

Semantic Web Systems Ontologies Jacques Fleuriot School of Informatics

Semantic Technologies and CDISC Standards. Frederik Malfait, Information Architect, IMOS Consulting Scott Bahlavooni, Independent

Scaling the Semantic Wall with AllegroGraph and TopBraid Composer. A Joint Webinar by TopQuadrant and Franz

Semantic Web Systems Introduction Jacques Fleuriot School of Informatics

Knowledge Representations. How else can we represent knowledge in addition to formal logic?

Building Blocks of Linked Data

Reducing Consumer Uncertainty

Customisable Curation Workflows in Argo

Semantiska webben DFS/Gbg

Today: RDF syntax. + conjunctive queries for OWL. KR4SW Winter 2010 Pascal Hitzler 3

Semantic Web and Natural Language Processing

Logic and Reasoning in the Semantic Web (part I RDF/RDFS)

Forward Chaining Reasoning Tool for Rya

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Linked Open Data: a short introduction

Prof. Dr. Christian Bizer

Outline RDF. RDF Schema (RDFS) RDF Storing. Semantic Web and Metadata What is RDF and what is not? Why use RDF? RDF Elements

LAB 3: Text processing + Apache OpenNLP

Text mining tools for semantically enriching the scientific literature

Ontology Summit2007 Survey Response Analysis. Ken Baclawski Northeastern University

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Publishing Vocabularies on the Web. Guus Schreiber Antoine Isaac Vrije Universiteit Amsterdam

Proposal for Implementing Linked Open Data on Libraries Catalogue

Visual Concept Detection and Linked Open Data at the TIB AV- Portal. Felix Saurbier, Matthias Springstein Hamburg, November 6 SWIB 2017

Deep integration of Python with Semantic Web technologies

Making BioPAX SPARQL

Semantic Integration with Apache Jena and Apache Stanbol

5 RDF and Inferencing

Having Triplets Holding Cultural Data as RDF

Semantic Annotation, Search and Analysis

Introduction to Text Mining. Hongning Wang

Semantic Image Retrieval Based on Ontology and SPARQL Query

Annotating Spatio-Temporal Information in Documents

Semantic Days 2011 Tutorial Semantic Web Technologies

Bringing the Semantic Web closer to reality PostgreSQL as RDF Graph Database

Lessons Learned from a Greenhorn Ontologist

Unit 2 RDF Formal Semantics in Detail

Contents. G52IWS: The Semantic Web. The Semantic Web. Semantic web elements. Semantic Web technologies. Semantic Web Services

Knowledge Representation VII - IKT507. SPARQL stands for SPARQL Protocol And RDF Query Language

Historical Text Mining:

Solving problem of semantic terminology in digital library

BUILDING THE SEMANTIC WEB

The AGROVOC Concept Scheme - A Walkthrough

Understanding Billions of Triples with Usage Summaries

A Semantic MediaWiki-Empowered Terminology Registry

Open PermID Release Notes R1.9

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM

The R2R Framework: Christian Bizer, Andreas Schultz. 1 st International Workshop on Consuming Linked Data (COLD2010) Freie Universität Berlin

Best Practices for World-Class Search

An Approach To Web Content Mining

INFORMATION EXTRACTION

Accessing information about Linked Data vocabularies with vocab.cc

Metadata Standards and Applications. 6. Vocabularies: Attributes and Values

Jumpstarting the Semantic Web

Formalising the Semantic Web. (These slides have been written by Axel Polleres, WU Vienna)

Final Project Discussion. Adam Meyers Montclair State University

Linked Open Data Cloud. John P. McCrae, Thierry Declerck

Helmi Ben Hmida Hannover University, Germany

Knowledge Representation for the Semantic Web

A tool for Cross-Language Pair Annotations: CLPA

Opus: University of Bath Online Publication Store

Data is the new Oil (Ann Winblad)

RPI INSIDE DEEPQA INTRODUCTION QUESTION ANALYSIS 11/26/2013. Watson is. IBM Watson. Inside Watson RPI WATSON RPI WATSON ??? ??? ???

LexGrid Philosophy, Model and Interfaces Harold R Solbrig Division of Biomedical Statistics and Informatics Mayo Clinic

Machine Learning in GATE

THE GETTY VOCABULARIES TECHNICAL UPDATE

Semantic Similarity Assessment to Browse Resources exposed as Linked Data: an Application to Habitat and Species Datasets R. Albertoni, M.

Introduction to Information Extraction (IE) and ANNIE

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

BD003: Introduction to NLP Part 2 Information Extraction

Europeana update: aspects of the data

Transcription:

MASWS Natural Language and the Semantic Web Kate Byrne School of Informatics 14 February 2011 1 / 27

Populating the Semantic Web why and how? text to RDF Organising the Triples schema design open issues and problems References 2 / 27

Populating the Semantic Web The Semantic Web needs data why and how? Semantic Web first proposed in 1999 Why hasn t it taken off the way WWW did? not enough data I ll delay adding my data queries aren t useful 3 / 27

How to get data Populating the Semantic Web why and how? How do we populate the Semantic Web? create convert structured data unstructured and semi structured 4 / 27

Populating the Semantic Web why and how? Semi-structured content is everywhere 5 / 27

Populating the Semantic Web Why convert unstructured text? why and how? Extracting RDF triples from free text is not easy So why is it worth doing? there was knowledge before WWW (!) high quality material, often historical big digitisation push in 1990s (and again in 2011?) but still challenging to search text effectively natural language is our preferred way of communicating Can the web learn to talk in natural language...? 6 / 27

Populating the Semantic Web why and how? 7 / 27

Populating the Semantic Web Natural Language Processing txt2rdf Extraction of semantic content from natural language text Systems now available Open Calais, Powerset, AlchemyAPI To explain the process: part of my PhD work Kate likes chocolate. Think of RDF triples as declarative sentences: RDF nodes = nouns; RDF arcs = verbs :kate :likes :chocolate prefix : <http://www.ltg.ed.ac.uk/> 8 / 27

Populating the Semantic Web txt2rdf Converting text to RDF fancy NLP processing and RDFisation 9 / 27

Populating the Semantic Web Example RCAHMS text site 20 txt2rdf 10 / 27

Populating the Semantic Web The txt2rdf process in a nutshell txt2rdf First find the important nouns, or named entities : Example Mr. Peter Moar found a small hoard of 5 polished stone knives on the same patch in May 1946. Then look for relationships between them: Mr. Peter Moar found a small hoard of 5 polished stone knives on the same patch in May 1946. 11 / 27

sfsjksjwjvssjkljljs sd lajoen s jjs kjdlk lksjlkj sks oihhg sk jjlkjlj jljbjl skj ekw My txt2rdf pipeline Populating the Semantic Web txt2rdf Text documents Pre processing Named Entity Recognition tokenise sentence and para split POS tag multi word tokens and features list of NEs and classes trained NER model generate triples remove unwanted relations attach siteids trained RE model set of NE pairs and features list of relations and classes Graph of triples RDF translation Relation Extraction 12 / 27

Populating the Semantic Web Dealing with EVENTs txt2rdf Mr. Peter Moar found a small hoard of 5 polished stone knives on the same patch in May 1946. FIND event is n-ary relation: eventrel(eventagent, eventagentrole, eventdate, eventpatient, eventplace) find123( Mr. Peter Moar,, May 1946, polished stone knives, Hill of Shurton ) Split event relation into RDF triples :find123 :hasagent :petermoar. :find123 :hasdate :may1946. :find123 :haspatient :polishedstoneknives. :find123 :haslocation :hillofshurton. 13 / 27

Populating the Semantic Web Archaeological events txt2rdf Example was from RCAHMS data: http://canmore.rcahms.gov.uk/en/site/1102/details/hill+of+shurton/ Extracting events allows structured searches Question: What do we do with NEs that are not in relations? For more information Byrne and Klein (2009) Automatic Extraction of Archaeological Events from Text. 14 / 27

Populating the Semantic Web Available NLP systems txt2rdf OpenCalais: http://viewer.opencalais.com/ Powerset: now part of Microsoft s Bing AlchemyAPI: http://www.alchemyapi.com/api/demo.html Trained on lots of (English) text, mostly news articles Many general-purpose NLP tools available online, eg nltk (http://www.nltk.org/) Edinburgh LTG software (http://www.ltg.ed.ac.uk/software) NaCTeM (http://nactem.ac.uk/) 15 / 27

Organising the Triples schema design Populating the Semantic Web why and how? text to RDF Organising the Triples schema design open issues and problems References 16 / 27

Organising the Triples Do we have a data schema? schema design How do we populate the Semantic Web? create convert structured data unstructured and semi structured 16 / 27

Organising the Triples schema design A schema for free text So far we ve looked at instance data (ABox statements) Where is framework to slot instances into? Named Entity categories: PERSON, PLACE, DATE, etc NE categories become RDFS classes: Mr. Peter Moar found a small hoard of 5 polished stone knives on the same patch in May 1946. Entities are typed by their NE class :petermoar rdf:type :person. :found rdf:type :event. :polishedstoneknives rdf:type :artefact. :may1946 rdf:type :date. 17 / 27

Organising the Triples schema design RDFS class hierarchy NE categories usually flat list, not hierarchical My set: ORG, PERSNAME, ROLE, SITETYPE, ARTEFACT, PLACE, SITENAME, ADDRESS, PERIOD, DATE, EVENT Contrast with structured data Hand-designed hierarchy (with rdfs:subclassof ): http://www.ltg.ed.ac.uk/tether/ siteid agent loc time event classn org person role date period sitetype objtype place sitename address grid survey excavation find visit description creation alteration 18 / 27

Property labels Organising the Triples schema design Can we extract RDF predicate set automatically from text? Eg clustering commonly occurring verbs But schema design only has to be done once Property set and class hierarchy are related My set: hasevent, hasagent, hasagentrole, hasperiod, haspatient, haslocation, hasclassn, hasobject, partof Plus standard ones like owl:sameas, rdfs:seealso, skos:broader Flat list (no rdfs:subpropertyof ) 19 / 27

Organising the Triples schema design Existing schemas and vocabularies Lots of vocabularies to choose from http://esw.w3.org/topic/vocabularymarket RDFS and OWL are fundamental SKOS translate existing domain thesauri http://www.w3.org/2004/02/skos/ Reuse where possible; invent local scheme where not Incentive to reuse is strong 20 / 27

Organising the Triples Issues around schema design open issues and problems Schema granularity more categories to choose from NLP less accurate so class hierarchy quite coarse and flat Schema discovery tension between simplicity and query power how will software agents understand your schema? 21 / 27

Organising the Triples Issues around NLP accuracy open issues and problems Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc) NLP is not 100% accurate! how many false statements can Semantic Web stand? the computer said it, so it must be true Even given canonical URI :petermoar for our Peter Moar...... can we distinguish him from others with same name? 22 / 27

Grounding local data Organising the Triples open issues and problems Grounding: tie unique canonical URIs to authoritative reference Is http://www.ltg.ed.ac.uk/tether/loc/place#edinburgh same as http://www.geonames.org/2650225/edinburgh.html? Grounding during NLP processing impractical Use owl:sameas? redundant triples, more complex queries Who maintains the authoritative lists of URIs? Specialist thesauri, eg archaeological classification terms like Stone setting 23 / 27

Organising the Triples Grounding specialist terminology open issues and problems event:excavation date:1982 :hasperiod rdf:type "An arrangement of two or more standing stones" :hasevent event:excavation20w158 siteid:site20 :haslocation "stone setting" sitetype:religious+ritual+and+funerary rdfs:label :hasclassn skos:broader skos:scopenote sitetype:stone+setting sitetype:standing+stone sitetype:stone+circle sitetype:stone+row skos:related rdf:type sitetype:stone+settings20w179 rdfs:subclassof sitetype: sitename:sands+of+breckon :haslocation :haslocation address:hp50nw+11.01+hp+5304+0519 address:breckon Aim: ground place names, people, organisations, etc. 24 / 27

Linked Data Organising the Triples open issues and problems 25 / 27

Organising the Triples open issues and problems Grounding turns data into Linked Data grounding local URIs against "authority" nodes is the next big challenge! 26 / 27

References to follow up References NLP software for Semantic Web applications OpenCalais: http://opencalais.com/ AlchemyAPI: http://www.alchemyapi.com/api/demo.html VocabularyMarket: http://esw.w3.org/topic/vocabularymarket Case study Byrne and Klein (2009) Automatic Extraction of Archaeological Events from Text. CAA2009, Williamsburg, VA. General references for NLP Natural Language Processing with Python, Bird, Klein and Loper (http://www.nltk.org/book) Speech and Language Processing, Jurafsky and Martin (http://www.cs.colorado.edu/ martin/slp.html) 27 / 27