JOHN SHEPHERDSON... AUTOMATIC KEYWORD GENERATION TECHNICAL SERVICES DIRECTOR UK DATA ARCHIVE UNIVERSITY OF ESSEX...

Size: px
Start display at page:

Download "JOHN SHEPHERDSON... AUTOMATIC KEYWORD GENERATION TECHNICAL SERVICES DIRECTOR UK DATA ARCHIVE UNIVERSITY OF ESSEX..."

Transcription

1 AUTOMATIC KEYWORD GENERATION.... JOHN SHEPHERDSON... TECHNICAL SERVICES DIRECTOR UNIVERSITY OF ESSEX... DEVCON1 UNIVERSITY OF ESSEX 12 APRIL 2013

2 Abstract HASSET is a subject thesaurus that has been developed by the UK Data Archive over more than 20 years. It provides a standard set of key words that can be used to tag items that have content relating to the fields of Humanities and Social Science. In the past the tagging has been carried out by human experts, but we have recently developed a prototype system that can generate a list of keywords automatically, given an arbitrary piece of text. Learn about our experiences of generating keywords automatically, and the evaluation of the results See a demo of automated keyword generation Although we have used HASSET (Humanities and Social Science Electronic Thesaurus) keywords, the approach holds for other keyword sets.

3 Overview brief background: the UK Data Archive and the UK Data Service cataloguing practices and thesaurus tools SKOS-HASSET RDF version of thesaurus auto generation of keywords technologies used evaluation findings application to web page metadata acknowledgements and questions

4 The UK Data Archive and the UK Data Service based at the University of Essex since 1967 curator of the largest collection of digital data in the social sciences and humanities in the UK see data-archive.ac.uk for more details makes these available via the new UK Data Service UK Data Service also provides value-added services for UK Census data, government surveys and beyond UK Data Service includes Universities of Essex, Manchester (Mimas, CCSR), Leeds, Southampton, Edinburgh (Edina) and University College London See ukdataservice.ac.uk for more details

5 The UK Data Service: cataloguing standards the UK Data Service indexes over 5000 digital data collections and the number is ever growing all catalogued at thematic level many also indexed at variable level available via Discover ukdataservice.ac.uk and every data collection is indexed with HASSET terms HASSET is employed in the search

6 HASSET multidisciplinary thesaurus developed originally to support the UK Data Archive/UK Data Service collections coverage in the core subject areas of social science uses standard hierarchical relationships: TT (top term) BT (broader term) NT (narrower term) RT (related term) USE (from non-preferred term to preferred term); UF (from preferred term to non-preferred term). role of HASSET in the Archive is twofold: used internally for indexing studies and series with HASSET terms also a separate product licensed to others

7 ELSST European Language Social Science Thesaurus (ELSST) is a multi-lingual thesaurus, based on core English terms taken from HASSET translated into 11 languages (with more on the way) closely connected with HASSET, but must demonstrate international applicability of all terms

8 Applying SKOS to HASSET SKOS/RDF what is RDF? Resource Description Framework (RDF) describes data using simple format subject predicate object e.g. car hascolour red So, what is SKOS? Simple Knowledge Organization System SKOS is set of RDF predicates to describe relationships between thesaurus terms e.g. skos:concept162 skos:preflabel CAR e.g. skos:concept162 skos:altlabel AUTOMOBILE it encodes these products in a standardised way to make their structures comparable and to facilitate interaction

9 Applying SKOS to HASSET (2) SKOS has been applied to HASSET persistence via GUIDs version control all terms date stamped all changes recorded live versions of thesaurus products (SKOS-HASSET, ELSST) made at agreed, regular intervals with recognised annual major incremental versioning we are using Pubby to publish our SKOS provides Linked Data interface to RDF data held in BrightstarDB triple store

10 Automated indexing: four corpora Nesstar questions/variables (humanly indexed during project) 17.5k Questionnaires 2.5k catalogue records 5.5k publications (case studies and support/how to guides) 0.25k

11 Automated indexing: the task automatically index the four corpora, using HASSET terms evaluate the results present as a case study (via SKOS-HASSET blog) pre-processing tasks: conversion of PDFs to text extraction of metadata (manual keywords) some were embedded within PDFs others held externally in databases extraction of the data into two file types:.txt (actual text) and.key (gold-standard keywords)

12 Automated indexing: experimental work three methodologies used: Term Frequency/Inverse Document Frequency (TF/IDF) model Keyphrase Extraction Algorithm (KEA) Solr search

13 TF/IDF model our text sample was small so we considered this model, which requires no training data, first processed 2.5k SQB documents no controlled vocabulary results: keywords returned with low domain-specific information, although ours is a domain-specific collection mapping extracted keywords to HASSET returned few matches e.g. it failed to find matches for the keyword Liberal Party although it exists in HASSET but in a different form (BRITISH LIBERAL PARTY) and (LIBERAL PARTY (GREAT BRITAIN))

14 Keyphrase Extraction Algorithm (KEA) keyword indexing using a Controlled Vocabulary uses training data (based on keyword coverage) builds a classifier training model (WEKA) the algorithm is based on machine learning and works in two main steps: candidate term identification identifies phrases (n-grams) from the text and maps these to HASSET filtering uses a learned model (from our training data) to identify the most significant keywords based on features

15 Keyphrase Extraction Algorithm (KEA) (2) created a training model using human indexer s keywords 80% of text used for training 20% of text used for testing uses SKOS-HASSET as controlled vocabulary used stop-word list and trained KEA to avoid method terms ( do you think, closest to your view )

16

17 KEA automated indexing: Step by step Wrapped KEA Jar file with client Training mode java -jar kea.jar -m <output:model_location> -t <input:data_location> Generation mode java -jar kea.jar -d <input:data_location> -m <input:model_location> -n <output:max_no_keywords> Can also set: thesaurus/cv file and format; stemmer type document language and encoding; stopwords file; min and max no of words in a phrase; minimum occurrence of a phrase in a document

18 KEA automated indexing: Step by step (2) Evaluation Used PowerShell to generate spreadsheet for each document in corpus, manual vs auto keywords Used PowerShell to generate one summary spreadsheet for each corpus: F1, recall, precision.

19 KEA automated indexing: results (Recall and Precision) broad Recall scores: case studies/support guides 0.73 SQB 0.5 Nesstar 0.36 catalogue records 0.2 (low) this suggests that KEA could be usefully employed to suggest new relevant terms for full-text corpora broad precision scores: SQB 0.47 Catalogue records 0.42 Nesstar 0.41 Case studies/support guides 0.25 overall, this suggests that KEA keywords are very often relevant

20 KEA automated indexing: results (more) Little overlap between KEA keywords and manual keywords (on average KEA found keywords per document across the four corpora, of which only 2.33 were exact matches with the manual keywords) However, a high percentage of KEA keywords were considered relevant/suitable even if they were not exact matches: 33% for the SQB corpus with an average of 25% across all four corpora KEA could be a very useful tool for indexers

21 Solr indexing and search runs against SKOS-HASSET searches every word finds phrases as well as single words uses stop words returns preferred term if synonym is found uses a non-aggressive stemming approach demo

22 Solr indexing and search (2) uses inverted search index SKOS-HASSET RDF to create Solr core text entered is used to search core one word at a time phrase-matching achieved be de-inverting search used because text input (1000 chars) much smaller than thesaurus (7,0000 words) multilingualism can be achieved by translating SKOS- HASSET and having a core per input language

23 Automated indexing: findings KEA training needs large corpus and takes tens of minutes generation too slow to run in real-time Solr no training size of corpus largely immaterial very fast can use in real time no learning, cannot suggest new terms

24 Automated indexing: findings (2) Both can easily use with different thesaurus can easily extend stop words BUT more work is needed to investigate further and to see how could be incorporated technically, and in terms of process, into our systems

25 Crude Comparison Solr and Kea Used same 10 abstracts from Catalogue Records as input Kea found 99 unique HASSET terms Solr found 231 unique HASSET terms So Solr is better than Kea? Not so fast Kea found 24 not found by Solr Kea found 3 phrases only partially found by Solr e.g. INFORMATION/LIBRARY SYSTEMS AND SERVICES vs INFORMATION

26 KEA tag cloud

27 Solr tag cloud

28 Application to web page metadata tags Theoretical approach: For each page, identify content section(s) Feed content in to keyword generator Optionally review suggested keywords Insert keywords in to metadata tags

29 Application to web page metadata tags Example: UK Data Service about our data page Generated keywords (Solr): ARCHIVES, CATALOGUES, CENTRAL GOVERNMENT, DATA, DIARIES, ECONOMIC INDICATORS, ECONOMICS, FIELDS, GOVERNMENT, GRANTS, IMAGE, MARKET RESEARCH, MATERIALS, PAPER, PHOTOGRAPHS, REPORTS, RESEARCH, RESEARCH GRANTS, SURVEYS, TEACHING

30 Want to find out more? Using Solr search in a.net environment, Matthew Brumpton, Breakout 4 (13:30-14:30) SKOS-HASSET browser: HASSET browser: SKOS-HASSET Project web site: SKOS-HASSET blog:

31 Acknowledgements The automatic keyword generation work described here was undertaken as part of the JISC-funded SKOS-HASSET project Project Manager: Lucy Bell Evaluators: Lorna Balkan, Suzanne Barbalet KEA programming: Mahmoud El-Haj SKOS/RDF programming: Darren Bell Solr programming: Oscar Dovao

32 Questions?

33 CONTACT UNIVERSITY OF ESSEX WIVENHOE PARK COLCHESTER ESSEX CO4 3SQ..... T +44 (0) E info@data-archive.ac.uk data-archive.ac.uk

Innovation in Thesaurus Management

Innovation in Thesaurus Management Innovation in Thesaurus Management Lucy Bell Management Information Manager UK Data Archive IASSIST 2013, Cologne 31 May 2013 Two thesauri; two projects SKOS-HASSET 10 month, Jisc-funded project to enhance

More information

PERSISTENT IDENTIFIERS FOR THE UK: SOCIAL AND ECONOMIC DATA

PERSISTENT IDENTIFIERS FOR THE UK: SOCIAL AND ECONOMIC DATA PERSISTENT IDENTIFIERS FOR THE UK: SOCIAL AND ECONOMIC DATA MATTHEW WOOLLARD.. ECONOMIC AND SOCIAL DATA SERVICE UNIVERSITY OF ESSEX... METADATA AND PERSISTENT IDENTIFIERS FOR SOCIAL AND ECONOMIC DATA,

More information

Implementing RESTful endpoints with BaseX. Darren Bell, Data & Services Developer, UK Data Archive 17 April 2014

Implementing RESTful endpoints with BaseX. Darren Bell, Data & Services Developer, UK Data Archive 17 April 2014 Implementing RESTful endpoints with BaseX Darren Bell, Data & Services Developer, UK Data Archive 17 April 2014 UK Data Archive based at the University of Essex since 1967 curator of the UK s largest collection

More information

Survey Question Bank: End of Award Report

Survey Question Bank: End of Award Report Survey Question Bank: End of Award Report The Survey Question Bank (SQB) is a service providing a set of online survey research resources. It was set up as one strand of the ESRC-funded Survey Resources

More information

Organizing Economic Information

Organizing Economic Information Organizing Economic Information An Overview of Application and Reuse Scenarios of an Economics Knowledge Organization System Tobias Rebholz, Andreas Oskar Kempf, Joachim Neubert ZBW Leibniz Information

More information

Natural Language Processing with PoolParty

Natural Language Processing with PoolParty Natural Language Processing with PoolParty Table of Content Introduction to PoolParty 2 Resolving Language Problems 4 Key Features 5 Entity Extraction and Term Extraction 5 Shadow Concepts 6 Word Sense

More information

The UNESCO Thesaurus

The UNESCO Thesaurus UNESCO Thesaurus Thésaurus de l UNESCO Tesauro de la UNESCO Тезаурус ЮНЕСКО The UNESCO Thesaurus UN-LINKS Meeting 28-30 November Meron Ewketu UNESCO Library Introduction There are too many ways of expressing

More information

Copyright 2012 Taxonomy Strategies. All rights reserved. Semantic Metadata. A Tale of Two Types of Vocabularies

Copyright 2012 Taxonomy Strategies. All rights reserved. Semantic Metadata. A Tale of Two Types of Vocabularies Taxonomy Strategies July 17, 2012 Copyright 2012 Taxonomy Strategies. All rights reserved. Semantic Metadata A Tale of Two Types of Vocabularies What is semantic metadata? Semantic relationships in the

More information

On practical aspects of enhancing semantic interoperability using SKOS and KOS alignment

On practical aspects of enhancing semantic interoperability using SKOS and KOS alignment On practical aspects of enhancing semantic interoperability using SKOS and KOS alignment Antoine ISAAC KRR lab, Vrije Universiteit Amsterdam National Library of the Netherlands ISKO UK Meeting, July 21,

More information

Edinburgh DataShare: Tackling research data in a DSpace institutional repository

Edinburgh DataShare: Tackling research data in a DSpace institutional repository Edinburgh DataShare: Tackling research data in a DSpace institutional repository Robin Rice EDINA and Data Library, Information Services University of Edinburgh, Scotland DSpace User Group Meeting Gothenburg,

More information

ISSUES IN INFORMATION RETRIEVAL Brian Vickery. Presentation at ISKO meeting on June 26, 2008 At University College, London

ISSUES IN INFORMATION RETRIEVAL Brian Vickery. Presentation at ISKO meeting on June 26, 2008 At University College, London ISSUES IN INFORMATION RETRIEVAL Brian Vickery Presentation at ISKO meeting on June 26, 2008 At University College, London NEEDLE IN HAYSTACK MY BACKGROUND Plant chemist, then reports librarian Librarian,

More information

A brief introduction to SKOS

A brief introduction to SKOS Taxonomy standards and architecture A brief introduction to SKOS Taxonomy Boot Camp London 17 October 2018 Presented by Heather Hedden Senior Vocabulary Editor Gale, A Cengage Company 1 Controlled Vocabulary

More information

Wondering about either OWL ontologies or SKOS vocabularies? You need both!

Wondering about either OWL ontologies or SKOS vocabularies? You need both! Making sense of content Wondering about either OWL ontologies or SKOS vocabularies? You need both! ISKO UK SKOS Event London, 21st July 2008 bernard.vatant@mondeca.com A few words about Mondeca Founded

More information

Agricultural bibliographic data sharing & interoperability in China

Agricultural bibliographic data sharing & interoperability in China Agricultural bibliographic data sharing & interoperability in China Prof. Xuefu Zhang,Xian Guojian and Sun Wei Agricultural Information Institute of CAAS Asia Pacific Advanced Network Meeting, 29 Aug.,

More information

Indexing and subject organisation

Indexing and subject organisation Indexing and subject organisation Madely du Preez Dept of Information Science University of South Africa (UNISA) LIASA IGBIS WORKSHOP 2018: 16-18 August, Centurion Lake Hotel. Menu Subject organisation

More information

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper. Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview...

More information

A Semantic MediaWiki-Empowered Terminology Registry

A Semantic MediaWiki-Empowered Terminology Registry Proc. Int l Conf. on Dublin Core and Metadata Applications 2009 A Semantic MediaWiki-Empowered Terminology Registry Qing Zou School of Information Studies McGill University, Canada qing.zou2@mail.mcgill.ca

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

Tagging behaviour with support from controlled vocabulary

Tagging behaviour with support from controlled vocabulary Tagging behaviour with support from controlled vocabulary Marianne Lykke Aalborg University, Denmark Anne Lyne HÄj Royal School of Library and Information Science Line NÄrgaard Madsen Royal School of Library

More information

Application Services for Knowledge Organisation and System Integration

Application Services for Knowledge Organisation and System Integration www.askosi.org Application Services for Knowledge Organisation and System Integration A Short Presentation May 2010 Christophe Dupriez dupriez@askosi.org Thesauri: Take a walk on the «Why?» slide! Search

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information

STW (Thesaurus for Economics) web service applied to library applications

STW (Thesaurus for Economics) web service applied to library applications STW (Thesaurus for Economics) web service applied to library applications Timo Borst Joachim Neubert IT Development German National Library of Economics Leibniz Centre for Economics 8th European Networked

More information

Vocabulary Alignment for archaeological Knowledge Organization Systems

Vocabulary Alignment for archaeological Knowledge Organization Systems Vocabulary Alignment for archaeological Knowledge Organization Systems 14th Workshop on Networked Knowledge Organization Systems TPDL 2015 Poznan Lena-Luise Stahn September 17, 2015 1 / 20 Summary Introduction

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

Linking Thesauri to the Linked Open Data Cloud for Improved Media Retrieval

Linking Thesauri to the Linked Open Data Cloud for Improved Media Retrieval biblio.ugent.be The UGent Institutional Repository is the electronic archiving and dissemination platform for all UGent research publications. Ghent University has implemented a mandate stipulating that

More information

Data is the new Oil (Ann Winblad)

Data is the new Oil (Ann Winblad) Data is the new Oil (Ann Winblad) Keith G Jeffery keith.jeffery@keithgjefferyconsultants.co.uk 20140415-16 JRC Workshop Big Open Data Keith G Jeffery 1 Data is the New Oil Like oil has been, data is Abundant

More information

Chinese Agricultural Thesaurus and its application on data sharing & interoperability

Chinese Agricultural Thesaurus and its application on data sharing & interoperability Chinese Agricultural Thesaurus and its application on data sharing & interoperability Prof. Xuefu Zhang,Xian Guojian and Sun Wei Agricultural Information Institute of CAAS Asia Pacific Advanced Network

More information

The Semantic Web DEFINITIONS & APPLICATIONS

The Semantic Web DEFINITIONS & APPLICATIONS The Semantic Web DEFINITIONS & APPLICATIONS Data on the Web There are more an more data on the Web Government data, health related data, general knowledge, company information, flight information, restaurants,

More information

Terminologies, Knowledge Organization Systems, Ontologies

Terminologies, Knowledge Organization Systems, Ontologies Terminologies, Knowledge Organization Systems, Ontologies Gerhard Budin University of Vienna TSS July 2012, Vienna Motivation and Purpose Knowledge Organization Systems In this unit of TSS 12, we focus

More information

Registry Interchange Format: Collections and Services (RIF-CS) explained

Registry Interchange Format: Collections and Services (RIF-CS) explained ANDS Guide Registry Interchange Format: Collections and Services (RIF-CS) explained Level: Awareness Last updated: 10 January 2017 Web link: www.ands.org.au/guides/rif-cs-explained The RIF-CS schema is

More information

Using Linked Data and taxonomies to create a quick-start smart thesaurus

Using Linked Data and taxonomies to create a quick-start smart thesaurus 7) MARJORIE HLAVA Using Linked Data and taxonomies to create a quick-start smart thesaurus 1. About the Case Organization The two current applications of this approach are a large scientific publisher

More information

Chapter 2. Architecture of a Search Engine

Chapter 2. Architecture of a Search Engine Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them

More information

Standards for classifying services and related information in the public sector

Standards for classifying services and related information in the public sector Standards for classifying services and related information in the public sector SCRAN Research Brief No.5 Abstract This report describes the role of standards in local government. It draws on the experience

More information

Metadata Standards and Applications. 6. Vocabularies: Attributes and Values

Metadata Standards and Applications. 6. Vocabularies: Attributes and Values Metadata Standards and Applications 6. Vocabularies: Attributes and Values Goals of Session Understand how different vocabularies are used in metadata Learn about relationships in vocabularies Understand

More information

Deliverable title: 8.7: rdf_thesaurus_prototype This version:

Deliverable title: 8.7: rdf_thesaurus_prototype This version: SWAD-Europe Thesaurus Activity Deliverable 8.7 RDF Thesaurus Prototype Abstract: This report describes the thesarus research prototype demonstrating the SKOS schema by means of the SKOS API web service

More information

Copyright 2012 Taxonomy Strategies. All rights reserved. Semantic Metadata. A Tale of Two Types of Vocabularies

Copyright 2012 Taxonomy Strategies. All rights reserved. Semantic Metadata. A Tale of Two Types of Vocabularies Taxonomy Strategies October 28, 2012 Copyright 2012 Taxonomy Strategies. All rights reserved. Semantic Metadata A Tale of Two Types of Vocabularies What is the semantic web? Making content web-accessible

More information

Striving for efficiency

Striving for efficiency Ron Dekker Director CESSDA Striving for efficiency Realise the social data part of EOSC How to Get the Maximum from Research Data Prerequisites and Outcomes University of Tartu, 29 May 2018 Trends 1.Growing

More information

National Centre for Text Mining NaCTeM. e-science and data mining workshop

National Centre for Text Mining NaCTeM. e-science and data mining workshop National Centre for Text Mining NaCTeM e-science and data mining workshop John Keane Co-Director, NaCTeM john.keane@manchester.ac.uk School of Informatics, University of Manchester What is text mining?

More information

Introduction to Information Retrieval. Lecture Outline

Introduction to Information Retrieval. Lecture Outline Introduction to Information Retrieval Lecture 1 CS 410/510 Information Retrieval on the Internet Lecture Outline IR systems Overview IR systems vs. DBMS Types, facets of interest User tasks Document representations

More information

SKOS. COMP62342 Sean Bechhofer

SKOS. COMP62342 Sean Bechhofer SKOS COMP62342 Sean Bechhofer sean.bechhofer@manchester.ac.uk Ontologies Metadata Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies

More information

The AGROVOC Concept Scheme - A Walkthrough

The AGROVOC Concept Scheme - A Walkthrough Journal of Integrative Agriculture 2012, 11(5): 694-699 May 2012 REVIEW The AGROVOC Concept Scheme - A Walkthrough Sachit Rajbhandari and Johannes Keizer Food and Agriculture Organization of the United

More information

RDM through a UK lens - New Roles for Librarians?

RDM through a UK lens - New Roles for Librarians? RDM through a UK lens - New Roles for Librarians? Stuart Macdonald Research Data Management Service Coordinator Research & Library Services University of Edinburgh Email: stuart.macdonald@ed.ac.uk Towards

More information

Dexterity: Data Exchange Tools and Standards for Social Sciences

Dexterity: Data Exchange Tools and Standards for Social Sciences Dexterity: Data Exchange Tools and Standards for Social Sciences Louise Corti, Herve L Hours, Matthew Woollard (UKDA) Arofan Gregory, Pascal Heus (ODaF) I-Pres, 29-30 September 2008, London Introduction

More information

COCHRANE LIBRARY. Contents

COCHRANE LIBRARY. Contents COCHRANE LIBRARY Contents Introduction... 2 Learning outcomes... 2 About this workbook... 2 1. Getting Started... 3 a. Finding the Cochrane Library... 3 b. Understanding the databases in the Cochrane Library...

More information

Bioethics Thesaurus Database Search Tips

Bioethics Thesaurus Database Search Tips Bioethics Thesaurus Database Search Tips Writers, Internet surfers, bloggers, indexers, journalists, health care professionals, librarians and students alike will find the Bioethics Thesaurus Database

More information

Fusing Corporate Thesaurus Management with Linked Data using PoolParty

Fusing Corporate Thesaurus Management with Linked Data using PoolParty Fusing Corporate Thesaurus Management with Linked Data using PoolParty Thomas Schandl PoolParty at a glance Developed by punkt. netservices Current release: PoolParty 2.8 Main focus on three application

More information

Ontologies SKOS. COMP62342 Sean Bechhofer

Ontologies SKOS. COMP62342 Sean Bechhofer Ontologies SKOS COMP62342 Sean Bechhofer sean.bechhofer@manchester.ac.uk Metadata Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies

More information

GIR experiements with Forostar at GeoCLEF 2007

GIR experiements with Forostar at GeoCLEF 2007 GIR experiements with Forostar at GeoCLEF 2007 Simon Overell 1, João Magalhães 1 and Stefan Rüger 2,1 1 Multimedia & Information Systems Department of Computing, Imperial College London, SW7 2AZ, UK 2

More information

From Open Data to Data- Intensive Science through CERIF

From Open Data to Data- Intensive Science through CERIF From Open Data to Data- Intensive Science through CERIF Keith G Jeffery a, Anne Asserson b, Nikos Houssos c, Valerie Brasse d, Brigitte Jörg e a Keith G Jeffery Consultants, Shrivenham, SN6 8AH, U, b University

More information

Eloquent WebSuite Planning Guide

Eloquent WebSuite Planning Guide ELOQUENT SYSTEMS INC Eloquent WebSuite Planning Guide Volume WS2 - Managing Authority Files Published on February 14, 2011 2/11/2011 This manual describes how the Eloquent WebSuite software controls the

More information

Research Data Management: Edinburgh University Library s Approach Dominic Tate

Research Data Management: Edinburgh University Library s Approach Dominic Tate Research Data Management: Edinburgh University Library s Approach Dominic Tate Head of Library Research Support Library & University Collections University of Edinburgh Overview Introduction to the University

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European

More information

Software Requirements Specification for the Names project prototype

Software Requirements Specification for the Names project prototype Software Requirements Specification for the Names project prototype Prepared for the JISC Names Project by Daniel Needham, Amanda Hill, Alan Danskin & Stephen Andrews April 2008 1 Table of Contents 1.

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Reading group on Ontologies and NLP:

Reading group on Ontologies and NLP: Reading group on Ontologies and NLP: Machine Learning27th infebruary Automated 2014 1 / 25 Te Reading group on Ontologies and NLP: Machine Learning in Automated Text Categorization, by Fabrizio Sebastianini.

More information

Populating the Semantic Web with Historical Text

Populating the Semantic Web with Historical Text Populating the Semantic Web with Historical Text Kate Byrne, ICCS Supervisors: Prof Ewan Klein, Dr Claire Grover 9th December 2008 1 Outline Overview of My Research populating the Semantic Web the Tether

More information

Challenge. Case Study. The fabric of space and time has collapsed. What s the big deal? Miami University of Ohio

Challenge. Case Study. The fabric of space and time has collapsed. What s the big deal? Miami University of Ohio Case Study Use Case: Recruiting Segment: Recruiting Products: Rosette Challenge CareerBuilder, the global leader in human capital solutions, operates the largest job board in the U.S. and has an extensive

More information

Wendy Thomas Minnesota Population Center NADDI 2014

Wendy Thomas Minnesota Population Center NADDI 2014 Wendy Thomas Minnesota Population Center NADDI 2014 Coverage Problem statement Why are there problems with interoperability with external search, storage and delivery systems Minnesota Population Center

More information

Linked Data: Standard s convergence

Linked Data: Standard s convergence Linked Data: Standard s convergence Enhancing the convergence between reporting standards Maria Mora Technical Manager maria.mora@cdp.net 1 Lets talk about a problem Lack of a perfect convergence between

More information

The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003

The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003 The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY

More information

Statistical Insight - Help

Statistical Insight - Help Statistical Insight has been completely redesigned to support a significantly better statistical search experience! What s NEW? New look and feel! Works like ProQuest Congressional with saved searches,

More information

PoolParty. Thesaurus Management Semantic Search Linked Data. ISKO UK, London September 14, Andreas Blumauer

PoolParty. Thesaurus Management Semantic Search Linked Data. ISKO UK, London September 14, Andreas Blumauer PoolParty Thesaurus Management Semantic Search Linked Data ISKO UK, London September 14, 2010 Andreas Blumauer Some thoughts on the Semantic Web In the Semantic Web, it is not the Semantic which is new,

More information

Study and guidelines on Geospatial Linked Data as part of ISA Action 1.17 Resource Description Framework

Study and guidelines on Geospatial Linked Data as part of ISA Action 1.17 Resource Description Framework DG Joint Research Center Study and guidelines on Geospatial Linked Data as part of ISA Action 1.17 Resource Description Framework 6 th of May 2014 Danny Vandenbroucke Diederik Tirry Agenda 1 Introduction

More information

How Co-Occurrence can Complement Semantics?

How Co-Occurrence can Complement Semantics? How Co-Occurrence can Complement Semantics? Atanas Kiryakov & Borislav Popov ISWC 2006, Athens, GA Semantic Annotations: 2002 #2 Semantic Annotation: How and Why? Information extraction (text-mining) for

More information

A service based on Linked Data to classify Web resources using a Knowledge Organisation System

A service based on Linked Data to classify Web resources using a Knowledge Organisation System A service based on Linked Data to classify Web resources using a Knowledge Organisation System A proof of concept in the Open Educational Resources domain Abstract One of the reasons why Web resources

More information

Identification and Classification of A/E/C Web Sites and Pages

Identification and Classification of A/E/C Web Sites and Pages Construction Informatics Digital Library http://itc.scix.net/ paper w78-2002-34.content Theme: Title: Author(s): Institution(s): E-mail(s): Abstract: Keywords: Identification and Classification of A/E/C

More information

Computer-based Analysis of UK Annual Report. Narratives. PhD Training Session at Bangor Business School: Analysing Annual Report.

Computer-based Analysis of UK Annual Report. Narratives. PhD Training Session at Bangor Business School: Analysing Annual Report. PhD Training Session at Bangor Business School: Analysing Annual Report Narratives 2 December 2014, Bangor Steven Young Mahmoud E-Haj Computer-based Analysis of UK Annual Report Narratives Research funded

More information

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND 41 CHAPTER 5 TEXT W. Bruce Croft BACKGROUND Much of the information in digital library or digital information organization applications is in the form of text. Even when the application focuses on multimedia

More information

Creating a Corporate Taxonomy. Internet Librarian November 2001 Betsy Farr Cogliano

Creating a Corporate Taxonomy. Internet Librarian November 2001 Betsy Farr Cogliano Creating a Corporate Taxonomy Internet Librarian 2001 7 November 2001 Betsy Farr Cogliano 2001 The MITRE Corporation Revised October 2001 2 Background MITRE is a not-for-profit corporation operating three

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Converting a thesaurus into an ontology: the use case of URBISOC

Converting a thesaurus into an ontology: the use case of URBISOC Advanced Information Systems Laboratory Cost Action C2 Converting a thesaurus into an ontology: the use case of URBISOC J. Nogueras-Iso, J. Lacasta Alcalá de Henares, 4-5 May 2007 http://iaaa.cps.unizar.es

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

ARKive-ERA Project Lessons and Thoughts

ARKive-ERA Project Lessons and Thoughts ARKive-ERA Project Lessons and Thoughts Semantic Web for Scientific and Cultural Organisations Convitto della Calza 17 th June 2003 Paul Shabajee (ILRT, University of Bristol) 1 Contents Context Digitisation

More information

Search Results Clustering in Polish: Evaluation of Carrot

Search Results Clustering in Polish: Evaluation of Carrot Search Results Clustering in Polish: Evaluation of Carrot DAWID WEISS JERZY STEFANOWSKI Institute of Computing Science Poznań University of Technology Introduction search engines tools of everyday use

More information

Towards a joint service catalogue for e-infrastructure services

Towards a joint service catalogue for e-infrastructure services Towards a joint service catalogue for e-infrastructure services Dr British Library 1 DI4R 2016 Workshop Joint service catalogue for research 29 September 2016 15/09/15 Goal A framework for creating a Catalogue

More information

Mapping between Digital Identity Ontologies through SISM

Mapping between Digital Identity Ontologies through SISM Mapping between Digital Identity Ontologies through SISM Matthew Rowe The OAK Group, Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK m.rowe@dcs.shef.ac.uk

More information

Digital repositories as research infrastructure: a UK perspective

Digital repositories as research infrastructure: a UK perspective Digital repositories as research infrastructure: a UK perspective Dr Liz Lyon Director This work is licensed under a Creative Commons Licence Attribution-ShareAlike 2.0 UKOLN is supported by: Presentation

More information

Testbed a walk-through

Testbed a walk-through Testbed a walk-through Digital Preservation Planning: Principles, Examples and the Future with Planets, July 2008 Matthew Barr HATII at the University of Glasgow Contents Definitions and goals Achievements

More information

Content analysis and classification in mathematics

Content analysis and classification in mathematics Content analysis and classification in mathematics Wolfram Sperber (Zentralblatt Math) Patrick Ion (Math Reviews) UDC Seminar 2011 CLASSIFICATION & ontology Formal approaches and Access to Knowledge The

More information

Searching the Evidence in the Cochrane Library

Searching the Evidence in the Cochrane Library CAMBRIDGE UNIVERSITY LIBRARY MEDICAL LIBRARY Searching the Evidence Searching the Evidence in the Cochrane Library January 2014 (due for revision July 2014) Searching the Evidence 1. How to access The

More information

Terminology server for improved resource discovery: analysis of model and functions

Terminology server for improved resource discovery: analysis of model and functions Terminology server for improved resource discovery: analysis of model and functions George Macgregor 1, Emma McCulloch 2 and Dennis Nicholson 2 1 Information Strategy Group, Liverpool Business School,

More information

Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization

Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Katsuya Masuda *, Makoto Tanji **, and Hideki Mima *** Abstract This study proposes a framework to access to the

More information

Enhancing information services using machine to machine terminology services

Enhancing information services using machine to machine terminology services Enhancing information services using machine to machine terminology services Gordon Dunsire Presented to the IFLA 2009 Satellite Conference Looking at the past and preparing for the future 20-21 Aug 2009,

More information

> Semantic Web Use Cases and Case Studies

> Semantic Web Use Cases and Case Studies > Semantic Web Use Cases and Case Studies Case Study: The Semantic Web for the Agricultural Domain, Semantic Navigation of Food, Nutrition and Agriculture Journal Gauri Salokhe, Margherita Sini, and Johannes

More information

Global Agricultural Concept Scheme The collaborative integration of three thesauri

Global Agricultural Concept Scheme The collaborative integration of three thesauri Global Agricultural Concept Scheme The collaborative integration of three thesauri Prof Dr Thomas Baker [1] Dr Osma Suominen [2] Dini Jahrestagung Linked Data Vision und Wirklichkeit Frankfurt, 28. Oktober

More information

Data Exchange and Conversion Utilities and Tools (DExT)

Data Exchange and Conversion Utilities and Tools (DExT) Data Exchange and Conversion Utilities and Tools (DExT) Louise Corti, Angad Bhat, Herve L Hours UK Data Archive CAQDAS Conference, April 2007 An exchange format for qualitative data Data exchange models

More information

Facilitate Open Science Training for European Research

Facilitate Open Science Training for European Research Facilitate Open Science Training for European Research Open access and research data management: Horizon 2020 and beyond University College Cork, April 14 th & 15 th 2015 Using existing institutional repository

More information

KeaKAT An Online Automatic Keyphrase Assignment Tool

KeaKAT An Online Automatic Keyphrase Assignment Tool 2012 10th International Conference on Frontiers of Information Technology KeaKAT An Online Automatic Keyphrase Assignment Tool Rabia Irfan, Sharifullah Khan, Irfan Ali Khan, Muhammad Asif Ali School of

More information

0.1 Knowledge Organization Systems for Semantic Web

0.1 Knowledge Organization Systems for Semantic Web 0.1 Knowledge Organization Systems for Semantic Web 0.1 Knowledge Organization Systems for Semantic Web 0.1.1 Knowledge Organization Systems Why do we need to organize knowledge? Indexing Retrieval Organization

More information

Hibernate Search: A Successful Search, a Happy User Make it Happen!

Hibernate Search: A Successful Search, a Happy User Make it Happen! Hibernate Search: A Successful Search, a Happy User Make it Happen! Emmanuel Bernard Lead Developer at JBoss by Red Hat September 2nd 2009 1 Emmanuel Bernard Hibernate Search in Action blog.emmanuelbernard.com

More information

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision A Semantic Web-Based Approach for Harvesting Multilingual Textual Definitions from Wikipedia to Support ICD-11 Revision Guoqian Jiang 1,* Harold R. Solbrig 1 and Christopher G. Chute 1 1 Department of

More information

Building a Data Catalog

Building a Data Catalog Building a Data Catalog Promoting Data Reuse and Collaboration at an Academic Medical Center Kevin Read, MLIS, MAS Alisa Surkis, PhD, MLIS EXTERNAL DATASETS 2 EXTERNAL DATASETS INTERNAL DATASETS 3 NYU

More information

Utilizing, creating and publishing Linked Open Data with the Thesaurus Management Tool PoolParty

Utilizing, creating and publishing Linked Open Data with the Thesaurus Management Tool PoolParty Utilizing, creating and publishing Linked Open Data with the Thesaurus Management Tool PoolParty Thomas Schandl, Andreas Blumauer punkt. NetServices GmbH, Lerchenfelder Gürtel 43, 1160 Vienna, Austria

More information

An Introduction to the WERS-REPONSE Stata dataset. Version 1.0 (May 2016)

An Introduction to the WERS-REPONSE Stata dataset. Version 1.0 (May 2016) An Introduction to the WERS-REPONSE Stata dataset Version 1.0 (May 2016) 1. Introduction The WERS-REPONSE Stata dataset ( the WR dataset hereafter) was compiled as part of a research project to comparatively

More information

Current JISC initiatives for Repositories

Current JISC initiatives for Repositories Current JISC initiatives for Repositories Exchange of Experience on Institutional Repositories 17 th May 2007, Liverpool Julie Allinson Repositories Research Officer UKOLN, University of Bath UKOLN is

More information

Finding Topic-centric Identified Experts based on Full Text Analysis

Finding Topic-centric Identified Experts based on Full Text Analysis Finding Topic-centric Identified Experts based on Full Text Analysis Hanmin Jung, Mikyoung Lee, In-Su Kang, Seung-Woo Lee, Won-Kyung Sung Information Service Research Lab., KISTI, Korea jhm@kisti.re.kr

More information

Best Practices for World-Class Search

Best Practices for World-Class Search Best Practices for World-Class Search MARY HOLSTEGE Distinguished Engineer, MarkLogic @mathling 4 June 2018 MARKLOGIC CORPORATION SLIDE: 2 4 June 2018 MARKLOGIC CORPORATION Search Application: Search for

More information

CANDIDATE LINK GENERATION USING SEMANTIC PHEROMONE SWARM

CANDIDATE LINK GENERATION USING SEMANTIC PHEROMONE SWARM CANDIDATE LINK GENERATION USING SEMANTIC PHEROMONE SWARM Ms.Susan Geethu.D.K 1, Ms. R.Subha 2, Dr.S.Palaniswami 3 1, 2 Assistant Professor 1,2 Department of Computer Science and Engineering, Sri Krishna

More information

Self-tuning ongoing terminology extraction retrained on terminology validation decisions

Self-tuning ongoing terminology extraction retrained on terminology validation decisions Self-tuning ongoing terminology extraction retrained on terminology validation decisions Alfredo Maldonado and David Lewis ADAPT Centre, School of Computer Science and Statistics, Trinity College Dublin

More information