JOHN SHEPHERDSON... AUTOMATIC KEYWORD GENERATION TECHNICAL SERVICES DIRECTOR UK DATA ARCHIVE UNIVERSITY OF ESSEX...
|
|
- Eleanore Crawford
- 5 years ago
- Views:
Transcription
1 AUTOMATIC KEYWORD GENERATION.... JOHN SHEPHERDSON... TECHNICAL SERVICES DIRECTOR UNIVERSITY OF ESSEX... DEVCON1 UNIVERSITY OF ESSEX 12 APRIL 2013
2 Abstract HASSET is a subject thesaurus that has been developed by the UK Data Archive over more than 20 years. It provides a standard set of key words that can be used to tag items that have content relating to the fields of Humanities and Social Science. In the past the tagging has been carried out by human experts, but we have recently developed a prototype system that can generate a list of keywords automatically, given an arbitrary piece of text. Learn about our experiences of generating keywords automatically, and the evaluation of the results See a demo of automated keyword generation Although we have used HASSET (Humanities and Social Science Electronic Thesaurus) keywords, the approach holds for other keyword sets.
3 Overview brief background: the UK Data Archive and the UK Data Service cataloguing practices and thesaurus tools SKOS-HASSET RDF version of thesaurus auto generation of keywords technologies used evaluation findings application to web page metadata acknowledgements and questions
4 The UK Data Archive and the UK Data Service based at the University of Essex since 1967 curator of the largest collection of digital data in the social sciences and humanities in the UK see data-archive.ac.uk for more details makes these available via the new UK Data Service UK Data Service also provides value-added services for UK Census data, government surveys and beyond UK Data Service includes Universities of Essex, Manchester (Mimas, CCSR), Leeds, Southampton, Edinburgh (Edina) and University College London See ukdataservice.ac.uk for more details
5 The UK Data Service: cataloguing standards the UK Data Service indexes over 5000 digital data collections and the number is ever growing all catalogued at thematic level many also indexed at variable level available via Discover ukdataservice.ac.uk and every data collection is indexed with HASSET terms HASSET is employed in the search
6 HASSET multidisciplinary thesaurus developed originally to support the UK Data Archive/UK Data Service collections coverage in the core subject areas of social science uses standard hierarchical relationships: TT (top term) BT (broader term) NT (narrower term) RT (related term) USE (from non-preferred term to preferred term); UF (from preferred term to non-preferred term). role of HASSET in the Archive is twofold: used internally for indexing studies and series with HASSET terms also a separate product licensed to others
7 ELSST European Language Social Science Thesaurus (ELSST) is a multi-lingual thesaurus, based on core English terms taken from HASSET translated into 11 languages (with more on the way) closely connected with HASSET, but must demonstrate international applicability of all terms
8 Applying SKOS to HASSET SKOS/RDF what is RDF? Resource Description Framework (RDF) describes data using simple format subject predicate object e.g. car hascolour red So, what is SKOS? Simple Knowledge Organization System SKOS is set of RDF predicates to describe relationships between thesaurus terms e.g. skos:concept162 skos:preflabel CAR e.g. skos:concept162 skos:altlabel AUTOMOBILE it encodes these products in a standardised way to make their structures comparable and to facilitate interaction
9 Applying SKOS to HASSET (2) SKOS has been applied to HASSET persistence via GUIDs version control all terms date stamped all changes recorded live versions of thesaurus products (SKOS-HASSET, ELSST) made at agreed, regular intervals with recognised annual major incremental versioning we are using Pubby to publish our SKOS provides Linked Data interface to RDF data held in BrightstarDB triple store
10 Automated indexing: four corpora Nesstar questions/variables (humanly indexed during project) 17.5k Questionnaires 2.5k catalogue records 5.5k publications (case studies and support/how to guides) 0.25k
11 Automated indexing: the task automatically index the four corpora, using HASSET terms evaluate the results present as a case study (via SKOS-HASSET blog) pre-processing tasks: conversion of PDFs to text extraction of metadata (manual keywords) some were embedded within PDFs others held externally in databases extraction of the data into two file types:.txt (actual text) and.key (gold-standard keywords)
12 Automated indexing: experimental work three methodologies used: Term Frequency/Inverse Document Frequency (TF/IDF) model Keyphrase Extraction Algorithm (KEA) Solr search
13 TF/IDF model our text sample was small so we considered this model, which requires no training data, first processed 2.5k SQB documents no controlled vocabulary results: keywords returned with low domain-specific information, although ours is a domain-specific collection mapping extracted keywords to HASSET returned few matches e.g. it failed to find matches for the keyword Liberal Party although it exists in HASSET but in a different form (BRITISH LIBERAL PARTY) and (LIBERAL PARTY (GREAT BRITAIN))
14 Keyphrase Extraction Algorithm (KEA) keyword indexing using a Controlled Vocabulary uses training data (based on keyword coverage) builds a classifier training model (WEKA) the algorithm is based on machine learning and works in two main steps: candidate term identification identifies phrases (n-grams) from the text and maps these to HASSET filtering uses a learned model (from our training data) to identify the most significant keywords based on features
15 Keyphrase Extraction Algorithm (KEA) (2) created a training model using human indexer s keywords 80% of text used for training 20% of text used for testing uses SKOS-HASSET as controlled vocabulary used stop-word list and trained KEA to avoid method terms ( do you think, closest to your view )
16
17 KEA automated indexing: Step by step Wrapped KEA Jar file with client Training mode java -jar kea.jar -m <output:model_location> -t <input:data_location> Generation mode java -jar kea.jar -d <input:data_location> -m <input:model_location> -n <output:max_no_keywords> Can also set: thesaurus/cv file and format; stemmer type document language and encoding; stopwords file; min and max no of words in a phrase; minimum occurrence of a phrase in a document
18 KEA automated indexing: Step by step (2) Evaluation Used PowerShell to generate spreadsheet for each document in corpus, manual vs auto keywords Used PowerShell to generate one summary spreadsheet for each corpus: F1, recall, precision.
19 KEA automated indexing: results (Recall and Precision) broad Recall scores: case studies/support guides 0.73 SQB 0.5 Nesstar 0.36 catalogue records 0.2 (low) this suggests that KEA could be usefully employed to suggest new relevant terms for full-text corpora broad precision scores: SQB 0.47 Catalogue records 0.42 Nesstar 0.41 Case studies/support guides 0.25 overall, this suggests that KEA keywords are very often relevant
20 KEA automated indexing: results (more) Little overlap between KEA keywords and manual keywords (on average KEA found keywords per document across the four corpora, of which only 2.33 were exact matches with the manual keywords) However, a high percentage of KEA keywords were considered relevant/suitable even if they were not exact matches: 33% for the SQB corpus with an average of 25% across all four corpora KEA could be a very useful tool for indexers
21 Solr indexing and search runs against SKOS-HASSET searches every word finds phrases as well as single words uses stop words returns preferred term if synonym is found uses a non-aggressive stemming approach demo
22 Solr indexing and search (2) uses inverted search index SKOS-HASSET RDF to create Solr core text entered is used to search core one word at a time phrase-matching achieved be de-inverting search used because text input (1000 chars) much smaller than thesaurus (7,0000 words) multilingualism can be achieved by translating SKOS- HASSET and having a core per input language
23 Automated indexing: findings KEA training needs large corpus and takes tens of minutes generation too slow to run in real-time Solr no training size of corpus largely immaterial very fast can use in real time no learning, cannot suggest new terms
24 Automated indexing: findings (2) Both can easily use with different thesaurus can easily extend stop words BUT more work is needed to investigate further and to see how could be incorporated technically, and in terms of process, into our systems
25 Crude Comparison Solr and Kea Used same 10 abstracts from Catalogue Records as input Kea found 99 unique HASSET terms Solr found 231 unique HASSET terms So Solr is better than Kea? Not so fast Kea found 24 not found by Solr Kea found 3 phrases only partially found by Solr e.g. INFORMATION/LIBRARY SYSTEMS AND SERVICES vs INFORMATION
26 KEA tag cloud
27 Solr tag cloud
28 Application to web page metadata tags Theoretical approach: For each page, identify content section(s) Feed content in to keyword generator Optionally review suggested keywords Insert keywords in to metadata tags
29 Application to web page metadata tags Example: UK Data Service about our data page Generated keywords (Solr): ARCHIVES, CATALOGUES, CENTRAL GOVERNMENT, DATA, DIARIES, ECONOMIC INDICATORS, ECONOMICS, FIELDS, GOVERNMENT, GRANTS, IMAGE, MARKET RESEARCH, MATERIALS, PAPER, PHOTOGRAPHS, REPORTS, RESEARCH, RESEARCH GRANTS, SURVEYS, TEACHING
30 Want to find out more? Using Solr search in a.net environment, Matthew Brumpton, Breakout 4 (13:30-14:30) SKOS-HASSET browser: HASSET browser: SKOS-HASSET Project web site: SKOS-HASSET blog:
31 Acknowledgements The automatic keyword generation work described here was undertaken as part of the JISC-funded SKOS-HASSET project Project Manager: Lucy Bell Evaluators: Lorna Balkan, Suzanne Barbalet KEA programming: Mahmoud El-Haj SKOS/RDF programming: Darren Bell Solr programming: Oscar Dovao
32 Questions?
33 CONTACT UNIVERSITY OF ESSEX WIVENHOE PARK COLCHESTER ESSEX CO4 3SQ..... T +44 (0) E info@data-archive.ac.uk data-archive.ac.uk
Innovation in Thesaurus Management
Innovation in Thesaurus Management Lucy Bell Management Information Manager UK Data Archive IASSIST 2013, Cologne 31 May 2013 Two thesauri; two projects SKOS-HASSET 10 month, Jisc-funded project to enhance
More informationPERSISTENT IDENTIFIERS FOR THE UK: SOCIAL AND ECONOMIC DATA
PERSISTENT IDENTIFIERS FOR THE UK: SOCIAL AND ECONOMIC DATA MATTHEW WOOLLARD.. ECONOMIC AND SOCIAL DATA SERVICE UNIVERSITY OF ESSEX... METADATA AND PERSISTENT IDENTIFIERS FOR SOCIAL AND ECONOMIC DATA,
More informationImplementing RESTful endpoints with BaseX. Darren Bell, Data & Services Developer, UK Data Archive 17 April 2014
Implementing RESTful endpoints with BaseX Darren Bell, Data & Services Developer, UK Data Archive 17 April 2014 UK Data Archive based at the University of Essex since 1967 curator of the UK s largest collection
More informationSurvey Question Bank: End of Award Report
Survey Question Bank: End of Award Report The Survey Question Bank (SQB) is a service providing a set of online survey research resources. It was set up as one strand of the ESRC-funded Survey Resources
More informationOrganizing Economic Information
Organizing Economic Information An Overview of Application and Reuse Scenarios of an Economics Knowledge Organization System Tobias Rebholz, Andreas Oskar Kempf, Joachim Neubert ZBW Leibniz Information
More informationNatural Language Processing with PoolParty
Natural Language Processing with PoolParty Table of Content Introduction to PoolParty 2 Resolving Language Problems 4 Key Features 5 Entity Extraction and Term Extraction 5 Shadow Concepts 6 Word Sense
More informationThe UNESCO Thesaurus
UNESCO Thesaurus Thésaurus de l UNESCO Tesauro de la UNESCO Тезаурус ЮНЕСКО The UNESCO Thesaurus UN-LINKS Meeting 28-30 November Meron Ewketu UNESCO Library Introduction There are too many ways of expressing
More informationCopyright 2012 Taxonomy Strategies. All rights reserved. Semantic Metadata. A Tale of Two Types of Vocabularies
Taxonomy Strategies July 17, 2012 Copyright 2012 Taxonomy Strategies. All rights reserved. Semantic Metadata A Tale of Two Types of Vocabularies What is semantic metadata? Semantic relationships in the
More informationOn practical aspects of enhancing semantic interoperability using SKOS and KOS alignment
On practical aspects of enhancing semantic interoperability using SKOS and KOS alignment Antoine ISAAC KRR lab, Vrije Universiteit Amsterdam National Library of the Netherlands ISKO UK Meeting, July 21,
More informationEdinburgh DataShare: Tackling research data in a DSpace institutional repository
Edinburgh DataShare: Tackling research data in a DSpace institutional repository Robin Rice EDINA and Data Library, Information Services University of Edinburgh, Scotland DSpace User Group Meeting Gothenburg,
More informationISSUES IN INFORMATION RETRIEVAL Brian Vickery. Presentation at ISKO meeting on June 26, 2008 At University College, London
ISSUES IN INFORMATION RETRIEVAL Brian Vickery Presentation at ISKO meeting on June 26, 2008 At University College, London NEEDLE IN HAYSTACK MY BACKGROUND Plant chemist, then reports librarian Librarian,
More informationA brief introduction to SKOS
Taxonomy standards and architecture A brief introduction to SKOS Taxonomy Boot Camp London 17 October 2018 Presented by Heather Hedden Senior Vocabulary Editor Gale, A Cengage Company 1 Controlled Vocabulary
More informationWondering about either OWL ontologies or SKOS vocabularies? You need both!
Making sense of content Wondering about either OWL ontologies or SKOS vocabularies? You need both! ISKO UK SKOS Event London, 21st July 2008 bernard.vatant@mondeca.com A few words about Mondeca Founded
More informationAgricultural bibliographic data sharing & interoperability in China
Agricultural bibliographic data sharing & interoperability in China Prof. Xuefu Zhang,Xian Guojian and Sun Wei Agricultural Information Institute of CAAS Asia Pacific Advanced Network Meeting, 29 Aug.,
More informationIndexing and subject organisation
Indexing and subject organisation Madely du Preez Dept of Information Science University of South Africa (UNISA) LIASA IGBIS WORKSHOP 2018: 16-18 August, Centurion Lake Hotel. Menu Subject organisation
More informationSemantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.
Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview...
More informationA Semantic MediaWiki-Empowered Terminology Registry
Proc. Int l Conf. on Dublin Core and Metadata Applications 2009 A Semantic MediaWiki-Empowered Terminology Registry Qing Zou School of Information Studies McGill University, Canada qing.zou2@mail.mcgill.ca
More informationA Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2
A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,
More informationTagging behaviour with support from controlled vocabulary
Tagging behaviour with support from controlled vocabulary Marianne Lykke Aalborg University, Denmark Anne Lyne HÄj Royal School of Library and Information Science Line NÄrgaard Madsen Royal School of Library
More informationApplication Services for Knowledge Organisation and System Integration
www.askosi.org Application Services for Knowledge Organisation and System Integration A Short Presentation May 2010 Christophe Dupriez dupriez@askosi.org Thesauri: Take a walk on the «Why?» slide! Search
More informationResPubliQA 2010
SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first
More informationSTW (Thesaurus for Economics) web service applied to library applications
STW (Thesaurus for Economics) web service applied to library applications Timo Borst Joachim Neubert IT Development German National Library of Economics Leibniz Centre for Economics 8th European Networked
More informationVocabulary Alignment for archaeological Knowledge Organization Systems
Vocabulary Alignment for archaeological Knowledge Organization Systems 14th Workshop on Networked Knowledge Organization Systems TPDL 2015 Poznan Lena-Luise Stahn September 17, 2015 1 / 20 Summary Introduction
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More informationLinking Thesauri to the Linked Open Data Cloud for Improved Media Retrieval
biblio.ugent.be The UGent Institutional Repository is the electronic archiving and dissemination platform for all UGent research publications. Ghent University has implemented a mandate stipulating that
More informationData is the new Oil (Ann Winblad)
Data is the new Oil (Ann Winblad) Keith G Jeffery keith.jeffery@keithgjefferyconsultants.co.uk 20140415-16 JRC Workshop Big Open Data Keith G Jeffery 1 Data is the New Oil Like oil has been, data is Abundant
More informationChinese Agricultural Thesaurus and its application on data sharing & interoperability
Chinese Agricultural Thesaurus and its application on data sharing & interoperability Prof. Xuefu Zhang,Xian Guojian and Sun Wei Agricultural Information Institute of CAAS Asia Pacific Advanced Network
More informationThe Semantic Web DEFINITIONS & APPLICATIONS
The Semantic Web DEFINITIONS & APPLICATIONS Data on the Web There are more an more data on the Web Government data, health related data, general knowledge, company information, flight information, restaurants,
More informationTerminologies, Knowledge Organization Systems, Ontologies
Terminologies, Knowledge Organization Systems, Ontologies Gerhard Budin University of Vienna TSS July 2012, Vienna Motivation and Purpose Knowledge Organization Systems In this unit of TSS 12, we focus
More informationRegistry Interchange Format: Collections and Services (RIF-CS) explained
ANDS Guide Registry Interchange Format: Collections and Services (RIF-CS) explained Level: Awareness Last updated: 10 January 2017 Web link: www.ands.org.au/guides/rif-cs-explained The RIF-CS schema is
More informationUsing Linked Data and taxonomies to create a quick-start smart thesaurus
7) MARJORIE HLAVA Using Linked Data and taxonomies to create a quick-start smart thesaurus 1. About the Case Organization The two current applications of this approach are a large scientific publisher
More informationChapter 2. Architecture of a Search Engine
Chapter 2 Architecture of a Search Engine Search Engine Architecture A software architecture consists of software components, the interfaces provided by those components and the relationships between them
More informationStandards for classifying services and related information in the public sector
Standards for classifying services and related information in the public sector SCRAN Research Brief No.5 Abstract This report describes the role of standards in local government. It draws on the experience
More informationMetadata Standards and Applications. 6. Vocabularies: Attributes and Values
Metadata Standards and Applications 6. Vocabularies: Attributes and Values Goals of Session Understand how different vocabularies are used in metadata Learn about relationships in vocabularies Understand
More informationDeliverable title: 8.7: rdf_thesaurus_prototype This version:
SWAD-Europe Thesaurus Activity Deliverable 8.7 RDF Thesaurus Prototype Abstract: This report describes the thesarus research prototype demonstrating the SKOS schema by means of the SKOS API web service
More informationCopyright 2012 Taxonomy Strategies. All rights reserved. Semantic Metadata. A Tale of Two Types of Vocabularies
Taxonomy Strategies October 28, 2012 Copyright 2012 Taxonomy Strategies. All rights reserved. Semantic Metadata A Tale of Two Types of Vocabularies What is the semantic web? Making content web-accessible
More informationStriving for efficiency
Ron Dekker Director CESSDA Striving for efficiency Realise the social data part of EOSC How to Get the Maximum from Research Data Prerequisites and Outcomes University of Tartu, 29 May 2018 Trends 1.Growing
More informationNational Centre for Text Mining NaCTeM. e-science and data mining workshop
National Centre for Text Mining NaCTeM e-science and data mining workshop John Keane Co-Director, NaCTeM john.keane@manchester.ac.uk School of Informatics, University of Manchester What is text mining?
More informationIntroduction to Information Retrieval. Lecture Outline
Introduction to Information Retrieval Lecture 1 CS 410/510 Information Retrieval on the Internet Lecture Outline IR systems Overview IR systems vs. DBMS Types, facets of interest User tasks Document representations
More informationSKOS. COMP62342 Sean Bechhofer
SKOS COMP62342 Sean Bechhofer sean.bechhofer@manchester.ac.uk Ontologies Metadata Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies
More informationThe AGROVOC Concept Scheme - A Walkthrough
Journal of Integrative Agriculture 2012, 11(5): 694-699 May 2012 REVIEW The AGROVOC Concept Scheme - A Walkthrough Sachit Rajbhandari and Johannes Keizer Food and Agriculture Organization of the United
More informationRDM through a UK lens - New Roles for Librarians?
RDM through a UK lens - New Roles for Librarians? Stuart Macdonald Research Data Management Service Coordinator Research & Library Services University of Edinburgh Email: stuart.macdonald@ed.ac.uk Towards
More informationDexterity: Data Exchange Tools and Standards for Social Sciences
Dexterity: Data Exchange Tools and Standards for Social Sciences Louise Corti, Herve L Hours, Matthew Woollard (UKDA) Arofan Gregory, Pascal Heus (ODaF) I-Pres, 29-30 September 2008, London Introduction
More informationCOCHRANE LIBRARY. Contents
COCHRANE LIBRARY Contents Introduction... 2 Learning outcomes... 2 About this workbook... 2 1. Getting Started... 3 a. Finding the Cochrane Library... 3 b. Understanding the databases in the Cochrane Library...
More informationBioethics Thesaurus Database Search Tips
Bioethics Thesaurus Database Search Tips Writers, Internet surfers, bloggers, indexers, journalists, health care professionals, librarians and students alike will find the Bioethics Thesaurus Database
More informationFusing Corporate Thesaurus Management with Linked Data using PoolParty
Fusing Corporate Thesaurus Management with Linked Data using PoolParty Thomas Schandl PoolParty at a glance Developed by punkt. netservices Current release: PoolParty 2.8 Main focus on three application
More informationOntologies SKOS. COMP62342 Sean Bechhofer
Ontologies SKOS COMP62342 Sean Bechhofer sean.bechhofer@manchester.ac.uk Metadata Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies
More informationGIR experiements with Forostar at GeoCLEF 2007
GIR experiements with Forostar at GeoCLEF 2007 Simon Overell 1, João Magalhães 1 and Stefan Rüger 2,1 1 Multimedia & Information Systems Department of Computing, Imperial College London, SW7 2AZ, UK 2
More informationFrom Open Data to Data- Intensive Science through CERIF
From Open Data to Data- Intensive Science through CERIF Keith G Jeffery a, Anne Asserson b, Nikos Houssos c, Valerie Brasse d, Brigitte Jörg e a Keith G Jeffery Consultants, Shrivenham, SN6 8AH, U, b University
More informationEloquent WebSuite Planning Guide
ELOQUENT SYSTEMS INC Eloquent WebSuite Planning Guide Volume WS2 - Managing Authority Files Published on February 14, 2011 2/11/2011 This manual describes how the Eloquent WebSuite software controls the
More informationResearch Data Management: Edinburgh University Library s Approach Dominic Tate
Research Data Management: Edinburgh University Library s Approach Dominic Tate Head of Library Research Support Library & University Collections University of Edinburgh Overview Introduction to the University
More informationInformation Retrieval. (M&S Ch 15)
Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion
More informationKnowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.
Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European
More informationSoftware Requirements Specification for the Names project prototype
Software Requirements Specification for the Names project prototype Prepared for the JISC Names Project by Daniel Needham, Amanda Hill, Alan Danskin & Stephen Andrews April 2008 1 Table of Contents 1.
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationReading group on Ontologies and NLP:
Reading group on Ontologies and NLP: Machine Learning27th infebruary Automated 2014 1 / 25 Te Reading group on Ontologies and NLP: Machine Learning in Automated Text Categorization, by Fabrizio Sebastianini.
More informationPopulating the Semantic Web with Historical Text
Populating the Semantic Web with Historical Text Kate Byrne, ICCS Supervisors: Prof Ewan Klein, Dr Claire Grover 9th December 2008 1 Outline Overview of My Research populating the Semantic Web the Tether
More informationChallenge. Case Study. The fabric of space and time has collapsed. What s the big deal? Miami University of Ohio
Case Study Use Case: Recruiting Segment: Recruiting Products: Rosette Challenge CareerBuilder, the global leader in human capital solutions, operates the largest job board in the U.S. and has an extensive
More informationWendy Thomas Minnesota Population Center NADDI 2014
Wendy Thomas Minnesota Population Center NADDI 2014 Coverage Problem statement Why are there problems with interoperability with external search, storage and delivery systems Minnesota Population Center
More informationLinked Data: Standard s convergence
Linked Data: Standard s convergence Enhancing the convergence between reporting standards Maria Mora Technical Manager maria.mora@cdp.net 1 Lets talk about a problem Lack of a perfect convergence between
More informationThe MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003
The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY
More informationStatistical Insight - Help
Statistical Insight has been completely redesigned to support a significantly better statistical search experience! What s NEW? New look and feel! Works like ProQuest Congressional with saved searches,
More informationPoolParty. Thesaurus Management Semantic Search Linked Data. ISKO UK, London September 14, Andreas Blumauer
PoolParty Thesaurus Management Semantic Search Linked Data ISKO UK, London September 14, 2010 Andreas Blumauer Some thoughts on the Semantic Web In the Semantic Web, it is not the Semantic which is new,
More informationStudy and guidelines on Geospatial Linked Data as part of ISA Action 1.17 Resource Description Framework
DG Joint Research Center Study and guidelines on Geospatial Linked Data as part of ISA Action 1.17 Resource Description Framework 6 th of May 2014 Danny Vandenbroucke Diederik Tirry Agenda 1 Introduction
More informationHow Co-Occurrence can Complement Semantics?
How Co-Occurrence can Complement Semantics? Atanas Kiryakov & Borislav Popov ISWC 2006, Athens, GA Semantic Annotations: 2002 #2 Semantic Annotation: How and Why? Information extraction (text-mining) for
More informationA service based on Linked Data to classify Web resources using a Knowledge Organisation System
A service based on Linked Data to classify Web resources using a Knowledge Organisation System A proof of concept in the Open Educational Resources domain Abstract One of the reasons why Web resources
More informationIdentification and Classification of A/E/C Web Sites and Pages
Construction Informatics Digital Library http://itc.scix.net/ paper w78-2002-34.content Theme: Title: Author(s): Institution(s): E-mail(s): Abstract: Keywords: Identification and Classification of A/E/C
More informationComputer-based Analysis of UK Annual Report. Narratives. PhD Training Session at Bangor Business School: Analysing Annual Report.
PhD Training Session at Bangor Business School: Analysing Annual Report Narratives 2 December 2014, Bangor Steven Young Mahmoud E-Haj Computer-based Analysis of UK Annual Report Narratives Research funded
More informationTEXT CHAPTER 5. W. Bruce Croft BACKGROUND
41 CHAPTER 5 TEXT W. Bruce Croft BACKGROUND Much of the information in digital library or digital information organization applications is in the form of text. Even when the application focuses on multimedia
More informationCreating a Corporate Taxonomy. Internet Librarian November 2001 Betsy Farr Cogliano
Creating a Corporate Taxonomy Internet Librarian 2001 7 November 2001 Betsy Farr Cogliano 2001 The MITRE Corporation Revised October 2001 2 Background MITRE is a not-for-profit corporation operating three
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationConverting a thesaurus into an ontology: the use case of URBISOC
Advanced Information Systems Laboratory Cost Action C2 Converting a thesaurus into an ontology: the use case of URBISOC J. Nogueras-Iso, J. Lacasta Alcalá de Henares, 4-5 May 2007 http://iaaa.cps.unizar.es
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationARKive-ERA Project Lessons and Thoughts
ARKive-ERA Project Lessons and Thoughts Semantic Web for Scientific and Cultural Organisations Convitto della Calza 17 th June 2003 Paul Shabajee (ILRT, University of Bristol) 1 Contents Context Digitisation
More informationSearch Results Clustering in Polish: Evaluation of Carrot
Search Results Clustering in Polish: Evaluation of Carrot DAWID WEISS JERZY STEFANOWSKI Institute of Computing Science Poznań University of Technology Introduction search engines tools of everyday use
More informationTowards a joint service catalogue for e-infrastructure services
Towards a joint service catalogue for e-infrastructure services Dr British Library 1 DI4R 2016 Workshop Joint service catalogue for research 29 September 2016 15/09/15 Goal A framework for creating a Catalogue
More informationMapping between Digital Identity Ontologies through SISM
Mapping between Digital Identity Ontologies through SISM Matthew Rowe The OAK Group, Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK m.rowe@dcs.shef.ac.uk
More informationDigital repositories as research infrastructure: a UK perspective
Digital repositories as research infrastructure: a UK perspective Dr Liz Lyon Director This work is licensed under a Creative Commons Licence Attribution-ShareAlike 2.0 UKOLN is supported by: Presentation
More informationTestbed a walk-through
Testbed a walk-through Digital Preservation Planning: Principles, Examples and the Future with Planets, July 2008 Matthew Barr HATII at the University of Glasgow Contents Definitions and goals Achievements
More informationContent analysis and classification in mathematics
Content analysis and classification in mathematics Wolfram Sperber (Zentralblatt Math) Patrick Ion (Math Reviews) UDC Seminar 2011 CLASSIFICATION & ontology Formal approaches and Access to Knowledge The
More informationSearching the Evidence in the Cochrane Library
CAMBRIDGE UNIVERSITY LIBRARY MEDICAL LIBRARY Searching the Evidence Searching the Evidence in the Cochrane Library January 2014 (due for revision July 2014) Searching the Evidence 1. How to access The
More informationTerminology server for improved resource discovery: analysis of model and functions
Terminology server for improved resource discovery: analysis of model and functions George Macgregor 1, Emma McCulloch 2 and Dennis Nicholson 2 1 Information Strategy Group, Liverpool Business School,
More informationRevealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization
Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Katsuya Masuda *, Makoto Tanji **, and Hideki Mima *** Abstract This study proposes a framework to access to the
More informationEnhancing information services using machine to machine terminology services
Enhancing information services using machine to machine terminology services Gordon Dunsire Presented to the IFLA 2009 Satellite Conference Looking at the past and preparing for the future 20-21 Aug 2009,
More information> Semantic Web Use Cases and Case Studies
> Semantic Web Use Cases and Case Studies Case Study: The Semantic Web for the Agricultural Domain, Semantic Navigation of Food, Nutrition and Agriculture Journal Gauri Salokhe, Margherita Sini, and Johannes
More informationGlobal Agricultural Concept Scheme The collaborative integration of three thesauri
Global Agricultural Concept Scheme The collaborative integration of three thesauri Prof Dr Thomas Baker [1] Dr Osma Suominen [2] Dini Jahrestagung Linked Data Vision und Wirklichkeit Frankfurt, 28. Oktober
More informationData Exchange and Conversion Utilities and Tools (DExT)
Data Exchange and Conversion Utilities and Tools (DExT) Louise Corti, Angad Bhat, Herve L Hours UK Data Archive CAQDAS Conference, April 2007 An exchange format for qualitative data Data exchange models
More informationFacilitate Open Science Training for European Research
Facilitate Open Science Training for European Research Open access and research data management: Horizon 2020 and beyond University College Cork, April 14 th & 15 th 2015 Using existing institutional repository
More informationKeaKAT An Online Automatic Keyphrase Assignment Tool
2012 10th International Conference on Frontiers of Information Technology KeaKAT An Online Automatic Keyphrase Assignment Tool Rabia Irfan, Sharifullah Khan, Irfan Ali Khan, Muhammad Asif Ali School of
More information0.1 Knowledge Organization Systems for Semantic Web
0.1 Knowledge Organization Systems for Semantic Web 0.1 Knowledge Organization Systems for Semantic Web 0.1.1 Knowledge Organization Systems Why do we need to organize knowledge? Indexing Retrieval Organization
More informationHibernate Search: A Successful Search, a Happy User Make it Happen!
Hibernate Search: A Successful Search, a Happy User Make it Happen! Emmanuel Bernard Lead Developer at JBoss by Red Hat September 2nd 2009 1 Emmanuel Bernard Hibernate Search in Action blog.emmanuelbernard.com
More informationA Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision
A Semantic Web-Based Approach for Harvesting Multilingual Textual Definitions from Wikipedia to Support ICD-11 Revision Guoqian Jiang 1,* Harold R. Solbrig 1 and Christopher G. Chute 1 1 Department of
More informationBuilding a Data Catalog
Building a Data Catalog Promoting Data Reuse and Collaboration at an Academic Medical Center Kevin Read, MLIS, MAS Alisa Surkis, PhD, MLIS EXTERNAL DATASETS 2 EXTERNAL DATASETS INTERNAL DATASETS 3 NYU
More informationUtilizing, creating and publishing Linked Open Data with the Thesaurus Management Tool PoolParty
Utilizing, creating and publishing Linked Open Data with the Thesaurus Management Tool PoolParty Thomas Schandl, Andreas Blumauer punkt. NetServices GmbH, Lerchenfelder Gürtel 43, 1160 Vienna, Austria
More informationAn Introduction to the WERS-REPONSE Stata dataset. Version 1.0 (May 2016)
An Introduction to the WERS-REPONSE Stata dataset Version 1.0 (May 2016) 1. Introduction The WERS-REPONSE Stata dataset ( the WR dataset hereafter) was compiled as part of a research project to comparatively
More informationCurrent JISC initiatives for Repositories
Current JISC initiatives for Repositories Exchange of Experience on Institutional Repositories 17 th May 2007, Liverpool Julie Allinson Repositories Research Officer UKOLN, University of Bath UKOLN is
More informationFinding Topic-centric Identified Experts based on Full Text Analysis
Finding Topic-centric Identified Experts based on Full Text Analysis Hanmin Jung, Mikyoung Lee, In-Su Kang, Seung-Woo Lee, Won-Kyung Sung Information Service Research Lab., KISTI, Korea jhm@kisti.re.kr
More informationBest Practices for World-Class Search
Best Practices for World-Class Search MARY HOLSTEGE Distinguished Engineer, MarkLogic @mathling 4 June 2018 MARKLOGIC CORPORATION SLIDE: 2 4 June 2018 MARKLOGIC CORPORATION Search Application: Search for
More informationCANDIDATE LINK GENERATION USING SEMANTIC PHEROMONE SWARM
CANDIDATE LINK GENERATION USING SEMANTIC PHEROMONE SWARM Ms.Susan Geethu.D.K 1, Ms. R.Subha 2, Dr.S.Palaniswami 3 1, 2 Assistant Professor 1,2 Department of Computer Science and Engineering, Sri Krishna
More informationSelf-tuning ongoing terminology extraction retrained on terminology validation decisions
Self-tuning ongoing terminology extraction retrained on terminology validation decisions Alfredo Maldonado and David Lewis ADAPT Centre, School of Computer Science and Statistics, Trinity College Dublin
More information