ARKive-ERA Project Lessons and Thoughts

Similar documents
Table of contents for The organization of information / Arlene G. Taylor and Daniel N. Joudrey.

For those of you who may not have heard of the BHL let me give you some background. The Biodiversity Heritage Library (BHL) is a consortium of

Semantic Technologies for Nuclear Knowledge Modelling and Applications

3) CHARLIE HULL. Implementing open source search for a major specialist recruiting firm

Semantic media application with user created content to enhance enjoying cultural heritage

Report from the W3C Semantic Web Best Practices Working Group

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment

4) DAVE CLARKE. OASIS: Constructing knowledgebases around high resolution images using ontologies and Linked Data

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

UDC at the BBC. Alexander, Fran; Stickley, Kathryn; Buser, Vicky; Miller, Libby.

National Data Sharing and Accessibility Policy-2012 (NDSAP-2012)

Enhancing information services using machine to machine terminology services

Uniform Resource Management

DRIVER Step One towards a Pan-European Digital Repository Infrastructure

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

Indexing and subject organisation

The Semantic Web DEFINITIONS & APPLICATIONS

Strategies for Training and Implementation Home

Demo: Linked Open Statistical Data for the Scottish Government

CARARE: project overview

CHM GUIDE. We recommend the Bulgarian Biodiversity Portal web site of the Bulgarian Clearing House Mechanism as a starting point of your visit:

The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003

The CISM Education Plan (updated August 2006)

Purpose, features and functionality

SEMANTIC SUPPORT FOR MEDICAL IMAGE SEARCH AND RETRIEVAL

Introduction to Data Management for Ocean Science Research

Mass Digitisation Enabling Access, Use and Reuse

Metadata Standards and Applications. 6. Vocabularies: Attributes and Values

VI-SEEM Data Repository. Presented by: Panayiotis Charalambous

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

ICT & Digital Cinema A new start? Roberto Cencioni. DG Information Society and Media. Unit INFSO.E2 - Content & Knowledge

A Comparative Study of Teaching Forensics at a University Degree Level

PERIODIC REPORT 3 KYOTO, ICT version April 2012

Eight units must be completed and passed to be awarded the Diploma.

The Road to Discovery is Paved with Standards

Business Benefits of Developing Effective Taxonomies. Cathrin Senn and Ian Davis Taxonomy Consultants

Digital repositories as research infrastructure: a UK perspective

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction

AutoFocus, an Open Source Facet-Driven Enterprise Search Solution

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009

Strategies for Training and Implementation Home

Opus: University of Bath Online Publication Store

DIABLO VALLEY COLLEGE CATALOG

Content Enrichment. An essential strategic capability for every publisher. Enriched content. Delivered.

Get More Out of Hitting Record IT S EASY TO CREATE EXCEPTIONAL VIDEO CONTENT WITH MEDIASITE JOIN

Xyleme Studio Data Sheet

Subject Evening. 30 th September 2015

DIABLO VALLEY COLLEGE CATALOG

Robin Wilson Director. Digital Identifiers Metadata Services

Linked Data and Libraries

Macromedia Breeze. Introducing web communications that really speak to people.

DESIGN WHITE PAPER EVERYTHING IS BEAUTIFUL. POP IS EVERYTHING.

SC32 WG2 Metadata Standards Tutorial

Ontology Servers and Metadata Vocabulary Repositories

Your Student s Head Start on Career Goals and College Aspirations

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications. Tutorial at CVPR 2014 June 23rd, 1:00pm-5:00pm, Columbus, OH

Agricultural bibliographic data sharing & interoperability in China

Edinburgh DataShare: Tackling research data in a DSpace institutional repository

Standards for classifying services and related information in the public sector

On the Way to the Semantic Web

IBE101: Introduction to Information Architecture. Hans Fredrik Nordhaug 2008

Europeana and the Mediterranean Region

Understanding Taxonomies

Data Curation Profile Human Genomics

Towards a joint service catalogue for e-infrastructure services

Image Indexing Project Group Indexing Activities and Final Report Group Four Olga Grushin, Brian McGraw and Collette Spence INFO 622 Winter 2010

Enterprise Multimedia Integration and Search

re3data.org - Making research data repositories visible and discoverable

Easy Ed: An Integration of Technologies for Multimedia Education 1

CG: Computer Graphics

DIGITAL ARCHIVING OF SPECIFIC SCIENTIFIC INFORMATION IN THE CZECH REPUBLIC

BSc (Honours) Computer Science Curriculum Outline

Introduction to Linked Data

7.3. In t r o d u c t i o n to m e t a d a t a

Setting up a CIDOC CRM Adoption and Use Strategy CIDOC CRM: Success Stories, Challenges and New Perspective

The role of vocabularies for estimating carbon footprint for food recipies using Linked Open Data

Enhanced retrieval using semantic technologies:

WaSABi 2014: Breakout Brainstorming Session Summary

Electronic Health Records with Cleveland Clinic and Oracle Semantic Technologies

National Finnish Semantic Web Ontology Project

Semantic Web Update W3C RDF, OWL Standards, Development and Applications. Dave Beckett

1. CONCEPTUAL MODEL 1.1 DOMAIN MODEL 1.2 UML DIAGRAM

Technical documentation. D2.4 KPI Specification

Educating a New Breed of Data Scientists for Scientific Data Management

Museum resources and cross domain digital applications

USC Viterbi School of Engineering

University of Saskatchewan

Copyright 2012 Taxonomy Strategies. All rights reserved. Semantic Metadata. A Tale of Two Types of Vocabularies

Collection Policy. Policy Number: PP1 April 2015

> Semantic Web Use Cases and Case Studies

The Emerging Data Lake IT Strategy

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

INF - INFORMATION SCIENCES

Overview of Web Mining Techniques and its Application towards Web

The Semantics of Semantic Interoperability: A Two-Dimensional Approach for Investigating Issues of Semantic Interoperability in Digital Libraries

Version 11

Development of Contents Management System Based on Light-Weight Ontology

Two interrelated objectives of the ARIADNE project, are the. Training for Innovation: Data and Multimedia Visualization

Tools for Data Management. Research Data Management : Session 3 9 th June 2015

GETTING STARTED WITH DIGITAL COMMONWEALTH

Transcription:

ARKive-ERA Project Lessons and Thoughts Semantic Web for Scientific and Cultural Organisations Convitto della Calza 17 th June 2003 Paul Shabajee (ILRT, University of Bristol) 1

Contents Context Digitisation Projects Worldwide Goals Repurposing in Educational Settings ARKive-ERA Project & Requirements Some Key Problems Identified Why Semantic Web and Other Tools? Conclusion 2

Digital Collections of Multimedia Millions of, & $ invested in development of Very Large Digital Collections of Priceless Historic and Cultural Objects, Images and Films, 3D Environments, Places and Things Otherwise Inaccessible Future immersive 3D environments, wearable computing, new forms of collaboration and interaction Imagine: If these can be used to support learning and teaching in any relevant subject, phase of education, level of education, utilised to help provide personalised learning resources for individuals. Want maximum educational value from of digital collections 3

Digital Collections of Multimedia In the UK e.g. ARKive Multimedia profiles of Animal, Plant & Fungi + Habitats Phase 1 30,000 images, 900hrs video, audio, maps, text Funding - UK Heritage Lottery Fund ( 1.6m), New Opportunities Fund ( 0.5m), HP Labs ($2m for technology research), Others SCRAN, British Pathe other NOF projects, Commercial Organisations and NGOs ARKive-ERA (Educational Repurposing of Assets) Project Investigating key issues in designing systems to provide diverse users with appropriate access to multimedia assets Funded by HP Labs as in support of work with ARKive 4

Repurposing Maximising use across range of user groups and individuals. Across types of media, platform, context Web page Audience A1 Web pages Topic A Web page Audience A2 Web page Topic B Original Image Presentations External Image Database Downloadable files (e.g. PDF) On-line Tutorials 5

Repurposing External Media Database 1 Video 1 Image 2 External Media Resources External Collection Of Presentation (e.g. PowerPoint) Slides from Presentation On-line resources in support of Single Lecture Presentations On-line Tutorials Pulling together resources from different sources The best resources External Modular On-line Tutorials Module of On-line Tutorial External Composite Learning Resources Downloadable files (e.g. PDF) 6

Repurposing How? ARKive - Defining The Collection? Definition Accession Capturing other 'Pieces of Inf ormation' Identif ying Target Species Def ining Contents of Spicies Prof ile Locating Potential Assets Selecting Assets Describing Assets 7

Repurposing How? Human Accession Storage Re-purposing 'Raw ' Asset Metadata Database Asset + Metadata Assets + Metadata Assets + Metadata Asset + Metadata Assets + Metadata Assets + Metadata Composite Object A Species Composite Object B Habitat Animals & Art Composite Object C Controlled Indexing Vocabularies Terms - Consistency A Assets Extracted Using Metadata Computer 8

Some Problems & Issues Capturing ALL the relevant information Comprehensive description of species and their habitats Describing Resources Choosing and creating core vocabularies Inter-indexer consistency & assistance Capturing pieces of information Capture and indexing of pieces of information Representing controversial, partial and changing knowledge Finding Resources Example: The Flying Bird Problem Providing access to diverse audiences Examples: The hat problem & A fundamental dilemma Interoperation with internal & external systems 9

Capturing ALL the Information Wanting to capture comprehensive multimedia profile of a species appearance, life cycle, behaviours, inter-specific relationships different types of profile for different species e.g. plants and animals, and insects and mammals Requires a means of representing the content of those profiles 1) To check that there are materials for ALL of elements of profile 2) To check what is missing what is still to be obtained But new information/knowledge is always being discovered 10

Capturing Pieces of Information During the research phase (which is always!) information is gathered about a species These new pieces of information specific to species need to be captured and integrated into the profiles continuous revision of profile [& new concepts!] New pieces of information about other things e.g. key scientists, geographic regions, how different groups classify types of flight etc needs to be captured and made available i.e. indexed so can be retrieved for repurposing/authoring new content And Much knowledge is controversial, partial and changes 11

Describing Multimedia Resources Terms for controlled vocabulary for describing appearance, life cycle, behaviours, inter-specific relationships e.g. Mapping terms from one biological sub-discipline to another Other subject disciplines & perspectives??? next slide Inter-Indexer Consistency 46% consistency in tagging moving images Of course not only descriptive terms but many other aspects of multimedia to describe too technical, administrative, preservation using metadata standards 12

Providing Access to Diverse Audiences The Hat Problem Diverse (esp. near orthogonal) perspectives on any single asset specialist vocabularies e.g. The Medical Subject Headings (MeSH) - there are more than 19,000 main headings The Art & Architecture Thesaurus (AAT) - contains about 120,000 terms covering objects, textual materials, images, architecture and material culture from antiquity to the present. UK National Curriculum (for schools) Metadata Schema contains some 2000 'subject keywords Comprehensive Description? Costs? Time? Expertise? Which to choose? 13

A Fundamental Dilemma Developers (e.g. ARKive) want to enable all potential educational users to gain maximum benefit from the assets held in their database They do not want to dictate or second guess how people might use an asset However: in order to allow anyone to find anything in a usable and effective manner with finite time and budgets, to index assets a limited choice of metadata terms must be made which in turn requires assumptions about the users and likely uses to which the resource might be put! You don't want to (and can't) predict what your users will want to use the 'raw' multimedia assets for, but if you don't, your users can't get to the assets. 14

Finding Resources Both internal and external re-purposing require that assets/resources can be found at (re)authoring time If you can t find a resource when you need it, is it really there? Effectively No. When traditional approaches break down e.g. Cross searching across domains with little conceptual overlap Example: Do birds that fly, fly? 15

Interoperability BIG VALUE ADDED!!! If linked to other internal and external datasets e.g. news content, geographic, conservation organisations, Web cams Biodiversity Domain Few standard vocabularies and much controversy Diverse groups developing standards about related areas e.g. conservation, government, Politics & Politics are major issues Human Knowledge is just really messy! 16

Solutions Overview/Key Issues? Metadata and knowledge representation Getting into data sets with minimal conceptual overlap Semantic Bootstrapping Auto or assisted/semi-automatic indexing Comprehensive cross subject domain mark-up and/or retrieval of resources Metadata standards and standards mappings 17

Solutions? Capturing ALL the relevant information Describing Resources Capturing and indexing pieces of information Finding Resources Providing access to diverse audiences Interoperation with internal & external systems Thesauri & Simple Taxonomies Ontologies RDF Data Mining & Concept Extraction Content Based Image Retrieval Community Annotation Future Technologies? 18

Solutions Some Limitations and Issues Many seemingly simple things still difficult to represent [in RDF/ontology languages] e.g. all birds, except x, y & z, fly. Many hard problems e.g. controversial, partial and changing knowledge human knowledge in most domains is messy! Ontologies are complex things and impose an invisible world view on users academic issues Rapid and distributed ontology development & management is very problematic Ontology mapping still very problematic CBIR and concept extraction tools for non-text media not very good yet 19

Conclusions ARKive was/is a very good example of a digitisation project Majority of generic problems There were & are many problems! Some tractable with existing technologies others not yet. Some simply to do with people The Semantic Web technologies and approaches offer part of a more comprehensive set of solutions to solve problems 20

Questions 21

Machine Readable Ontologies Schreiber et al (2001), Ontology-Based Photo Annotation, IEEE INTELLIGENT SYSTEMS, MAY/JUNE 2001 22

The Hat Problem Imagine that there is a database of 100,000 images of people in a wide variety of different settings The database is indexed using terms relating to identification of people (name, age, ) and event (time, place ) Now a milliner might see great potential for studying how people use hats. The database is likely to be a very useful resource, they could search to see how the style of hats has changed over time, or what types of hats are most popular, what percentage of women have bows on their hats? and many more specific questions However the database is not indexed using the concept of 'hat BACK 23