Simple Archive Architectures

Similar documents
Flexible Design for Simple Digital Library Tools and Services

SIMPLE DIGITAL LIBRARIES

Digitally Preserving African Heritage

Flexible Design for Simple Digital Library Tools and Services

VI-SEEM Data Repository. Presented by: Panayiotis Charalambous

The Open Archives Initiative and the Sheet Music Consortium

National Documentation Centre Open access in Cultural Heritage digital content


Preservation Planning in the OAIS Model

Share.TEC System Architecture

Digital Library Curriculum Development Module 4-b: Metadata Draft: 6 May 2008

The implementation of cultural heritage archives

Building Interoperable and Accessible ETD Collections: A Practical Guide to Creating Open Archives

Joining the BRICKS Network - A Piece of Cake

Data publication and discovery with Globus

The OAIS Reference Model: current implementations

Using DSpace for Digitized Collections. Lisa Spiro, Marie Wise, Sidney Byrd & Geneva Henry Rice University. Open Repositories 2007 January 23, 2007

Preserva'on*Watch. What%to%monitor%and%how%Scout%can%help. KEEP%SOLUTIONS%

The Design of a DLS for the Management of Very Large Collections of Archival Objects

Persistent identifiers, long-term access and the DiVA preservation strategy

Metadata Harvesting Framework

Metadata Workshop 3 March 2006 Part 1

Edinburgh DataShare: Tackling research data in a DSpace institutional repository

Persistent Identifiers for Audiovisual Archives and Cultural Heritage

Research Data Edinburgh: MANTRA & Edinburgh DataShare. Stuart Macdonald EDINA & Data Library University of Edinburgh

Masters Proposal. Meta-standardisation of Interoperability Protocols

The National Digital Library Finna Among Digital Research Infrastructures in Finland

Links, languages and semantics: linked data approaches in The European Library and Europeana. Valentine Charles, Nuno Freire & Antoine Isaac

Metadata Catalogue Issues. Daan Broeder Max-Planck Institute for Psycholinguistics

COAR Interoperability Roadmap. Uppsala, May 21, 2012 COAR General Assembly

RVOT: A Tool For Making Collections OAI-PMH Compliant

DRI: Preservation Planning Case Study Getting Started in Digital Preservation Digital Preservation Coalition November 2013 Dublin, Ireland

The CARARE project: modeling for Linked Open Data

Comparing Open Source Digital Library Software

Persistent identifiers: jnbn, a JEE application for the management of a national NBN infrastructure

Ponds, Lakes, Ocean: Pooling Digitized Resources and DPLA. Emily Jaycox, Missouri Historical Society SLRLN Tech Expo 2018

Assessing the Design of Web Interoperability Protocols

The MIND Approach. Fabio Crestani University of Strathclyde, Glasgow, UK. Open Archive Forum Workshop Berlin, Germany, March 2003

GETTING STARTED WITH DIGITAL COMMONWEALTH

Geospatial Multistate Archive and Preservation Partnership Metadata Comparison

Research on the Interoperability Architecture of the Digital Library Grid

DRI: Dr Aileen O Carroll Policy Manager Digital Repository of Ireland Royal Irish Academy

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication

Getting Started with the Digital Commonwealth. Robin L. Dale Director of Digital & Preservation Services LYRASIS

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

OAI-PMH for dummies: how to build an institutional repository with limited resources?

Nuno Freire National Library of Portugal Lisbon, Portugal

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

The Metadata Challenge:

Pipe Dreams: Harvesting Local Collections into Primo Using OAI-PMH

Working with Islandora

Building Virtual Collections

How to contribute information to AGRIS

Non-text theses as an integrated part of the University Repository

CREATING DIGITAL REPOSITORIES PRESENTED BY CHAMA MPUNDU MFULA CHIEF LIBRARIAN NATIONAL ASSEMBLY OF ZAMBIA

BPMN Processes for machine-actionable DMPs

Omeka. iskills Workshop February 16, Leslie Barnes, Digital Scholarship Librarian

Using Linked Data to Reduce Learning Latency for e-book Readers

Building a Digital Repository on a Shoestring Budget

Ubiquitous and Open Access: the NextGen library. Edmund Balnaves, Phd. Information Officer, IFLA IT Section

Institutional Repositories

School of Computing and Information Sciences

August 14th - 18th 2005, Oslo, Norway. Web crawling : The Bibliothèque nationale de France experience

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

The Scottish Collections Network: landscaping the Scottish common information environment. Gordon Dunsire

Linked Open Europeana: Semantics for the Digital Humanities

How do you know what your users want?

Selecting an Electronic Records Repository Platform

The e-depot in practice. Barbara Sierman Digital Preservation Officer Madrid,

Digital Libraries: Interoperability

Mary Manning Eric Schnell

FLAT: A CLARIN-compatible repository solution based on Fedora Commons

Programme Specification

Europeana and the Mediterranean Region

Transforming Our Data, Transforming Ourselves RDA as a First Step in the Future of Cataloging

Using the WorldCat Digital Collection Gateway

The digital preservation technological context

Invenio: A Modern Digital Library for Grey Literature

Institutional repositories: description of VITAL as an example of a Fedora-based digital assets management system.

Evolving the digital library for digital scholarship enablement

Repository models and policies for preservation

RDA: Where We Are and

Digital Curation and Preservation: Defining the Research Agenda for the Next Decade

Demos: DMP Assistant and Dataverse

Data Curation Profile Human Genomics

The Ohio State University s Institutional Repository: The Knowledge Bank. Knowledge Bank Users Group. 2 nd Annual Meeting May 11, Welcome!

The Open Archives Initiative in Practice:

Research Data Repository Interoperability Primer

Metadata and Encoding Standards for Digital Initiatives: An Introduction

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

Data Curation Profile Plant Genetics / Corn Breeding

Digital Archives: Extending the 5S model through NESTOR

Horizon Societies of Symbiotic Robot-Plant Bio-Hybrids as Social Architectural Artifacts. Deliverable D4.1

CARARE: project overview

Open Archives Initiatives Protocol for Metadata Harvesting Practices for the cultural heritage sector

School of Computing and Information Sciences. Course Title: Mobile Application Development Date: 8/23/10

The Metadata Assignment and Search Tool Project. Anne R. Diekema Center for Natural Language Processing April 18, 2008, Olin Library 106G

CONTENTdm & The Digital Collection Gateway New Looks for Discovery and Delivery

Data Exchange and Conversion Utilities and Tools (DExT)

EPrints: Repositories for Grassroots Preservation. Les Carr,

Transcription:

Simple Archive Architectures Lighton Phiri and Hussein Suleman Digital Libraries Laboratory Department of Computer Science University of Cape Town IFLA '15 Workshop on Digital Libraries: research methods and tools

www.martinwest.uct.ac.za 2

lloydbleekcollection.cs.uct.ac.za 3

Contextual Overview Problems and challenges Preservation costs Technical skills and expertise Computing resources Proposed solution Explicit simplicity and minimalism Principled design of DL tools and services Motivation Successes of minimalism---project Gutenburg 4

Research goals Is it feasible to implement DLSes based on simple architectures? How should simplicity for DLS storage and service architectures be defined? Derivation of design principles Simple repository prototype + case studies What are the implications of simplifying DLS? Developer user study Performance evaluation What are some of the comparative advantages and disadvantages of simple architectures? DSpace 3.1 comparative evaluation 5

Claim #1: Simplicity for DL storage and services can be defined through derivation 6

Design Principles (1) Meta-analysis of popular software applications 12 candidate tools were considered---even split between DL and non-dl tools Tool attributes that potentially influenced design of tools identified Pair-wise comparison done to assess most appropriate attributes Eight guiding design principles derived [1] Applicable for simple and minimalistic architectures 7

Design Principles (2) Principles mapped to potential repository architectural design decisions Applicable principles derived during mapping 8

Simple Repository Prototype File-based Digital objects stored on OS Hierarchical collection structure Metadata objects DC plain text files Object organisation Metadata stored along content Nested objects 9

Case studies Two case studies involving two different collections The Bleek and Lloyd Collection Honours project: Bonolo [5] SARU archaeological database Honours project: The School of Rock Art [6] 10

The Digital Bleek and Lloyd 18,924 content objects with a total size of 6.2GB Two-level collection structure Virtual content objects representing stories Bonolo [5] DLS implemented using repository sub-layer 11

SARU Archaeological database 72,333 content objects with a total size of 283GB Four-level collection structure The School of Rock Art [6] implemented using repository sublayer 12

Claim #2: There are desirable features and advantages possessed by DL tools and services implemented using simple architectures 13

User Study (1) Developer-oriented study Assess simplicity and flexibility of simple repository architecture Target population 34 computer science honours students split into 12 groups of twos and threes Basic developer skills and DL knowledge Approach Participants tasked to build layered services using simple repository Post-experiment survey 14

User Study (2) Wide variety of layered services Wide variety of programming languages used Choice of language not influenced by repository design; only 15% indicated that it did 15

User Study (3) Dublin Core XMLencoded files perceived simple& easy to work with 69% and 61% respectively Repository perceived simple but not easily understandable 62% and 46% respectively 16

User Study (4) Simplicity resulted in more understandable repository layer Most participants found Dublin Core XMLencoded metadata files easy and simple to work with Most participants found hierarchical structure simple but not easily understandable Flexibility of interaction with repository layer unaffected by simplicity No influence on programming languages 17

Performance Evaluation (1) Assess and benchmark performance relative to collection size Typical DL service aspects evaluated. Ingestions, search, OAI-PMH data provider and feed provider Log analysis of production repository informed aspects Comparative assessment with DSpace 3.1 Experimental design Metrics---Response time Factors---Collection size and structure 18

Performance Evaluation (2) Three datasets with 15 linearly increasing workloads; data from NDLTD Union Catalog One-, two- and three-level collection structures Varying objects in different collection structures 19

Performance Evaluation (3) Performance within acceptable limits for medium-sized collections Collections > 12,800 objects affected Information-discovery services---feed, fulltext search and OAIPMH data provider--affected 20

Performance Evaluation (4) Performance benchmarking Performance within acceptable limits for medium sized collections Performance degradation beyond 12 800 objects Performance degradation adversely affects information discovery services; ingestion process unaffected by collection scale Comparison with DSpace 3.1 Ingestion performance outperformed DSpace 3.1 Information discovery services outperformed by DSpace 3.1 21

Conclusions Principled DL design approach undertaken Feasibility of simple DL architectures Minimalism does not affect flexibility and extensibility of DL tools and services Performance acceptable for small- and medium-sized collection Comparable results with well-established solutions 22

Bibliography [1] Lighton Phiri and Hussein Suleman. In Search of Simplicity: Redesigning the Digital Bleek and Lloyd. DESIDOC 12 32(4): 306 312, 2012. [2] Lighton Phiri et al. Bonolo: A General Digital Library System for File-based Collections. ICADL 12 7634:49 58, 2012. [3] Lighton Phiri and Hussein Suleman. Flexible Design for Simple Digital Library Tools and Services. SAICSIT 13 160 169, 2013 [4] Lighton Phiri and Hussein Suleman. Managing cultural heritage: information systems architecture. Facet Publishing 13 134, 2015 [5] Stuart Hammar and Miles Robinson. Bonolo Project URL: http: //goo.gl/etblcr [6] Kaitlyn Crawford et al. The School of Rock Art. URL: http://goo. gl/u092eh 23

Questions? Additional information http://dl.cs.uct.ac.za