Building Collections Using Greenstone

Similar documents
How to Build a Digital Library

Mary Manning Eric Schnell

Compound or complex object: a set of files with a hierarchical relationship, associated with a single descriptive metadata record.

5. Digital Library Creation and Management 5.2. Greenstone digital library software practical

Metadata and Encoding Standards for Digital Initiatives: An Introduction

Creating and Customizing Digital Library Collections with the Greenstone Librarian Interface

Comparing Open Source Digital Library Software

Greenstone in Practice: Implementations of an Open Source Digital Library System

Repository Interoperability and Preservation: The Hub and Spoke Framework

International Implementation of Digital Library Software/Platforms 2009 ASIS&T Annual Meeting Vancouver, November 2009

Using DSpace for Digitized Collections. Lisa Spiro, Marie Wise, Sidney Byrd & Geneva Henry Rice University. Open Repositories 2007 January 23, 2007

Greenstone Publications

CREATING DIGITAL LIBRARIES USING GSDL

Building up a Digital Library with Greenstone A Self-Instructional Guide for Beginner's

Practical Experiences with Ingesting Materials for Long-Term Preservation

SobekCM. Compiled for presentation to the Digital Library Working Group School of Oriental and African Studies

Importance of cultural heritage:

The Case of the 35 Gigabyte Digital Record: OCR and Digital Workflows

How to Build a Digital Library

You may print, preview, or create a file of the report. File options are: PDF, XML, HTML, RTF, Excel, or CSV.

Building for the Future

CCS Content Conversion Specialists. METS / ALTO introduction

An introduction to the Metadata Encoding and Transmission Standard METS)

Alphabet Soup: A Metadata Overview Melanie Schlosser Metadata Librarian

Data Exchange and Conversion Utilities and Tools (DExT)

Evaluation of Islandora & SobekCM

GETTING STARTED WITH DIGITAL COMMONWEALTH

International Journal of Current Multidisciplinary Studies Available Online at Vol. 3, Issue,04, pp , April, 2017

The Argus Digital Collection: A Digital Archive for Access and Preservation

LIMB Processing Release Notes

Virtual Collections. Challenges in Harvesting and Transforming Metadata from Harvard Catalogs for Topical Collections

CREATION OF A DIGITAL REPOSITORY FOR THE MASTER THESIS OF THE FACULTY OF JOURNALISM, LIBRARY AND INFORMATION SCIENCE OF THE OSLO UNIVERSITY COLLEGE

Building A Digital Library of Agricultural Documents Using Open Source Software

QUARK AUTHOR THE SMART CONTENT TOOL. INFO SHEET Quark Author

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

Common Specification (FGS-PUBL) for deposit of single electronic publications to the National Library of Sweden - Kungl.

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

Kalaivani Ananthan Version 2.0 October 2008 Funded by the Library of Congress

CONTENTdm & The Digital Collection Gateway New Looks for Discovery and Delivery

PAWN: A Novel Ingestion Workflow Technology for Digital Preservation. Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall

Next Generation Library Catalogs: opportunities. September 26, 2008

EMC DOCUMENT SCIENCES INTERACTIVE DOCUMENT DEVELOPMENT KIT

WEB APPLICATION DEVELOPMENT. How the Web Works

Creating An Xml Mapping Schema In Excel 2007

Discovering Information through Summon:

Creating Accessible PDFs

In 2012, KnowledgeLake Professional Services successfully completed Phase I of a multi phase ECM engagement with

Automated Classification. Lars Marius Garshol Topic Maps

Web Mapping Applications with ArcGIS. Bernie Szukalski Derek Law

Blackwell Synergy New Design Preview

The Swedish National Archives digital preservation. Mats Berggren, IT-department,

Managing Learning Objects in Large Scale Courseware Authoring Studio 1

Preservation Standards (& Specifications) (&& Best Practices)

Building Digital Collection with Greenstone: Development and Customization

Its All About The Metadata

7.3. In t r o d u c t i o n to m e t a d a t a

Document. conversion. Internal format. Classify. Document. conversion. Internal format

Greenstone: Open source Software for Building Digital Library Collections

INFOLIB2015 USER INSTRUCTION GUIDE

Content Syndication Implementation Guide

Open Archives Initiatives Protocol for Metadata Harvesting Practices for the cultural heritage sector

Lessons Learned. Implementing Rosetta in the Harold B. Lee Library

Orbis Cascade Alliance Archives & Manuscripts Collections Service. ArchivesSpace Usage Manual: Digital Objects. Introduction to Digital Objects

Frederick Zarndt Semblanza

Web-based workflow software to support book digitization and dissemination. The Mounting Books project

Exactly User Guide. Contact information. GitHub repository. Download pages for application. Version

Below is an example workflow of file inventorying at the American Geographical Society Library at UWM Libraries.

A Workflow for Document Level Interoperability

Phyllis Kaiden. Product Manager, Digital Collection Services. Project Client Server Catcher Website: End-User Experience Redesign

SciX Open, self organising repository for scientific information exchange. D15: Value Added Publications IST

Hello, I m Melanie Feltner-Reichert, director of Digital Library Initiatives at the University of Tennessee. My colleague. Linda Phillips, is going

METS For The Cultural Heritage Community: A Literature Review

Comparative evaluation of open source digital library packages

Exactly User Guide. Contact information. GitHub repository. Download pages for application. Version

Update on the TDL Metadata Working Group s activities for

CREATING DIGITAL REPOSITORIES PRESENTED BY CHAMA MPUNDU MFULA CHIEF LIBRARIAN NATIONAL ASSEMBLY OF ZAMBIA

The Open Archives Initiative and the Sheet Music Consortium

Dexterity: Data Exchange Tools and Standards for Social Sciences

Dr. MIQUEL MONTANER CTO at Easy Innova. Dr. VÍCTOR MUÑOZ R&D Manager at Easy Innova. XAVI TARRÉS Project Manager at Easy Innova

Adobe. Using DITA XML for Instructional Documentation. Andrew Thomas 08/10/ Adobe Systems Incorporated. All Rights Reserved.

Digital object and digital object component records

Regular Forum of Lreis. Speechmaker: Gao Ang

Tutorial 8 Sharing, Integrating and Analyzing Data

DigiTool for Course Support at Notre Dame. Pascal Calarco, University of Notre Dame IGeLU 2007 Brno, Czech Republic September 3, 2007

Pipe Dreams: Harvesting Local Collections into Primo Using OAI-PMH

CONTENTdm 4.3. Russ Hunt Product Specialist Barcelona October 30th 2007

Slide 1 & 2 Technical issues Slide 3 Technical expertise (continued...)

Norcom. e-fileplan Electronic Cabinet System

Assessment of product against OAIS compliance requirements

Using the WorldCat Digital Collection Gateway

Brown University Libraries Technology Plan

Archivists Toolkit: Description Functional Area

Building The Czech Digital Mathematics Library upon DSpace System

SSQA Seminar Series. Server Side Testing Frameworks. Sachin Bansal Sr. Quality Engineering Manager Adobe Systems Inc. February 13 th, 2007

2013, Active Commerce 1

Mitigating Preservation Threats: Standards and Practices in the National Digital Newspaper Program

Addressing the E-Journal Preservation Conundrum: Understanding Portico

July 2016 HOW TO VIEW PRODUCT DATA USING THE PRODUCT DETAILS SCREEN QUICK REFERENCE GUIDE. Figure 1 Product Detail Search

INSTITUTIONAL REPOSITORY

SpringCM. Release Notes December 2017

Transcription:

Building Collections Using Greenstone Tod A. Olson <tod@uchicago.edu> Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library http://www.lib.uchicago.edu/dldc/ talks/2003/dlf-greenstone/

Greenstone New Zealand Digital Library Project at the University of Waikato In cooperation with UNESCO, Human Info NGO International, every continent Examples: Academic Digitization projects Classes on digital libraries Non-academic UNESCO humanitarian documentation

Greenstone features Works with existing documents Imports several formats Searching: full text and metadata Dublin Core, custom metadata Browse Structured documents Indexing, access Extensible & customizable OpenSource software (GPL)

User Interface overview Finding documents Search full text and metadata indexes Classifiers: browse lists for navigating collections Navigating documents Navigate hierarchical documents by logical structure Simple page turning (not shown) Single page for simple documents (not shown)

Greenstone Architecture Receptionist Receptionist Protocol Collection Server Collection Server Collection DB & Indexes Collection DB & Indexes Collection DB & Indexes Import Import Import Redrawn from Witten & Bainbridge, How to Build a Digital Library, p. 356

Greenstone Architecture Receptionist Provides user interface Accept user input Send to appropriate collection server Accept results Dynamic page generation Collection Server Handle collection content Search and filter information Return results multiple collections

Building Collections HTML PDF Import GSAF Build DB & Indexes???

Building collections Create a collection framework or work with an old collection Select documents Import documents Converts to internal XML format (GSAF) Build collection creates search indexes and browse listings

GSAF: internal XML format <Section> <Description> <Metadata name= Title value= > <Content> [Text, images, links, etc.] <Section> <Description> <Metadata name= Title > <Content> <Section> <Section> <Section>

GSAF: internal XML format Section: Description Metadata fields Content Text,internal markup, images Section No limit in number or depth Hierarchical documents Sections nest, tree structure

Config file: collect.cfg Collection-specific configuration file, collect.cfg, specifies: file types to import Indexes and browse lists Document or section level paragraph (text index only) display of results and browse listings document displays

Chopin Early Editions Over 400 early edition Chopin scores 1830 s to 1880 s Target audience: music scholars & musicians. On web, page-turnable JPEG images. Online in March 2003 Currently 372 scores in online collection Usage: Nearly100 hits per day, > 30% of use is international.

Build overview Catalog records Scanned Images METS & MODS XSLT Greenstone Archive Format Greenstone Dig. Library Software Structural metadata Human processing XML-based automated processing

Catalog records Detailed MARC/AACR2 record for each score Luxury: few print music collections have this much metadata

Scanned score images Scanned by Preservation staff Archival TIFF images 400dpi, 24-bit color, uncompressed JPEG derivatives for web-based delivery

Structural and other metadata "chopin","108","001","","1","" "chopin","108","002","","1","" "chopin","108","003","1","1","nocturne, no.15" "chopin","108","004","2","1","" "chopin","108","005","3","1",""

Structural metadata Identify each image document (score) no. & sequencial image no. Image content: page no. as printed, milestones Staff use familiar RDB product Export data in CSV format Technical metadata recorded, not yet used

Build overview Catalog records Scanned Images METS & MODS XSLT Greenstone Archive Format Greenstone Dig. Library Software Structural metadata Human processing XML-based automated processing

METS & MODS Catalog record (MARC) Scanned images (JPEG) Structural metadata dmdsec MODS filesec URL: page1.jpg URL: page2.jpg structmap div DMDID=1 div FILEID=1 div FILEID=2

METS & MODS Program uses structural metadata to: Generate structmap Generate image URLs for filesec Images stored by naming convention Structural md carries catalog record no. Extract MARC from catalog crosswalk to MODS Embed in dmdsec

GSAF XML format for internal storage Hierarchical document structure Nested sections: e.g. part 1, chapt. 2 METS to GSAF via XSLT Natural mapping from METS to GSAF Map structural hierarchy Follow links Descriptive metadata File content

METS to GSAF dmdsec MODS: Title, filesec page1.jpg page2.jpg structmap div: Score div: Page 1 div: Page 2 Section Description Metadata: Title, Content: Title, Section Content: Page 1 page1.jpg Section Content: Page 2 page2.jpg

METS to GSAF dmdsec MODS: Title, filesec page1.jpg page2.jpg structmap div: Score div: Page 1 div: Page 2 Section Description Metadata: Title, Content: Title, Section Content: Page 1 page1.jpg Section Content: Page 2 page2.jpg

METS to GSAF dmdsec MODS: Title, filesec page1.jpg page2.jpg structmap div: Score div: Page 1 div: Page 2 Section Description Metadata: Title, Content: Title, Section Content: Page 1 page1.jpg Section Content: Page 2 page2.jpg

METS to GSAF Walk structural metadata to create the tree of <Section> elements Descriptive metadata: <Description> Crosswalk to desired metadata names <Content>: Format metadata desired for display File data <Content>: Inline text, link to images, etc.

Customizing Chopin collection Focus on navigation Metadata for custom access E.g. genre, dedicatee not in MARC/AACR2 Can support with METS, MODS, Greenstone Custom document navigation Separate description from scores Custom page navigation Improves usability Branding in next phase

Comments on Chopin Early Editions Data created by staff using familiar tools Structural md created in desktop application Catalog records a luxury Catalog is DB of record Project IDs in 909 POIs point into Greenstone METS/MODS assembled by program Expect to repurpose METS for other applications Customization: navigation, not branding Faster to bring up collection, get user reaction

Greenstone benefits for Chopin Robust, mature system Recovered time in project Fast to bring up UI out of the box Dynamic page generation Incremental customization XML compliant Natural mapping from METS to GSAF

Future work: Chopin Add DjVu image format Repurpose METS for other applications OAI Standardize new digitization production flow Project was first for METS, MODS, GS, & 6 depts. Standardize collection of structural metadata Plug in descriptive metadata as appropriate Store archival descriptive metadata in METS object Repurpose via XSLT for delivery

Other custom UI examples Lehigh Digital Bridges Extensive changes to look Washington Research Libraries Consortium (WRLC) Custom page banner Popup page turner in Perl GS as component of DL suite

Ongoing work: Greenstone Greenstone Librarian Interface (GLI) Greenstone 3

Greenstone Librarian Interface (GLI) Collection management Informed by work at GS sites Assist collection designer Support all phases of collection build process Do not specify workflow Java-based GUI tool Formerly called the Gatherer 2 yrs in development In beta outside of lab Bangalore, other sites in current distribution

GLI functions Establish new collection (or work on old) Select files to include in collection Enrich files with metadata Select indexes, classifiers Build collection Customize appearance Preview collection

Greenstone 3 GS2 mature, 5+ yrs., wide deployment Constraints: support legacy systems Other technologies have matured: Java, XML GS3: rewrite in Java, XML, XSLT Distributed architecture, SOAP METS as internal format Group assembled for Greenstone METS profile(s) OAI support planned 1 year in dev; alpha testing in lab

Conclusion Positive experiences Good direction for development Strong user community Proven in real digital library projects

Links & Further Information Chopin Early Editions: http://chopin.lib.uchicago.edu/ Greenstone: http://www.greenstone.org/ Downloads, documentation, examples New Zealand Digital Library Project: http://www.nzdl.org/ UNESCO & related collections, many demos Witten & Bainbridge. How to Build a Digital Library. Morgan Kaufman, 2003.