How to Build a Digital Library

Similar documents
How to Build a Digital Library

Elba Project. Procedures and general norms used in the edition of the electronic book and in its storage in the digital library

Part III: Survey of Internet technologies

Building Collections Using Greenstone

Digitisation Standards

RECOMMENDED FILE FORMATS

The Case of the 35 Gigabyte Digital Record: OCR and Digital Workflows

SAS/IntrNet 9.3. Overview. SAS Documentation

Compound or complex object: a set of files with a hierarchical relationship, associated with a single descriptive metadata record.

Web-based workflow software to support book digitization and dissemination. The Mounting Books project

Creating and Customizing Digital Library Collections with the Greenstone Librarian Interface

Summary of Bird and Simons Best Practices

Information retrieval concepts Search and browsing on unstructured data sources Digital libraries applications

Sustainable File Formats for Electronic Records A Guide for Government Agencies

SOFTWARE AND MULTIMEDIA. Chapter 6 Created by S. Cox

AVS4YOU Programs Help

Structured documents

EXCELLENT ACADEMY OF ENGINEERING. Telephone: /

Lesson 5: Multimedia on the Web

IMMERSIVE TERMS AND DEFINITIONS

2 Webpage Markup with HTML HTML5 Page Structure Creating a Webpage HTML5 Elements and Entities

Paraben Examiner 9.0 Release Notes

Toward a Knowledge-Based Solution for Information Discovery in Complex and Dynamic Domains

USER GUIDE TO THE DIGITAL LIBRARY OF IBERO-AMERICAN HERITAGE (BDPI)

Media Types. Web Architecture and Information Management [./] Spring 2009 INFO (CCN 42509) Contents. Erik Wilde, UC Berkeley School of

Table of contents for The organization of information / Arlene G. Taylor and Daniel N. Joudrey.

How to Build a Digital Library

Features & Functionalities

Different File Types and their Use

Greenstone Publications

Lecture 19 Media Formats

Java Applets, etc. Instructor: Dmitri A. Gusev. Fall Lecture 25, December 5, CS 502: Computers and Communications Technology

Creating Compound Objects (Documents, Monographs Postcards, and Picture Cubes)

Lecture 5. Digital Media Components Markup and Scripting Languages Multimedia Tools Facilities Provided by the School Suggested Reading

The Journal of Insect Science

Technical University of Munich - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Features & Functionalities

Alphabet Soup: A Metadata Overview Melanie Schlosser Metadata Librarian

Comparing Open Source Digital Library Software

International Implementation of Digital Library Software/Platforms 2009 ASIS&T Annual Meeting Vancouver, November 2009

Paraben s Network Examiner 7.0 Release Notes

SharePoint Archival Storage Strategies & Technologies January Porter-Roth Associates 1

Unicode. Standard Alphanumeric Formats. Unicode Version 2.1 BCD ASCII EBCDIC

The Use of Search Engines for Massively Scalable Forensic Repositories

Introduction to Web Concepts & Technologies

User s Guide: Advanced Functions

MEDIA RELATED FILE TYPES

Chapter 7. The Application Layer. DNS The Domain Name System. DNS Resource Records. The DNS Name Space Resource Records Name Servers

Building for the Future

Archivists Toolkit: Description Functional Area

CPSC 301: Computing in the Life Sciences Lecture Notes 16: Data Representation

Draft Digital Preservation Policy for IGNCA. Dr. Aditya Tripathi Banaras Hindu University Varanasi

A tool for Entering Structural Metadata in Digital Libraries

CREATING DIGITAL LIBRARIES USING GSDL

How A Website Works. - Shobha

DjVu Technology Primer

Using DSpace for Digitized Collections. Lisa Spiro, Marie Wise, Sidney Byrd & Geneva Henry Rice University. Open Repositories 2007 January 23, 2007

Internet Standards for the Web: Part II

Advanced High Graphics

Delivery Options: Attend face-to-face in the classroom or via remote-live attendance.

Introduction. WWW history. Application concepts, WWW and HTTP Perspectives on (mobile) communications. Fredrik Alstorp Torbjörn Söderberg

Final Term ( ) Marking Scheme Multimedia and Web Technology (067)

Software for Digital Library

XML: Introduction. !important Declaration... 9:11 #FIXED... 7:5 #IMPLIED... 7:5 #REQUIRED... Directive... 9:11

What Is SAS? CHAPTER 1 Essential Concepts of Base SAS Software

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

Repository Software Survey, March 2009

Delivery Options: Attend face-to-face in the classroom or remote-live attendance.

CSS Classes and GIMP Tutorial

Introduction to Information Systems

Importance of cultural heritage:

PROCESSING AND CATALOGUING DATA AND DOCUMENTATION - QUALITATIVE

Acknowledgments... xix

DiskSavvy Disk Space Analyzer. DiskSavvy DISK SPACE ANALYZER. User Manual. Version Dec Flexense Ltd.

Multimedia. What is multimedia? Media types. Interchange formats. + Text +Graphics +Audio +Image +Video. Petri Vuorimaa 1

Copyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML

Image coding and compression

Metadata and Encoding Standards for Digital Initiatives: An Introduction

Page Delivery Service User Guide

Mastering phpmyadmiri 3.4 for

Importing documents and metadata into digital libraries: Requirements analysis and an extensible architecture

SobekCM METS Editor Application Guide for Version 1.0.1

Archives in a Networked Information Society: The Problem of Sustainability in the Digital Information Environment

Delivery Context in MPEG-21

Corso di Biblioteche Digitali

PROCESSING AND CATALOGUING DATA AND DOCUMENTATION: QUALITATIVE

Formatting Support: Word 2008

Chapter 2. Architecture of a Search Engine

Building A Digital Library of Agricultural Documents Using Open Source Software

Integration Test Plan

Open Source Software Packages for E-Resource Management

Table of Checkpoints for User Agent Accessibility Guidelines 1.0

Silver Oak College of Engineering and Technology Information Technology Department Mid Semester 2 Syllabus 6 th IT

CONTENTdm & The Digital Collection Gateway New Looks for Discovery and Delivery

Lesson 5: Multimedia on the Web

Elementary Computing CSC 100. M. Cheng, Computer Science

References differences between SVG 1.1 Full and SVG 1.2 Tiny

Greenstone: A Comprehensive Open-Source Digital Library Software System

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

Funcom Multiplayer Online Games - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Transcription:

How to Build a Digital Library Ian H. Witten & David Bainbridge Contents Preface Acknowledgements i iv 1. Orientation: The world of digital libraries 1 One: Supporting human development 1 Two: Pushing on the frontiers of science 1 Three: Preserving a traditional culture 2 Four: Exploring popular music 2 The scope of digital libraries 3 1.1. Libraries and digital libraries 4 1.2. The changing face of libraries 5 In the beginning 6 The information explosion 7 The Alexandrian principle 9 Early technodreams 9 The library catalog 10 The changing nature of books 11 1.3. Digital libraries in developing countries 12 Disseminating humanitarian information 12 Disaster relief 13 Preserving indigenous culture 13 Locally produced information 14 The technological infrastructure 14 1.4. The Greenstone software 15 1.5. The pen is mighty: wield it wisely 18 Copyright 18 Collecting from the Web 19 Illegal and harmful material 23 Cultural sensitivity 23 1.6. Notes and sources 24 2. Preliminaries: Sorting out the ingredients 1 2.1. Sources of material 2 Ideology 2 Converting an existing library 3 Building a new collection 4 Virtual libraries 5 2.2. Bibliographic organization 7 Objectives of a bibliographic system 7 Bibliographic entities 8 2.3. Modes of access 13 2.4. Digitizing documents 16

Scanning 16 Optical character recognition 17 Interactive OCR systems 19 Page handling 22 Planning an image digitization project 23 Inside an OCR shop 24 An example project 25 2.5. Notes and sources 26 3. Presentation: User interfaces 1 3.1. Presenting documents 3 Hierarchically structured documents 3 Plain, unstructured text documents 4 Page images 6 Page images and extracted text 7 Audio and photographic images 8 Video 9 Music 9 Foreign languages 10 3.2. Presenting metadata 11 3.3. Searching 13 Types of query 14 Case-folding and stemming 16 Phrase searching 18 Different query interfaces 20 3.4. Browsing 22 Browsing alphabetical lists 23 Ordering lists of words in Chinese 23 Browsing by date 25 Hierarchical classification structures 25 3.5. Phrase browsing 26 A phrase browsing interface 26 Keyphrases 28 3.6. Browsing using extracted metadata 29 Acronyms 29 Language identification 30 3.7. Notes and sources 30 4. Documents: The raw material 1 4.1. Representing characters 3 Unicode 4 The Unicode character set 5 Composite and combining characters 6 Unicode character encodings 8 Hindi and related scripts 10 Using Unicode in a digital library 14 4.2. Representing documents 14 Plain text 14 Indexing 15 Word segmentation 17

4.3. Page description languages: PostScript and PDF 19 PostScript 20 Fonts 23 Text extraction 25 Using PostScript in a digital library 28 Portable Document Format: PDF 29 PDF and PostScript 32 4.4. Word-processor documents 33 Rich Text Format 34 Native Word formats 38 Latex format 39 4.5. Representing images 40 Lossless image compression: GIF and PNG 41 Lossy image compression: JPEG 43 Progressive refinement 46 4.6. Representing audio and video 48 Multimedia compression: MPEG 48 MPEG video 50 MPEG audio 51 Mixing media 52 Other multimedia formats 53 Using multimedia in a digital library 54 4.7. Notes and sources 55 5. Markup and metadata: Elements of organization 1 5.1. Hypertext markup language 3 Basic HTML 4 Using HTML in a digital library 6 5.2. Extensible markup language XML 6 Development of markup and stylesheet languages 7 The XML metalanguage 8 Parsing XML 10 Using XML in a digital library 11 5.3. Presenting marked up documents 11 Cascaded style sheets: CSS 12 Extensible stylesheet language: XSL 15 5.4. Bibliographic metadata 20 MARC 20 Dublin Core 21 BibTeX 23 Refer 24 5.5. Metadata for images and multimedia 24 Image metadata: TIFF 25 Multimedia metadata: MPEG-7 26 5.6. Extracting metadata 28 Extracting document metadata 29 Generic entity extraction 29 Bibliographic references 30 Language identification 31 Acronym extraction 32 Keyphrase extraction 33 Phrase hierarchies 36 5.7. Notes and sources 38

6. Construction: Building collections 1 6.1. Why Greenstone? 2 What it does 2 How to use it 4 6.2. Using the Collector 6 Creating a new collection 8 Working with existing collections 11 Document formats 12 6.3. Walkthrough 13 Getting started 14 Making a framework for the collection 14 Importing the documents 15 Building the indexes 16 Installing the collection 17 6.4. Importing and building 17 Files and directories 18 Object identifiers 19 Plugins 20 The import process 21 The build process 22 6.5. Greenstone archive documents 24 Document metadata 25 Inside the documents 25 6.6. Collection configuration file 26 Default configuration file 26 Subcollections and supercollections 27 6.7. Getting the most out of your documents 29 Plugins 29 Classifiers 34 Format statements 38 6.8. Building collections graphically 42 6.9. Notes and sources 43 7. Delivery: How Greenstone works 1 7.1. Users, processes and protocols 1 Processes 2 The null protocol implementation 2 The Corba protocol implementation 3 7.2. Preliminaries 3 The macro language 3 The collection information database 8 7.3. Responding to user requests 10 Performing a search 11 Retrieving a document 12 Browsing a hierarchical classifier 13 Generating the home page 13 Using the protocol 14 Actions 14

7.4. Operational aspects 15 Configuring the receptionist 16 Configuring the site 19 7.5. Notes and sources 19 8. Interoperability: Standards and protocols 1 8.1. More markup 2 Names 2 Links 4 Types 7 8.2. Resource description 9 Collection-level metadata 11 8.3. Document exchange 12 Open ebook 13 8.4. Query languages 15 Common command language 16 XML Query 18 8.5. Protocols 20 Z39.50 21 Supporting the Z39.50 protocol 22 The Open Archives Initiative 22 Supporting the OAI protocol 24 8.6. Research protocols 25 Dienst 25 Simple digital library interoperability protocol 26 Translating between protocols 27 Discussion 28 8.7. Notes and sources 29 9. Visions: Future, past, and present 1 9.1. Libraries of the future 2 Today s visions 2 Tomorrow s visions 3 Working inside the digital library 5 9.2. Preserving the past 6 The problem of preservation 6 A tale of preservation in the digital era 7 The digital dark ages 8 Preservation strategies 9 9.3. Generalized documents: a challenge for the present 12 Digital libraries of music 12 Other media 14 Generalized documents in Greenstone 16 Digital libraries for oral cultures 18 9.4. Notes and sources 19 References 1 Glossary of terms 10 Appendix A Installing and operating Greenstone 16

A.1 Installation procedure 17 Windows 17 Unix 19 How to find Greenstone 21 Testing and troubleshooting 22 Greenstone collections 22 Associated software 23 A.2 Setting up the web server 24 Apache web server 24 Security 26 PWS and IIS web servers 26 File permissions 27 A.3 Managing your site 27 Personalizing the Greenstone home page 27 Redirecting a URL to Greenstone 28 Administrative facility 28 Configuration files and logs 29 User management 29 Technical information 30 Appendix B Greenstone source code 36 B.1 Foundations 36 Text_t object 36 Library code 37 Protocol API 38 B.2 Collection server 38 Search object 39 Source object 40 Filter object 40 Collection server code 41 B.3 Receptionist 42 Actions 42 Formatting 44 Macro language 44 Receptionist code 45 B.4 Initialization 48