Frederick Zarndt Semblanza Frederick Zarndt has worked with historic and contemporary newspaper, journal, magazine, book, and records digitization since computer speeds, software, technology, storage, and costs first made it practical. He worked with the Library of Congress on its pilot implementation of the NDNP National Digital Newspaper Program (2003), with the University of Utah since the beginning of its newspaper digitization program (2002), with the New Zealand National Library on its Papers Past and Parliamentary Papers digitization projects (2006), with Singapore National Library Board on its historic and born digital newspapers conversion projects (2006), with the National Library of Australia and with the State Library of Victoria on the Australian Newspapers Digitization Program (2008), and with many other institutions both small and large. Frederick has experience in every aspect of digitization projects including project requirements development, project management, conversion operations (both in-house and outsourced), acceptance testing, and software development for production and delivery of digital data. Frederick is current chair of the IFLA Newspapers Section (the first nonlibrarian to serve as chair). He presently works as technical, business development, and sales consultant for Digital Divide Data (since 2008), Content Conversion Specialists (since 2005), and DL Consulting (since 2001). Previously he was President of Planman Consulting North America, a subsidiary company to Planman Technologies. Until 2005 he was Chief Technology Officer and one of the co-founders of iarchives / Footnote. While CTO at iarchives, his engineering team created a custom genealogical records data entry application for FamilySearch.org, which is today used by over 400,000 "crowdsource" volunteers worldwide. Frederick has 25+ years experience in software development and is a member of ACM and IEEE and a Certified Software.
digital projects best practices Frederick Zarndt 1
why a digital library? to enhance accessibility of the content in libraries to increase collaboration and cooperation between libraries to promote research to provide opportunities for entrepreneurs 2
how s and what s of a digital library what is a (good) digital library? how should a digital library be designed? how should a digital library be created? how is a digital library measured? how should a digital project be executed? how should a digital library or a digital project be managed? 3
about me 4
digital library overview collections: organized groups of objects objects: digital materials metadata: information about objects and collections initiatives: programs or projects to create, manage, and preserve collections NISO Framework Working Group. A Framework of Guidance for Building Good Digital Collections, 3 rd edition, Dec 2007. 5
project / program / initiative clearly articulate the goals and deliverables of the project conduct formative evaluation to validate the initial goals and deliverables of the project identify what work needs to be done to accomplish the goals and deliverables of the project break down the work into manageable sub-tasks, and identify dependencies between the sub-tasks estimate and allocate the time and resources required to successfully complete each sub-task create a project plan that includes an estimated timetable for the completion of the sub-tasks, estimates the resource requirements for the completion of each sub-task, and identifies key milestones and deliverables in the project NISO Framework Working Group. A Framework of Guidance for Building Good Digital Collections, 3 rd edition, Dec 2007. 6
project phases assess design implement measure manage preserve 7
project phases: waterfall for each project repeat { access create architecture design implement measure manage preserve } until (funding is cut-off) 8
project phases: agile access project create architecture create requirements, acceptance criteria repeat { digitize (small) pilot batch test data against acceptance criteria adjust requirements and acceptance criteria } until (no more adjustments are necessary) digitize more data 9
assess identify the users select the collection or content identify ownership and legal risks define the goals identify applicable standards evaluate capabilities 10
design: standards METS XML container for descriptive, structural, technical, and administrative metadata descriptive metadata MARC representation and communication of descriptive metadata Metadata Object Description Standard (MODS) selected metadata from MARC Dublin Core fundamental group of text elements for describing and cataloging 11
design: standards administrative metadata technical metadata ALTO for OCR text PREMIS for digital preservation MIX for images structural metadata structural map structural links 12
design: standards image and file standards TIFF JPEG2000 PDF, PDF/A, PDF/A-1b, PDF/A-1a TEI JPEG GIF MPEG, WMV, AIFF ISAD(G) etc, etc 13
design: access user community user interface (UI) search authentication and user management digital object presentation portability administration 14
implement: in-house reasons for in-house production collection cannot be moved collection is badly organized digitization must be done slowly over a long period digitization is very simple 15
implement: outsource reasons for outsourced production originals can t be scanned in-house because equipment is too expensive output data is beyond staff experience labor is too expensive large volume of work in a short time insufficient space, infrastructure, or staff 16
implement: software commercial off-the-shelf (COTS) open source customized COTS customized open source custom in-house 17
measure: acceptance criteria automatic quality checks is the digital object complete? is the digital object verifiable? is the digital object uncorrupted? manual quality checks does the metadata meet accuracy specifications? does the text meet accuracy specifications? is the image quality satisfactory? 18
measure: image quality images which are ultimately to be viewed by human beings, the only correct method of quantifying visual image quality is through subjective evaluation. in practice, however, subjective evaluation is usually too inconvenient, time-consuming and expensive best way to assess the quality of an image is to look at it because human eyes are the ultimate viewers of most images Zhou Wang and Hamid R. Sheikh. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Transactions on Image Processing. April 2004 Zhou Wang, Alan Bovick, and Ligang Lu. Why is image quality assessment so difficult? IEEE Transactions on Image Processing. April 2004 19
measure: use who is using the collection? what is the collection being used for? how many page views per day / week / month? how long do visitors to the collection stay? how many repeat visitors to the collection? 20
manage the most important factor in project management is open, honest, and effective communication. the 2 nd most important factor in project management is communication. the 3 rd most important factor in project management is communication. in other words communicate with your colleagues and project stakeholders! 21
manage use project management software to develop a timeline update the timeline as needed develop progress reports and measurement reports specific to the project or use others examples leverage others experiences by attending professional conferences 22
preserve bit rot format obsolescence media obsolescence / decay migration to new media or hardware 23
preserve: bit rot gradual decay of storage media because of media quality storage media because of improper storage data due to random events (bit-flip, software due to interface changes software due to non-obvious or inadvertent configuration changes 24
preserve: media decay a report by NIST and the Library of Congress says that virtually all CD-Rs tested indicated an estimated life expectancy beyond 15 years only 47 percent of recordable DVDs indicated an estimated life expectancy beyond 15 years, some had a life expectancy as short as 1.9 years in practice actual lifetimes may be considerably shorter 25
preserve: media obsolescence 5 ¼ floppy disks 8 track tapes 3 ½ floppy disks ZIP drives CD-R, CD-RW, Blu-Ray microfilm 26
preserve: migration file format changes file name differences: case sensitive /insensitive extended file attributes file permissions soft links / hard links 27
Open Archival Information System (OAIS) reference model: the simple view 28
OAIS reference model: the complicated view 29
collection A digital collection consists of digital objects that are selected and organized to facilitate their discovery, access, and use. Objects, metadata, and the user interface together create the user experience of a collection. NISO Framework Working Group. A Framework of Guidance for Building Good Digital Collections, 3 rd edition, Dec 2007. 30
collection 1. A good digital collection is created according to an explicit collection development policy. 2. Collections should be described so that a user can discover characteristics of the collection, including scope, format, restrictions on access, ownership, and any information significant for determining the collection s authenticity, integrity, and interpretation. 3. A good collection is curated, which is to say, its resources are actively managed during their entire lifecycle. 4. A good collection is broadly available and avoids unnecessary impediments to use. Collections should be accessible to persons with disabilities, and usable effectively in conjunction with adaptive technologies. 5. A good collection respects intellectual property rights. 6. A good collection has mechanisms to supply usage data and other data that allows standardized measures of usefulness to be recorded. 7. A good collection is interoperable. 8. A good collection integrates into the users own workflow. 9. A good collection is sustainable over time. NISO Framework Working Group. A Framework of Guidance for Building Good Digital Collections, 3 rd edition, Dec 2007. 31
objects A digital object represents a discrete unit and is comprised of a digital file or files as well as descriptive metadata. Digital objects begin life in one of two ways: 1. As a digitized file produced as a surrogate for materials that exist in analog format. 2. As a "born digital" entity, with no analog counterpart. NISO Framework Working Group. A Framework of Guidance for Building Good Digital Collections, 3 rd edition, Dec 2007. 32
objects 1. A good object exists in a format that supports its intended current and future use. 2. A good object is preservable. 3. A good object is meaningful and useful outside of its local context. 4. A good object will be named with a persistent, globally unique identifier that can be resolved to the current address of the object. 5. A good object can be authenticated. 6. A good object has associated metadata. NISO Framework Working Group. A Framework of Guidance for Building Good Digital Collections, 3 rd edition, Dec 2007. 33
metadata Metadata is structured information associated with an object for purposes of discovery, description, use, management, and preservation. NISO Framework Working Group. A Framework of Guidance for Building Good Digital Collections, 3 rd edition, Dec 2007. 34
metadata 1. Good metadata conforms to community standards in a way that is appropriate to the materials in the collection, users of the collection, and current and potential future uses of the collection. 2. Good metadata supports interoperability. 3. Good metadata uses authority control and content standards to describe objects and collocate related objects. 4. Good metadata includes a clear statement of the conditions and terms of use for the digital object. 5. Good metadata supports the long-term curation and preservation of objects in collections. 6. Good metadata records are objects themselves and therefore should have the qualities of good objects, including authority, authenticity, archivability, persistence, and unique identification. NISO Framework Working Group. A Framework of Guidance for Building Good Digital Collections, 3 rd edition, Dec 2007. 35
digital project / program Digital programs provide the framework that pulls together people, policies and tools. Projects are activities within programs that have specific goals and are of finite duration. Project planning and program planning have common principles, and both must include plans for ongoing sustainability. NISO Framework Working Group. A Framework of Guidance for Building Good Digital Collections, 3 rd edition, Dec 2007. 36
project / program / initiatives 1. A good digital initiative has a substantial design and planning component. 2. A good digital initiative has an appropriate level of staffing with necessary expertise to achieve its objectives. 3. A good digital initiative follows best practices for project management. 4. A good digital initiative has an evaluation component. 5. A good digital initiative markets itself and broadly disseminates information about the initiative's process and outcomes. 6. A good digital initiative considers the entire lifecycle of the digital collection and associated services. NISO Framework Working Group. A Framework of Guidance for Building Good Digital Collections, 3 rd edition, Dec 2007. 37
references METS, MODS, ALTO, PRISM, etc : http://www.loc.gov/standards OAIS : http://public.ccsds.org/publications/refmodel.aspx NISO standards and guidelines : http://www.niso.org/publications/rp Good Practice Guides : http://www.ukoln.ac.uk And many, many more 38
Frederick Zarndt This work is licensed under the Creative Commons Attribution-ShareAlike (CC by SA) License. To view a copy of this license visit http://creativecommons.org/licenses/by-sa/3.0/ 39
Digital Projects Best Practices Do you have heavily used collections? Do you have materials that are crumbling from age, improper storage, heavy use, bad paper, etc? Is your boss demanding that you digitize something to keep up with worldwide digitization trends? Are you planning a digitization project? If so, this presentation is for you. Digitization is no longer rocket science. Since software, computers, storage, and the internet first made such projects possible at the beginning of this century, a broad range of digital projects and digital libraries theoretical and practical knowledge is readily available. Also a plethora of information about best practices and first-hand experiences. However digitization projects can be daunting, even for experienced practitioners. The variety of standards, confusing and constantly changing file formats, legal issues, and digital preservation issues will make anyone planning such a project think carefully, even the seasoned digital librarian. For example, have you considered: how the digital content will be used? user communities? copyright? file formats? metadata? standards? content delivery? digital preservation? workflow? quality assurance? project management? in-house versus outsource The list doesn t end with this. Even though the list lengthy and may sound like a foreign language, do not be discouraged: It has no mysteries. A well-planned and carefully executed digital project or library will succeed. Others have done it -- you can do it too! In this tutorial we will talk about each of the aforementioned aspects of digital projects and more, from project inception to project conclusion. Some topics will be covered in depth, others only touched upon, according to the audience s wishes.