Final Report. Phase 2. Virtual Regional Dissertation & Thesis Archive. August 31, Texas Center Research Fellows Grant Program

Similar documents
QUESTIONS AND CONTACTS

Thesis/Dissertation Submission Guidelines The Graduate School Valdosta State University

Archivists Workbench: White Paper

Introduction to Archivists Toolkit Version (update 5)

Response to the Validation Panel for the DIT Foundation Programmes

UNIFORM STANDARDS FOR PLT COURSES AND PROVIDERS

Metadata Framework for Resource Discovery

Plunkett Research Online

REQUEST FOR PROPOSALS

CTE Program Proposal. NAME OF COLLEGE: Bakersfield College. FACULTY CONTACT: Creighton Magers DATE: 11/19/2015

OPPORTUNITY FOR PLACEMENT AHRC Knowledge Exchange Fellow FuseBox/Wired Sussex

Beginning To Define ebxml Initial Draft

NORTH/WEST PASSAGE. Operations and Travel Information Integration Sharing (OTIIS) Website Structure and Ownership. August 2016

USER EXPERIENCE DESIGN (UXD)

Development of Web Applications for Savannah River Site

XML-based production of Eurostat publications

The Journal of Insect Science

First Session of the Asia Pacific Information Superhighway Steering Committee, 1 2 November 2017, Dhaka, Bangladesh.

Component Specification NFQ Level 5. Web Authoring 5N Component Details. Web Authoring. Level 5. Credit Value 15

DRS Policy Guide. Management of DRS operations is the responsibility of staff in Library Technology Services (LTS).

Update on the TDL Metadata Working Group s activities for

PROJ 302. Project Report, Poster and Digest Guidelines. for Industrial Engineering Students. Summer 2017

Florida Coastal Everglades LTER Program

AIM. 10 September

The Cisco Show and Share mobile client for Apple ios devices will provide the following features when connected to a Cisco Show and Share system:

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS. INTRODUCTION TO INTERNET SOFTWARE DEVELOPMENT CSIT 2230 (formerly CSIT 2645)

CISER Data Archive Collection Policy

Metadata Workshop 3 March 2006 Part 1

The main website for Henrico County, henrico.us, received a complete visual and structural

Development of an e-library Web Application

USING EPORTFOLIOS TO PROMOTE STUDENT SUCCESS THROUGH HIGH- IMPACT PRACTICES

DEPARTMENT OF HEALTH and HUMAN SERVICES. HANDBOOK for

CTI Short Learning Programme in Internet Development Specialist

ebook library PAGE 1 HOW TO OPTIMIZE TRANSLATIONS AND ACCELERATE TIME TO MARKET

STRATEGIC PLAN

Community College of Beaver County. Request for Proposal. April 15, 2019

Development and Implementation of an Online Database for Bibliographic Control of Governments Documents in a Selective Depository

Living Library Federica Campus Virtuale Policy document

trends in ARCHIVES PRACTICE MODULE 3 DESIGNING DESCRIPTIVE AND ACCESS SYSTEMS Daniel A. Santamaria CHICAGO

About the Digital Bible Library

ECE Senior Design Team 1702 Project Proposal

POSITION DETAILS. Content Analyst/Developer

KM COLUMN. How to evaluate a content management system. Ask yourself: what are your business goals and needs? JANUARY What this article isn t

Qualification details

THE JOURNEY OVERVIEW THREE PHASES TO A SUCCESSFUL MIGRATION ADOPTION ACCENTURE IS 80% IN THE CLOUD

CS7026: Authoring for Digital Media. Introduction Markup Languages

PG Certificate Web Design and Development. Course Structure. Course Overview. Web Development and User Experience - ARMC243S7 Overview

Delaware State University School of Graduate Studies and Research. Electronic Thesis and Dissertation

Cameron Stewart Technical Publications Product Manager, xmatters. MadCap Flare native XML singlesource content authoring software

COMPLETION OF WEBSITE REQUEST FOR PROPOSAL JUNE, 2010

Programme Specification Section 1

A Digital Preservation Roadmap for Public Media Institutions

COMPUTER INFORMATION SYSTEMS

Digital Library on Societal Impacts Draft Requirements Document

PRISM - FHF The Fred Hollows Foundation

Microsoft SharePoint Server 2013 Plan, Configure & Manage

Using the Web in Your Teaching

Invitation to Tender Content Management System Upgrade

Interactive Media CTAG Alignments

Study on XML-based Heterogeneous Agriculture Database Sharing Platform

Submitting your Dissertation/ Thesis Electronically: A Guide for Graduate Students

Quantum, a Data Storage Solutions Leader, Delivers Responsive HTML5-Based Documentation Centers Using MadCap Flare

ImageNow eforms. Getting Started Guide. ImageNow Version: 6.7. x

PeopleSoft Applications Portal and WorkCenter Pages

ETSETB Academic Regulations for MET and MEE Master Degree Theses (TFM - Treball Fi de Màster)

Web Engineering. Introduction. Husni

Advanced Solutions of Microsoft SharePoint Server 2013 Course Contact Hours

Entering Finding Aid Data in ArchivesSpace

Oracle BI Publisher 11g R1: Fundamentals

Advanced Solutions of Microsoft SharePoint 2013

Foundation of Web Goal 4: Proficiency in Adobe Dreamweaver CC

Information Technology Accessibility Policy

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

Creating descriptive metadata for patron browsing and selection on the Bryant & Stratton College Virtual Library

CTI Higher Certificate in Information Systems (Internet Development)

Editorial Guidelines for EHLT News e.publication for staff

Creating a Course Web Site

Computer Science Department

ProQuest Dissertations and Theses Overview. Austin McLean and Marlene Coles CGS Summer Workshop, July 2017

Brown University Libraries Technology Plan

Deanship of Academic Development. Comprehensive eportfolio Strategy for KFU Dr. Kathryn Chang Barker Director, Department of Professional Development

ETD Submission via ProQuest Step-by-Step

Usability Testing Report of College of Liberal Arts & Sciences (CLAS) Website

Automating Publishing Workflows through Standardization. XML Publishing with SDL

Importance of cultural heritage:

Supervisors of Elections DOS Online Grants System User Manual

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

Data Exchange and Conversion Utilities and Tools (DExT)

I. General regulations

Identify how the use of different browsers and devices affects the look of a webpage. Competencies

Preservation and Access of Digital Audiovisual Assets at the Guggenheim

INF - INFORMATION SCIENCES

MALAYSIA THESES ONLINE (MYTO): AN APPROACH FOR MANAGING UNIVERSITIES ELECTRONIC THESES AND DISSERTATIONS

XML. Objectives. Duration. Audience. Pre-Requisites

User Guide CDU HE Learnline Master Template

Glossary of Exchange Network Related Groups

Creating a System for the Online Delivery of Oral History Content

KEY PROGRAMME INFORMATION. Originating institution(s) Bournemouth University. Faculty responsible for the programme Faculty of Science and Technology

DIABLO VALLEY COLLEGE CATALOG

Response to the. ESMA Consultation Paper:

Transcription:

Final Report Phase 2 Virtual Regional Dissertation & Thesis Archive August 31, 2006 Submitted to: Texas Center Research Fellows Grant Program 2005-2006 Submitted by: Fen Lu, MLS, MS Automated Services, Killam Library Jeanette Hatcher, MLS, MAPAFF Special Collections & Archives, Killam Library

Page 1 Table of Contents I. Background Information... 2 II. Research Statement of Purpose... 2 III. Necessity of Research... 2 IV. Research Methodology... 3 Encoded Archival Description (EAD)... 4 Collection Development Parameters... 4 Implementation Methodology... 5 V. Findings... 6 Digitization... 6 Encoding... 6 Transformation and Delivery... 8 VI. Conclusion... 9 Recommendations... 9 Future Phases... 10 Currently our project can be viewed at: http://library.tamiu.edu/depts/sparc/theses/index.htm Table of Figures Figure 1 -- Custom mapping code in Java & support text-to- EAD DTD mapping... 7 Figure 2 -- Snapshot of XSL file... 8 Figure 3 -- HTML display using XSL file... 9

Page 2 I. Background Information Grant Program: Texas Center Research Fellows Grant Program 2005-2006 Project Title: Phase 2 -- Virtual Regional Dissertation & Thesis Archive Research Team: Fen Lu, MLS, MS Automated Services, Killam Library Jeanette Hatcher, MLS, MAPAFF Special Collections & Archives, Killam Library II. Research Statement of Purpose The research team has developed an innovative historical preservation resource by creating a Regional Dissertation & Thesis Archive. As part of this endeavor we sought to provide electronic access, via the Internet, to vital information for each volume that is placed in the archive. This information includes the Title/Approval page, the table of contents, and the abstract for each dissertation & thesis. This initiative was documented in order to provide useful information to other institutions interested in undertaking a similar project. III. Necessity of Research Currently the Killam Library has an exceptional resource of information for the border region in that of the TAMIU/LSU Thesis Collection. Housed in the Library s Special Collections & Archives, this collection consists of 600+ volumes, the majority of which provide snapshots of demographic studies in our region. The majority of these theses deal with various topics of research whose scope of study is limited to the border region. These topics range from myriad issues in K-12 education in South Texas, to the potential impact of an International Trade Data System on the southbound commercial border. Faculty & student researchers, as well as private citizens and government entities, use these resources. The Special Collections & Archives, in the course of proctoring, arranging and providing access to these materials recognized the need for the following: 1. The need to provide improved physical and intellectual access to these materials. 2. The need to produce an expanded reference resource of this kind, which incorporates works about our region written by researchers at other institutions.

Page 3 In accordance with these observations, the Special Collections & Archives and Automated Services departments of Killam Library project was a collaborative, multi phase, two-tiered project. This project sought to develop a new research resource and provide improved physical and intellectual access to these materials by providing Internet access to a Finding Aid created in an Encoded Archival Description (EAD) format. IV. Research Methodology The focus of this report is only the second phase of this project; however, it is important to give an overview of the entire endeavor. Future directions for this project such as the possibility of moving from open source software to licensed software largely depends upon the outcome of each successive phase. This project included the following proposed method of research: Phase 1 consisted of: Tier 1 The creation of the TAMIU/LSU Thesis Collection - Receipt of TAMIU/LSU Theses - Provision of separate shelving for thesis collection - The initial purchase and receipt of dissertations & theses whose subject matter deals with regional issues and/or examines regional groups authored by students seeking to fulfill academic requirements at institutions other than TAMIU. Tier 2 The creation of a Finding Aid for the TAMIU/LSU Thesis Collection - The creation of a 2nd generation Finding Aid (EXCEL format) for the Thesis Collection, which includes title, author, submission date, call number and statement of purpose information.

Page 4 Phase 2 consisted of: Tier 1 The creation of a Regional Dissertation & Thesis Archive - Review and selection of TAMIU/LSU theses appropriate for inclusion in the new virtual collection - Purchase of additional materials for the archive Tier 2 The creation of an Encoded Archival Description Finding Aid for this archive - Creation of a template for an Encoded Archival Description (EAD) Finding Aid - Scanning of the Title/Approval page, the table of contents, and the abstract of selected TAMIU/LSU theses - Formatting scanned dissertation/thesis information to conform to the EAD Finding Aid - Provision of access to the EAD Finding Aid via the internet Encoded Archival Description (EAD) It is important to note that by choosing to create an Encoded Archival Description (EAD) format finding aid, we are in line with widely accepted standards and best practices of the archival community. By using EAD, TAMIU Special Collections will be able, in future, to contribute content to larger archival initiatives such as those at the University of North Texas or The University of Texas at Austin and thus gain a wider audience for our collections. Collection Development Parameters Special Collections supports a broad range of regional research. Materials in Special Collections are collected because they hold long-term, historical research potential; possess unique physical characteristics, such as binding, printing or are heavily illustrated; are seminal, original works in a relevant area of study; or are inherently rare or scarce. Materials acquired can vary considerably according to the intrinsic qualities of the material itself, or the research needs of a particular program supported by the Special

Page 5 Collections & Archives. The purpose of Special Collections is to preserve, conserve and provide access to these special materials. Defined Project parameters: Subject Matter Works dealing with border populations and/or issues including (but not limited to) art, culture, architecture, business, government, science, education and history Focus (but not limited to) -- The geographic region of South Texas, particularly the area of both sides of the Texas-Mexico border with a special emphasis on Laredo, Texas and Nuevo Laredo, Tamaulipas and the environs Language parameters English and Spanish are the primary languages for this collection Format -- Dissertations, Theses, and Occasional Papers that have been accepted in partial fulfillment of requirements for completing a graduate course of study Implementation Methodology The project was implemented in three steps: digitization, encoding and delivery. It began with the digitization of 600 items selected from current TAMIU/LSU theses archives. Scanning technology was used to convert materials to a digital format. In order to observe copyright guidelines, only the cover page, abstract and table of content of these items were scanned. Parts of the current unpublished Excel format finding aid for the TAMI/LSU Thesis Collection were used for our project. To facilitate the implementation of EAD, the finding aid structure and content was standardized. The finding aid was encoded using Extensible Markable Language (XML) conforming to EAD definition version 2.0. During phase 2 of the project, open source software was used to perform the data conversion. To date, most Internet browsers are capable of reading XML-encoded documents but not all of them are. An XSLT file was developed to transform XML to HTML. Project staff mapped out the contents and navigation needs of the digital archive s Web page.

Page 6 V. Findings Digitization The project began with digitization of 600 items selected from current TAMIU/LSU theses archives. Because we do not have any agreements with authors regarding the distribution of their theses and dissertations, we decided to scan only cover pages, table of contents and abstracts in order to avoid any potential copyright issues. In addition, all theses and dissertation were written in different formats. Even with the best scanning equipment, it would be very difficult to scan the table of contents and cover pages as text files and make them searchable. Therefore, we made the decision that the table of contents and cover pages were to be scanned as images and saved in PDF files. The abstracts were scanned as text and saved in separate files. Student workers conducted the digitization process with two scanners. For three months approximately 29 hours each week was devoted to the scanning portion of this project. Student workers did the digitization following the procedure documentation, which defined the workflow, naming convention and all other relevant procedural details. Encoding Encoding is the most challenging part of implementation. The original inventory we created for our theses and dissertation collection is just a simple excel file in which each item in this collection is described with less than 20 features. Basically, it is just a table. However, EAD is a very complicated schema. It not only can present extensive and interrelated descriptive information found in finding aids, it also can preserve the hierarchical relationships that exist between different levels of description. The difficulty we faced is not just a simple changing of data from one format to another. The research team needed to change data from a format that contains no hierarchy to a format that has the potential to contain a very complicated hierarchy. Moreover, EAD also includes the descriptive information for finding aids. All kinds of pertinent information can be included. For example, the language being used to describe the collection or the type of character encoding being used in your finding aid can be included when using EAD.

Page 7 Fortunately, the descriptive information within our finding aid is uniform. We can treat it as constant data and add it to our EAD XML file. The following is the detailed description of implementation. First, in MS Excel, we exported our excel format inventory to text format. In this text file, each line represents a single row in the Excel file, i.e. a thesis/dissertation in our collection. Secondly, a Python program was developed to divide this text file into several files. Lastly, we downloaded Altova Mapforce 2006 trial version to generate a custom mapping code in Java and support the text-to- EAD DTD mapping (Figure 1). The Java codes generated by this application were used to convert all text files to XML files that conform to the EAD standard. We used the free version of Borland JBuilder Foundation to run the Java code. Figure 1 Custom mapping code in Java & support text-to- EAD DTD mapping

Page 8 Transformation and Delivery EAD markup designates only the structure and content of the document, not how it will appear on in print or on a computer screen. A stylesheet must be employed to format the file for display. There are several standardized stylesheet languages, such as CSS (Cascading Style Sheets), XSL (Extensible Style Language), DSSSL (Document Style Semantics and Specification Language), and FOSI (Format Output Specification Instance). We chose XSL to perform this conversion. XSL converts document trees. In our case, it transforms our XML file to an HTML file that can be displayed in a web browser. The properties of XSL, also provides the ability to repeat the same data in two different parts of the display. The free version of Altova Spy Home Edition is the tool we used to write XSL codes. This version of software can be downloaded from the Altova website. Figure 2 is a snapshot of the XSL file written for this project. Figure 3 is the html display using this XSL file. In addition, to facilitate browsing by our users, we arranged our collection by author names, subjects and titles. Currently, our web interface is available at: http://library.tamiu.edu/depts/sparc/theses/index.htm Figure 2 Snapshot of XSL file

Page 9 Figure 3 HTML display using XSL file VI. Conclusion Recommendations If other groups are contemplating starting a project similar to the one that this research team has undertaken, we have a few advisory comments to share with them. First, carefully consider the labor- intensive nature of this type of project and the skills that your workforce needs to possess in order to produce a viable product. Make sure that a knowledgeable and hardworking group is available for your project. Secondly, if you do not have the appropriate equipment at the outset of your project, remember to include extra time in your work timeline to acquire said equipment. Third, consider that once you create this database it can grow rather quickly. While free software can be appropriate for small or temporary databases and simple web interfaces, it may not be able to handle larger datasets or complicated search functions. Lastly, this was actually a multi-phase, multi-tiered project. Digitization projects of this kind take many months or years to be fully realized and require integration into the department s workflow.

Page 10 Future Phases There are several aspects that the research team is interested in exploring during future phases of this project. We are confident that with more research and more time, the process for the inputting of information into this database could be streamlined further; thus, allowing for a more efficient use of workers time. The changes that we are going to explore will call for a shift in terms of training and our workers necessary skill set. Streamlining input procedures will also allow for more time to engage in quality control activities. Although at first it may seem superficial, the research team has spent some time discussing the visual representation of this project. As librarians, the research team understands the importance of having a visually engaging web interface. Naturally, utility of the tool is paramount in importance; however, it is extremely important to create an interface that is inviting and clearly mapped for the ease of users. In terms of aesthetics, the research team plans to continue evolving the look of the database s web interface. There are also plans to eventually provide users with a printer friendly page option. There are also several possibilities in terms of increasing the utility of this tool. Namely by learning more about EAD descriptors and applying that knowledge to later versions of this database, the research team could greatly increase the usefulness of this product. Modifications of our current search interface and better use of EAD could greatly increase the search capabilities of this tool in later phases of this project. It seems that much of the free software that was used during the earlier phases of this project will be useful for only a limited time. As the project grows, the research team will be under increasing pressure to purchase software and materials to continue development. Any improvements made in later phases will require increasing levels of technical expertise and be labor intensive. It appears that it will be necessary to seek future funding opportunities in order to fully develop later phases of this project.