oatd.org Discovery for Open Access Theses and Dissertations An ASERL Webinar, October 15, 2013 These slides:

Similar documents

OAI-PMH. DRTC Indian Statistical Institute Bangalore

RVOT: A Tool For Making Collections OAI-PMH Compliant

Problem: Solution: No Library contains all the documents in the world. Networking the Libraries

Using metadata for interoperability. CS 431 February 28, 2007 Carl Lagoze Cornell University

Edinburgh Research Explorer

Harvesting Metadata Using OAI-PMH

Metadata Harvesting Framework

Integrating Access to Digital Content

The Open Archives Initiative and the Sheet Music Consortium

Creating a National Federation of Archives using OAI-PMH

Indonesian Citation Based Harvester System

Adding OAI ORE Support to Repository Platforms

Navigating the Universe of ETDs: Streamlining for an Efficient and Sustainable Workflow at the University of North Florida Library

Building Interoperable and Accessible ETD Collections: A Practical Guide to Creating Open Archives

Hello, I m Melanie Feltner-Reichert, director of Digital Library Initiatives at the University of Tennessee. My colleague. Linda Phillips, is going

Closing the gap? Curating bepress metadata for SHARE

Increasing access to OA material through metadata aggregation

Taking D2D Services to the Users with OpenURL, RSS, and OAI-PMH. Chuck Koscher Technology Director, CrossRef

Flexible Design for Simple Digital Library Tools and Services

SNHU Academic Archive Policies

JMU ETD SUBMISSION INSTRUCTIONS

Using the WorldCat Digital Collection Gateway with CONTENTdm

A service-oriented national e-thesis information system and repository

SobekCM. Compiled for presentation to the Digital Library Working Group School of Oriental and African Studies

Using the WorldCat Digital Collection Gateway

Electronic Thesis and Dissertation Tutorial: Submitting an ETD to SFA ScholarWorks

Ponds, Lakes, Ocean: Pooling Digitized Resources and DPLA. Emily Jaycox, Missouri Historical Society SLRLN Tech Expo 2018

BPMN Processes for machine-actionable DMPs

Metadata Catalogue Issues. Daan Broeder Max-Planck Institute for Psycholinguistics

Institutional Repository using DSpace. Yatrik Patel Scientist D (CS)

CodeSharing: a simple API for disseminating our TEI encoding. Martin Holmes

Building Interoperable Digital Libraries: A Practical Guide to creating Open Archives

How to contribute information to AGRIS

arxiv, the OAI, and peer review

A Repository of Metadata Crosswalks. Jean Godby, Devon Smith, Eric Childress, Jeffrey A. Young OCLC Online Computer Library Center Office of Research

Building a Digital Repository on a Shoestring Budget

Vireo Submission Guide

CORE: Improving access and enabling re-use of open access content using aggregations

Pipe Dreams: Harvesting Local Collections into Primo Using OAI-PMH

IVOA Registry Interfaces Version 0.1

CONTENTdm 4.3. Russ Hunt Product Specialist Barcelona October 30th 2007

Open Archives Initiative protocol development and implementation at arxiv

Dryad Curation Manual, Summer 2009

Repository Metadata: Challenges of Interoperability

Working with Islandora

E-Marefa User Guide. "Arab Theses and Dissertations"

ProQuest Dissertations and Theses Overview. Austin McLean and Marlene Coles CGS Summer Workshop, July 2017

Optional Thesis Deposit

Harvesting Statistical Metadata from an Online Repository for Data Analysis and Visualization

OAI AND AMF FOR ACADEMIC SELF-DOCUMENTATION

A service-oriented national e-theses information system and repository

Non-text theses as an integrated part of the University Repository

Comparing Open Source Digital Library Software

CREATION OF A DIGITAL REPOSITORY FOR THE MASTER THESIS OF THE FACULTY OF JOURNALISM, LIBRARY AND INFORMATION SCIENCE OF THE OSLO UNIVERSITY COLLEGE

Evaluation of Islandora & SobekCM

VI-SEEM Data Repository. Presented by: Panayiotis Charalambous

The OAI2LOD Server: Exposing OAI-PMH Metadata as Linked Data

Using MARC Records to Populate CONTENTdm

Using Caltech s Institutional Repository to Track OA Publishing in Chemistry

21-22 September 2011 LAND-GRANT AGRICULTURAL KNOWLEDGE DISCOVERY SYSTEM PLANNING WORKSHOP

Developing data catalogue extensions for metadata harvesting in GIS

Interoperability and Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)

From Cataloging to User Search the path to Primo. Presented by Lynn Wolf ODIN Technical Support ODIN Work Day 2016 April 19, 2016

OAI-PMH implementation and tools guidelines

Furl Furled Furling. Social on-line book marking for the masses. Jim Wenzloff Blog:

Making ETDs count in UK repositories. Paul Needham, Cranfield University ETD2014, 24 th July 2014

Storage Made Simple: Preserving Digital Objects with bepress Archive and Amazon S3

University of Huddersfield Repository

Student Guide to Submitting a Thesis or Dissertation at Texas A&M

Slide 1 & 2 Technical issues Slide 3 Technical expertise (continued...)

Harvesting of Additional Metadata Schema into DSpace through OAI-PMH: Issues and Challenges

OAI-Publishers in Repository Infrastructures

Long-term digital preservation of UNSWorks

UC San Diego Electronic Theses and Dissertations (ETDs) Procedure and Workflow

DigiTool for Course Support at Notre Dame. Pascal Calarco, University of Notre Dame IGeLU 2007 Brno, Czech Republic September 3, 2007

Developing an Institutional Repository Service in Chinese Academy of Sciences

Showing it all a new interface for finding all Norwegian research output

Implementing EDS. A ten step summary, as experienced at Hofstra University Library

Joining the BRICKS Network - A Piece of Cake

Metadata and Encoding Standards for Digital Initiatives: An Introduction

HOW TO Load etheses. Background:

is an electronic document that is both user friendly and library friendly

Network Information System. NESCent Dryad Subcontract (Year 1) Metacat OAI-PMH Project Plan 25 February Mark Servilla

Exploring Open Source Solutions in the Management of ETD Processes CHETAN S SONAWANE KMC COLLEGE, INDIA

SobekCM Digital Repository : A Retrospective

Citation Services for Institutional Repositories: Citebase Search. Tim Brody Intelligence, Agents, Multimedia Group University of Southampton

CONTENTdm & The Digital Collection Gateway New Looks for Discovery and Delivery

A Novel Architecture of Agent based Crawling for OAI Resources

Citation Services for Institutional Repositories: Citebase Search. Tim Brody Intelligence, Agents, Multimedia Group University of Southampton

"Efficient" Thesis & Dissertation Workflows With Limited Resources

Compound or complex object: a set of files with a hierarchical relationship, associated with a single descriptive metadata record.

Version 2 of the OAI-PMH & some other stuff

Infrastructure for the UK

The Observation of Bahasa Indonesia Official Computer Terms Implementation in Scientific Publication

Improving the visibility of institutional repository, digital theses and research data: the case of the Peruvian University for Applied Sciences

Electronic Submission to UMI using FTP

Phase 1 RDRDS Metadata

Session Questions and Responses

Digital Library Interoperability. Europeana

Transcription:

oatd.org Discovery for Open Access Theses and Dissertations An ASERL Webinar, October 15, 2013 These slides: http://goo.gl/muxq15 Thomas Dowling dowlintp@wfu.edu

I Can Haz ASERL ETDs? 34 of 37 ASERL universities provide open access ETDs 25 of 37 members provide OA ETDs through a harvestable repository. [Or, 9 members provide OA ETDs but do not make them harvestable.] As of September 30, 2013, OATD indexes 99,857 records from ASERL members.

What Is OATD? A discovery service for Open Access Graduate Level Theses and Dissertations Harvested from Repositories Worldwide

What Is OATD? 1.89 million records 850+ universities 360+ repositories 75,000 records with "semi-full text turbo boost" Search hits from first 40 pages Sample images from PDF Not a full-text index One Amazon small server + 200GB

What Is OATD? Steering Committee: Martin Courtois (Kansas State), John Hagen (WVU), Molly Keener (WFU), Caitlin Nelson (Florida Virtual Lib), Ryan Steans (Texas Digital Lib), Zoe Stewart-Marshall (past president, LITA) Generous support from the Z. Smith Reynolds Library, Wake Forest University.

What Needs Does OATD Meet? Current search tools for ETDs: Point to closed-access copies when OA is available Lump OA ETDs in with overwhelming numbers of other documents Rely on the kindness of vendors Have under-developed, uninformative user interfaces Have no enhancement request process

What Needs Does OATD Meet? Current search tools for ETDs: Point to closed-access copies when OA is available Lump OA ETDs in with overwhelming numbers of other documents Rely on the kindness of vendors Have under-developed, uninformative user interfaces Have no enhancement request process

What Needs Does OATD Meet? Current search tools for ETDs: Point to closed-access copies when OA is available Lump OA ETDs in with overwhelming numbers of other documents Rely on the kindness of vendors Have under-developed, uninformative user interfaces Have no enhancement request process

What Needs Does OATD Meet? Current search tools for ETDs: Point to closed-access copies when OA is available Lump OA ETDs in with overwhelming numbers of other documents Rely on the kindness of vendors Have under-developed, uninformative user interfaces Have no enhancement request process

What Needs Does OATD Meet? Meanwhile, in every other library search interface... Massive investment of time, energy, and money Google-driven user expectations Simpler search Concentration on what users can do with results

OATD Components Getting Metadata (OAI-PMH harvesting) Cleaning Up Metadata (XML conversion) Indexing Metadata (Solr) Web User Interface Web Crawler for PDFs Web Crawlers for [a few] non-oai repositories

OAI-PMH OAI-harvested repositories in OATD, by platform Built into most major repository platforms (DSpace, DigitalCommons, ContentDM, EPrints...)

OAI-PMH But... It may not be enabled It may not be configured well It may break without alerting you Wait Didn t Google quit using OAI-PMH? Why should we still care about it?

OAI-PMH Talks to our highly structured metadata: <title>emulating Data Synthesis for Virtual Simulations</title> <dc:creator>aaronson, A. Arthur</dc:creator> <dc:contributor role="committee Chair"> Berenson, Barbara B. </dc:contributor>

OAI-PMH Talks to our highly structured metadata: <mods:dateaccessioned> 2013-09-16T09:15:00Z </mods:dateaccessioned> <mods:dateavailable> 2014-09-16T09:15:00Z </mods:dateavailable>

OAI-PMH Six "Verbs" (and various adverbs) [Odds are, you don't need to know any of this...] Identify: Tell me about yourself ListMetadataFormats: Tell me what metadata "flavors" you offer (DC, Qualified DC; ETD-MS, UKETD, xmetadiss; METS, MODS) ListSets: Tell me how you subdivide your repository

OAI-PMH Six "Verbs" (and various adverbs) ListIdentifiers: List record identifiers [in this set] [from a date] [until a date] [available in this metadata format] GetRecord: Give me one record [with this identifier] [in this metadata format] ListRecords: Give me all records [in this set] [from a date] [until a date] [in this metadata format]

OAI-PMH Six "Verbs" (and various adverbs) So for example: http://archive.foo.edu/oai? verb=listrecords & set=etds & from=2013-10-01 & metadataprefix=oai_etdms

Cleanup and Conversion...<metadata> <title>a New Theory on the Brontosaurus</title> <creator>elke, Anne</creator> <degree> <name>doctor of Philosophy</name> <level>doctoral</level> <discipline>dinosaur Studies</discipline> <grantor>foo Tech</grantor> </degree> </metadata>...

Cleanup and Conversion <title>a New Theory on the Brontosaurus</title> <creator>elke, Anne</creator> <degree>doctor of Philosophy</degree> <grantor>foo Tech</grantor> <field name="title">a New Theory on the Brontosaurus </field> <field name="author">elke, Anne</field> <field name="degree" >PhD</field> <field name="publisher">foo Institute of Technology and Science</field>

Cleanup and Conversion <subject>thesis (M.S.) - Archeology</subject> <contributor>foo Tech, School of Social Work</contributor> <date>spring 2010</date> <date>2010-04</date> <field name="degree">ms</field> <field name="level">masters</field> <field name="discipline"> Archeology</field> <field name="discipline"> Social Work</field> <field name="date"> 2010-04-01</field>

Solr Free, open source search engine Search engine used by Netflix, StubHub, Instagram, Internet Archive, Zappos, Smithsonian... Also VuFind and Blacklight library catalog interfaces. Reindexes records without creating dups.

Semi-Full Text Turbo Boost Covers ~75k mostly recent ETDs Very lightweight web crawler Pauses if it has hit a site within the last 2 minutes Should not re-pull the same PDF more than once a year Gets record IDs from OATD: does not run searches on your site Pulls OATD's URL and looks for first likely PDF link

Semi-Full Text Turbo Boost Grabs page images for "front matter" (first 7 pages) Indexes pages 8 to 40 Used for search highlighting only Not searchable in the UI Extracts first 11 "real" images Not front matter Not too big, not too small No PDF "soft mask" images

Semi-Full Text Turbo Boost

Semi-Full Text Turbo Boost

Semi-Full Text Turbo Boost

Web Crawling for Non-OAI Sites Last Resort Depends on findable, parseable browse pages; parseable record pages; persistent URLs Very labor-intensive Prone to breaking whenever you tweak your site Requires re-crawling the entire site every time If you really can t do OAI, but have good metadata, call me.

Frequently Unanswered Questions (And How Your Metadata Can Help) Is this an ETD? Entire repository is ETDs Set ABCD is all of our ETDs dc:type=thesis, dc:type=dissertation Is it open access? How about Creative Commons? dc:rights=unrestricted dc:rights=licensed under a Creative Commons CC-BY-SA license...

Frequently Unanswered Questions (And How Your Metadata Can Help) What school is this from? Use a good dc:publisher value What department or discipline is this from? Use ETD-MS or UKETD dc.contributor=[something very consistent]

Frequently Unanswered Questions (And How Your Metadata Can Help) Is this a doctoral dissertation or a masters thesis? What s the degree? Use ETD-MS or UKETD dc.subject=[something very consistent] What s the complete citation? Confirm that you export author, title, year, URL

Frequently Unanswered Questions (And How Your Metadata Can Help) What s the embargo situation? [Now] dc:rights=restricted [Later] dc:rights=unrestricted dc:date.accessioned=... dc:date.available=...

Not Really That Helpful <publisher>digitorium @ State!</publisher> <dc:rights>no access until 2010. Campus-only access until 2012.</dc:rights> <dc:title>embargo Test #2</dc:title>

The Mad Libs Guide To Next Steps OATD is a harvested index of ETD records in repositories from around the world. way cool name is a harvested index of discipline OR content type records in repositories from. geographical region OR library consortium OR repository type

Remember When Every Library Presentation Had an Obligatory Silo Slide? Three Silos Theresa L. Wysocki, from Flickr

oatd.org Discovery for Open Access Theses and Dissertations An ASERL Webinar, October 15, 2013 These slides: http://goo.gl/muxq15 Thomas Dowling dowlintp@wfu.edu