Drexel University From the SelectedWorks of James Gross June 4, 2012 The DOI Identifier James Gross, Drexel University Available at: https://works.bepress.com/jamesgross/26/
The DOI Identifier James Gross Drexel University, INFO 756 Professor Matienzo Extra credit report 6/4/2012
2 The issue of expired URL links is a problem as it can impede access to web based research data. Researchers, authors, and publishers often have a common interest in ensuring that users can readily locate published information. But, access to this online content can be lost if the URL becomes obsolete. In this short paper, we will briefly examine the introduction of the DOI identifier for registering research data sets and discuss what this usage means for future access of web-based research data. Issue: The issue of URL s and unique identifiers was brought up in our class, INFO756, in Part 2 of the Personal Digital Information project. An exercise of this class assignment included the creation of a DOI, to connect an online data file to a hyperlink. Students utilized the UC3EZID demonstration service to create a temporary DOI (California, 2012a, n.p.). Background: According to Jan Brase, DataCite s Manager, DOI was developed to help counter three problems facing unpublished scientific literature: (1) large quantities of data being stored privately and lost, (2) lack of access to this data causing unnecessary duplication of research efforts, and (3) quantities of research funds being unduly spent every year to try and re-create existing data (Brase, 2010, p. 3). Brase noted that the problems with locating and accessing scientific research were being primarily caused by three factors: (1) poor preservation properties such as broken links if an author left one s academic institution, (2) poor documentation quality, and (3) limited academic recognition as datasets cannot be easily searched or located (Ibid, p.3).
3 Solution: Per Brase, the development of the Digital Object Identifier was a way to address proper dataset identification (2011, p.3). Brase noted that dataset identification was, a key element for allowing citation and long term integration of datasets into text as well as supporting a variety of data management activities (Brase, 2011, p.3). So, what exactly is a DOI and why is it helpful for academic publishing? CrossRef s definition, A DOI is a permanent link to published full text like journal articles conference papers, or other content (CrossRef, n.d., n.p.). ISO documentation on DOI was published as: Information and documentation, Digital Object Identifier System (ISO, 2012, n.p.) Per the ISO website, a DOI is an efficient means of identifying an entity over the internet and used primarily for sharing with an interested user community or managing as intellectual property (ISO, 2012, n.p.). Per the ISO website, a DOI name is an identifier of an entity, physical, digital or abstract, on digital networks. It provides information about that object, including where the object, or information about it, can be found on the internet (Ibid, n.p.) The ISO website documentation explained that this standard, gives the syntax, description and resolution functional components of the digital object identifier system. It also gives the general principles for the creation, registration and administration of DOI
4 names (Ibid, n.p.). What is DataCite? It is a DOI proxy service. The DataCite website lists the following three goals: (1) establish easier access to research data, (2) increase acceptance of research data as legitimate contributions in the scholarly record, and to (3) support data archiving to permit results to be verified and re-purposed for future study (DataCite, 2012,n.p.). Max Wilkinson, affiliated with the British Library s DataCite initiative, listed the following four goals for DataCite: (1). aim to improve the scholarly infrastructure around datasets, (2) provide standards, workflows and best-practice, and (3) support persistent identification of data using the DOI system (Wilkinson, 2010, p.5). A brief explanation from the DataCite guide: An assigned DOI name serves as a portal hyperlink to the dataset and allows downloading of the data (DataCite, n.d., p. 3). What is CrossRef? CrossRef is an independent membership association, the Publishers International Linking Association (CrossRef, 2002a, n.p.).per their website, CrossRef is the official DOI link registration agency for scholarly and professional publications. CrossRef s mandate is to connect users to primary research content, by enabling publishers to work collectively (CrossRef, 2002a, n.p.). The CrossRef website documentation notes: A DOI is a unique alphanumeric string assigned to a digital object, such as an electronic journal, article, report, or thesis. Each DOI name is unique and serves as a persistent link to the full-text of an electronic item
5 on the internet (CrossRef, 2002b, n.p.). As with DataCite, CrossRef uses DOIs to permanently identify and track scholarly items on the web (CrossRef, 2002c, n.p.). Publishers of electronic scholarly content join CrossRef as members and are assigned a DOI prefix (CrossRef, 2002c, n.p.). A prefix is assigned to each item (CrossRef, 2002c, n.p.). Why use a DOI? The advantage of using a DOI as opposed to a URL is that the DOI is a persistent identifier. Even if the virtual location of the file changes, the publisher or author can simply update the DOI as opposed to having to create another URL. EZID is an example of a service which can create a DOI. The website UC3EZID, is run by the University of California s Digital Library, a service of the University of California Curation Center (California, 2012b, n.p.). Per the EZID website, the advantage of using the DOI identifier is that it can: (1) create identifiers for anything including texts, data, or files, (2) store citation metadata for identifiers in a variety of formats, and (3) update current URL locations to prevent citation links from becoming broken (California, 2012b, n.p.). Based on a PowerPoint presentation by DataCite s manager, Jan Brase, there are currently 9 DOI registration agencies, and approximately 60 million DOI names have been assigned globally, of which over 90% are for scholarly articles (Brase, 2012, n.p.). Of this number, CrossRef has registered approximately 53 million DOI s and DataCite
6 has registered 1.3 million (Ibid, n.p.). An identified need: As Brase noted back in 2007, Many publications are based on scientific data that can not be accessed, therefore re-evaluation or re-analysis of data is almost impossible (2007, p.1). Brase also noted that, Providing datasets with identifiers is an absolutely essential pre-requisite for citing, locating, retrieving, using, and even receiving credit for creating them (Brase, 2011, n.p.). Wilkinson observed that the DOI serves three purposes: (1) it supports researchers by enabling them to locate, identify, and cite research datasets, (2) it supports data centers as it provides persistent identifiers for datasets and workflows, (3) it supports publishers by enabling research articles to be linked to the underlying data (Wilkinson, 2010, p.9). In summary, we have briefly discussed the CrossRef and DataCite services. We have learned how the DOI identifier acts as a permanent URL, enabling users to access published data sets, even if the URL has changed. As I see it, the introduction of the persistent DOI identifier is a very positive development for the access and exchange of scholarly information. Wilkinson noted that this service would, increase acceptance of research data as legitimate, citable contributions to scholarly communication (Wilkinson, 2010, n.p.).
7 This citation initiative will undoubtedly enable researchers to better locate, identify, and cite research datasets. As Brase observed, the gathering and exchanging of research data is a global imperative (2007, p.1). I agree. The development and implementation of this schema appears to be a very positive development for researchers and authors alike.
8 References Brase, J. (2010). DataCite- A global registration agency for research data, German Data Forum, RatSWD Working Paper Series, No. 149. Retrieved 5/30/2012 from: http://www.ratswd.de/download/ratswd_wp_2010/ratswd_wp_149.pdf Brase, J. (2011). Access to research data, D-Lib Magazine, 17, (1/2). Retrieved 5/30/2012 from: http://www.dlib.org/dlib/january11/brase/01brase.print.html Brase, J. (2012). Digital object identifiers. [Lecture], Presentation at the Workshop: Metadata and persistent identifiers for social science data, Berlin, Germany, May 7, 2012. Retrieved 5/30/2012 from: http://www.ratswd.de/ver/docs_pid_2012/brase_pid2012.pdf California (2012a). EZID: Create Simple Demo ID, University of California Curation Center, University of California, [website]. Retrieved 5/30/2012 from: http://n2t.net/ezid/demo/simple California (2012b). EZID, University of California Curation Center, University of California, [website]. Retrieved 5/30/2012 from: http://n2t.net/ezid
9 California (2012c). Services and projects, University of California Curation Center, University of California, [website]. Retrieved 5/30/2012 from: http://www.cdlib.org/services/uc3/ CrossRef (n.d.). DOIs, the library and the researcher, [Handout]. Retrieved 5/30/2012 from: http://www.crossref.org/08downloads/handouts/library_researcher.pdf CrossRef (2002a). CrossRef Homepage. CrossRef, a Division of Publishers International Linking Association. Retrieved 5/30/2012 from: http://www.crossref.org/ CrossRef (2002b). About DOIs/What is a DOI. CrossRef, a Division of Publishers International Linking Association. Retrieved 5/30/2012 from http://www.crossref.org/help/content/01_about_dois/what_is_a_doi.htm CrossRef (2002c). CrossRef Help. CrossRef, a Division of Publishers International Linking Association. Retrieved 5/30/2012 from: http://www.crossref.org/help/crossref_help.htm DataCite (n.d.). Best practise guide for DataCitation, DataCite, International data citation initiative, Part 1, Examples. Retrieved 5/30/2012 from:
10 http://forskningsdata.deff.wikispaces.net/file/view/bestpractiseguide.pdf DataCite (2012). Frequently asked questions, Retrieved 5/30/2012 from: http://datacite.org/faqs [ISO] International Organization for Standardization (2012). Digital object identifier (DOI) becomes an ISO standard. Press release May 10, 2012, International Organization for Standardization. Retrieved 5/30/2012 from: http://www.iso.org/iso/pressrelease?refid=ref1561 [SIIA] Software & Information Industry Association (2001). The digital object identifier: The keystone for digital rights management, Software & Information Industry Association. Retrieved 5/30/2012 from: http://www.siia.net/estore/download/doi-01.pdf Wilkinson, M. (2010). DataCite: The International Data Citation Initiative, Datasets Programme, German Data Forum, RatSWD Working Paper Series, No. 163. Retrieved 5/30/2012 from: http://www.ratswd.de/download/ratswd_wp_2010/ratswd_wp_163.pdf
Drexel honesty statement: 11