The Curator s Approach to Data Management and Sustainability Nic Weber & Megan Senseney Center for Informatics Research in Science & Scholarship Graduate School of Library & Information Science University of Illinois at Urbana-Champaign Digital Humanities at Oxford Summer School 14-18 July 2014
Agenda Data management...as a DH technique valued ends available resources DMP Agency Mandates DMP beyond two pages Sustainability Significant properties 2 Case studies in DH sustainability
I m trying to deflate the idea of digital humanities from a domain to an underlying set of practices 6 July DH 2014
DM as a DH Technique Many different Techniques
Data Management as a DH Technique the ensemble of practices by which one uses available resources in order to achieve certain valued ends. Harold Lasswell
Valued Ends Preservation of Knowledge (material artifacts that are produced, as well as ways of knowing) Maximize the value of public investment Increase the efficiency of doing digital humanities research both immediate and long-term.
The Royal Society Science Policy Centre. (2012). Science as an open enterprise. Page 60. 7
Data management Is highly personal Interpersonal when collaborating Intrapersonal in our relationship with institutions, organizations and funding agencies
! =
Data management techniques include concerns of Planning ( more in a bit ) / Costing Documentation Formatting Storage Copyright / IP / Licensing
Documentation
Documentation : tricks and tips Include a header line that describes the variables as the first line in the table. Use plain ASCII text for your file names, variable names, and data values. Record naming schemes (<- develop naming schemes) When you export from an analysis environment (e.g. SPSS, R, Gephi, etc.) record transformations in a separate: readme_(filename).txt file
Storage & Formatting!
Storage : DIY Cyberinfrastructure
Formatting & Storage: Tricks and Tips Store data in nonproprietary software formats (e.g., comma delimited text file,.csv); proprietary software (e.g., Excel, Access) can become unavailable, whereas text files can always be read. When in an analysis stage - store an uncorrected (raw) data file. Do not make any corrections to this file; make corrections within a scripted language. Modified from: https://www.nceas.ucsb.edu/content/simple-guidelines-effective-data-management
Copyright / IP slide
IP: Tricks of Trade Melissa Levine s Checklist on the DH Curation Guide: http://guide.dhcuration.org/legal/policy/#p05
Data Management Planning Is highly social Dialectic (optimal vs. practical) Plans change
DMP Mandates (Funding Agencies) Peer Reviewed Components Enforcement AHRC Yes Summary of Digital Outputs and Digital Technologies; Technical Methodology; Standards and Formats; Hardware and Software; Data Acquisition, Processing, Analysis and Use; Technical Support and Relevant Experience; Preservation, Sustainability and Use; Preserving Your Data; Ensuring Continued Access and Use of Your Digital Outputs NEH YES Expected types of data Period of data retention Data forms and dissemination Data storage and preservation EU No Data set reference and name Data set description Standards and metadata Data sharing Archiving and preservation Unclear YES Sliding
AHRC Example Project: Kitchen Cosmology Project University of Bristol. PI: Dr. Rita Langer. Link: http://bit.ly/1n0evun NEH Example Project: A unified approach to preserving cultural software objects and their development histories : UC Santa Cruz. PI Noah Wardrip-Fruin Link: http://1.usa.gov/1knxm8n
completed worksheets
Costing Tricks and Tips 4C: Overview of 10 curation cost models: http://bit.ly/1ldmuft provides a short description of each of the models and a presentation of their core features
More tricks of the trade slide Advertise your data Say how you would like it to be cited (paper? data? both?) State known limitations (fit-for-purpose) Rely on journals, repositories and colleagues for guidance Don t rely on journals, repositories or colleagues for guidance
How do projects end? SUSTAINABILITY
Why this matters to DC Fundamental questions of digital preservation: 1. What must you retain to ensure the integrity and authenticity of the digital object? 2. What can you lose without potential implications?
Significant Properties characteristics of an information object that must be maintained to ensure that object s continued access, use, and meaning over time as it is moved to new technologies. (Wilson, 2007).
Five categories of SPs Content Context Rendering Structure Behavior
Criteria for deciding significance Grace, S. & Knight, G. (2008)
Case study 1 : Sustainability GLOBALIZATION AND AUTONOMY ONLINE COMPENDIUM Rockwell, Day, Yu, and Engel (2014) Burying Dead Projects: Depositing the Globalization Compendium. Digital Humanities Quarterly; 8 (002). http://www.digitalhumanities.org/dhq/vol/8/2/000179/000179.html
Then we came to (planning for) the end http://globalautonomy.ca/global1/index.jsp End of what? - XML files with content; - A MySQL bibliographic database; - A metadata database of the content for generating topical pages and for searching; - A full text index for searching the text; - The code that handles the dynamic generation of the site, the searching, linking, and the XSL transforms; - Some HTML pages and CSS stylesheets; - And various images that are embedded in pages.
The experience of the Compendium is that the intellectual work is not only in the individual articles, or even in the bibliographic data it is in the interaction between these, mediated by code and in the user experience. Rockwell et al. 2014
What was deposited? Content: the texts, including bibliography, and glossary. We also considered the text on the HTML pages content. Code: HTML, CSS, and includes the XSLT code that generated much of the interface Process: materials (but not all) that document the editorial processes, including the editorial backend that strictly speaking was not part of the Compendium as experienced. The User Experience: information about the experience of the Compendium as an interactive work by writing a narrative along with screen shots of typical use of the Compendium stored as PDFs
Five categories of SPs Content Context Rendering Structure Behavior Rockwell s Categories Content Code Process User Experience
Case study 2 : Sustainability PERSEUS DIGITAL LIBRARY
How would Perseus End? (hint not by beheading Medusa)
RESOURCE LIST Rockwell, Day, Yu, and Engel (2014) Burying Dead Projects: Depositing the Globalization Compendium. Digital Humanities Quarterly; 8 (002). http://www.digitalhumanities.org/dhq/vol/8/2/000179/000179.html Grace, S. & Knight, G. (2008) What are significant properties and why should I care? Presentation delivered at Digital Curation 101, October, 7 2008. Edinburgh, Scotland