T I F F A N Y C. C H A O G r a d u a t e S c h o o l o f L i b r a r y a n d I n f o r m a t i o n S c i e n c e U n i v e r s i t y o f I l l i n o i s I A S S I S T 2 0 1 3
IASSIST 2013 2 ROADMAP Introduction Background: sharing practices in the sciences Methodological Design Preliminary results & Discussion Future Directions
IASSIST 2013 3 ACCESS TO RESEARCH DATA Increased expectations by US federal agencies and offices for open access to both the publications and data generated through research Growing number of journals that require authors to provide access to data as part of publication Ex: PLoS, Nature, American Economic Review Emerging Institution-level support and services for research data management throughout their lifecycle DMP Tool (https://dmp.cdlib.org/) Data Curation Profiles Toolkit (http://datacurationprofiles.org)
IASSIST 2013 4 RESEARCHER DATA SHARING PRACTICES AND PERCEPTIONS IN THE SCIENCES Disciplinary norms and research cultures influence sharing practice: What data are shared Digital data from sensors more likely to be shared than hand-processed data in habitat ecology research (Borgman et al., 2007); raw data are not typically shared or deposited within agricultural sciences (Diekmann, 2012). When are data shared Scientists opt for post-publication or prescribed embargo period to release data to the public; very rarely are data shared prior to publication (Cragin et al., 2010) How data are made available Greater instances of Informal (i.e. personal contact, website) sharing compared to formal (i.e. domain repository, archive) mechanisms in the Social Sciences (Pienta et al., 2010); the norm for genomics researchers is to submit to a data repository (i.e. GenBank) (Swan & Brown, 2008). Who has access to the data Some researchers make data available only to those with whom they have worked closely with (i.e. research group, collaborators); those in physical science were more committed to open data sharing than other disciplines (PARSE, 2009) Across disciplines, scientists are willing to share some of their data with third party researchers only with conditions in place (Tenopir et al., 2011)
IASSIST 2013 5 CONTINUUM FOR DATA ACCESS Data curation continuum model for access (Treloar et al.,2007) Closed Access Open Access Closed access Sensitive information Commercial/industry connections Intentional withholding (Campbell et al., 2002; Blumenthal et al., 2006) Open Access Open access to research data from public funding should be easy, timely, user-friendly and preferably Internetbased. (OECD, 2007) Available after embargo period/ post-publication No restrictions
IASSIST 2013 6 CONTINUUM FOR DATA ACCESS Data curation continuum model for access (Treloar et al.,2007) Closed Access? Open Access Closed access Sensitive information Commercial/industry connections Intentional withholding (Campbell et al., 2002; Blumenthal et al., 2006) Open Access No restrictions Available after embargo period/ post-publication Open access to research data from public funding should be easy, timely, user-friendly and preferably Internetbased. (OECD, 2007)
IASSIST 2013 7 THE SPACE BETWEEN: CONDITIONS FOR DATA ACCESS Closed Access? Conditions for sharing data (Tenopir et al, 2011) collaboration opportunity; mandatory reprints provided; co-authorship; results of analyses need data providers approval prior to dissemination; cost recovery; legal permissions obtained Open Access
IASSIST 2013 8 THE SPACE BETWEEN: CONDITIONS FOR DATA ACCESS Closed Access? Open Access Conditions for sharing data (Tenopir et al, 2011) Environmental Science & Ecology Physical Sciences collaboration opportunity; mandatory reprints provided; co-authorship; results of analyses need data providers approval prior to dissemination cost recovery; legal permissions obtained Social Sciences
IASSIST 2013 9 THE SPACE BETWEEN: CONDITIONS FOR DATA ACCESS Closed Access? Open Access Conditions for sharing data (Tenopir et al, 2011) Environmental Science & Ecology Physical Sciences collaboration opportunity; mandatory reprints provided; co-authorship; results of analyses need data providers approval prior to dissemination; cost recovery; legal permissions obtained Social Science
IASSIST 2013 10 PROPOSED INQUIRY While researchers are willing to share their data, it is not clear how intended conditions are presented to third party researchers interested in reuse What conditions for use and access of data are made visible through metadata description? What similarities and differences exist for use and access conditions of data across disciplines?
IASSIST 2013 11 METHODOLOGICAL DESIGN OVERVIEW Sample Scope (3) disciplines/domain areas: Geochemistry Environmental Impacts Population Studies Source Dataset metadata records from GCMD (Global Change Master Directory); follows DIF format (Directory Interchange Format) Collected in Fall 2012. Includes fields for for <access constraints> and <use constraints> Approach Extracted descriptions for each dataset record. Applied analytic framework for available <use> and <access> constraints. Reviewed initial framework codes and recoded descriptions with new list.
IASSIST 2013 12 DISCIPLINES/ DOMAIN AREAS Geochemistry Environmental Science: Human Dimensions (ESHD) Demography/ Population Studies (DPS) Involves the study of the chemical composition along with physical and chemical processes at work in the formation of the Earth Examples of data content: biogeochemistry, hydration, ion exchange Relates the study of interactions and interdependencies of humans and their physical and social environments Examples of data content: GIS, air quality, chemical traces, vegetation species Concerns the number of individuals in a specified area that constitute a particular race, class, or group Examples of data content: mortality rate, population distribution, population density
IASSIST 2013 13
IASSIST 2013 14
IASSIST 2013 15 DIF DEFINITIONS OF ACCESS AND USE CONSTRAINTS The <Access_Constraints> field allows the author to provide information about any constraints for accessing the data set. This includes any special restrictions, legal prerequisites, limitations and/or warnings on obtaining the data set. Some words that may be used in this field include: Public, In-house, Limited, Additional detailed instructions on how to access the data can be entered in this field. The <Use_Constraints> field allows the author to describe how the data may or may not be used after access is granted to assure the protection of privacy or intellectual property. This includes any special restrictions, legal prerequisites, terms and conditions, and/or limitations on using the data set. Data providers may request acknowledgement of the data from users and claim no responsibility for quality and completeness of data. Retrieved from http://gcmd.nasa.gov/add/difguide/index.html
IASSIST 2013 16 DDI DESCRIPTION FOR USE AND ACCESS CONDITIONS <Conditions> Indicates any additional information that will assist the user in understanding the access and use conditions of the data collection. Example: The data are available without restriction. Potential users of these datasets are advised, however, to contact the original principal investigator Dr. J. Smith (Institute for Social Research, The University of Michigan, Box 1248, Ann Arbor, MI 48106), about their intended uses of the data. Dr. Smith would also appreciate receiving copies of reports based on the datasets. Retrieved from DDI-Codebook version 2.5
IASSIST 2013 17 FGDC CONTENT STANDARD FOR DIGITAL GEOSPATIAL METADATA: USE AND ACCESS CONSTRAINS 1.7 Access Constraints Definition: Restrictions and legal prerequisites for accessing the dataset. These include any access constraints applied to assure the protection of privacy or intellectual property, and any special restrictions or limitations on obtaining the dataset. Example: Access Constraints: None Access Constraints: CIESIN offers unrestricted access and use of data without charge, unless specified in the documentation for particular data. All other rights are reserved. 1.8 Use Constraints Definition: Restrictions and legal prerequisites for using the dataset after access is granted. These include any use constraints applied to assure the protection of privacy or intellectual property, and any special restrictions or limitations on using the dataset. Example: Use Constraints: None Use Constraints: The Wildlife Conservation Society (WCS) and Trustees of Columbia University in the City of New York hold the copyright of this dataset. Users are prohibited from any commercial, non-free resale, or redistribution without explicit written permission from WCS or CIESIN. Users should acknowledge WCS and CIESIN as the source used in the creation of any reports, publications, new datasets, derived products, or services resulting from the use of this dataset. WCS or CIESIN also request reprints of any publications and notification of any redistributing efforts. Retrieved from http://www.fgdc.gov/csdgmgraphical/ideninfo.htm
IASSIST 2013 18 ANALYTICAL FRAMEWORK A C C E S S C O N D I T I O N S special restrictions, legal prerequisites, limitations and/or warnings terms to use: Public; Inhouse; Limited instructions on how to access the data USE CONDITIONS special restrictions, legal prerequisites, terms and conditions, and/or limitations data providers may request acknowledgement of the data from users data providers may claim no responsibility for quality and completeness of data
IASSIST 2013 19 PRELIMINARY RESULTS & DISCUSSION Observed characterizations of Access conditions/ Use conditions Comparative analysis of conditions across disciplines Relationship between conditions and sharing practices and perceptions
IASSIST 2013 20 DISTRIBUTION OF AVAILABLE METADATA RECORDS WITH INFORMATION ON ACCESS AND/OR USE CONDITIONS 100% 90% 80% % of metadata records 70% 60% 50% 40% 30% No information Only Use Only Access Both Access & Use 20% 10% 0% Geochemistry DPS ESHD Discipline/ domain areas
IASSIST 2013 21 TYPES OF ACCESS CONDITIONS Users: access limited to certain personnel Permission: contact investigator and request access data Fees: account subscriptions for data download License agreement: acknowledge and agree to conditions Retrieval: reference to where and how data can be accessed Embargo period: not all data are publically available yet
IASSIST 2013 22 TYPES OF USE CONDITIONS Proper acknowledgement: includes specific citation to use or attribution statement Disclaimer: statements regarding accuracy of available data; providers cannot guarantee quality of data Sensitivity: respect for subject confidentiality Copyright: data cannot be used for commercial purposes without consent of provider; must secure written permission Publication restrictions: states who can publish Fees: assessed for data but not the metadata, which was freely available
TOP 3 CONDITIONS FOR EACH DISCIPLINE/ DOMAIN AREA IASSIST 2013 23 Access conditions Use conditions Demography/ Population Studies Environmental Science: Human Dimensions None (80.65%) Retrieval (9.68%) Limited use (6.45%) None (70%) Retrieval (15%) Embargo (10%) None (61.11%) Copyright/Disclaimer (22.22%) Proper acknowledgment (11.11%) Copyright (34.78%) License (26.09%) None (13.04%) Geochemistry None (40.76%) Limited use (16.92%) Retrieval (13.86%) Proper acknowledgment (31.67%) None (26.67%) License (17.5%)
COMPARISON OF CONDITIONS WITH KNOWN SHARING PERCEPTIONS IASSIST 2013 24 Sharing conditions from Tenopir et al. (2011) collaboration opportunity; mandatory reprints provided; co-authorship; legal permissions obtained Geochemistry Environmental Science: Human dimensions Demography/ Population Studies From metadata records: No Access conditions; Use conditions center on proper acknowledgement No Access conditions; Use conditions focus on copyright and license agreements No Access or Use conditions
IASSIST 2013 25 CONTINUUM OF DATA ACCESS: RE-VISITED Major Conditions: Permission from data provider; Fees/license agreement Closed Access Data only available to certain personnel Minor Conditions: Attribution to data provider; Retrieval from specified space; Embargo period Open Access No conditions or constraints
IASSIST 2013 26 FUTURE DIRECTIONS 1) Identify how conditions are conveyed in informal sharing venues 2) Relation between data type and conditions Preliminary analysis of records from the 3 disciplines indicate PDF most prevalent format for distribution of data No pattern in the type of data associated with particular constraints 3) Assess relevance of condition information to decisions regarding reuse of data
IASSIST 2013 27 SUMMARY & CONCLUSIONS Current metadata schema recognize information for data access and use conditions, yet the actual inclusion of conditions vary There are multiple types of conditions for access and use of data, with some more prevalent across different disciplines than others Potential implications for design of data repository system and services to accommodate different conditions and facilitate sharing
CIRSS Center for Informatics Research in Science and Scholarship Thank you! Contact: Tiffany Chao; tchao at illinois.edu IASSIST 2013 28
REFERENCES IASSIST 2013 29 Blumenthal, D., Campbell, E. G., Gokhale, M., Yucel, R., Clarridge, B., Hilgartner, S., & Holtzman, N. A. (2006). Data withholding in genetics and the other life sciences: prevalences and predictors. Academic medicine: journal of the Association of American Medical Colleges, 81(2), 137 145. Borgman, C. L., Wallis, J. C., & Enyedy, N. (2007). Little Science confronts the data deluge: Habitat ecology, embedded sensor networks, and digital libraries. International Journal on Digital Libraries, 7(1), 17 30. doi:10.1007/ s00799-007-0022-9 Campbell, E. G., Clarridge, B. R., Gokhale, M., Birenbaum, L., Hilgartner, S., Holtzman, N. A., & Blumenthal, D. (2002). Data withholding in academic genetics: evidence from a national survey. Jama, 287(4), 473. Cragin, M. H., Palmer, C. L., Carlson, J. R., & Witt, M. (2010). Data sharing, small science and institutional repositories. Philosophical Transactions of the Royal Society A, 368(1926), 4023 4038. doi:10.1098/rsta.2010.0165 Diekmann, F. (2012). Data Practices of Agricultural Scientists: Results from an Exploratory Study. Journal of Agricultural & Food Information, 13(1), 14 34. doi:10.1080/10496505.2012.636005 OECD (Organisation for Economic Co-operation and Development). (2007). OECD Principles and Guidelines for Access to Research Data from Public Funding. Retrieved from http://www.oecd.org/science/sci-tech/38500813.pdf PARSE. (2009). PARSE.Insight: INSIGHT into issues of permanent access to records of science in Europe. Retrieved from htto://www.parse-insight.eu/downloads/parse-insight_d3-4_surveyreport_final_hq.pdf Pienta, A. M., Alter, G. C., & Lyle, J. A. (2010, April). The enduring value of social science research: The use and reuse of primary research data. Presented at The Organisation, Economics and Policy of Scientific Research workshop, Torino, Italy. Retrieved from http://hdl.handle.net/2027.42/78307 Swan, A., & Brown, S. (2008). To share or not to share: Publication and quality assurance of research data outputs: Main report. Research Information Network. Retrieved from http://www.rin.ac.uk/our-work/data-management-and-curation/ share-or-not-share-research-data-outputs Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A. U., Wu, L., Read, E., Frame, M. (2011). Data Sharing by Scientists: Practices and Perceptions. PLoS ONE, 6(6), e21101. doi:10.1371/journal.pone.0021101 Treloar, A., Groenewegen, D., & Harboe-Ree, C. (2007). The Data curation continuum: Managing data objects in institutional repositories. DLib Magazine, 13(9/10). Retrieved from http://www.dlib.org/dlib/september07/treloar/09treloar.html