0 Welcome to the Pure International Conference Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018
1 Mendeley Data Use Synergies with Pure to Showcase Additional Research Outputs Nikhil Joshi Solutions Manager, Research Data Management Oct 31, 2018
2 Key take-away: My-own means that I (and anybody else) could not re-use my own data https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970 2
3 3 The research life cycle depends heavily on two data life cycles: 1. Within the lab, covering all active data in all domains 2. Within the world at large Research Lifecycle 1. Lab data Design Find Topic Identify gaps Plan & Fund Execute Discover data, people, methods & protocols Prepare, reproduce, re-use & benchmark Collect, analyze & visualize Store & Share Publish Disseminate 2. Open data: data publicly available
Taking the institutional lens, we can speak of 3 interlocking data life cycles 4 4 Research Institution 3. Metrics on data Monitoring and reporting on institutional data 1. Lab data Benchmark Rank Evaluate Manage Preserve Design Find Topic Identify gaps Plan & Fund Execute Discover data, people, methods & protocols Prepare, reproduce, re-use & benchmark Collect, analyze & visualize Store & Share Publish Disseminate 2. Open data: data publicly available Re-using research data improves outcomes for the research life cycle This means improving the research data life-cycles: (1) within the lab and (2) to the world at large This also means keeping track of the institutional data lifecycles, and (3) reporting on them
5 5 What is Mendeley Data? Benefits for researchers: Prevent re-work: save time searching, collecting and sharing data Comply with funders' mandates Improve impact: increase data reuse Benefits for institutions: Keep track of your data inside and outside your institution Showcase institutional research outputs Improve collaborations within/across institutions How we deliver: 1. Open system & open API s; modular approach enables integrations across many research data solutions 2. Data remains at/owned by institution 3. System is integrated with the researcher workflows: we make it simple & obvious 4. Your researchers keep working like they do today while avoiding additional bureaucracy & administration
What is Mendeley Data? Search within 9mln datasets from over 30 world-wide data repositories, growing all the time Including your institutional repository (if you want) Institution 3. Metrics on data Monitoring and reporting on institutional data Benchmark Rank Evaluate Manage Preserve Find and report on your data inside and outside your institution Engage with researchers when they actually have data Trusted data repository Showcasing your data Automatically linked with Pure 6 6 Active data management: Manage data in projects Custom metadata & co-editing Local data integration Integration hub between internal and external data Lab Notebook (Hivebench) Annotate your data with protocols and experiments
7 7 How did we get here? 2014 2015 2016 2017 2018 Mendeley Data Notebook (Hivebench): lab notebook has been used since 2014 by researchers who wanted to collect their data in the lab in a more structural way https://hivebench.com Mendeley Data Repository: in use since 2015 by researchers who wanted to find a safe place to get credit for their data https://data.mendeley.co m Data Search: in use by researchers since 2016 to discover data https://datasearch.elsevier.com/ Launched now: Mendeley Data: platform for institutions Worked with community & 10 development partners to meet institutional needs
Where is Mendeley Data in the Elsevier ecosystem? 8 8 Notify new articles to Monitor for data sharing compliance Datasets as Scopus records Sync datasets, projects, grants, equipment, showcase on portal Users Produce and consume data metrics Produce and consume data metrics Submit / link datasets with publications existing integration planned integration
9 9 Types of integration between CRIS and RDM RDM platform CRIS Dataset Dataset Dataset Data Meta data CRIS to RDM Projects Grants People Organisational units RDM to CRIS Dataset metadata Links to publications Access permissions Data metrics
Mendeley Data already integrates through open APIs with the global Research Data Management ecosystem 1 0 10 Mint DOIs Import/export datasets, notebooks, experiments Open API with any other tool Repository indexed by OpenAIRE Index datasets metadata Integrate with machine readable DMPs Zenodo indexed by DataSearch Long-term preservation of published datasets Publish links between articles and datasets Datasets indexed by DataSearch existing integration planned integration + 30 repositories
11 Use case: Librarian Help curate and validate research outputs from an institution Search for publication records to be curated / merged Merge publications Send comments and notes Validate existing publications Import datasets 11
12 For example: Manchester has datasets that researchers have uploaded on the Univ repository or have uploaded externally
13 Search for publications to clean up and merge 13
14 Merge publications 14
Curate and validate data 15 15
16 Interact with users and record changes, validate 16
17 Import datasets 17
18 Example result: Manchester Research Explorer Pure portal 18
19 Pure integrations RDM Module Integration Status Data Repository Data Monitor Data Monitor Published datasets appear in Pure catalogue Import/Export Data into Pure workflow Import researchers from Pure for active ongoing engagement with researchers a) Datasets from backfiles can be discovered and imported for Pure customers today b) Ongoing monitoring will be ready mid 2019 Will be ready mid 2019
20 Thank you Wouter, Alberto & Nikhil w.haak@elsevier.com a.zigoni@elsevier.com n.joshi@elsevier.com www.elsevier.com/research-intelligence
21 Section 1 FAIR data? How to link Pure to the actual researcher data workflow Wouter Haak VP Research Data Management Alberto Zigoni RDM Development Director Nikhil Joshi Solutions Manager, Research Data Management Nov 1, 2018
22 FAIR data Findable: have sufficiently rich metadata and a unique and persistent identifier Accessible: retrievable by humans and machines through a standard protocol; open and free by default; authentication and authorization where necessary Interoperable: metadata use a formal, accessible, shared, and broadly applicable language for knowledge representation Reusable: metadata provide rich and accurate information; clear usage license; detailed provenance see original guiding principles at https://www.force11.org/node/6062
23 Why FAIR Data? Good scientific practice depends on communicating the evidence. Open research data are essential for reproducibility, self-correction. Academic publishing has not kept up with age of digital data. Danger of an replication / evidence / credibility gap. Boulton: to fail to communicate the data that supports scientific assertions is malpractice Open data practices have transformed certain areas of research. Genomics and related biomedical sciences; crystallography; astronomy; areas of earth systems science; various disciplines using remote sensing data FAIR data helps use of data at scale, by machines, harnessing technological potential. Research data often have considerable potential for reuse, reinterpretation, use in different studies. Open data foster innovation and accelerate scientific discovery through reuse of data within and outside the academic system. Research data produced by publicly funded research are a public asset From: Simon Hodson, Executive Director, CODATA, Open Science Conference 2018, Berlin, 14 March 2018
Research Data Management (RDM) needs a holistic approach 24 When talking about data, we talk about All forms of research data, which includes everything needed to reproduce and reuse Raw data Processed data Protocols, methods, workflows Machine & environment settings Scripts, analyses, algorithms
25 25 The research life cycle depends heavily on two data life cycles: 1. Within the lab, covering all active data in all domains 2. Within the world at large Research Lifecycle 1. Lab data Design Find Topic Identify gaps Plan & Fund Execute Discover data, people, methods & protocols Prepare, reproduce, re-use & benchmark Collect, analyze & visualize Store & Share Publish Disseminate 2. Open data: data publicly available
Taking the institutional lens, we can speak of 3 interlocking data life cycles 26 26 Research Institution 3. Metrics on data Monitoring and reporting on institutional data 1. Lab data Benchmark Rank Evaluate Manage Preserve Design Find Topic Identify gaps Plan & Fund Execute Discover data, people, methods & protocols Prepare, reproduce, re-use & benchmark Collect, analyze & visualize Store & Share Publish Disseminate 2. Open data: data publicly available Re-using research data improves outcomes for the research life cycle This means improving the research data life-cycles: (1) within the lab and (2) to the world at large This also means keeping track of the institutional data lifecycles, and (3) reporting on them
The impact of RDM on publications: the state of Pennsylvania 27
28 RDM is growing very fast in Pennsylvania CAGR: 1.4% CAGR: 29.5%
29 Pure dataset discovery and import Get pubs from Pure Search for linked datasets in Scholix Retrieve dataset metadata from repositories (or DataCite) Match pub authors and dataset contributors Prepare import file for Pure
30 University of Manchester datasets import statistics ~180K publications analysed 6,036 datasets discovered ~3% of the publications have datasets Datasets from Mendeley Data are imported via standard integration 4 additional datasets from Mendeley Data added (free users with personal email)
Data showcasing Manchester and Mendeley Data 31
FAIR starts with an F : Findable 32
33
Bowel missing as metadata 34
FAIR is about combining all 3 interlocking data life cycles 35 35 Institution 3. Metrics on data Monitoring and reporting on institutional data Benchmark Rank Evaluate Manage Preserve
36 Therefore we integrate CRIS and RDM Mendeley Data Pure Dataset Dataset Dataset Data Meta data CRIS to RDM Projects Grants People Organisational units RDM to CRIS Dataset metadata Links to publications Access permissions Data metrics
37 Thank you Wouter, Alberto & Nikhil w.haak@elsevier.com a.zigoni@elsevier.com n.joshi.1@elsevier.com www.elsevier.com/research-intelligence