Metadata Models for Experimental Science Data Management

Similar documents
Supporting Data Workflows at STFC. Brian Matthews Scientific Computing Department

Supporting scientific processes in National Facilities

Managing Research Data for Diverse Scientific Experiments

PaNSIG (PaNData), and the interactions between SB-IG and PaNSIG

Requirements for data catalogues within facilities

University of Bath. Publication date: Document Version Early version, also known as pre-print. Link to publication. Publisher Rights CC BY-SA

FREYA Connected Open Identifiers for Discovery, Access and Use of Research Resources

PaNdata-ODI Deliverable: D4.4. PaNdata-ODI. Deliverable D4.4. Benchmark of performance of the metadata catalogue

The data explosion. and the need to manage diverse data sources in scientific research. Simon Coles

EUDAT & SeaDataCloud

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation

From Open Data to Data- Intensive Science through CERIF

Deliverable D4.4. Overview of external datasets, strategy of access methods, and implications on the portal architecture

Data is the new Oil (Ann Winblad)

Towards a generalised approach for defining, organising and storing metadata from all experiments at the ESRF. by Andy Götz ESRF

The Photon and Neutron Data Initiative PaN-data

Towards FAIRness: some reflections from an Earth Science perspective

The CEDA Archive: Data, Services and Infrastructure

Design patterns for data-driven research acceleration

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Scientific Data Curation and the Grid

Data Management at NIST

Deliverable 6.4. Initial Data Management Plan. RINGO (GA no ) PUBLIC; R. Readiness of ICOS for Necessities of integrated Global Observations

University at Buffalo's NEES Equipment Site. Data Management. Jason P. Hanley IT Services Manager


PaNdata-ODI Deliverable: D4.3. PaNdata-ODI. Deliverable D4.3. Deployment of cross-facility metadata searching

ICAT Job Portal. a generic job submission system built on a scientific data catalog. IWSG 2013 ETH, Zurich, Switzerland 3-5 June 2013

I data set della ricerca ed il progetto EUDAT

Science Europe Consultation on Research Data Management

CORE: Improving access and enabling re-use of open access content using aggregations

Developing a Research Data Policy

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

DOI for Astronomical Data Centers: ESO. Hainaut, Bordelon, Grothkopf, Fourniol, Micol, Retzlaff, Sterzik, Stoehr [ESO] Enke, Riebe [AIP]

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Metadata Ingestion and Processinng

Why CERIF? Keith G Jeffery Scientific Coordinator ERCIM Anne Assserson eurocris. Keith G Jeffery SDSVoc Workshop Amsterdam

An Approach to Software Preservation

An Evaluation of Metadata Tools and Methods to Assess the Impact and Value of Provenance Management in the PaNData community.

Martin Dove s RMC Workflow Diagram

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

Scientific Data Policy of European X-Ray Free-Electron Laser Facility GmbH

Introduction to Grid Computing

The Materials Data Facility

EUDAT Towards a Collaborative Data Infrastructure

Managing data flows. Martyn Winn Scientific Computing Dept. STFC Daresbury Laboratory Cheshire. 8th May 2014

Computing and Networking at Diamond Light Source. Mark Heron Head of Control Systems

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

INSPIRE & Environment Data in the EU

Overview. Steve Fisher Please do interrupt with any questions

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

Global Data Sharing The Research Data Alliance

Writing a Data Management Plan A guide for the perplexed

e-infrastructures in FP7 INFO DAY - Paris

Distributed Archive System for the Cherenkov Telescope Array

EUDAT- Towards a Global Collaborative Data Infrastructure

Reproducible Workflows Biomedical Research. P Berlin, Germany

WM2015 Conference, March 15 19, 2015, Phoenix, Arizona, USA

Federated Identity Management for Research Collaborations. Bob Jones IT dept CERN 29 October 2013

Data publication and discovery with Globus

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Paving the Rocky Road Toward Open and FAIR in the Field Sciences

Promoting semantic interoperability between public administrations in Europe

Software + Services for Data Storage, Management, Discovery, and Re-Use

High Performance Computing Data Management. Philippe Trautmann BDM High Performance Computing Global Research

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

IM&T Data Management Strategy Overview March 2013 CSIRO INFORMATION MANAGEMENT & TECHNOLOGY

VI-SEEM Data Repository. Presented by: Panayiotis Charalambous

Medici for Digital Cultural Heritage Libraries. George Tsouloupas, PhD The LinkSCEEM Project

Big Data infrastructure and tools in libraries

For each use case, the business need, usage scenario and derived requirements are stated. 1.1 USE CASE 1: EXPLORE AND SEARCH FOR SEMANTIC ASSESTS

Reproducible & Transparent Computational Science with Galaxy. Jeremy Goecks The Galaxy Team

Transitioning to Symyx

Town Hall Meeting on the National and International Data Infrastructure. Jim Warren, NIST Lisa Friedersdorf, NNCO

European Interoperability Reference Architecture (EIRA) overview

Trust and Certification: the case for Trustworthy Digital Repositories. RDA Europe webinar, 14 February 2017 Ingrid Dillo, DANS, The Netherlands

The Research Data Alliance (RDA): building global data connections CC BY-SA 4.0

Building a federated science web one link at a time

The iplant Data Commons

N. Marusov, I. Semenov

Educating a New Breed of Data Scientists for Scientific Data Management

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

e-infrastructure: objectives and strategy in FP7

Opus: University of Bath Online Publication Store

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

Fedora Commons Update

DL User Interfaces. Giuseppe Santucci Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza

Data Management Plans. Sarah Jones Digital Curation Centre, Glasgow

Nexus Builder Developing a Graphical User Interface to create NeXus files

Indiana University Research Technology and the Research Data Alliance

Introduction to Metadata for digital resources (2D/3D)

Evolution of INSPIRE interoperability solutions for e-government

3D Printing: Chemistry and Crystal Structures

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

Digital The Harold B. Lee Library

Processing and analysis of Earth Observation data

irods usage at CC-IN2P3: a long history

Data Management Checklist

Index Introduction Setting up an account Searching and accessing Download Advanced features

FAIR-aligned Scientific Repositories: Essential Infrastructure for Open and FAIR Data

Semantic assets and challenges of ontologies management

Transcription:

Metadata Models for Experimental Science Data Management Brian Matthews Facilities Programme Manager Scientific Computing Department, STFC Co-Chair RDA Photon and Neutron Science Interest Group Task lead, NA on metadata standardisation, NFFA-Europe

Large-Scale Analytic Facilities Key challenges of the 21st century energy, global climate, health and security study matter at the scales from single atoms (10-10 m) to living cells (10-6 m) to whole systems (10-3 -1 m) High resolution microscopes intense beams of particles Specialist sources Requires large scale research infrastructures that are beyond the capability of any single university or research group Diamond ISIS Photons(X-Rays) see electric charge high atomic number nuclei Neutrons see nucleons including hydrogen atoms

Experimental Method Fundamental in science The defining feature Experimental methodology A Subject of study Controlled environmental conditions Vary chosen parameters Measure and take data Analysis to interpret data Compare with hypothesis (model) Data alone is useless With some simple descriptive metadata Need full-context of the experiment Restartability Validation Reproducability

The science we do - Structure of materials Visit facility on research campus Place sample in beam Diffraction pattern from sample Fitting experimental data to model A particular view on what an experiment is Structural determination of materials Possibly multiple runs, multiple techniques Compared and contrast with computational models Increasingly dynamics May be used in a wider context E.g. Drug candidates May differ from other views of experiments Observations and measurements Longitudinal studies Etc But a useful subclass And may be generalisable (?) Structure of cholesterol in crude oil

Data Management Systems ICAT Data Management Suite Integrated data management pipelines From data acquisition to storage to publication Data Acquisition Data Storage Data Access Metadata as Middleware A Catalogue of Experimental Data Automated metadata capture Integrated with the User Office and data acquisition system ICAT API Metadata ICAT DB Metadata Catalogue Metadata DB Cache TopCAT Web frontend Downloader (IDS) Providing access to the user TopCat web front end Integrated into Analysis frameworks Mantid for Neutrons, DAWN for X-Rays Lustre file store StorageD Client DB Cache Data StorageD Data (De-) aggregator, Metadata Ingestor Data Retrieved data FUSE Data browser TopCAT (Web Interface to ICATs) Python ICAT DB Cache CASTOR Storage System DB Desktop app Clusters/HPC Disk Tape Data transfer protocols ICAT Job Portal IDS (ICAT Data Service) AA Plugins ICAT APIs ICAT Manager ICAT + DAWN (Eclipse Plugin) ICAT + Mantid (desktop client) 15 years effort to build data management systems DLS Archive of - 4.7PB, 1100 million files (Aoril 2016) cf 2.2PB, 620m Jan 2015 ISIS Data Archive ~50Tb Full experimental Metadata ICAT Open Source Collaboration: www.icatproject.org

Facility Data Lifecycle Metadata Catalogue Proposal Publication Approval Data analysis Scheduling Experiment Data reduction http://www.icatproject.org

Core Scientific Metadata Model (CSMD) Facility Instrument Publication Authorisation Investigation Investigator The Core Metadata model forms the information model for ICAT. Dataset Sample Sample Parameter Designed to describe facilities based experiments in Structural Science throughout a facility s scientific workflow. - Uses a Facilities view of an experiment an Investigation (proposal) For use within the repository for organising data Datafile Related Datafile Dataset Parameter Datafile Parameter Parameter http://purl.org/net/csmd http://icatproject.org/csmd/

An open access resource for experimental & theoretical nanoscience Information and Data Management Repository Platform for nanoscience An integrated platform covering the full research cycle by the users. automatic acquisition of key metadata a data repository for future data access. Defining metadata standards for data sharing in nanoscience To represent data from nanoscience experiment and theoretical analysis. Use currently available standards e.g. from PaNData project. STFC, CNR-IOM, ESRF, KIT, FORTH Materials IG - and International Materials Resource Registries WG

Metadata for Nanomaterials Data Workflow for Nano-structured Science Metadata focussed around the Project A user centred view NFFA Deliverable 11.2: Draft Metadata Standard 29 th February 2016

Core vocabulary for Entities Experiment Concepts Research User Instrument Scientist.. Project Proposal Facility Instrument Experiment Measurement Sample Data Concepts Raw Data Analyzed Data Data Asset Data Analysis Data Analysis Software Data Archive Data Policy Data Manager Data Curation Activity

Relations between Entities

Not just us of course: Chemical Process Description Experimental process Measurement parameters Sample description/ preparation Observation/outcome description Data analysis Reaction transformation Equipment/apparatus Laboratory/environmental parameters Metadata used in data models (e.g.,orechem) XML standards (e.g., AnIML, S88) Methods ontologies (e.g., ChMO) Analytical terminology (e.g,iupac Orange Book) Incident analysis (e.g., BowTie)

RDA Metadata IG Common Concepts Organisation Person Project Name/Title Keywords RDA Metadata IG Defining common concepts Straw man Need to agree good definitions Take into account models of experimental science Publication (document) Facility Equipment Product (dataset) Indicator Spatial coordinates Temporal coordinates Classification Term Measurement Keith Jeffrey Service Classification Scheme

For RDA FAIR: Interoperability, Reusability Entities in a Core Metadata Vocabulary Agreed definitions Nature of Relationship between entities Base Attributes for all Entities Based on models for research processes General enough to be in common Specific enough to be useful Role of Pids Pids for everything! Relationships to other metadata Provenance, Preservation