PaNSIG (PaNData), and the interactions between SB-IG and PaNSIG

Similar documents
Managing Research Data for Diverse Scientific Experiments

Supporting scientific processes in National Facilities

Supporting Data Workflows at STFC. Brian Matthews Scientific Computing Department

Metadata Models for Experimental Science Data Management

Requirements for data catalogues within facilities

The Photon and Neutron Data Initiative PaN-data

Towards a generalised approach for defining, organising and storing metadata from all experiments at the ESRF. by Andy Götz ESRF

PaNdata-ODI Deliverable: D4.3. PaNdata-ODI. Deliverable D4.3. Deployment of cross-facility metadata searching

An Evaluation of Metadata Tools and Methods to Assess the Impact and Value of Provenance Management in the PaNData community.

PaNdata-ODI Deliverable: D4.4. PaNdata-ODI. Deliverable D4.4. Benchmark of performance of the metadata catalogue

PaBdataODI-8.6 Deliverable: D8.6. PaN-data ODI. Deliverable D8.6

Martin Dove s RMC Workflow Diagram

PaN-data ODI Deliverable: D8.3. PaN-data ODI. Deliverable D8.3. Draft: D8.3: Implementation of pnexus and MPI I/O on parallel file systems (Month 21)

University of Bath. Publication date: Document Version Early version, also known as pre-print. Link to publication. Publisher Rights CC BY-SA

EUDAT & SeaDataCloud

SSRS-4 and the CREMLIN follow up project

Data Challenges in Photon Science. Manuela Kuhn GridKa School 2016 Karlsruhe, 29th August 2016

Deliverable D5.5. D5.5 VRE-integrated PDBe Search and Query API. World-wide E-infrastructure for structural biology. Grant agreement no.

ICAT Job Portal. a generic job submission system built on a scientific data catalog. IWSG 2013 ETH, Zurich, Switzerland 3-5 June 2013

Deliverable D4.4. Overview of external datasets, strategy of access methods, and implications on the portal architecture

Scientific Data Policy of European X-Ray Free-Electron Laser Facility GmbH

Overview of the CRISP proposal

Diamond Moonshot Pilot Participation

Integrating Data with Publications: Greater Interactivity and Challenges for Long-Term Preservation of the Scientific Record

Crystallography & Cryo-electron microscopy

Processing at ebic/diamond Light Source. Alun Ashton

Summary of the LHC Computing Review

FREYA Connected Open Identifiers for Discovery, Access and Use of Research Resources

Federated Identity Management for Research Collaborations. Bob Jones IT dept CERN 29 October 2013

STFC Cloud Developments. Ian Collier & Alex Dibbo STFC Rutherford Appleton Laboratory UK Research & Innovation HEPiX Spring, Madison, WI 17 May 2018

Nexus Builder Developing a Graphical User Interface to create NeXus files

Automating complex data collection using workflows

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

Data publication and discovery with Globus

Managing data flows. Martyn Winn Scientific Computing Dept. STFC Daresbury Laboratory Cheshire. 8th May 2014

The CEDA Archive: Data, Services and Infrastructure

High Performance Computing from an EU perspective

Pre-Commercial Procurement project - HNSciCloud. 20 January 2015 Bob Jones, CERN

CernVM-FS beyond LHC computing

Scientific Data Curation and the Grid

X-ray imaging software tools for HPC clusters and the Cloud

Design patterns for data-driven research acceleration

EUDAT. Towards a pan-european Collaborative Data Infrastructure

e-infrastructure: objectives and strategy in FP7

Data Management at NIST

LAMBDA The LSDF Execution Framework for Data Intensive Applications

I data set della ricerca ed il progetto EUDAT

Data Reduction in CrysAlis Pro

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

CODATA and (meta)data characterisation in the wider world Brian McMahon

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

Monash High Performance Computing

e-infrastructures in FP7 INFO DAY - Paris

irods and Objectstorage UGM 2016, Chapel Hill / Othmar Weber, Bayer Business Services / v0.2

Building a federated science web one link at a time

The data explosion. and the need to manage diverse data sources in scientific research. Simon Coles

Grid technologies, solutions and concepts in the synchrotron Elettra

PaNdata ODI. Photon and Neutron Open Data Infrastructure Project. European federated data infrastructure for research with photons and neutrons

Computing and Networking at Diamond Light Source. Mark Heron Head of Control Systems

Big Scientific Data and Data Science. Professor Tony Hey Chief Data Scientist Rutherford Appleton Laboratory, STFC

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

4th TERENA NRENs and Grids Workshop, Amsterdam, Dec. 6-7, Marcin Lawenda Poznań Supercomputing and Networking Center

Description of the European Big Data Hackathon 2019

arxiv: v1 [physics.ins-det] 26 Jul 2018

Andrea Sciabà CERN, Switzerland

EGI: Linking digital resources across Eastern Europe for European science and innovation

EUDAT- Towards a Global Collaborative Data Infrastructure

N. Marusov, I. Semenov

Welcome - webinar instructions

SynchLink: an IOS app for viewing data interactively from synchrotron MX beamlines

Rapid Deployment of VS Workflows. Meta Scheduling Service

Outline. MXCuBE3 at ESRF. Remote Access. Quick Review of 3.0. New in version Future work. Marcus Oskarsson

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation

FuncX: A Function Serving Platform for HPC. Ryan Chard 28 Jan 2019

DESY site report. HEPiX Spring 2016 at DESY. Yves Kemp, Peter van der Reest. Zeuthen,

CCP4 CSE. CCP for Protein Crystallography. Much of this talk prepared by Martyn Winn. Computational Science & Engineering Department

Moving e-infrastructure into a new era the FP7 challenge

SAXS at the ESRF Beamlines ID01 and ID02

THE EUCLID ARCHIVE SYSTEM: A DATA-CENTRIC APPROACH TO BIG DATA

Welcome! Presenters: STFC January 10, 2019

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

Vista Controls Vsystem * at the ISIS pulsed neutron facility Bob Mannix, Tim Gray ISIS Controls Group STFC, Rutherford Appleton Laboratory UK

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

Advanced Photon Source Data Management. S. Veseli, N. Schwarz, C. Schmitz (SDM/XSD) R. Sersted, D. Wallis (IT/AES)

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

STATUS OF THE ULTRA FAST TOMOGRAPHY EXPERIMENTS CONTROL AT ANKA (THCA06)

IMARIS 3D and 4D interactive analysis and visualization solutions for the life sciences

EU Innovation Investments: The Challenges met by Innovation Infrastructures Today in Europe

Helix Nebula Science Cloud Pre-Commercial Procurement pilot. 5 April 2016 Bob Jones, CERN

Protein Data Bank: An open access resource enabling basic and applied research and education in biology and medicine

IEPSAS-Kosice: experiences in running LCG site

The Data exacell DXC. J. Ray Scott DXC PI May 17, 2016

The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

DMPonline Technical Note

The Virtual Observatory and the IVOA

Semantic assets and challenges of ontologies management

Transcription:

PaNSIG (PaNData), and the interactions between SB-IG and PaNSIG Erica Yang erica.yang@stfc.ac.uk Scientific Computing Department STFC Rutherford Appleton Laboratory Structural Biology IG 27 March 2014 RDA Plenary, Dublin RDA-PanSIG

What is ICAT? Example ISIS Proposal GEM High intensity, high resolution neutron diffractometer H2-(zeolite) vibrational frequencies vs polarising potential of cations B-lactoglobulin protein interfacial structure Proposals Once awarded beamtime at ISIS, an entry will be created in ICAT that describes your proposed experiment. Experiment Data collected from your experiment will be indexed by ICAT (with additional experimental conditions) and made available to your experimental team Analysed Data You will have the capability to upload any desired analysed data and associate it with your experiments. Publication Using ICAT you will also be able to associate publications to your experiment and even reference data from your publications.

PaNSIG: IG for Photon and Neutron Science Co-chairs Amber Boehnlein/SLAC Frank Schluenzen/DESY, Brian Matthews/STFC A follow-on from the EU PaNData project Focusing on discussing coordinated efforts to Address rising challenges around research data Identify opportunities for exploiting research data Participants (so far) Computing people: facility operation, data analysis code development, data infrastructure and HPC operation, and system integration Friendly scientists (need more!)

PaNData: Photon and Neutron Data Infrastructure Established in 2007 with 4 facilities now standing at 13 With friends around the world Combined Number of Unique Users more than 35000 in 2011 Combines Scientific and IT staff from the collaborating facilities European Framework 7 Projects PaNdata-Europe: SA, 2009-11 PaNdata-Open Data Infrastructure, IP, 2011-14 Guesstimates Investment > 4.000.000.000 Running costs > 500.000.000/yr Publications > 10.000/yr RCosts/Publication ~ 50.000 Data volume >>>> 10PB/yr Credit: Brian Matthews

ALBA BER II DESY DLS Counting Users ELETT RA Number of Users shared between facilities ESRF FRM-II ILL ISIS LLB SINQ SLS SOLEIL neutron photon all ALBA 773 7 61 58 51 281 2 51 13 5 10 77 105 69 400 773 BER II 7 1563 115 46 27 179 157 383 198 98 191 62 36 580 329 1563 DESY 61 115 4197 137 222 851 116 255 113 62 95 315 188 469 1294 4197 DLS 58 46 137 4407 102 810 30 267 399 33 52 229 192 546 1130 4407 ELETTR A Neutron diffraction 51 27 222 102 3167 433 11 77 35 20 18 179 367 141 900 3167 ESRF 281 179 851 810 433 10287 139 900 369 190 174 963 1286 1313 3586 10287 FRM-II 2 157 116 30 11 139 1095 347 137 89 161 33 29 509 259 1095 ILL 51 383 255 267 77 900 347 4649 731 301 395 156 222 1518 1347 4649 ISIS 13 198 113 399 35 369 137 731 2880 89 233 94 56 936 745 2880 LLB 5 98 62 33 20 190 89 301 89 1235 74 39 151 391 323 1235 SINQ 10 191 95 52 18 174 161 395 233 74 1219 224 31 590 415 1219 SLS 77 62 315 229 179 963 33 156 94 39 224 3827 399 371 1470 3827 SOLEIL 105 36 188 192 367 1286 29 222 56 151 31 399 4568 394 1817 4568 neutron 69 1563 469 546 141 1313 1095 4649 2880 1235 1219 371 394 10023 2334 10023 Multiple facilities, and multiple techniques High-quality structure refinement photon 773 329 4197 4407 3167 10287 259 1347 745 323 415 3827 4568 2334 25336 25336 all 773 1563 4197 4407 3167 10287 1095 4649 2880 1235 1219 3827 4568 10023 25336 33025 Experimental, computational, and maybe, data-intensive (?) http://pan-data.eu/users2012-results X-ray diffraction

PaN-Data Developments Common data environment, common user experience Shared Data Policy Framework Federated User Authentication Federated Data Catalogue Common Software Catalogue Common Ontology Common Data Format PanSoft PanKOS NeXus

ICAT Tool Suite and Clients Desktop app Clusters/HPC ICAT Job Portal TopCAT (Web Interface to ICATs) User interactions with ICAT ICAT Data Explorer (Eclipse Plugin) Disk Tape IDS (ICAT Data Service) ICAT + Mantid (desktop client) A database With well defined data model With well defined APIs to access, search, download Experiment data Proposal to Publication Being rolled out in PaNData-ODI ICAT APIs + Web Services

Interfaces - TopCAT Some screenshots

Interfaces Mantid and DAWN Some screenshots

ICAT: current deployments In production: 1. ISIS/UK: neutron 2. DLS/UK: synchrotron 3. CLF/UK: high power lasers 4. SNS/US: neutron 5. ILL/France: neutron In prototype deployment: 1. ALBA/Spain: synchrotron 2. ESRF/France: synchrotron 3. DESY/Germany: synchrotron 4. Elletra/Italy: synchrotron

Network monitoring Data Synchronisation Behind the ICATs (Diverse experiments at RAL) Data monitoring Data Cataloguing Data archive

Nexus and ICAT How does diverse experiment metadata get into ICAT Nexus Application Profile for SAS http://download.nexusformat.org/ A facility can put any metadata into ICAT as long as it conforms to the above model. But, it doesn t help searching and discovering data! (Data model diagram credit: Andy Gotz et. al./esrf)

PanKOS: Ontology for Facility Science Facilities, instruments, and techniques using Linked Data (applications: cataloguing, annotation, searching, and linking)

How does it relevant to SB-IG?

An integrated structural biology platform for the UK Cell Biology OPPF-UK MPL RC@H UK XFEL HuB@Diamond Fluorescence microscopy (CLF/STFC & DLS) Computational environment / CCP4 X-ray imaging. Cryo-EM/ET Diamond Beamlines: Macromolecular Crystallography, Scattering, X-ray spectroscopy, infrared CCP EM Diagram Credit: Martin Walsh

Challenges facing SB user communities Data Volume e.g. 25PB/75 days XFEL SFX instrument == LHC data/year Diversity of data (and techniques) Complex workflows (multiple techniques) Variety of data analysis software/frameworks Data reduction and analysis may be facility specific Users lack of capability (hardware, software and infrastructure expertise) to exploit third party software, HPC, and advanced distributed computing frameworks Variety of data formats (?TRUE) Vast variety of metadata collected Experiments Simulations Data analysis software Hard to keep track (and keep a record) of data analysis workflows

A Facility-based Post-Experiment Data Analysis Environment An opportunity to go hand-in-hand with the SB community?

Infrastructure for managing data flows Scan Reconstruct Segment + Quantify 3D mesh + Image based Modelling Predict + Compare Data Catalogue Petabyte Data storage Parallel File system HPC CPU+GPU Visualisation Infrastructure + Software + Expertise! Some mage credit: Avizo, Visualization Sciences Group (VSG)

Between SB-IG and PanSIG Starting points for collaborations? PanSoft PanKOS NeXus