NorStore. a national infrastructure for scientific data. Andreas O Jaunsen UNINETT Sigma as

Similar documents
Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

EUDAT and Cloud Services

EUDAT - Open Data Services for Research

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

EUDAT & SeaDataCloud

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

I data set della ricerca ed il progetto EUDAT

EUDAT Towards a Collaborative Data Infrastructure

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

The EUDAT Collaborative Data Infrastructure

einfrastructure Use Roadmap (DRAFT)

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure. KNMI Workshop, Utrecht, Netherlands

2. HDF AAI Meeting -- Demo Slides

Fundamentals of Data Infrastructures

e-infrastructure: objectives and strategy in FP7

Data Discovery - Introduction

Using EUDAT services to replicate, store, share, and find cultural heritage data

Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

Trusted Digital Repositories. A systems approach to determining trustworthiness using DRAMBORA

Data management and discovery

Promoting Open Standards for Digital Repository. case study examples and challenges

Striving for efficiency

Indiana University Research Technology and the Research Data Alliance

EGI-InSPIRE. Cloud Services. Steven Newhouse, EGI.eu Director. 23/05/2011 Cloud Services - ASPIRE - May EGI-InSPIRE RI

AARC Overview. Licia Florio, David Groep. 21 Jan presented by David Groep, Nikhef.

EUDAT- Towards a Global Collaborative Data Infrastructure

University of British Columbia Library. Persistent Digital Collections Implementation Plan. Final project report Summary version

The CHAIN-REDS Project

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Poland - e-infrastructure ecosystem and relation to EOSC

irods - An Overview Jason Executive Director, irods Consortium CS Department of Computer Science, AGH Kraków, Poland

A national approach for storage scale-out scenarios based on irods

Protection of the National Cultural Heritage in Austria

OAIS: What is it and Where is it Going?

EUDAT Common data infrastructure

EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012

Protecting Future Access Now Models for Preserving Locally Created Content

An overview of the OAIS and Representation Information

SciX Open, self organising repository for scientific information exchange. D15: Value Added Publications IST

High Performance Computing from an EU perspective

Digital Preservation in the Cloud Benefits and Considerations for State Archives Tuesday 10 Feb 2015 Preservica & Amazon Web Services

Requirements for data catalogues within facilities

European Collaborative Data Infrastructure EUDAT - Training on EUDAT Principles -

Enabling BigData Workflows in Earth Observation Science

e-infrastructures in FP7 INFO DAY - Paris

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository

Ing. José A. Mejía Villar M.Sc. Computing Center of the Alfred Wegener Institute for Polar and Marine Research

Petaflop Computing in the European HPC Ecosystem

DIGITAL STEWARDSHIP SUPPLEMENTARY INFORMATION FORM

EUDAT & AAI. Daan Broeder MPI for Psycholinguistics

AARC. Christos Kanellopoulos AARC Architecture WP Leader GRNET. Authentication and Authorisation for Research and Collaboration

Fedora Commons: Taking on the Challenge of the Next Generation of Scholarly Communication

Open Science Commons: A Participatory Model for the Open Science Cloud

Bridging Continents. Kazu Yamaji National Institute of Informatics JAPAN

Dagmar Triebel, Peter Grobe, Anton Güntsch, Gregor Hagedorn, Joachim Holstein, Carola Söhngen, Claus Weiland, Tanja Weibulat.

e-research Infrastructures for e-science Axel Berg SARA national HPC & e-science support center RAMIRI, June 15, 2011

European Open Science Cloud

irods for Data Management and Archiving UGM 2018 Masilamani Subramanyam

irods workflows for the data management in the EUDAT pan-european infrastructure

DCH-RP Trust-Building Report

National Research Data Cloud

Richard Marciano Alexandra Chassanoff David Pcolar Bing Zhu Chien-Yi Hu. March 24, 2010

A collaboration overview: From TF-VSS to GN2 SA6

NDSA Web Archiving Survey

Digitalization in the energy landscape - integrating renewable energy FAPESP, Sao Paulo, 13. November 2017

Basics of Cloud Computing Lecture 2. Cloud Providers. Satish Srirama

HPC IN EUROPE. Organisation of public HPC resources

dcache: challenges and opportunities when growing into new communities Paul Millar on behalf of the dcache team

EuroHPC Bologna 23 Marzo Gabriella Scipione

RDM through a UK lens - New Roles for Librarians?

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

GRIDS INTRODUCTION TO GRID INFRASTRUCTURES. Fabrizio Gagliardi

Reporting on EOSC matters. Onur Temizsoylu TÜBİTAK ULAKBİM

The OAIS Reference Model: current implementations

Trust and Certification: the case for Trustworthy Digital Repositories. RDA Europe webinar, 14 February 2017 Ingrid Dillo, DANS, The Netherlands

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

Designing an institutional research data management infrastructure for the life sciences

ETP4HPC IN A NUTSHELL

EUDAT Registry Overview for SAF (26/04/2012) John kennedy, Tatyana Khan

INDIGO AAI An overview and status update!

Introduction to Digital Preservation. Danielle Mericle University of Oregon

Research Data Management: lessons learned - and still to learn

Stakeholder consultation process and online consultation platform

National Centre for Text Mining NaCTeM. e-science and data mining workshop

The PICTURE project, ICT R&I priorities in EaP, areas of cooperation

Developing the ERS Collaboration Framework

DURAARK. Ex Libris conference April th, 2013 Berlin. Long-term Preservation of 3D Architectural Data

Data Curation Handbook Steps

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

Metadata for Data Discovery: The NERC Data Catalogue Service. Steve Donegan

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

Certification Efforts at Nestor Working Group and cooperation with Certification Efforts at RLG/OCLC to become an international ISO standard

Transcription:

NorStore a national infrastructure for scientific data Andreas O Jaunsen UNINETT Sigma as

About UNINETT Sigma UNINETT Sigma AS is a private company established by the Ministry of science and education (Kunnskapsdep.) and owned by UNINETT AS. Participation in national e-infrastructures is organized in a consortium with UiT, NTNU, UiB and UiO. Sigma has a coordinating role: Notur - national infrastructure for high perf. computing NorStore - national infrastructure for scientific data WLCG (Nordic Tier-1) - distributed (grid) services Sigma is the national representative in PRACE, EGI & EUDAT. 12-11-2012 2

NorStore and funding NorStore is on the national roadmap for infrastructures in science NorStore is funded by the RCN (67%) and the partners UiO, UiB, NTNU & UiT (33%) as part of the FORINFRA program. Currently funded period is 1.1.2010-30.6.2013 RCN funding over 3.5 years is 37 MNOK Total project budget (including partner and user in-kind) is 58 MNOK Long-term funding is seen as a requirement due to the scope of most services provided and in securing commitments from the user communities 12-11-2012 3

NorStore - human resources In total approx. 12 FTEs: project manager (1 FTE) administration (0.5 FTE) operations (1 FTE) user support (2-3 FTE) advanced user support (4-5 FTE) training (0.5 FTE) dissemination and outreach (0.5 FTE) development (1 FTE) technology coordinator / data manager (1 FTE) 12-11-2012 4

Metacenter organisation a distributed e-infrastructure also needs to manage distributed man power the Metacenter (Notur, NorStore + PRACE, EGI, EUDAT) meet 1-2 times per year Metacenter activities are organised in task forces tasks forces meet F2F from once per month to few times per year) and when needed by video conference NorStore project group meet bi-weekly via video conference progress and results are disseminated/documented using wiki pages the Metacenter traditionally also meet during the annual Notur conference 12-11-2012 5

NorStore - usage statistics Primary disciplines Affiliation Computational Fluid Dynamics 11% Geosciences 39% Other 11% UiO 39% UiB / Uni AS 29% Chemistry 4% Biosciences 25% Mathematics and Informatics 7% Physics 4% IFE/Kjeller Nansensenteret 4% 4% NTNU 11% met.no 14% 12-11-2012 6

NorStore - usage statistics Staffing Funding Perm staff 44% Other 29% EU 11% MSc students 13% guest 6% PostDoc 18% PhD students 19% Industry 2% Univ 25% SFF/SFI 12% NFR prog 22% 12-11-2012 7

NorStore - usage statistics 20 15 30.0 22.5 10 # people 15.0 5 7.5 0 AccessOther DataManager DataPlan yes no 0 0 12.5 25.0 37.5 50.0 # articles 12-11-2012 8

NorStore - user requirements large storage capacities (scalable) with redundant copies of data computing-near-data secure storage and handling of (person-identifiable) sensitive data provide tools and user support for data management enable sharing of data (with colleagues) long-term archiving including curation and preservation of data tighter coupling between computing resources and data services easy access and authentication for non-traditional user groups (eg. webdav, cloud) authentication authorization infrastructure (AAI) 12-11-2012 9

NorStore - e-infrastructure Research projects Research infrastructure Infrastructure A Commun. B Project C discipline services 10G UiT core services, user support NTNU UiA hardware, operations Org. B Inst. C UiB 10G 10G 10G UiO NorStore 12-11-2012 10

User community services (bioinf) 12-11-2012 11

User community services (bioinf) 12-11-2012 12

Data life cycle 12-11-2012 13

OAIS model Preservation Planning Descriptive Info Data Management Descriptive Info queries SIP Ingest Access result sets orders AIP AIP DIP Archival Storage Administration 12-11-2012 14

NorStore - services application servers (ssh) WORK Generic Specific https API SSL Nationally federated identity (eg. Feide) Humanities Marine Climate & Environm. Language tech. Medical imaging Bioinfo. 12-11-2012 15

NorStore - services application servers (ssh) WORK irods PREPARE PRESERVE Generic Science archive Specific https API SSL Nationally federated identity (eg. Feide) Humanities Marine Climate & Environm. Language tech. Medical imaging Bioinfo. 12-11-2012 15

NorStore - services web-clients application servers (ssh) WORK irods PREPARE PRESERVE query access annotate ingest Feide Generic Science archive Specific https API SSL Nationally federated identity (eg. Feide) Humanities Marine Climate & Environm. Language tech. Medical imaging Bioinfo. 12-11-2012 15

NorStore - services web-clients HPC Notur application servers (ssh) stage data WORK irods PREPARE PRESERVE query access annotate ingest Feide Generic Science archive Specific https API SSL Nationally federated identity (eg. Feide) Humanities Marine Climate & Environm. Language tech. Medical imaging Bioinfo. 12-11-2012 15

NorStore - services web-clients HPC Notur application servers (ssh) stage data WORK irods PREPARE PRESERVE query access annotate ingest Feide Generic Science archive Specific THREDDS https AAI API SSL Nationally federated identity (eg. Feide) Humanities Marine Climate & Environm. Language tech. Medical imaging Bioinfo. 12-11-2012 15

NorStore - services web-clients HPC Notur application servers (ssh) user-funding stage data WORK irods PREPARE PRESERVE query access annotate ingest Feide Generic Science archive Specific THREDDS https AAI API SSL Nationally federated identity (eg. Feide) Humanities Marine Climate & Environm. Language tech. Medical imaging Bioinfo. 12-11-2012 15

Lessons learned (so far...) planning of services must be based on user input and requirements sustainability of services and funding is critical to ensure commitment from the community important to secure/build/recruit competence (data management, service & infrastructure design, storage technology, AAI, long-term archiving and preservation etc) secure commitment from user communities and partners (incl. funding) staffing > critical mass distributed project management is a challenge (need for frequent F2F meetings) use of task forces is one way to structure and secure ownership 12-11-2012 16

Future infrastructure model AWS S3 9 Cloud Backed Storage 10 WebDAV Client 4 icat 1 irods IRODS Server 5 irods Clients Storage Devices 3 2 Swift Server 7 6 Cloud Clients Direct FileSystem Access On-premises storage devices irods Server On-premises irods server S3 Amazon storage service Swift Server On-premises OpenStack Storage 12-11-2012 17

Future infrastructure model Zone A Zone B 1 2 2 Swift Server A 1 2 2 Swift Server B 3 3 1 irods Server A 1 irods Server B Client 5 5 7 1 irods A 3 3 6 swift irods B 2 2 1 Nova Nova 8 Nova Server Server 8 Server 12-11-2012 18

Collaborative data infrastructure Data Generators Users User functionalities, data capture & transfer, virtual research environments Trust Data Curation Community Support Services Data discovery & navigation, workflow generation, annotation, interpretability Core Data Services Persistent storage, identification, authenticity, workflow execution, mining Fig. from Riding the wave, EC High Level Expert Group on Scientific Data 12-11-2012 20

NorStore WORK processing and analysis PUBLISH data generated science archive research community data-doc (annotate) publication PID doi:xxxx:yyyyyy ARCHIVE register and ingest quality control 12-11-2012 21