NorStore a national infrastructure for scientific data Andreas O Jaunsen UNINETT Sigma as
About UNINETT Sigma UNINETT Sigma AS is a private company established by the Ministry of science and education (Kunnskapsdep.) and owned by UNINETT AS. Participation in national e-infrastructures is organized in a consortium with UiT, NTNU, UiB and UiO. Sigma has a coordinating role: Notur - national infrastructure for high perf. computing NorStore - national infrastructure for scientific data WLCG (Nordic Tier-1) - distributed (grid) services Sigma is the national representative in PRACE, EGI & EUDAT. 12-11-2012 2
NorStore and funding NorStore is on the national roadmap for infrastructures in science NorStore is funded by the RCN (67%) and the partners UiO, UiB, NTNU & UiT (33%) as part of the FORINFRA program. Currently funded period is 1.1.2010-30.6.2013 RCN funding over 3.5 years is 37 MNOK Total project budget (including partner and user in-kind) is 58 MNOK Long-term funding is seen as a requirement due to the scope of most services provided and in securing commitments from the user communities 12-11-2012 3
NorStore - human resources In total approx. 12 FTEs: project manager (1 FTE) administration (0.5 FTE) operations (1 FTE) user support (2-3 FTE) advanced user support (4-5 FTE) training (0.5 FTE) dissemination and outreach (0.5 FTE) development (1 FTE) technology coordinator / data manager (1 FTE) 12-11-2012 4
Metacenter organisation a distributed e-infrastructure also needs to manage distributed man power the Metacenter (Notur, NorStore + PRACE, EGI, EUDAT) meet 1-2 times per year Metacenter activities are organised in task forces tasks forces meet F2F from once per month to few times per year) and when needed by video conference NorStore project group meet bi-weekly via video conference progress and results are disseminated/documented using wiki pages the Metacenter traditionally also meet during the annual Notur conference 12-11-2012 5
NorStore - usage statistics Primary disciplines Affiliation Computational Fluid Dynamics 11% Geosciences 39% Other 11% UiO 39% UiB / Uni AS 29% Chemistry 4% Biosciences 25% Mathematics and Informatics 7% Physics 4% IFE/Kjeller Nansensenteret 4% 4% NTNU 11% met.no 14% 12-11-2012 6
NorStore - usage statistics Staffing Funding Perm staff 44% Other 29% EU 11% MSc students 13% guest 6% PostDoc 18% PhD students 19% Industry 2% Univ 25% SFF/SFI 12% NFR prog 22% 12-11-2012 7
NorStore - usage statistics 20 15 30.0 22.5 10 # people 15.0 5 7.5 0 AccessOther DataManager DataPlan yes no 0 0 12.5 25.0 37.5 50.0 # articles 12-11-2012 8
NorStore - user requirements large storage capacities (scalable) with redundant copies of data computing-near-data secure storage and handling of (person-identifiable) sensitive data provide tools and user support for data management enable sharing of data (with colleagues) long-term archiving including curation and preservation of data tighter coupling between computing resources and data services easy access and authentication for non-traditional user groups (eg. webdav, cloud) authentication authorization infrastructure (AAI) 12-11-2012 9
NorStore - e-infrastructure Research projects Research infrastructure Infrastructure A Commun. B Project C discipline services 10G UiT core services, user support NTNU UiA hardware, operations Org. B Inst. C UiB 10G 10G 10G UiO NorStore 12-11-2012 10
User community services (bioinf) 12-11-2012 11
User community services (bioinf) 12-11-2012 12
Data life cycle 12-11-2012 13
OAIS model Preservation Planning Descriptive Info Data Management Descriptive Info queries SIP Ingest Access result sets orders AIP AIP DIP Archival Storage Administration 12-11-2012 14
NorStore - services application servers (ssh) WORK Generic Specific https API SSL Nationally federated identity (eg. Feide) Humanities Marine Climate & Environm. Language tech. Medical imaging Bioinfo. 12-11-2012 15
NorStore - services application servers (ssh) WORK irods PREPARE PRESERVE Generic Science archive Specific https API SSL Nationally federated identity (eg. Feide) Humanities Marine Climate & Environm. Language tech. Medical imaging Bioinfo. 12-11-2012 15
NorStore - services web-clients application servers (ssh) WORK irods PREPARE PRESERVE query access annotate ingest Feide Generic Science archive Specific https API SSL Nationally federated identity (eg. Feide) Humanities Marine Climate & Environm. Language tech. Medical imaging Bioinfo. 12-11-2012 15
NorStore - services web-clients HPC Notur application servers (ssh) stage data WORK irods PREPARE PRESERVE query access annotate ingest Feide Generic Science archive Specific https API SSL Nationally federated identity (eg. Feide) Humanities Marine Climate & Environm. Language tech. Medical imaging Bioinfo. 12-11-2012 15
NorStore - services web-clients HPC Notur application servers (ssh) stage data WORK irods PREPARE PRESERVE query access annotate ingest Feide Generic Science archive Specific THREDDS https AAI API SSL Nationally federated identity (eg. Feide) Humanities Marine Climate & Environm. Language tech. Medical imaging Bioinfo. 12-11-2012 15
NorStore - services web-clients HPC Notur application servers (ssh) user-funding stage data WORK irods PREPARE PRESERVE query access annotate ingest Feide Generic Science archive Specific THREDDS https AAI API SSL Nationally federated identity (eg. Feide) Humanities Marine Climate & Environm. Language tech. Medical imaging Bioinfo. 12-11-2012 15
Lessons learned (so far...) planning of services must be based on user input and requirements sustainability of services and funding is critical to ensure commitment from the community important to secure/build/recruit competence (data management, service & infrastructure design, storage technology, AAI, long-term archiving and preservation etc) secure commitment from user communities and partners (incl. funding) staffing > critical mass distributed project management is a challenge (need for frequent F2F meetings) use of task forces is one way to structure and secure ownership 12-11-2012 16
Future infrastructure model AWS S3 9 Cloud Backed Storage 10 WebDAV Client 4 icat 1 irods IRODS Server 5 irods Clients Storage Devices 3 2 Swift Server 7 6 Cloud Clients Direct FileSystem Access On-premises storage devices irods Server On-premises irods server S3 Amazon storage service Swift Server On-premises OpenStack Storage 12-11-2012 17
Future infrastructure model Zone A Zone B 1 2 2 Swift Server A 1 2 2 Swift Server B 3 3 1 irods Server A 1 irods Server B Client 5 5 7 1 irods A 3 3 6 swift irods B 2 2 1 Nova Nova 8 Nova Server Server 8 Server 12-11-2012 18
Collaborative data infrastructure Data Generators Users User functionalities, data capture & transfer, virtual research environments Trust Data Curation Community Support Services Data discovery & navigation, workflow generation, annotation, interpretability Core Data Services Persistent storage, identification, authenticity, workflow execution, mining Fig. from Riding the wave, EC High Level Expert Group on Scientific Data 12-11-2012 20
NorStore WORK processing and analysis PUBLISH data generated science archive research community data-doc (annotate) publication PID doi:xxxx:yyyyyy ARCHIVE register and ingest quality control 12-11-2012 21