SURFsara Data Services

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "SURFsara Data Services"

Transcription

1 SURFsara Data Services SUPPORTING DATA-INTENSIVE SCIENCES Axel Berg

2 Dutch Scientific Challenges From High Energy Physics to atomic and molecular physics (DNA); Life sciences (cell biology); Human interaction (all human sciences from linguistics to even phobia studies); and from the big bang; to astronomy; science of the solar system; earth (climate and geophysics); into life and biodiversity. Picture courtesy of UvA FNWI

3 High Energy Physics, discovery of Higgs boson SURFsara and NIKHEF are a tier 1 site for the Large Hadron Collider at CERN

4 LOFAR, discovery of two new pulsars Using computing resources provided by SURFsara, which is part of the Dutch and European Grid Infrastructure, Coenen and the team needed only a month to search through a set of LOFAR images that would have occupied a single computer for more than a century.

5 twinl: searching NL tweets Prof. Antal van den Bosch Dr. Erik Tjong Kim Sang

6 twinl usage

7 THE DATA EXPLOSION Production Cartesius Projects COSMO GRID (2009) RUMC (2013) OWLS (2006) ESSENCE (2006) ENTRAIN (2011) ITAMOC (2011) EAGLE (2013) TITAN (2014) HPC ARCHIVE (1993) LOFAR Volume 105TB 90TB 70TB 41TB 26TB 25TB 2,5TB (600TB) (1PB) 1172TB 12PB Community LOFAR LHC/ATLAS LHC/LHCb BBMRI MOLEPI LHC/ALICE ILDG DANS OTHERS Volume 5,8PB 3.9PB 1,4PB 115TB 78TB 74TB 43TB 18TB 50TB Start LHC

8 The world of the many Many different users (well organised (international) user communities, research groups, universities, research institutes, individual researchers, etc.) Many different use cases (central data, distributed data, small files, large files, static data, dynamic data, etc.) Many different user requirements (storage, meta data, data searching, persistent identifiers, long-term preservation, privacy, visualisation, data analytics, data sharing, data re-use, semantic annotation, work flows etc. etc. etc. etc.) Many different V s of Big Data (varieties, volumes, velocities, veracities, validities, volatilities)

9 CREATING DATA: designing, planning consent, collection and management, capturing and creating metadata CREATING DATA" RE-USING DATA: for follow-ups, new research, research reviews, scrutinising, teaching & learning RE-USING DATA" PROCESSING DATA" PROCESSING DATA: entering, transcribing, checking & validating, anonymising and describing TRUST ACCESS TO DATA: distributing, sharing, controlling access, promoting GIVING ACCESS TO DATA" ANALYSING DATA" ANALYSING DATA: interpreting, deriving, producing outputs & publishing, preparing for sharing Ref: UK Data Archive: PRESERVING DATA" PRESERVING DATA: migrating, backing-up, storing, creating metadata and documentation, archiving Research Data Life Cycle

10 CREATING DATA" Data ingest service PID Service Trusted Digital Repository Beehub B2SAFE (EUDAT) PID service Trusted Digital Repository B2SAFE (EUDAT) RE-USING DATA" GIVING ACCESS TO DATA" SURFsara services supporting the Research Data Life Cycle PROCESSING DATA" ANALYSING DATA" Grid Storage Beehub SURFdrive Grid Storage Beehub SURFdrive Hadoop HPC Cloud Remote Visualisation Cartesius, Lisa cluster Ref: UK Data Archive: PRESERVING DATA" Central data archive PID Service Trusted Digital Repository B2SAFE (EUDAT) Supporting the complete Research Data Life Cycle Collaboration with DANS Collaboration within EUDAT

11 SURFsara data strategy SURFsara is providing integrated ICT services (e.g. computing, visualisation, data storage, data analysis) for the scientific research community inthe Netherlands SURFsara is going to provide a Trusted Digital Repository to: - to enable long-term archiving of research data at SURFsara - improve the quality of research data with metadata and persistent identifiers enrich access with open standard protocols - improve search and discoverability via harvesting SURFsara is going to serve the Long Tail Data (e.g. community clouds) SURFsara is providing services for data analysis (Hadoop, HPC Cloud, Grid, remote visualisation) Collaborate with Universities, Institutes and research groups on data management issues (e.g. community clouds) and data management plans Engage with funding agencies and data owners to come to transparent models on preservation, policies, costs, security and privacy of research data and services

12 SURFsara (Big) Data Centric infrastructure services

13 SURFsara (Big) Data Centric infrastructure services

14 SURFsara (Big) Data Centric infrastructure services

15 SURFsara Data Storage Services Central Archive GRID SE BEEHUB Usage HPC, External GRID, HPC Cloud, Hadoop, Workflows, External Collaboration platform, Define your own access rules Preservation Medium, Long term Medium, Long term Short, Medium Media Disk + Tape Disk + Tape Disk Capacity (current) Bandwidth (aggregated) 240TB + 2.5PB 5PB + 20PB 80TB 1GB/s + 1GB/s >16GB/s + 2GB/s 1GB/s Latency Direct + High (50s) Low + High (100s) Low Objects Medium, Large (60M) Large (30M) Small, Medium Protocols NFS, GridFTP, (hpn-) SCP, Rsync SRM, GridFTP, HTTP, Webdav, NFS, Xrootd, dcap Webdav

16 SURFsara Data Services Data Ingest Service Easy way to upload large data from disk onto SURFsara facilities Upload data from 45 disks in parallel Used for archiving large sequencing datasets EPIC/Handle PID service Persistent references to physical locations to make data objects findable and citable PID is comparable to SBN of Books EPIC/Handle system is comparable to DNS for Data Objects EPIC consortium: Providing a sustainable service for storing an maintaining large volumes of PIDs

17 Trusted Digital Repository Data repository service to deposit data sets and objects Long-term preservation of research data Provides quality to data sets and objects via metadata descriptions Makes data sets and objects citable and findable in the future via Persistent Identifiers (EPIC PID service) Makes data sets discoverable via metadata harvesting Improves data access and re-usability of research data Ensures trust to researchers via regular auditing against standardized certifications (e.g. Data Seal of Approval, ISO16363, DIN 31644) Trusted Digital Repository service is currently under construction at SURFsara Collaboration with DANS and Universities to provide service to manage research data

18 Grid infrastructure and usage National Grid infrastructure (NGI_NL): - Operated in collaboration with Nikhef and RUG-CIT - Grid compute clusters at Nikhef, RUG-CIT and SURFsara - Grid Services Cluster at SURFsara - Life Science Grid (operated by SURFsara) - Grid Front-End storage at Nikhef and SURFsara (total capacity 5.2 PB) - Part of European Grid Infrastructure (EGI) Grid Usage: - In 2013 more than 53 million core hours (6050 core years) - Scientific projects/users: Number of e-infra projects: 34 - Number of users: 226

19 Life Science Grid sites LUMC Leiden WUR Wageningen NKI Amsterdam AMC Amsterdam Radboud Universiteit Nijmegen ErasmusMC Rotterdam Technische Universiteit - Delft Rijksuniversiteit Groningen Keygene Wageningen Each of these clusters has 128 cores and 40 Tb of storage space. SURFsara 3000 cores and large tiered storage (disk and tape) NIKHEF: 3300 cores, RUG-CIT: 1000 cores

20 Grid computing example UMCG, LUMC, EMC, VUMC and UMCU jointly work on the Genome of the Netherlands DNA sequencing data of 750 individuals, sequenced in China Information is stored and processed on the Life Science Grid at SURFsara

21 HPC Cloud You can clone your laptop or server Create an image and upload it into the cloud And expand it Largest VM goes up to 32 cores and 256 GB RAM Or clone it to multiple copies, create your own cluster Start multiple worker nodes when needed But remember: you are your own sys admin

22 HPC Cloud Current HPC Cloud infrastructure 30 HPC Nodes: 32 cores, 256 GB RAM, 1 TB local disk 10 "light User Service Nodes: 8 cores/node, 64 GB RAM, 6 TB local disk 1250 cores total 400TB shared storage (ISCSI, NFS, CIFS, CDMI...)

23 HPC Cloud usage num_of_users Number of cloud users actively starting VMs

24 HPC Cloud usage in core hours (2013) Red: reserved capacity based on users applications Blue: actual usage

25 Memory-intensive computing High memory compute nodes 2 TB RAM - 40 core machines 8TB disk storage 1 as part of the Life Science Grid 1 as part of the HPC cloud Service available Q Added value: Particular in Life Sciences there are still many applications which use few cores and scale in RAM. (e.g. de novo assembly of genomes)

26 Hadoop & nosql HADOOP can be used to process large datafiles in a distributed way. It comes with an easy to use programmer interface and is able to handle huge datafiles, in structured, semistructured or unstructured form. Hadoop infrastructure cores (4x capacity of 2012) - 2 PB of HDFS storage (2x capacity of 2012) Usage 2013: - 35 different users and user groups CPU-years of processing - 160TB data stored New: nosql service - Schemaless databases, not everything fits into rows and columns easily - Faster development, easy to scale

27 Cloud storage for research (Beehub) Capacity: 80 TB Nr of users: 544 users (69 use SURFconext) Usage: 21 TB in user folders, 2 TB in group folders, 23 TB stored Interface: webdav

28 Visualisation: SURF Collaboratorium

29

30 Remote visualisation infrastructure Remote visualisation, benefits for users: - Remote visualisation cluster with GPUs next to data in SURFsara datacenter - Visualisation application runs on remote high-end GPU cluster (possibly in parallel) - User accesses application through remote desktop (VNC) - No datasets, only pixels are transferred Remote Visualisation Service: - 9 render nodes, per node: 2x 6-core Intel Xeon E5-2620, 2x Nvidia Geforce GTX780Ti, 64 GB RAM, 800 GB scratch - 33 TB home file system - Software stack maintained by SURFsara: ParaView, VisIt, VTK, VMD, MIPAV, Blender, any other OpenGL application - Expert visualisation support

31 Remote visualisation SURFsara E.P. van der Poel & R.O. Monico - University of Twente - Rayleigh-Bénard convection simulation J. van der Dussen - TU Delft - Numerical simulation of stratocumulus cloud breakup - Institute for Marine and Atmospheric research Utrecht - High-resolution movie generation of oceanographic simulation - Streaming from RVS cluster to tiled display at IMAU

32 Personal cloud storage (SURFdrive) Trusted community cloud for personal storage Collaboration between SURFsara, SURFnet and Dutch universities Advantages: Trusted community solution for personal cloud storage Hosted, managed and served by community, data stored at SURFsara Login through SURFconext Specifications and service determined by end-users (universities) Initial capacity: 200 TB, 100 GB storage capacity per user Based on owncloud Operational: April 1 st, 2014

33 SURFsara national datacenter HPC services have datacenter requirements that are not commonly served by commercial datacenters: - High energy density per rack - Advanced cooling techniques - Relatively low redundancy European procurement resulted in a partnership agreement between SURFsara and TelecityGroup for the development of a national datacenter for higher education at Science Park Amsterdam New SURFsara national datacenter operational medio 2016 Datacenter in a datacenter 800m 2 and on-demand power 1.8MW Low PUE (1.2) à sustainable and low costs Available for national e-infrastructure services, for higher education and for SMEs

34 SURFsara provides user support on (nearly) all levels

35 SURF Outreach & Support (SOS) SURF s operating companies offer a world-class communication, computing and support infrastructure to facilitate scientific and scholarly research - data transfer - compute - (international) collaboration - visualisation - storage For these services we aim to provide a single point of contact for researchers and ICT staff SOS offers: - to raise awareness among researchers of the possibilities of SURF e- infrastructure for scientific research - to foster collaboration and to enable research projects

36 What SURF can do for researchers Activities for researchers and research support officers in 2014 SURF Roadshow visit universities and university medical centers Integrated services portfolio description of services and support at Central information point direct advice and support via

37 Questions? Axel Berg,

Data management and discovery

Data management and discovery Data management and discovery using EUDAT services Hans van Piggelen SURFsara, The Netherlands EUDAT Mission offer common data services in CDI to all European researchers services will address the needs

More information

e-research Infrastructures for e-science Axel Berg SARA national HPC & e-science support center RAMIRI, June 15, 2011

e-research Infrastructures for e-science Axel Berg SARA national HPC & e-science support center RAMIRI, June 15, 2011 e-research Infrastructures for e-science Axel Berg SARA national HPC & e-science support center RAMIRI, June 15, 2011 Science Park Amsterdam a world of science in a city of inspiration > Faculty of Science

More information

Connecting the e-infrastructure chain

Connecting the e-infrastructure chain Connecting the e-infrastructure chain Internet2 Spring Meeting, Arlington, April 23 rd, 2012 Peter Hinrich & Migiel de Vos Topics - About SURFnet - Motivation: Big data & collaboration - Collaboration

More information

A national approach for storage scale-out scenarios based on irods

A national approach for storage scale-out scenarios based on irods A national approach for storage scale-out scenarios based on irods Christine Staiger Ton Smeele SURFsara Utrecht University Science Park 140, ITS/RDM Amsterdam, The Heidelberglaan 8, Netherlands Utrecht,

More information

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license Inge Van Nieuwerburgh OpenAIRE NOAD Belgium Tools&Services OpenAIRE EUDAT can be reused under the CC BY license Open Access Infrastructure for Research in Europe www.openaire.eu Research Data Services,

More information

The EUDAT Collaborative Data Infrastructure

The EUDAT Collaborative Data Infrastructure The EUDAT Collaborative Data Infrastructure CSCS, Lugano, 8-9 September 2 2016 (Plan-E workshop) Damien Lecarpentier Project Director, Research Infrastructures, CSC EUDAT Project Director EUDAT receives

More information

SARA Overview. Walter Lioen Group Leader Supercomputing & Senior Consultant

SARA Overview. Walter Lioen Group Leader Supercomputing & Senior Consultant SARA Overview Walter Lioen Walter.Lioen@sara.nl Group Leader Supercomputing & Senior Consultant Science Park Amsterdam Faculty of Science of the Universiteit van Amsterdam National Institute for Nuclear

More information

Big Data infrastructure and tools in libraries

Big Data infrastructure and tools in libraries Line Pouchard, PhD Purdue University Libraries Research Data Group Big Data infrastructure and tools in libraries 08/10/2016 DATA IN LIBRARIES: THE BIG PICTURE IFLA/ UNIVERSITY OF CHICAGO BIG DATA: A VERY

More information

EUDAT - Open Data Services for Research

EUDAT - Open Data Services for Research EUDAT - Open Data Services for Research Johannes Reetz EUDAT operations Max Planck Computing & Data Centre Science Operations Workshop 2015 ESO, Garching 24-27th November 2015 EUDAT receives funding from

More information

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd EUDAT Data Services & Tools for Researchers and Communities Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd CSC IT CENTER FOR SCIENCE! Founded in 1971 as a technical support

More information

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members Agenda Background information Services Common Data Infrastructure

More information

Data Management Checklist

Data Management Checklist Data Management Checklist Managing research data throughout its lifecycle ensures its long-term value and prevents data from falling into digital obsolescence. Proper data management is a key prerequisite

More information

Using EUDAT services to replicate, store, share, and find cultural heritage data

Using EUDAT services to replicate, store, share, and find cultural heritage data Using EUDAT services to replicate, store, share, and find cultural heritage data in PSNC and beyond Maciej Brzeźniak, Norbert Meyer, PSNC Damien Lecarpentier, CSC 3 October 203 DIGITAL PRESERVATION OF

More information

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure EUDAT Towards a pan-european Collaborative Data Infrastructure Damien Lecarpentier CSC-IT Center for Science, Finland CESSDA workshop Tampere, 5 October 2012 EUDAT Towards a pan-european Collaborative

More information

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice 2014 年 3 月 13 日星期四 From Big Data to Big Value Infrastructure Needs and Huawei Best Practice Data-driven insight Making better, more informed decisions, faster Raw Data Capture Store Process Insight 1 Data

More information

Inauguration Cartesius June 14, 2013

Inauguration Cartesius June 14, 2013 Inauguration Cartesius June 14, 2013 Hardware is Easy...but what about software/applications/implementation/? Dr. Peter Michielse Deputy Director 1 Agenda History Cartesius Hardware path to exascale: the

More information

NorStore. a national infrastructure for scientific data. Andreas O Jaunsen UNINETT Sigma as

NorStore. a national infrastructure for scientific data. Andreas O Jaunsen UNINETT Sigma as NorStore a national infrastructure for scientific data Andreas O Jaunsen UNINETT Sigma as About UNINETT Sigma UNINETT Sigma AS is a private company established by the Ministry of science and education

More information

Dutch View on URN:NBN and Related PID Services

Dutch View on URN:NBN and Related PID Services Dutch View on URN:NBN and Related PID Services Arjan Hogenaar DANS PID-workshop Cologne 1 Dutch view DANS-view Agreement to some extent Minor differences Mainly the view of the institute DANS PID-workshop

More information

e-infrastructure: objectives and strategy in FP7

e-infrastructure: objectives and strategy in FP7 "The views expressed in this presentation are those of the author and do not necessarily reflect the views of the European Commission" e-infrastructure: objectives and strategy in FP7 National information

More information

High Performance Data Analytics for Numerical Simulations. Bruno Raffin DataMove

High Performance Data Analytics for Numerical Simulations. Bruno Raffin DataMove High Performance Data Analytics for Numerical Simulations Bruno Raffin DataMove bruno.raffin@inria.fr April 2016 About this Talk HPC for analyzing the results of large scale parallel numerical simulations

More information

Deliverable D5.3. World-wide E-infrastructure for structural biology. Grant agreement no.: Prototype of the new VRE portal functionality

Deliverable D5.3. World-wide E-infrastructure for structural biology. Grant agreement no.: Prototype of the new VRE portal functionality Deliverable D5.3 Project Title: Project Acronym: World-wide E-infrastructure for structural biology West-Life Grant agreement no.: 675858 Deliverable title: Lead Beneficiary: Prototype of the new VRE portal

More information

DEVELOPING, ENABLING, AND SUPPORTING DATA AND REPOSITORY CERTIFICATION

DEVELOPING, ENABLING, AND SUPPORTING DATA AND REPOSITORY CERTIFICATION DEVELOPING, ENABLING, AND SUPPORTING DATA AND REPOSITORY CERTIFICATION Plato Smith, Ph.D., Data Management Librarian DataONE Member Node Special Topics Discussion June 8, 2017, 2pm - 2:30 pm ASSESSING

More information

European digital repository certification: the way forward

European digital repository certification: the way forward Data Archiving and Networked Services European digital repository certification: the way forward Ingrid Dillo (DANS) EUDAT 3 rd User Forum Prague, 24 April 2014 DANS is an institute of KNAW en NWO Content

More information

High Performance Computing on MapReduce Programming Framework

High Performance Computing on MapReduce Programming Framework International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming

More information

Big Data in Research: Research Analytics Industry Solution. Stuart Long CTO - Oracle Systems Asia Pacific and Japan

Big Data in Research: Research Analytics Industry Solution. Stuart Long CTO - Oracle Systems Asia Pacific and Japan Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific and Japan Information Architecture Capability Model Data Data technology Technology Management management

More information

Microsoft s Cloud. Delivering operational excellence in the cloud Infrastructure. Erik Jan van Vuuren Azure Lead Microsoft Netherlands

Microsoft s Cloud. Delivering operational excellence in the cloud Infrastructure. Erik Jan van Vuuren Azure Lead Microsoft Netherlands Microsoft s Cloud Delivering operational excellence in the cloud Infrastructure Erik Jan van Vuuren Azure Lead Microsoft Netherlands 4 Megatrends are driving a revolution 44% of users (350M people) access

More information

Data Transfers Between LHC Grid Sites Dorian Kcira

Data Transfers Between LHC Grid Sites Dorian Kcira Data Transfers Between LHC Grid Sites Dorian Kcira dkcira@caltech.edu Caltech High Energy Physics Group hep.caltech.edu/cms CERN Site: LHC and the Experiments Large Hadron Collider 27 km circumference

More information

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona EUDAT Towards a pan-european Collaborative Data Infrastructure Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona Date: 7 March 2012 EUDAT Key facts Content Project Name

More information

Journey Towards Science DMZ. Suhaimi Napis Technical Advisory Committee (Research Computing) MYREN-X Universiti Putra Malaysia

Journey Towards Science DMZ. Suhaimi Napis Technical Advisory Committee (Research Computing) MYREN-X Universiti Putra Malaysia Malaysia's Computational Journey Towards Science DMZ Suhaimi Napis Technical Advisory Committee (Research Computing) MYREN-X Universiti Putra Malaysia suhaimi@upm.my In the Beginning... Research on parallel/distributed

More information

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY Preston Smith Director of Research Services RESEARCH DATA DEPOT AT PURDUE UNIVERSITY May 18, 2016 HTCONDOR WEEK 2016 Ran into Miron at a workshop recently.. Talked about data and the challenges of providing

More information

Helix Nebula The Science Cloud

Helix Nebula The Science Cloud Helix Nebula The Science Cloud CERN 14 May 2014 Bob Jones (CERN) This document produced by Members of the Helix Nebula consortium is licensed under a Creative Commons Attribution 3.0 Unported License.

More information

Certification as a means of providing trust: the European framework. Ingrid Dillo Data Archiving and Networked Services

Certification as a means of providing trust: the European framework. Ingrid Dillo Data Archiving and Networked Services Certification as a means of providing trust: the European framework Ingrid Dillo Data Archiving and Networked Services Trust and Digital Preservation, Dublin June 4, 2013 Certification as a means of providing

More information

5 Fundamental Strategies for Building a Data-centered Data Center

5 Fundamental Strategies for Building a Data-centered Data Center 5 Fundamental Strategies for Building a Data-centered Data Center June 3, 2014 Ken Krupa, Chief Field Architect Gary Vidal, Solutions Specialist Last generation Reference Data Unstructured OLTP Warehouse

More information

Grid: data delen op wereldschaal

Grid: data delen op wereldschaal Graphics: Real Time Monitor, Gidon Moont, Imperial College London, see http://gridportal.hep.ph.ic.ac.uk/rtm/ Grid: data delen op wereldschaal David Groep, Nikhef Rotary Krommenie-Wormerveer David Groep,

More information

Using Cartesius and Lisa. Zheng Meyer-Zhao - Consultant Clustercomputing

Using Cartesius and Lisa. Zheng Meyer-Zhao - Consultant Clustercomputing Zheng Meyer-Zhao - zheng.meyer-zhao@surfsara.nl Consultant Clustercomputing Outline SURFsara About us What we do Cartesius and Lisa Architectures and Specifications File systems Funding Hands-on Logging

More information

DRS Update. HL Digital Preservation Services & Library Technology Services Created 2/2017, Updated 4/2017

DRS Update. HL Digital Preservation Services & Library Technology Services Created 2/2017, Updated 4/2017 Update HL Digital Preservation Services & Library Technology Services Created 2/2017, Updated 4/2017 1 AGENDA DRS DRS DRS Architecture DRS DRS DRS Work 2 COLLABORATIVELY MANAGED DRS Business Owner Digital

More information

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations Argonne National Laboratory Argonne National Laboratory is located on 1,500

More information

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal Heinrich Widmann, DKRZ DI4R 2016, Krakow, 28 September 2016 www.eudat.eu EUDAT receives funding from the European Union's Horizon 2020

More information

Europe and its Open Science Cloud: the Italian perspective. Luciano Gaido Plan-E meeting, Poznan, April

Europe and its Open Science Cloud: the Italian perspective. Luciano Gaido Plan-E meeting, Poznan, April Europe and its Open Science Cloud: the Italian perspective Luciano Gaido (gaido@to.infn.it) Plan-E meeting, Poznan, April 27 2017 Background Italy has a long-standing expertise and experience in the management

More information

EUDAT Towards a Collaborative Data Infrastructure

EUDAT Towards a Collaborative Data Infrastructure EUDAT Towards a Collaborative Data Infrastructure Daan Broeder - MPI for Psycholinguistics - EUDAT - CLARIN - DASISH Bielefeld 10 th International Conference Data These days it is so very easy to create

More information

IN THE FRAME. Computacenter Public Sector Frameworks FRAMEWORK

IN THE FRAME. Computacenter Public Sector Frameworks FRAMEWORK IN THE FRAME Computacenter Public Sector Frameworks FRAMEWORK SOLUTION PUBLIC SECTOR FRAMEWORK ACCELERATE TRANSFORMATION Put digitalisation in the fast lane with cost-effective, compliant and centralised

More information

DRIVER Step One towards a Pan-European Digital Repository Infrastructure

DRIVER Step One towards a Pan-European Digital Repository Infrastructure DRIVER Step One towards a Pan-European Digital Repository Infrastructure Norbert Lossau Bielefeld University, Germany Scientific coordinator of the Project DRIVER, Funded by the European Commission Consultation

More information

Darren D Souza ICT Manager, UQ Diamantina Institute

Darren D Souza ICT Manager, UQ Diamantina Institute Darren D Souza ICT Manager, UQ Diamantina Institute Who is UQDI? The University of Queensland Diamantina Institute was established as UQ s sixth research institute in 2007 The Institute was formed through

More information

BUSINESS DATA LAKE FADI FAKHOURI, SR. SYSTEMS ENGINEER, ISILON SPECIALIST. Copyright 2016 EMC Corporation. All rights reserved.

BUSINESS DATA LAKE FADI FAKHOURI, SR. SYSTEMS ENGINEER, ISILON SPECIALIST. Copyright 2016 EMC Corporation. All rights reserved. BUSINESS DATA LAKE FADI FAKHOURI, SR. SYSTEMS ENGINEER, ISILON SPECIALIST 1 UNSTRUCTURED DATA GROWTH 75% 78% 80% 2015 71 EB 2016 106 EB 2017 133 EB Total Capacity Shipped, Worldwide % of Unstructured Data

More information

Writing a Data Management Plan A guide for the perplexed

Writing a Data Management Plan A guide for the perplexed March 29, 2012 Writing a Data Management Plan A guide for the perplexed Agenda Rationale and Motivations for Data Management Plans Data and data structures Metadata and provenance Provisions for privacy,

More information

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF Conference 2017 The Data Challenges of the LHC Reda Tafirout, TRIUMF Outline LHC Science goals, tools and data Worldwide LHC Computing Grid Collaboration & Scale Key challenges Networking ATLAS experiment

More information

USE CASES IN SEISMOLOGY. Alberto Michelini INGV

USE CASES IN SEISMOLOGY. Alberto Michelini INGV USE CASES IN SEISMOLOGY Alberto Michelini INGV PROJECTS NERA (Network of European Research Infrastructures for Earthquake Risk Assessment and Mitigation) VERCE (Virtual Earthquake and seismology Research

More information

NARCIS: The Gateway to Dutch Scientific Information

NARCIS: The Gateway to Dutch Scientific Information NARCIS: The Gateway to Dutch Scientific Information Elly Dijk, Chris Baars, Arjan Hogenaar, Marga van Meel Department of Research Information, Royal Netherlands Academy of Arts and Sciences (KNAW) PO Box

More information

Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY

Data Staging and Data Movement with EUDAT. Course Introduction Helsinki 10 th -12 th September, Course Timetable TODAY Data Staging and Data Movement with EUDAT Course Introduction Helsinki 10 th -12 th September, 2013 1400 Introduction Adam Carter Course Timetable TODAY 1430 The EUDAT Safe Replication Service Claudio

More information

CLARIN s central infrastructure. Dieter Van Uytvanck CLARIN-PLUS Tools & Services Workshop 2 June 2016 Vienna

CLARIN s central infrastructure. Dieter Van Uytvanck CLARIN-PLUS Tools & Services Workshop 2 June 2016 Vienna CLARIN s central infrastructure Dieter Van Uytvanck CLARIN-PLUS Tools & Services Workshop 2 June 2016 Vienna CLARIN? Common Language Resources and Technology Infrastructure Research Infrastructure for

More information

Building a Materials Data Facility (MDF)

Building a Materials Data Facility (MDF) Building a Materials Data Facility (MDF) Ben Blaiszik (blaiszik@uchicago.edu) Ian Foster (foster@uchicago.edu) Steve Tuecke, Kyle Chard, Rachana Ananthakrishnan, Jim Pruyne (UC) Kandace Turner-Jones, John

More information

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure EUDAT Towards a pan-european Collaborative Data Infrastructure Giuseppe Fiameni (g.fiameni@cineca.it) Claudio Cacciari SuperComputing, Application and Innovation CINECA Johannes Reatz RZG, Germany Damien

More information

European Grid Infrastructure

European Grid Infrastructure EGI-InSPIRE European Grid Infrastructure A pan-european Research Infrastructure supporting the digital European Research Area Michel Drescher Technical Manager, EGI.eu Michel.Drescher@egi.eu TPDL 2013

More information

SUPERMICRO, VEXATA AND INTEL ENABLING NEW LEVELS PERFORMANCE AND EFFICIENCY FOR REAL-TIME DATA ANALYTICS FOR SQL DATA WAREHOUSE DEPLOYMENTS

SUPERMICRO, VEXATA AND INTEL ENABLING NEW LEVELS PERFORMANCE AND EFFICIENCY FOR REAL-TIME DATA ANALYTICS FOR SQL DATA WAREHOUSE DEPLOYMENTS TABLE OF CONTENTS 2 THE AGE OF INFORMATION ACCELERATION Vexata Provides the Missing Piece in The Information Acceleration Puzzle The Vexata - Supermicro Partnership 4 CREATING ULTRA HIGH-PERFORMANCE DATA

More information

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? - EUDAT Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? - Damien Lecarpentier CSC-IT Center for Science, Finland NeIC Conference Trondheim, 16 May 2013 Data trends Exponential

More information

Spark and HPC for High Energy Physics Data Analyses

Spark and HPC for High Energy Physics Data Analyses Spark and HPC for High Energy Physics Data Analyses Marc Paterno, Jim Kowalkowski, and Saba Sehrish 2017 IEEE International Workshop on High-Performance Big Data Computing Introduction High energy physics

More information

Improving Network Infrastructure to Enable Large Scale Scientific Data Flows and Collaboration (Award # ) Klara Jelinkova Joseph Ghobrial

Improving Network Infrastructure to Enable Large Scale Scientific Data Flows and Collaboration (Award # ) Klara Jelinkova Joseph Ghobrial Improving Network Infrastructure to Enable Large Scale Scientific Data Flows and Collaboration (Award # 1659348) Klara Jelinkova Joseph Ghobrial NSF Campus Cyberinfrastructure PI and Cybersecurity Innovation

More information

Data Staging: Moving large amounts of data around, and moving it close to compute resources

Data Staging: Moving large amounts of data around, and moving it close to compute resources Data Staging: Moving large amounts of data around, and moving it close to compute resources PRACE advanced training course on Data Staging and Data Movement Helsinki, September 10 th 2013 Claudio Cacciari

More information

Research Data Preserve, Share, Reuse, Publish, or Perish Mark Gahegan Director, Centre for eresearch 24 Symonds St

Research Data Preserve, Share, Reuse, Publish, or Perish Mark Gahegan Director, Centre for eresearch 24 Symonds St Research Data Preserve, Share, Reuse, Publish, or Perish Mark Gahegan Director, 24 Symonds St Outline The need The approach The progress to date What do we need? Data storage services that are reliable

More information

Wat verandert het toekomstige Internet voor architecten? Sogeti DYA Dag 2017

Wat verandert het toekomstige Internet voor architecten? Sogeti DYA Dag 2017 Wat verandert het toekomstige Internet voor architecten? Sogeti DYA Dag 2017 Leon Gommans Science Officer Air France KLM Group IT Technology Office R&D Guest Researcher, University of Amsterdam FNWI- SNE

More information

Streamlining CASTOR to manage the LHC data torrent

Streamlining CASTOR to manage the LHC data torrent Streamlining CASTOR to manage the LHC data torrent G. Lo Presti, X. Espinal Curull, E. Cano, B. Fiorini, A. Ieri, S. Murray, S. Ponce and E. Sindrilaru CERN, 1211 Geneva 23, Switzerland E-mail: giuseppe.lopresti@cern.ch

More information

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017 Interconnect Your Future Enabling the Best Datacenter Return on Investment TOP500 Supercomputers, November 2017 InfiniBand Accelerates Majority of New Systems on TOP500 InfiniBand connects 77% of new HPC

More information

Trusted Digital Archives

Trusted Digital Archives Data Archiving and Networked Services Trusted Digital Archives Peter Doorn Data Archiving and Networked Services 14th General Assembly 29/30 April 2013 Berlin Berlin-Brandenburg Academy of Sciences and

More information

Atos announces the Bull sequana X1000 the first exascale-class supercomputer. Jakub Venc

Atos announces the Bull sequana X1000 the first exascale-class supercomputer. Jakub Venc Atos announces the Bull sequana X1000 the first exascale-class supercomputer Jakub Venc The world is changing The world is changing Digital simulation will be the key contributor to overcome 21 st century

More information

The CEDA Archive: Data, Services and Infrastructure

The CEDA Archive: Data, Services and Infrastructure The CEDA Archive: Data, Services and Infrastructure Kevin Marsh Centre for Environmental Data Archival (CEDA) www.ceda.ac.uk with thanks to V. Bennett, P. Kershaw, S. Donegan and the rest of the CEDA Team

More information

DSpace Fedora. Eprints Greenstone. Handle System

DSpace Fedora. Eprints Greenstone. Handle System Enabling Inter-repository repository Access Management between irods and Fedora Bing Zhu, Uni. of California: San Diego Richard Marciano Reagan Moore University of North Carolina at Chapel Hill May 18,

More information

MOHA: Many-Task Computing Framework on Hadoop

MOHA: Many-Task Computing Framework on Hadoop Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction

More information

EUDAT & AAI. Daan Broeder MPI for Psycholinguistics

EUDAT & AAI. Daan Broeder MPI for Psycholinguistics EUDAT & AAI Daan Broeder MPI for Psycholinguistics Initially six research communities on Board EPOS: European Plate Observatory System CLARIN: Common Language Resources and Technology Infrastructure ENES:

More information

RZG Visualisation Infrastructure

RZG Visualisation Infrastructure Visualisation of Large Data Sets on Supercomputers RZG Visualisation Infrastructure Markus Rampp Computing Centre (RZG) of the Max-Planck-Society and IPP markus.rampp@rzg.mpg.de LRZ/RZG Course on Visualisation

More information

IJDC General Article

IJDC General Article Developing a Data Vault Stuart Lewis Lorraine Beard Edinburgh University Library The University of Manchester Library Mary McDerby Robin Taylor IT Services The University of Manchester Edinburgh University

More information

Data Staging: Moving large amounts of data around, and moving it close to compute resources

Data Staging: Moving large amounts of data around, and moving it close to compute resources Data Staging: Moving large amounts of data around, and moving it close to compute resources Digital Preserva-on Advanced Prac--oner Course Glasgow, July 19 th 2013 c.cacciari@cineca.it Definition Starting

More information

Certification. F. Genova (thanks to I. Dillo and Hervé L Hours)

Certification. F. Genova (thanks to I. Dillo and Hervé L Hours) Certification F. Genova (thanks to I. Dillo and Hervé L Hours) Perhaps the biggest challenge in sharing data is trust: how do you create a system robust enough for scientists to trust that, if they share,

More information

Coupled Computing and Data Analytics to support Science EGI Viewpoint Yannick Legré, EGI.eu Director

Coupled Computing and Data Analytics to support Science EGI Viewpoint Yannick Legré, EGI.eu Director Coupled Computing and Data Analytics to support Science EGI Viewpoint Yannick Legré, EGI.eu Director yannick.legre@egi.eu Credit slides: T. Ferrari www.egi.eu This work by EGI.eu is licensed under a Creative

More information

Tom Sas HP. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee

Tom Sas HP. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee Advanced PRESENTATION Data Reduction TITLE GOES HERE Concepts Tom Sas HP Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee SNIA Legal Notice The material contained in this tutorial

More information

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback

Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy. David Toback Big Computing and the Mitchell Institute for Fundamental Physics and Astronomy Texas A&M Big Data Workshop October 2011 January 2015, Texas A&M University Research Topics Seminar 1 Outline Overview of

More information

PLAN-E Workshop Switzerland. Welcome! September 8, 2016

PLAN-E Workshop Switzerland. Welcome! September 8, 2016 PLAN-E Workshop Switzerland Welcome! September 8, 2016 The Swiss National Supercomputing Centre Driving innovation in computational research in Switzerland Michele De Lorenzi (CSCS) PLAN-E September 8,

More information

e-infrastructures in FP7 INFO DAY - Paris

e-infrastructures in FP7 INFO DAY - Paris e-infrastructures in FP7 INFO DAY - Paris Carlos Morais Pires European Commission DG INFSO GÉANT & e-infrastructure Unit 1 Global challenges with high societal impact Big Science and the role of empowered

More information

Metadata Models for Experimental Science Data Management

Metadata Models for Experimental Science Data Management Metadata Models for Experimental Science Data Management Brian Matthews Facilities Programme Manager Scientific Computing Department, STFC Co-Chair RDA Photon and Neutron Science Interest Group Task lead,

More information

Fundamentals of Data Infrastructures

Fundamentals of Data Infrastructures Fundamentals of Data Infrastructures Dublin, March 2014 Welcome & Introduction Adam Carter EPCC, The University of Edinburgh Training Coordinator, EUDAT Timetable 09:00 Registration & Coffee 09:15 Welcome

More information

Data Management Plan

Data Management Plan Data Management Plan Mark Sanders, Martina Chýlková Document Identifier D1.9 Data Management Plan Version 1.0 Date Due M6 Submission date 30 November, 2015 WorkPackage WP1 Management and coordination Lead

More information

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? Simple to start What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? What is the maximum download speed you get? Simple computation

More information

The KM3NeT Data Management Plan

The KM3NeT Data Management Plan The KM3NeT Data Management Plan ADMIN DETAILS Project Name: KM3NeT2.0 Funder: European Commission (Horizon 2020) Grant Title: 739560 Principal Investigator / Researcher: Maarten de Jong, FOM, The Netherlands

More information

High Performance Computing from an EU perspective

High Performance Computing from an EU perspective High Performance Computing from an EU perspective DEISA PRACE Symposium 2010 Barcelona, 10 May 2010 Kostas Glinos European Commission - DG INFSO Head of Unit GÉANT & e-infrastructures 1 "The views expressed

More information

The LHC Computing Grid

The LHC Computing Grid The LHC Computing Grid Visit of Finnish IT Centre for Science CSC Board Members Finland Tuesday 19 th May 2009 Frédéric Hemmer IT Department Head The LHC and Detectors Outline Computing Challenges Current

More information

To Compete You Must Compute

To Compete You Must Compute To Compete You Must Compute Presentation to Society of High Performance Computing Professionals November 3, 2010 Susan Baldwin Executive Director Compute Canada susan.baldwin@computecanada.org Vision To

More information

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT EUDAT A European Collaborative Data Infrastructure Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT OpenAire Interoperability Workshop Braga, Feb. 8, 2013 EUDAT Key facts

More information

Towards a Strategy for Data Sciences at UW

Towards a Strategy for Data Sciences at UW Towards a Strategy for Data Sciences at UW Albrecht Karle Department of Physics June 2017 High performance compu0ng infrastructure: Perspec0ves from Physics Exis0ng infrastructure and projected future

More information

Garuda : The National Grid Computing Initiative Of India. Natraj A.C, CDAC Knowledge Park, Bangalore.

Garuda : The National Grid Computing Initiative Of India. Natraj A.C, CDAC Knowledge Park, Bangalore. Garuda : The National Grid Computing Initiative Of India Natraj A.C, CDAC Knowledge Park, Bangalore. natraj@cdacb.ernet.in 1 Agenda About CDAC Garuda grid highlights Garuda Foundation Phase EU-India grid

More information

CernVM-FS beyond LHC computing

CernVM-FS beyond LHC computing CernVM-FS beyond LHC computing C Condurache, I Collier STFC Rutherford Appleton Laboratory, Harwell Oxford, Didcot, OX11 0QX, UK E-mail: catalin.condurache@stfc.ac.uk Abstract. In the last three years

More information

How FAIR am I? FAIR Principles and Interoperability of Data and Tools

How FAIR am I? FAIR Principles and Interoperability of Data and Tools How FAIR am I? FAIR Principles and Interoperability of Data and Tools Peter Doorn, DANS @pkdoorn @dansknaw Plan-Europe - Platform of National escience Centers in Europe PLAN-E meeting, April 27 & 28, 2017,

More information

Science-as-a-Service

Science-as-a-Service Science-as-a-Service The iplant Foundation Rion Dooley Edwin Skidmore Dan Stanzione Steve Terry Matthew Vaughn Outline Why, why, why! When duct tape isn t enough Building an API for the web Core services

More information

CC-IN2P3: A High Performance Data Center for Research

CC-IN2P3: A High Performance Data Center for Research April 15 th, 2011 CC-IN2P3: A High Performance Data Center for Research Toward a partnership with DELL Dominique Boutigny Agenda Welcome Introduction to CC-IN2P3 Visit of the computer room Lunch Discussion

More information

Protecting Future Access Now Models for Preserving Locally Created Content

Protecting Future Access Now Models for Preserving Locally Created Content Protecting Future Access Now Models for Preserving Locally Created Content By Amy Kirchhoff Archive Service Product Manager, Portico, ITHAKA Amigos Online Conference Digital Preservation: What s Now, What

More information

SHARING YOUR RESEARCH DATA VIA

SHARING YOUR RESEARCH DATA VIA SHARING YOUR RESEARCH DATA VIA SCHOLARBANK@NUS MEET OUR TEAM Gerrie Kow Head, Scholarly Communication NUS Libraries gerrie@nus.edu.sg Estella Ye Research Data Management Librarian NUS Libraries estella.ye@nus.edu.sg

More information

Virtualization in a Grid Environment. Nils Dijk - Hogeschool van Amsterdam Instituut voor Informatica

Virtualization in a Grid Environment. Nils Dijk - Hogeschool van Amsterdam Instituut voor Informatica Virtualization in a Grid Environment Nils Dijk - nils.dijk@hva.nl Hogeschool van Amsterdam Instituut voor Informatica July 8, 2010 Abstract Date: July 8, 2010 Title: Virtualization in a Grid Environment

More information

How to make your data open

How to make your data open How to make your data open Marialaura Vignocchi Alma Digital Library Muntimedia Center University of Bologna The bigger picture outside academia Thursday 29th October 2015 There is a strong societal demand

More information

Lessons Learned in the NorduGrid Federation

Lessons Learned in the NorduGrid Federation Lessons Learned in the NorduGrid Federation David Cameron University of Oslo With input from Gerd Behrmann, Oxana Smirnova and Mattias Wadenstein Creating Federated Data Stores For The LHC 14.9.12, Lyon,

More information

Business Model for Global Platform for Big Data for Official Statistics in support of the 2030 Agenda for Sustainable Development

Business Model for Global Platform for Big Data for Official Statistics in support of the 2030 Agenda for Sustainable Development Business Model for Global Platform for Big Data for Official Statistics in support of the 2030 Agenda for Sustainable Development Introduction This note sets out a business model for a Global Platform

More information

EUROPEANA METADATA INGESTION , Helsinki, Finland

EUROPEANA METADATA INGESTION , Helsinki, Finland EUROPEANA METADATA INGESTION 20.11.2012, Helsinki, Finland As of now, Europeana has: 22.322.604 Metadata (related to a digital record) in CC0 3.698.807 are in the Public Domain 697.031 Digital Objects

More information