SURFsara Data Services

Size: px
Start display at page:

Download "SURFsara Data Services"

Transcription

1 SURFsara Data Services SUPPORTING DATA-INTENSIVE SCIENCES Axel Berg

2 Dutch Scientific Challenges From High Energy Physics to atomic and molecular physics (DNA); Life sciences (cell biology); Human interaction (all human sciences from linguistics to even phobia studies); and from the big bang; to astronomy; science of the solar system; earth (climate and geophysics); into life and biodiversity. Picture courtesy of UvA FNWI

3 High Energy Physics, discovery of Higgs boson SURFsara and NIKHEF are a tier 1 site for the Large Hadron Collider at CERN

4 LOFAR, discovery of two new pulsars Using computing resources provided by SURFsara, which is part of the Dutch and European Grid Infrastructure, Coenen and the team needed only a month to search through a set of LOFAR images that would have occupied a single computer for more than a century.

5 twinl: searching NL tweets Prof. Antal van den Bosch Dr. Erik Tjong Kim Sang

6 twinl usage

7 THE DATA EXPLOSION Production Cartesius Projects COSMO GRID (2009) RUMC (2013) OWLS (2006) ESSENCE (2006) ENTRAIN (2011) ITAMOC (2011) EAGLE (2013) TITAN (2014) HPC ARCHIVE (1993) LOFAR Volume 105TB 90TB 70TB 41TB 26TB 25TB 2,5TB (600TB) (1PB) 1172TB 12PB Community LOFAR LHC/ATLAS LHC/LHCb BBMRI MOLEPI LHC/ALICE ILDG DANS OTHERS Volume 5,8PB 3.9PB 1,4PB 115TB 78TB 74TB 43TB 18TB 50TB Start LHC

8 The world of the many Many different users (well organised (international) user communities, research groups, universities, research institutes, individual researchers, etc.) Many different use cases (central data, distributed data, small files, large files, static data, dynamic data, etc.) Many different user requirements (storage, meta data, data searching, persistent identifiers, long-term preservation, privacy, visualisation, data analytics, data sharing, data re-use, semantic annotation, work flows etc. etc. etc. etc.) Many different V s of Big Data (varieties, volumes, velocities, veracities, validities, volatilities)

9 CREATING DATA: designing, planning consent, collection and management, capturing and creating metadata CREATING DATA" RE-USING DATA: for follow-ups, new research, research reviews, scrutinising, teaching & learning RE-USING DATA" PROCESSING DATA" PROCESSING DATA: entering, transcribing, checking & validating, anonymising and describing TRUST ACCESS TO DATA: distributing, sharing, controlling access, promoting GIVING ACCESS TO DATA" ANALYSING DATA" ANALYSING DATA: interpreting, deriving, producing outputs & publishing, preparing for sharing Ref: UK Data Archive: PRESERVING DATA" PRESERVING DATA: migrating, backing-up, storing, creating metadata and documentation, archiving Research Data Life Cycle

10 CREATING DATA" Data ingest service PID Service Trusted Digital Repository Beehub B2SAFE (EUDAT) PID service Trusted Digital Repository B2SAFE (EUDAT) RE-USING DATA" GIVING ACCESS TO DATA" SURFsara services supporting the Research Data Life Cycle PROCESSING DATA" ANALYSING DATA" Grid Storage Beehub SURFdrive Grid Storage Beehub SURFdrive Hadoop HPC Cloud Remote Visualisation Cartesius, Lisa cluster Ref: UK Data Archive: PRESERVING DATA" Central data archive PID Service Trusted Digital Repository B2SAFE (EUDAT) Supporting the complete Research Data Life Cycle Collaboration with DANS Collaboration within EUDAT

11 SURFsara data strategy SURFsara is providing integrated ICT services (e.g. computing, visualisation, data storage, data analysis) for the scientific research community inthe Netherlands SURFsara is going to provide a Trusted Digital Repository to: - to enable long-term archiving of research data at SURFsara - improve the quality of research data with metadata and persistent identifiers enrich access with open standard protocols - improve search and discoverability via harvesting SURFsara is going to serve the Long Tail Data (e.g. community clouds) SURFsara is providing services for data analysis (Hadoop, HPC Cloud, Grid, remote visualisation) Collaborate with Universities, Institutes and research groups on data management issues (e.g. community clouds) and data management plans Engage with funding agencies and data owners to come to transparent models on preservation, policies, costs, security and privacy of research data and services

12 SURFsara (Big) Data Centric infrastructure services

13 SURFsara (Big) Data Centric infrastructure services

14 SURFsara (Big) Data Centric infrastructure services

15 SURFsara Data Storage Services Central Archive GRID SE BEEHUB Usage HPC, External GRID, HPC Cloud, Hadoop, Workflows, External Collaboration platform, Define your own access rules Preservation Medium, Long term Medium, Long term Short, Medium Media Disk + Tape Disk + Tape Disk Capacity (current) Bandwidth (aggregated) 240TB + 2.5PB 5PB + 20PB 80TB 1GB/s + 1GB/s >16GB/s + 2GB/s 1GB/s Latency Direct + High (50s) Low + High (100s) Low Objects Medium, Large (60M) Large (30M) Small, Medium Protocols NFS, GridFTP, (hpn-) SCP, Rsync SRM, GridFTP, HTTP, Webdav, NFS, Xrootd, dcap Webdav

16 SURFsara Data Services Data Ingest Service Easy way to upload large data from disk onto SURFsara facilities Upload data from 45 disks in parallel Used for archiving large sequencing datasets EPIC/Handle PID service Persistent references to physical locations to make data objects findable and citable PID is comparable to SBN of Books EPIC/Handle system is comparable to DNS for Data Objects EPIC consortium: Providing a sustainable service for storing an maintaining large volumes of PIDs

17 Trusted Digital Repository Data repository service to deposit data sets and objects Long-term preservation of research data Provides quality to data sets and objects via metadata descriptions Makes data sets and objects citable and findable in the future via Persistent Identifiers (EPIC PID service) Makes data sets discoverable via metadata harvesting Improves data access and re-usability of research data Ensures trust to researchers via regular auditing against standardized certifications (e.g. Data Seal of Approval, ISO16363, DIN 31644) Trusted Digital Repository service is currently under construction at SURFsara Collaboration with DANS and Universities to provide service to manage research data

18 Grid infrastructure and usage National Grid infrastructure (NGI_NL): - Operated in collaboration with Nikhef and RUG-CIT - Grid compute clusters at Nikhef, RUG-CIT and SURFsara - Grid Services Cluster at SURFsara - Life Science Grid (operated by SURFsara) - Grid Front-End storage at Nikhef and SURFsara (total capacity 5.2 PB) - Part of European Grid Infrastructure (EGI) Grid Usage: - In 2013 more than 53 million core hours (6050 core years) - Scientific projects/users: Number of e-infra projects: 34 - Number of users: 226

19 Life Science Grid sites LUMC Leiden WUR Wageningen NKI Amsterdam AMC Amsterdam Radboud Universiteit Nijmegen ErasmusMC Rotterdam Technische Universiteit - Delft Rijksuniversiteit Groningen Keygene Wageningen Each of these clusters has 128 cores and 40 Tb of storage space. SURFsara 3000 cores and large tiered storage (disk and tape) NIKHEF: 3300 cores, RUG-CIT: 1000 cores

20 Grid computing example UMCG, LUMC, EMC, VUMC and UMCU jointly work on the Genome of the Netherlands DNA sequencing data of 750 individuals, sequenced in China Information is stored and processed on the Life Science Grid at SURFsara

21 HPC Cloud You can clone your laptop or server Create an image and upload it into the cloud And expand it Largest VM goes up to 32 cores and 256 GB RAM Or clone it to multiple copies, create your own cluster Start multiple worker nodes when needed But remember: you are your own sys admin

22 HPC Cloud Current HPC Cloud infrastructure 30 HPC Nodes: 32 cores, 256 GB RAM, 1 TB local disk 10 "light User Service Nodes: 8 cores/node, 64 GB RAM, 6 TB local disk 1250 cores total 400TB shared storage (ISCSI, NFS, CIFS, CDMI...)

23 HPC Cloud usage num_of_users Number of cloud users actively starting VMs

24 HPC Cloud usage in core hours (2013) Red: reserved capacity based on users applications Blue: actual usage

25 Memory-intensive computing High memory compute nodes 2 TB RAM - 40 core machines 8TB disk storage 1 as part of the Life Science Grid 1 as part of the HPC cloud Service available Q Added value: Particular in Life Sciences there are still many applications which use few cores and scale in RAM. (e.g. de novo assembly of genomes)

26 Hadoop & nosql HADOOP can be used to process large datafiles in a distributed way. It comes with an easy to use programmer interface and is able to handle huge datafiles, in structured, semistructured or unstructured form. Hadoop infrastructure cores (4x capacity of 2012) - 2 PB of HDFS storage (2x capacity of 2012) Usage 2013: - 35 different users and user groups CPU-years of processing - 160TB data stored New: nosql service - Schemaless databases, not everything fits into rows and columns easily - Faster development, easy to scale

27 Cloud storage for research (Beehub) Capacity: 80 TB Nr of users: 544 users (69 use SURFconext) Usage: 21 TB in user folders, 2 TB in group folders, 23 TB stored Interface: webdav

28 Visualisation: SURF Collaboratorium

29

30 Remote visualisation infrastructure Remote visualisation, benefits for users: - Remote visualisation cluster with GPUs next to data in SURFsara datacenter - Visualisation application runs on remote high-end GPU cluster (possibly in parallel) - User accesses application through remote desktop (VNC) - No datasets, only pixels are transferred Remote Visualisation Service: - 9 render nodes, per node: 2x 6-core Intel Xeon E5-2620, 2x Nvidia Geforce GTX780Ti, 64 GB RAM, 800 GB scratch - 33 TB home file system - Software stack maintained by SURFsara: ParaView, VisIt, VTK, VMD, MIPAV, Blender, any other OpenGL application - Expert visualisation support

31 Remote visualisation SURFsara E.P. van der Poel & R.O. Monico - University of Twente - Rayleigh-Bénard convection simulation J. van der Dussen - TU Delft - Numerical simulation of stratocumulus cloud breakup - Institute for Marine and Atmospheric research Utrecht - High-resolution movie generation of oceanographic simulation - Streaming from RVS cluster to tiled display at IMAU

32 Personal cloud storage (SURFdrive) Trusted community cloud for personal storage Collaboration between SURFsara, SURFnet and Dutch universities Advantages: Trusted community solution for personal cloud storage Hosted, managed and served by community, data stored at SURFsara Login through SURFconext Specifications and service determined by end-users (universities) Initial capacity: 200 TB, 100 GB storage capacity per user Based on owncloud Operational: April 1 st, 2014

33 SURFsara national datacenter HPC services have datacenter requirements that are not commonly served by commercial datacenters: - High energy density per rack - Advanced cooling techniques - Relatively low redundancy European procurement resulted in a partnership agreement between SURFsara and TelecityGroup for the development of a national datacenter for higher education at Science Park Amsterdam New SURFsara national datacenter operational medio 2016 Datacenter in a datacenter 800m 2 and on-demand power 1.8MW Low PUE (1.2) à sustainable and low costs Available for national e-infrastructure services, for higher education and for SMEs

34 SURFsara provides user support on (nearly) all levels

35 SURF Outreach & Support (SOS) SURF s operating companies offer a world-class communication, computing and support infrastructure to facilitate scientific and scholarly research - data transfer - compute - (international) collaboration - visualisation - storage For these services we aim to provide a single point of contact for researchers and ICT staff SOS offers: - to raise awareness among researchers of the possibilities of SURF e- infrastructure for scientific research - to foster collaboration and to enable research projects

36 What SURF can do for researchers Activities for researchers and research support officers in 2014 SURF Roadshow visit universities and university medical centers Integrated services portfolio description of services and support at Central information point direct advice and support via

37 Questions? Axel Berg,

Data management and discovery

Data management and discovery Data management and discovery using EUDAT services Hans van Piggelen SURFsara, The Netherlands EUDAT Mission offer common data services in CDI to all European researchers services will address the needs

More information

towards a federated infrastructure enabling integrated life science research

towards a federated infrastructure enabling integrated life science research towards a federated infrastructure enabling integrated life science research ELIXIR Innovation & SME forum March 18 2015 Ruben Kok & Jaap Heringa www.dtls.nl ZOOMING IN AND OUT OF LIFE LIFE @ ALL LEVELS

More information

e-research Infrastructures for e-science Axel Berg SARA national HPC & e-science support center RAMIRI, June 15, 2011

e-research Infrastructures for e-science Axel Berg SARA national HPC & e-science support center RAMIRI, June 15, 2011 e-research Infrastructures for e-science Axel Berg SARA national HPC & e-science support center RAMIRI, June 15, 2011 Science Park Amsterdam a world of science in a city of inspiration > Faculty of Science

More information

SURFSARA, TECHNOLOGICAL EXPERTISE & SURF E- INFRASTRUCTURE Axel Berg Information event etec-big call NLeSC-SURFsara Amsterdam, April 9, 2019

SURFSARA, TECHNOLOGICAL EXPERTISE & SURF E- INFRASTRUCTURE Axel Berg Information event etec-big call NLeSC-SURFsara Amsterdam, April 9, 2019 SURFSARA, TECHNOLOGICAL EXPERTISE & SURF E- INFRASTRUCTURE Axel Berg Information event etec-big call NLeSC-SURFsara Amsterdam, April 9, 2019 We are SURF Fields of work Education Flexible education Diverse

More information

Grid Computing: dealing with GB/s dataflows

Grid Computing: dealing with GB/s dataflows Grid Computing: dealing with GB/s dataflows Jan Just Keijser, Nikhef janjust@nikhef.nl David Groep, NIKHEF 3 May 2012 Graphics: Real Time Monitor, Gidon Moont, Imperial College London, see http://gridportal.hep.ph.ic.ac.uk/rtm/

More information

Connecting the e-infrastructure chain

Connecting the e-infrastructure chain Connecting the e-infrastructure chain Internet2 Spring Meeting, Arlington, April 23 rd, 2012 Peter Hinrich & Migiel de Vos Topics - About SURFnet - Motivation: Big data & collaboration - Collaboration

More information

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013 Data Replication: Automated move and copy of data PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013 Claudio Cacciari c.cacciari@cineca.it Outline The issue

More information

CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science

CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science CSD3 The Cambridge Service for Data Driven Discovery A New National HPC Service for Data Intensive science Dr Paul Calleja Director of Research Computing University of Cambridge Problem statement Today

More information

Netherlands Institute for Radio Astronomy. May 18th, 2009 Hanno Holties

Netherlands Institute for Radio Astronomy. May 18th, 2009 Hanno Holties Netherlands Institute for Radio Astronomy Update LOFAR Long Term Archive May 18th, 2009 Hanno Holties LOFAR Long Term Archive (LTA) Update Status Architecture Data Management Integration LOFAR, Target,

More information

A national approach for storage scale-out scenarios based on irods

A national approach for storage scale-out scenarios based on irods A national approach for storage scale-out scenarios based on irods Christine Staiger Ton Smeele SURFsara Utrecht University Science Park 140, ITS/RDM Amsterdam, The Heidelberglaan 8, Netherlands Utrecht,

More information

I data set della ricerca ed il progetto EUDAT

I data set della ricerca ed il progetto EUDAT I data set della ricerca ed il progetto EUDAT Casalecchio di Reno (BO) Via Magnanelli 6/3, 40033 Casalecchio di Reno 051 6171411 www.cineca.it 1 Digital as a Global Priority 2 Focus on research data Square

More information

BiG Grid HPC Cloud Beta

BiG Grid HPC Cloud Beta BiG Grid HPC Cloud Beta Floris Sluiter SARA Computing and Networking services Amsterdam www.cloud.sara.nl About BiG Grid and Sara The BiG Grid project is a collaboration between NCF, Nikhef and NBIC, and

More information

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license Inge Van Nieuwerburgh OpenAIRE NOAD Belgium Tools&Services OpenAIRE EUDAT can be reused under the CC BY license Open Access Infrastructure for Research in Europe www.openaire.eu Research Data Services,

More information

EUDAT- Towards a Global Collaborative Data Infrastructure

EUDAT- Towards a Global Collaborative Data Infrastructure EUDAT- Towards a Global Collaborative Data Infrastructure FOT-Net Data Stakeholder Meeting Brussels, 8 March 2016 Yann Le Franc, PhD e-science Data Factory, France CEO and founder EUDAT receives funding

More information

De BiG Grid e-infrastructuur digitaal onderzoek verbonden

De BiG Grid e-infrastructuur digitaal onderzoek verbonden Graphics: Real Time Monitor, Gidon Moont, Imperial College London, see http://gridportal.hep.ph.ic.ac.uk/rtm/ De BiG Grid e-infrastructuur digitaal onderzoek verbonden David Groep, Nikhef KennisKring Amsterdam

More information

The EUDAT Collaborative Data Infrastructure

The EUDAT Collaborative Data Infrastructure The EUDAT Collaborative Data Infrastructure CSCS, Lugano, 8-9 September 2 2016 (Plan-E workshop) Damien Lecarpentier Project Director, Research Infrastructures, CSC EUDAT Project Director EUDAT receives

More information

Data Management Plans. Sarah Jones Digital Curation Centre, Glasgow

Data Management Plans. Sarah Jones Digital Curation Centre, Glasgow Data Management Plans Sarah Jones Digital Curation Centre, Glasgow sarah.jones@glasgow.ac.uk Twitter: @sjdcc Data Management Plan (DMP) workshop, e-infrastructures Austria, Vienna, 17 November 2016 What

More information

SARA Overview. Walter Lioen Group Leader Supercomputing & Senior Consultant

SARA Overview. Walter Lioen Group Leader Supercomputing & Senior Consultant SARA Overview Walter Lioen Walter.Lioen@sara.nl Group Leader Supercomputing & Senior Consultant Science Park Amsterdam Faculty of Science of the Universiteit van Amsterdam National Institute for Nuclear

More information

The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research

The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research Dr Paul Calleja Director of Research Computing University of Cambridge Global leader in science & technology

More information

How Five International Networks are Enabling International Data-Intensive Research. Internet2 Global Summit 2014

How Five International Networks are Enabling International Data-Intensive Research. Internet2 Global Summit 2014 How Five International Networks are Enabling International Data-Intensive Research Internet2 Global Summit 2014 CONTENTS Brief introduction to EYR and EYR-Global Introduction to 2 selected projects Large

More information

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan Storage Virtualization Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan Storage Virtualization In computer science, storage virtualization uses virtualization to enable better functionality

More information

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd EUDAT Data Services & Tools for Researchers and Communities Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd CSC IT CENTER FOR SCIENCE! Founded in 1971 as a technical support

More information

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands Unleash Your Data Center s Hidden Power September 16, 2014 Molly Rector CMO, EVP Product Management & WW Marketing

More information

EUDAT - Open Data Services for Research

EUDAT - Open Data Services for Research EUDAT - Open Data Services for Research Johannes Reetz EUDAT operations Max Planck Computing & Data Centre Science Operations Workshop 2015 ESO, Garching 24-27th November 2015 EUDAT receives funding from

More information

EUDAT and Cloud Services

EUDAT and Cloud Services EUDAT and Cloud Services Space Data & Cloud Computing Infrastructures: Policies and Regulations ESRIN, Frascati, 7 July 2017 Per Öster CSC-IT Center for Science Finland www.eudat.eu EUDAT receives funding

More information

Building on to the Digital Preservation Foundation at Harvard Library. Andrea Goethals ABCD-Library Meeting June 27, 2016

Building on to the Digital Preservation Foundation at Harvard Library. Andrea Goethals ABCD-Library Meeting June 27, 2016 Building on to the Digital Preservation Foundation at Harvard Library Andrea Goethals ABCD-Library Meeting June 27, 2016 What do we already have? What do we still need? Where I ll focus DIGITAL PRESERVATION

More information

Big Data infrastructure and tools in libraries

Big Data infrastructure and tools in libraries Line Pouchard, PhD Purdue University Libraries Research Data Group Big Data infrastructure and tools in libraries 08/10/2016 DATA IN LIBRARIES: THE BIG PICTURE IFLA/ UNIVERSITY OF CHICAGO BIG DATA: A VERY

More information

Grid Computing: dealing with GB/s dataflows

Grid Computing: dealing with GB/s dataflows Grid Computing: dealing with GB/s dataflows Jan Just Keijser, Nikhef janjust@nikhef.nl David Groep, NIKHEF 21 March 2011 Graphics: Real Time Monitor, Gidon Moont, Imperial College London, see http://gridportal.hep.ph.ic.ac.uk/rtm/

More information

Data Discovery - Introduction

Data Discovery - Introduction Data Discovery - Introduction Why (benefits of reusing data) How EUDAT's services help with this (in general) Adam Carter In days gone by: Design an experiment Getting Your Data Conduct the experiment

More information

EUDAT & SeaDataCloud

EUDAT & SeaDataCloud EUDAT & SeaDataCloud SeaDataCloud Kick-off meeting Damien Lecarpentier CSC-IT Center for Science www.eudat.eu EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-infrastructures.

More information

HPC Cloud at SURFsara

HPC Cloud at SURFsara HPC Cloud at SURFsara Offering cloud as a service SURF Research Boot Camp 21st April 2016 Ander Astudillo Markus van Dijk What is cloud computing?

More information

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members

EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture. Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members EUDAT Training 2 nd EUDAT Conference, Rome October 28 th Introduction, Vision and Architecture Giuseppe Fiameni CINECA Rob Baxter EPCC EUDAT members Agenda Background information Services Common Data Infrastructure

More information

SARA Computing & Networking Services

SARA Computing & Networking Services SARA Computing & Networking Services Ronald van der Pol rvdp@sara.nl Outline! About SARA! National and International Collaborations! Overview of Services! Main Operational Tasks! Organisation! Operational

More information

Fair data and open data: differences and consequences

Fair data and open data: differences and consequences Fair data and open data: differences and consequences 1. To share or not to share: what is fair? Alex Burdorf, Erasmus MC Rotterdam 2. Data sharing: consequences for informed consent Marie-José Bonthuis,

More information

Using EUDAT services to replicate, store, share, and find cultural heritage data

Using EUDAT services to replicate, store, share, and find cultural heritage data Using EUDAT services to replicate, store, share, and find cultural heritage data in PSNC and beyond Maciej Brzeźniak, Norbert Meyer, PSNC Damien Lecarpentier, CSC 3 October 203 DIGITAL PRESERVATION OF

More information

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012 SGI Overview HPC User Forum Dearborn, Michigan September 17 th, 2012 SGI Market Strategy HPC Commercial Scientific Modeling & Simulation Big Data Hadoop In-memory Analytics Archive Cloud Public Private

More information

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure EUDAT Towards a pan-european Collaborative Data Infrastructure Damien Lecarpentier CSC-IT Center for Science, Finland CESSDA workshop Tampere, 5 October 2012 EUDAT Towards a pan-european Collaborative

More information

NorStore. a national infrastructure for scientific data. Andreas O Jaunsen UNINETT Sigma as

NorStore. a national infrastructure for scientific data. Andreas O Jaunsen UNINETT Sigma as NorStore a national infrastructure for scientific data Andreas O Jaunsen UNINETT Sigma as About UNINETT Sigma UNINETT Sigma AS is a private company established by the Ministry of science and education

More information

IT Challenges and Initiatives in Scientific Research

IT Challenges and Initiatives in Scientific Research IT Challenges and Initiatives in Scientific Research Alberto Di Meglio CERN openlab Deputy Head DOI: 10.5281/zenodo.9809 LHC Schedule 2009 2010 2011 2011 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022

More information

Building a Dutch National Research Infrastructure IRODS UGM 2017

Building a Dutch National Research Infrastructure IRODS UGM 2017 Building a Dutch National Research Infrastructure IRODS UGM 2017 Frank Heere 15-06-2017 SURF: who are we? Not-for-profit cooperative for ICT in Dutch education and research Knowledge sharing Shared digital

More information

Inauguration Cartesius June 14, 2013

Inauguration Cartesius June 14, 2013 Inauguration Cartesius June 14, 2013 Hardware is Easy...but what about software/applications/implementation/? Dr. Peter Michielse Deputy Director 1 Agenda History Cartesius Hardware path to exascale: the

More information

Data Management Checklist

Data Management Checklist Data Management Checklist Managing research data throughout its lifecycle ensures its long-term value and prevents data from falling into digital obsolescence. Proper data management is a key prerequisite

More information

Dutch View on URN:NBN and Related PID Services

Dutch View on URN:NBN and Related PID Services Dutch View on URN:NBN and Related PID Services Arjan Hogenaar DANS PID-workshop Cologne 1 Dutch view DANS-view Agreement to some extent Minor differences Mainly the view of the institute DANS PID-workshop

More information

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice 2014 年 3 月 13 日星期四 From Big Data to Big Value Infrastructure Needs and Huawei Best Practice Data-driven insight Making better, more informed decisions, faster Raw Data Capture Store Process Insight 1 Data

More information

Co-existence: Can Big Data and Big Computation Co-exist on the Same Systems?

Co-existence: Can Big Data and Big Computation Co-exist on the Same Systems? Co-existence: Can Big Data and Big Computation Co-exist on the Same Systems? Dr. William Kramer National Center for Supercomputing Applications, University of Illinois Where these views come from Large

More information

EMC ISILON HARDWARE PLATFORM

EMC ISILON HARDWARE PLATFORM EMC ISILON HARDWARE PLATFORM Three flexible product lines that can be combined in a single file system tailored to specific business needs. S-SERIES Purpose-built for highly transactional & IOPSintensive

More information

CSCS CERN videoconference CFD applications

CSCS CERN videoconference CFD applications CSCS CERN videoconference CFD applications TS/CV/Detector Cooling - CFD Team CERN June 13 th 2006 Michele Battistin June 2006 CERN & CFD Presentation 1 TOPICS - Some feedback about already existing collaboration

More information

Deliverable D5.3. World-wide E-infrastructure for structural biology. Grant agreement no.: Prototype of the new VRE portal functionality

Deliverable D5.3. World-wide E-infrastructure for structural biology. Grant agreement no.: Prototype of the new VRE portal functionality Deliverable D5.3 Project Title: Project Acronym: World-wide E-infrastructure for structural biology West-Life Grant agreement no.: 675858 Deliverable title: Lead Beneficiary: Prototype of the new VRE portal

More information

High Performance Data Analytics for Numerical Simulations. Bruno Raffin DataMove

High Performance Data Analytics for Numerical Simulations. Bruno Raffin DataMove High Performance Data Analytics for Numerical Simulations Bruno Raffin DataMove bruno.raffin@inria.fr April 2016 About this Talk HPC for analyzing the results of large scale parallel numerical simulations

More information

Organizational Update: December 2015

Organizational Update: December 2015 Organizational Update: December 2015 David Hudak Doug Johnson Alan Chalker www.osc.edu Slide 1 OSC Organizational Update Leadership changes State of OSC Roadmap Web app demonstration (if time) Slide 2

More information

Science 2.0 VU Big Science, e-science and E- Infrastructures + Bibliometric Network Analysis

Science 2.0 VU Big Science, e-science and E- Infrastructures + Bibliometric Network Analysis W I S S E N n T E C H N I K n L E I D E N S C H A F T Science 2.0 VU Big Science, e-science and E- Infrastructures + Bibliometric Network Analysis Elisabeth Lex KTI, TU Graz WS 2015/16 u www.tugraz.at

More information

Data Life Cycle. Research. Access Collaborate. Acquire. Analyse. Comprehend. Plan. Manage Archive. Publish Reuse

Data Life Cycle. Research. Access Collaborate. Acquire. Analyse. Comprehend. Plan. Manage Archive. Publish Reuse Automated ingest and management Access Collaborate Dataset transfer Databases Web-based file sharing Collaborative sites Acquire Analyse Technical advice Costing Grant assistance Plan Research Data Life

More information

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud European Cloud Initiative: implementation status Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud Political drivers for action EC Communication "European

More information

e-infrastructure: objectives and strategy in FP7

e-infrastructure: objectives and strategy in FP7 "The views expressed in this presentation are those of the author and do not necessarily reflect the views of the European Commission" e-infrastructure: objectives and strategy in FP7 National information

More information

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez Scientific data processing at global scale The LHC Computing Grid Chengdu (China), July 5th 2011 Who I am 2 Computing science background Working in the field of computing for high-energy physics since

More information

Data management Backgrounds and steps to implementation; A pragmatic approach.

Data management Backgrounds and steps to implementation; A pragmatic approach. Data management Backgrounds and steps to implementation; A pragmatic approach. Research and data management through the years Find the differences 2 Research and data management through the years Find

More information

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch

Preparing for High-Luminosity LHC. Bob Jones CERN Bob.Jones <at> cern.ch Preparing for High-Luminosity LHC Bob Jones CERN Bob.Jones cern.ch The Mission of CERN Push back the frontiers of knowledge E.g. the secrets of the Big Bang what was the matter like within the first

More information

ODC and future EIDA/ EPOS-S plans within EUDAT2020. Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team

ODC and future EIDA/ EPOS-S plans within EUDAT2020. Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team B2Safe @ ODC and future EIDA/ EPOS-S plans within EUDAT2020 Luca Trani and the EIDA Team Acknowledgements to SURFsara and the B2SAFE team 3rd Conference, Amsterdam, The Netherlands, 24-25 September 2014

More information

Towards a joint service catalogue for e-infrastructure services

Towards a joint service catalogue for e-infrastructure services Towards a joint service catalogue for e-infrastructure services Dr British Library 1 DI4R 2016 Workshop Joint service catalogue for research 29 September 2016 15/09/15 Goal A framework for creating a Catalogue

More information

Open Science Commons: A Participatory Model for the Open Science Cloud

Open Science Commons: A Participatory Model for the Open Science Cloud Open Science Commons: A Participatory Model for the Open Science Cloud Tiziana Ferrari EGI.eu Technical Director EGI-Engage Technical Coordinator www.egi.eu EGI-Engage is co-funded by the Horizon 2020

More information

DEVELOPING, ENABLING, AND SUPPORTING DATA AND REPOSITORY CERTIFICATION

DEVELOPING, ENABLING, AND SUPPORTING DATA AND REPOSITORY CERTIFICATION DEVELOPING, ENABLING, AND SUPPORTING DATA AND REPOSITORY CERTIFICATION Plato Smith, Ph.D., Data Management Librarian DataONE Member Node Special Topics Discussion June 8, 2017, 2pm - 2:30 pm ASSESSING

More information

University at Buffalo Center for Computational Research

University at Buffalo Center for Computational Research University at Buffalo Center for Computational Research The following is a short and long description of CCR Facilities for use in proposals, reports, and presentations. If desired, a letter of support

More information

Striving for efficiency

Striving for efficiency Ron Dekker Director CESSDA Striving for efficiency Realise the social data part of EOSC How to Get the Maximum from Research Data Prerequisites and Outcomes University of Tartu, 29 May 2018 Trends 1.Growing

More information

EUDAT. Towards a pan-european Collaborative Data Infrastructure

EUDAT. Towards a pan-european Collaborative Data Infrastructure EUDAT Towards a pan-european Collaborative Data Infrastructure Martin Hellmich Slides adapted from Damien Lecarpentier DCH-RP workshop, Manchester, 10 April 2013 Research Infrastructures Research Infrastructure

More information

Scientific Visualization Services at RZG

Scientific Visualization Services at RZG Scientific Visualization Services at RZG Klaus Reuter, Markus Rampp klaus.reuter@rzg.mpg.de Garching Computing Centre (RZG) 7th GOTiT High Level Course, Garching, 2010 Outline 1 Introduction 2 Details

More information

European digital repository certification: the way forward

European digital repository certification: the way forward Data Archiving and Networked Services European digital repository certification: the way forward Ingrid Dillo (DANS) EUDAT 3 rd User Forum Prague, 24 April 2014 DANS is an institute of KNAW en NWO Content

More information

Insight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products

Insight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products What is big data? Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.

More information

Data Movement & Tiering with DMF 7

Data Movement & Tiering with DMF 7 Data Movement & Tiering with DMF 7 Kirill Malkin Director of Engineering April 2019 Why Move or Tier Data? We wish we could keep everything in DRAM, but It s volatile It s expensive Data in Memory 2 Why

More information

Parallel Storage Systems for Large-Scale Machines

Parallel Storage Systems for Large-Scale Machines Parallel Storage Systems for Large-Scale Machines Doctoral Showcase Christos FILIPPIDIS (cfjs@outlook.com) Department of Informatics and Telecommunications, National and Kapodistrian University of Athens

More information

Paul Doorenbosch KB, National Library of the Netherlands Open Repositories 2010, Madrid,

Paul Doorenbosch KB, National Library of the Netherlands Open Repositories 2010, Madrid, Institutional Repositories, Long Term Preservation and the changing nature of Scholarly Publications Paul Doorenbosch KB, National Library of the Netherlands Open Repositories 2010, Madrid, 2010-07-07

More information

The Materials Data Facility

The Materials Data Facility The Materials Data Facility Ben Blaiszik (blaiszik@uchicago.edu), Kyle Chard (chard@uchicago.edu) Ian Foster (foster@uchicago.edu) materialsdatafacility.org What is MDF? We aim to make it simple for materials

More information

Microsoft s Cloud. Delivering operational excellence in the cloud Infrastructure. Erik Jan van Vuuren Azure Lead Microsoft Netherlands

Microsoft s Cloud. Delivering operational excellence in the cloud Infrastructure. Erik Jan van Vuuren Azure Lead Microsoft Netherlands Microsoft s Cloud Delivering operational excellence in the cloud Infrastructure Erik Jan van Vuuren Azure Lead Microsoft Netherlands 4 Megatrends are driving a revolution 44% of users (350M people) access

More information

Journey Towards Science DMZ. Suhaimi Napis Technical Advisory Committee (Research Computing) MYREN-X Universiti Putra Malaysia

Journey Towards Science DMZ. Suhaimi Napis Technical Advisory Committee (Research Computing) MYREN-X Universiti Putra Malaysia Malaysia's Computational Journey Towards Science DMZ Suhaimi Napis Technical Advisory Committee (Research Computing) MYREN-X Universiti Putra Malaysia suhaimi@upm.my In the Beginning... Research on parallel/distributed

More information

Data Management at CHESS

Data Management at CHESS Data Management at CHESS Marian Szebenyi 1 Outline Background Big Data at CHESS CHESS-DAQ What our users say Conclusions 2 CHESS and MacCHESS CHESS: National synchrotron facility, 11 stations (NSF $) CHESS

More information

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France ERF, Big data & Open data Brussels, 7-8 May 2014 EU-T0, Data

More information

Big Data - Some Words BIG DATA 8/31/2017. Introduction

Big Data - Some Words BIG DATA 8/31/2017. Introduction BIG DATA Introduction Big Data - Some Words Connectivity Social Medias Share information Interactivity People Business Data Data mining Text mining Business Intelligence 1 What is Big Data Big Data means

More information

Promoting Open Standards for Digital Repository. case study examples and challenges

Promoting Open Standards for Digital Repository. case study examples and challenges Promoting Open Standards for Digital Repository Infrastructures: case study examples and challenges Flavia Donno CERN P. Fuhrmann, DESY, E. Ronchieri, INFN-CNAF OGF-Europe Community Outreach Seminar Digital

More information

eresearch UCT Jason van Rooyen, PhD eresearch Analyst

eresearch UCT Jason van Rooyen, PhD eresearch Analyst eresearch UCT Jason van Rooyen, PhD eresearch Analyst www.eresearch.uct.ac.za Libraries http://www.canberra.edu.au/research/ucresearch/e-research Libraries eresearch is 21 st century discovery through

More information

EUDAT. Towards a pan-european Collaborative Data Infrastructure. KNMI Workshop, Utrecht, Netherlands

EUDAT. Towards a pan-european Collaborative Data Infrastructure. KNMI Workshop, Utrecht, Netherlands EUDAT Towards a pan-european Collaborative Data Infrastructure Giuseppe Fiameni, Claudio Cacciari SuperComputing Centre, CINECA, Italy Peter Wittenburg, Daan Broeder The Language Archive, Max Planck Institute,

More information

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona

EUDAT. Towards a pan-european Collaborative Data Infrastructure. Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona EUDAT Towards a pan-european Collaborative Data Infrastructure Damien Lecarpentier CSC-IT Center for Science, Finland EUDAT User Forum, Barcelona Date: 7 March 2012 EUDAT Key facts Content Project Name

More information

Clare Richards, Benjamin Evans, Kate Snow, Chris Allen, Jingbo Wang, Kelsey A Druken, Sean Pringle, Jon Smillie and Matt Nethery. nci.org.

Clare Richards, Benjamin Evans, Kate Snow, Chris Allen, Jingbo Wang, Kelsey A Druken, Sean Pringle, Jon Smillie and Matt Nethery. nci.org. The important role of HPC and data-intensive infrastructure facilities in supporting a diversity of Virtual Research Environments (VREs): working with Climate Clare Richards, Benjamin Evans, Kate Snow,

More information

Travelling securely on the Grid to the origin of the Universe

Travelling securely on the Grid to the origin of the Universe 1 Travelling securely on the Grid to the origin of the Universe F-Secure SPECIES 2007 conference Wolfgang von Rüden 1 Head, IT Department, CERN, Geneva 24 January 2007 2 CERN stands for over 50 years of

More information

Accelerating Throughput from the LHC to the World

Accelerating Throughput from the LHC to the World Accelerating Throughput from the LHC to the World David Groep David Groep Nikhef PDP Advanced Computing for Research v5 Ignatius 2017 12.5 MByte/event 120 TByte/s and now what? Kans Higgs deeltje: 1 op

More information

Caring for research data and what about software? Peter Doorn, director DANS

Caring for research data and what about software? Peter Doorn, director DANS DANS promotes sustained access to digital research data and encourages researchers to archive and reuse data Caring for research data and what about software? Peter Doorn, director DANS 2nd PLAN-E meeting

More information

Big Data in Research: Research Analytics Industry Solution. Stuart Long CTO - Oracle Systems Asia Pacific and Japan

Big Data in Research: Research Analytics Industry Solution. Stuart Long CTO - Oracle Systems Asia Pacific and Japan Big Data in Research: Research Analytics Industry Solution Stuart Long CTO - Oracle Systems Asia Pacific and Japan Information Architecture Capability Model Data Data technology Technology Management management

More information

ALCF Argonne Leadership Computing Facility

ALCF Argonne Leadership Computing Facility ALCF Argonne Leadership Computing Facility ALCF Data Analytics and Visualization Resources William (Bill) Allcock Leadership Computing Facility Argonne Leadership Computing Facility Established 2006. Dedicated

More information

Certification as a means of providing trust: the European framework. Ingrid Dillo Data Archiving and Networked Services

Certification as a means of providing trust: the European framework. Ingrid Dillo Data Archiving and Networked Services Certification as a means of providing trust: the European framework Ingrid Dillo Data Archiving and Networked Services Trust and Digital Preservation, Dublin June 4, 2013 Certification as a means of providing

More information

Using Cartesius and Lisa. Zheng Meyer-Zhao - Consultant Clustercomputing

Using Cartesius and Lisa. Zheng Meyer-Zhao - Consultant Clustercomputing Zheng Meyer-Zhao - zheng.meyer-zhao@surfsara.nl Consultant Clustercomputing Outline SURFsara About us What we do Cartesius and Lisa Architectures and Specifications File systems Funding Hands-on Logging

More information

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team Isilon: Raising The Bar On Performance & Archive Use Cases John Har Solutions Product Manager Unstructured Data Storage Team What we ll cover in this session Isilon Overview Streaming workflows High ops/s

More information

High Performance Computing on MapReduce Programming Framework

High Performance Computing on MapReduce Programming Framework International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming

More information

EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012

EUDAT. Towards a Collaborative Data Infrastructure. Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012 EUDAT Towards a Collaborative Data Infrastructure Ari Lukkarinen CSC-IT Center for Science, Finland NORDUnet 2012 Oslo, 18 August 2012 Big (Chaotic) Data DATA GENERATORS 1) Measurement technology. 2) Cheap

More information

Green Supercomputing

Green Supercomputing Green Supercomputing On the Energy Consumption of Modern E-Science Prof. Dr. Thomas Ludwig German Climate Computing Centre Hamburg, Germany ludwig@dkrz.de Outline DKRZ 2013 and Climate Science The Exascale

More information

MGHPCC: A Platform for Collaboration. Jim Culbert Director of IT Services MGHPCC 10/30/2018

MGHPCC: A Platform for Collaboration. Jim Culbert Director of IT Services MGHPCC 10/30/2018 MGHPCC: A Platform for Collaboration Jim Culbert Director of IT Services MGHPCC 10/30/2018 The MGHPCC Data Center and Consortium A partnership between 5 universities. The Commonwealth, and industrial sponsors

More information

Emerging Technologies for HPC Storage

Emerging Technologies for HPC Storage Emerging Technologies for HPC Storage Dr. Wolfgang Mertz CTO EMEA Unstructured Data Solutions June 2018 The very definition of HPC is expanding Blazing Fast Speed Accessibility and flexibility 2 Traditional

More information

5 Fundamental Strategies for Building a Data-centered Data Center

5 Fundamental Strategies for Building a Data-centered Data Center 5 Fundamental Strategies for Building a Data-centered Data Center June 3, 2014 Ken Krupa, Chief Field Architect Gary Vidal, Solutions Specialist Last generation Reference Data Unstructured OLTP Warehouse

More information

Efficient HTTP based I/O on very large datasets for high performance computing with the Libdavix library

Efficient HTTP based I/O on very large datasets for high performance computing with the Libdavix library Efficient HTTP based I/O on very large datasets for high performance computing with the Libdavix library Authors Devresse Adrien (CERN) Fabrizio Furano (CERN) Typical HPC architecture Computing Cluster

More information

DRS Update. HL Digital Preservation Services & Library Technology Services Created 2/2017, Updated 4/2017

DRS Update. HL Digital Preservation Services & Library Technology Services Created 2/2017, Updated 4/2017 Update HL Digital Preservation Services & Library Technology Services Created 2/2017, Updated 4/2017 1 AGENDA DRS DRS DRS Architecture DRS DRS DRS Work 2 COLLABORATIVELY MANAGED DRS Business Owner Digital

More information

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY Preston Smith Director of Research Services RESEARCH DATA DEPOT AT PURDUE UNIVERSITY May 18, 2016 HTCONDOR WEEK 2016 Ran into Miron at a workshop recently.. Talked about data and the challenges of providing

More information

dcache: challenges and opportunities when growing into new communities Paul Millar on behalf of the dcache team

dcache: challenges and opportunities when growing into new communities Paul Millar on behalf of the dcache team dcache: challenges and opportunities when growing into new Paul Millar communities on behalf of the dcache team EMI is partially funded by the European Commission under Grant Agreement RI-261611 Orientation:

More information

Data Transfers Between LHC Grid Sites Dorian Kcira

Data Transfers Between LHC Grid Sites Dorian Kcira Data Transfers Between LHC Grid Sites Dorian Kcira dkcira@caltech.edu Caltech High Energy Physics Group hep.caltech.edu/cms CERN Site: LHC and the Experiments Large Hadron Collider 27 km circumference

More information