Clare Richards, Benjamin Evans, Kate Snow, Chris Allen, Jingbo Wang, Kelsey A Druken, Sean Pringle, Jon Smillie and Matt Nethery. nci.org.

Similar documents
The Changing Role of Data Stewardship in Creating Trustworthy, Transdisciplinary High Performance Data Platforms for the Future

Implementing a Data Quality Strategy to simplify access to data

Implementing a Data Quality Strategy to simplify access to data

Making data access easier with OPeNDAP. James Gallapher (OPeNDAP TM ) Duan Beckett (BoM) Kate Snow (NCI) Robert Davy (CSIRO) Adrian Burton (ARDC)

National Research Data Cloud

High Performance Data Efficient Interoperability for Scientific Data

Data near processing support for climate data analysis. Stephan Kindermann, Carsten Ehbrecht Deutsches Klimarechenzentrum (DKRZ)

The NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform to Support the Analysis of Petascale Environmental Data Collections

HDF Product Designer: A tool for building HDF5 containers with granule metadata

Cyberinfrastructure Framework for 21st Century Science & Engineering (CIF21)

The CEDA Archive: Data, Services and Infrastructure

CSIRO and the Open Data Cube

Uniform Resource Locator Wide Area Network World Climate Research Programme Coupled Model Intercomparison

Production Petascale Climate Data Replication at NCI Lustre and our engagement with the Earth Systems Grid Federation (ESGF)

Cumulus Services Working Group. Dan Pilone SE TIM / August 2017

Pervasive DataRush TM

CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science

Next GEOSS der neue europäische GEOSS Hub

Pangeo. A community-driven effort for Big Data geoscience

Data publication and discovery with Globus

Modeling groups and Data Center Requirements. Session s Keynote. Sébastien Denvil, CNRS, Institut Pierre Simon Laplace (IPSL)

Advancing Library Cyberinfrastructure for Big Data Sharing and Reuse. Zhiwu Xie

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Introduction to Grid Computing

Data Life Cycle. Research. Access Collaborate. Acquire. Analyse. Comprehend. Plan. Manage Archive. Publish Reuse

The Future of ESGF. in the context of ENES Strategy

The Data exacell DXC. J. Ray Scott DXC PI May 17, 2016

LTC 2017 Practical lesson

Portfolio of Services. NATIONAL COMPUTATIONAL Portfolio INFRASTRUCTURE

Digital Earth Australia: First Look

ExArch, Edinburgh, March 2014

GSKY: A scalable, distributed geospatial data-server

InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice

Data Issues for next generation HPC

Call for Participation in AIP-6

Your network s path to its fiber future. Grow confidently with fiber solutions from an experienced partner

Introduction to Prod-Trees

EGI: Linking digital resources across Eastern Europe for European science and innovation

High Performance Computing from an EU perspective

IT Challenges and Initiatives in Scientific Research

Get your business Skype d up. Lessons learned from Skype for Business adoption

CSIRO Visualisation Service

2013 AWS Worldwide Public Sector Summit Washington, D.C.

Assignments. Assignment 2 is due TODAY, 11:59pm! Submit one per pair on Blackboard.


Introduction. Chapter 1. What Is Visual Modeling? The Triangle for Success. The Role of Notation. History of the UML. The Role of Process

Moving e-infrastructure into a new era the FP7 challenge

NetCDF and Scientific Data Durability. Russ Rew, UCAR Unidata ESIP Federation Summer Meeting

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY

Lidar Radar Open Software Environment LROSE and the Python ARM Radar Toolkit Py-ART

Accelerate Your Enterprise Private Cloud Initiative

Ocean Color Data Formats and Conventions:

Intro to CMIP, the WHOI CMIP5 community server, and planning for CMIP6

ICT key driver to a low carbon society

Pangeo. A community-driven effort for Big Data geoscience

IT Town Hall Meeting

Discover the all-flash storage company for the on-demand world

INTAROS Integrated Arctic Observation System

Tom Achtor, Tom Rink, Tom Whittaker. Space Science & Engineering Center (SSEC) at the University of Wisconsin - Madison

OGC at KNMI: Current use and plans

ExArch: Climate analytics on distributed exascale data archives Martin Juckes, V. Balaji, B.N. Lawrence, M. Lautenschlager, S. Denvil, G. Aloisio, P.

The C3S Climate Data Store and its upcoming use by CAMS

Supporting Data Stewardship Throughout the Data Life Cycle in the Solid Earth Sciences

Connecting the e-infrastructure chain

2 The BEinGRID Project

AWIPS Technology Infusion Darien Davis NOAA/OAR Forecast Systems Laboratory Systems Development Division April 12, 2005

PREPARE FOR TAKE OFF. Accelerate your organisation s journey to the Cloud.

escience in the Cloud Dan Fay Director Earth, Energy and Environment

Practical Benefits of Organised Spatial Data

Basic Requirements for Research Infrastructures in Europe

HDF Product Designer Documentation

SAS, OPEN SOURCE & VIYA MATT MALCZEWSKI, SAS CANADA

Making Open Data work for Europe

I D C T E C H N O L O G Y S P O T L I G H T

Metadata Models for Experimental Science Data Management

Existing Solutions. Operating data services: Climate Explorer ECA&D climate4impact.eu data.knmi.nl

IMOS/AODN ocean portal: tools for data delivery. Roger Proctor, Peter Blain, Sebastien Mancini IMOS

Journey Towards Science DMZ. Suhaimi Napis Technical Advisory Committee (Research Computing) MYREN-X Universiti Putra Malaysia

SCA19 APRP. Update Andrew Howard - Co-Chair APAN APRP Working Group. nci.org.au

CANARIE: Providing Essential Digital Infrastructure for Canada

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

Regional Centers Framework Update. Regional Project Evaluation Committee March 24, 2017

The Materials Data Facility

Growing Variety and Volume of Remote Sensing and In Situ Data

SAINT PETERSBURG DECLARATION Building Confidence and Security in the Use of ICT to Promote Economic Growth and Prosperity

Terabit Networking with JASMIN

Open Software Standards for Next- Generation Community Satellite Software Packages June 2017

EGI federated e-infrastructure, a building block for the Open Science Commons

Federated Identity Management for Research Collaborations. Bob Jones IT dept CERN 29 October 2013

Government Data Center Modernization

The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research

BIG DATA CHALLENGES A NOAA PERSPECTIVE

Data Discovery - Introduction

EUDAT & SeaDataCloud

Python: Working with Multidimensional Scientific Data. Nawajish Noman Deng Ding

Technical documentation. SIOS Data Management Plan

Eastern Regional Network (ERN) Barr von Oehsen Internet2 Tech Exchange 10/16/2018

Collaborative data-driven science. Collaborative data-driven science. Mike Rippin

C3S Data Portal: Setting the scene

EU Innovation Investments: The Challenges met by Innovation Infrastructures Today in Europe

Transcription:

The important role of HPC and data-intensive infrastructure facilities in supporting a diversity of Virtual Research Environments (VREs): working with Climate Clare Richards, Benjamin Evans, Kate Snow, Chris Allen, Jingbo Wang, Kelsey A Druken, Sean Pringle, Jon Smillie and Matt Nethery IN31A-03 Contact: clare.richards@anu.edu.au

The aim of this talk is to Share the experience of the Australian National Computational Infrastructure (NCI) in establishing a HPC and data-intensive infrastructure facility for multiple research domains. Generate ideas for how to establish multi-domain computational research platforms underpinned by High Performance Data (HPD).

About NCI HPC and cloud infrastructure Computationally- and data-intensive science Big domains and long simulations 10+PB of reference data collections: Climate and Weather Environmental Earth Observation Geophysical Optical Astronomy Other data: Genomics and Social Sciences BoM

Understanding the needs for Climate research One of the most computationally demanding research in the environmental sciences. Volume and complexity of data is growing International collaboration involves sharing a lot of data. Coupled Model Intercomparison Project (CMIP) 20000 10000 Growth in CMIP Data CMIP6 (18PB+) Need a research platform for model development and data analysis Must adapt as user needs change. CMIP3 (40TB) CMIP5 (2PB) 0 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 Balaji V, et al (2018) Requirements for a global data infrastructure in support of CMIP6

Building a VRE for Climate and Weather Science Collaborative user-focused development Range of skills, experience. Climate and Weather Science Virtual Laboratory (2013) Integrated compute and data analysis platforms Climate Science Data Enhanced Virtual Laboratory New tools and data services Emphasis on reliable access to Climate and Weather data collections (local and internationally)

Climate Science Virtual Laboratory Modelling (resolution and complexity) Platform for model development HPC scaling Big data, data services and deep search capabilities FAIR data catalogues and data access Data curation/management Scalable data search* Virtual Desktop Infrastructure (VDI) Visualisation and analysis platform *IN43A-15 Rm209A-C 3:04pm Thu NCI Deep Search

Similarities across domains

Open to uptake and additions by other users The capability and principles developed for the climate research have been applied to Earth observations and geophysics: DATASETS Standardised HPD common self-describing netcdf file format Compliance with data convention and metadata standards CF and ACDD Test Usability with a range of uses and data services (e.g., THREDDS OPeNDAP) Transdisciplinary science access VDI Usage has increased EO and geophysics now almost half the users.

Future development Supporting broad research and development of web services Compute and data Adopting good software/approaches from other domains GSKY* provides: - High performance scalable OGC data services - Data manipulation, coordinate transformations, aggregations - Potentially re-gridding Jupyter Hub driving python and R Building in next step prototyping Open to updates based on feedback from users *For more information: http://gsky.

Overcoming the challenges Adopting new best practice has its challenges as well It takes time to make the transition Need to understand barriers to use Need to invest in training and outreach Make the transition as easy as possible Prototype and be flexible Be prepared that some things won t work Find community champions to help develop and transition Sustainability Don t try to keep modifying everything to please everyone! Anticipate what users want next and validate/prioritise for future development

Benefits of using a multi-domain approach Best practice will evolve Access and cross-disciplinary use of data: Better for researchers and funders. Adherence to standards and quality can build trust in the data. Sustainability: Sharing infrastructure and adopting software can accelerate development and deliver cost benefits. Central coordination of domain repositories across organisational/state/national/continental scales reduces unnecessary duplication of effort.

Acknowledgements The Team at NCI and all our collaborators