The important role of HPC and data-intensive infrastructure facilities in supporting a diversity of Virtual Research Environments (VREs): working with Climate Clare Richards, Benjamin Evans, Kate Snow, Chris Allen, Jingbo Wang, Kelsey A Druken, Sean Pringle, Jon Smillie and Matt Nethery IN31A-03 Contact: clare.richards@anu.edu.au
The aim of this talk is to Share the experience of the Australian National Computational Infrastructure (NCI) in establishing a HPC and data-intensive infrastructure facility for multiple research domains. Generate ideas for how to establish multi-domain computational research platforms underpinned by High Performance Data (HPD).
About NCI HPC and cloud infrastructure Computationally- and data-intensive science Big domains and long simulations 10+PB of reference data collections: Climate and Weather Environmental Earth Observation Geophysical Optical Astronomy Other data: Genomics and Social Sciences BoM
Understanding the needs for Climate research One of the most computationally demanding research in the environmental sciences. Volume and complexity of data is growing International collaboration involves sharing a lot of data. Coupled Model Intercomparison Project (CMIP) 20000 10000 Growth in CMIP Data CMIP6 (18PB+) Need a research platform for model development and data analysis Must adapt as user needs change. CMIP3 (40TB) CMIP5 (2PB) 0 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 Balaji V, et al (2018) Requirements for a global data infrastructure in support of CMIP6
Building a VRE for Climate and Weather Science Collaborative user-focused development Range of skills, experience. Climate and Weather Science Virtual Laboratory (2013) Integrated compute and data analysis platforms Climate Science Data Enhanced Virtual Laboratory New tools and data services Emphasis on reliable access to Climate and Weather data collections (local and internationally)
Climate Science Virtual Laboratory Modelling (resolution and complexity) Platform for model development HPC scaling Big data, data services and deep search capabilities FAIR data catalogues and data access Data curation/management Scalable data search* Virtual Desktop Infrastructure (VDI) Visualisation and analysis platform *IN43A-15 Rm209A-C 3:04pm Thu NCI Deep Search
Similarities across domains
Open to uptake and additions by other users The capability and principles developed for the climate research have been applied to Earth observations and geophysics: DATASETS Standardised HPD common self-describing netcdf file format Compliance with data convention and metadata standards CF and ACDD Test Usability with a range of uses and data services (e.g., THREDDS OPeNDAP) Transdisciplinary science access VDI Usage has increased EO and geophysics now almost half the users.
Future development Supporting broad research and development of web services Compute and data Adopting good software/approaches from other domains GSKY* provides: - High performance scalable OGC data services - Data manipulation, coordinate transformations, aggregations - Potentially re-gridding Jupyter Hub driving python and R Building in next step prototyping Open to updates based on feedback from users *For more information: http://gsky.
Overcoming the challenges Adopting new best practice has its challenges as well It takes time to make the transition Need to understand barriers to use Need to invest in training and outreach Make the transition as easy as possible Prototype and be flexible Be prepared that some things won t work Find community champions to help develop and transition Sustainability Don t try to keep modifying everything to please everyone! Anticipate what users want next and validate/prioritise for future development
Benefits of using a multi-domain approach Best practice will evolve Access and cross-disciplinary use of data: Better for researchers and funders. Adherence to standards and quality can build trust in the data. Sustainability: Sharing infrastructure and adopting software can accelerate development and deliver cost benefits. Central coordination of domain repositories across organisational/state/national/continental scales reduces unnecessary duplication of effort.
Acknowledgements The Team at NCI and all our collaborators