CSD3 The Cambridge Service for Data Driven Discovery A New National HPC Service for Data Intensive science Dr Paul Calleja Director of Research Computing University of Cambridge
Problem statement Today we are seeing an explosion of data in research:- Experimental data, simulation data, sensor data, population data This impacts many science domains such as Observational Astronomy High energy physics Medical informatics Medical imaging Large scale population genome studies Simulation science weather, chemistry, engineering Social science - population data research IOT / smart cities Traditional HPC systems are not designed to efficiently store and process large amounts of data and new system architectures are needed that focus large data volume, I/O throughput and connection data analytics / machine learning capabilities Cambridge Service for Data Driven Discovery 2
Global leader in science & technology innovation One of the worlds leading research intensive Universities in terms of research outputs and impact, 10,000 staff 1.2B turn over Over 800 years old with 92 Nobel Laureates The Cambridge Cluster 1535 technology companies in surrounding science parks 27,000 staff, 13B turn over Cambridge Service for Data Driven Discovery 3
Research computing @ Cambridge Research Support External outreach Academic/Industry HPC & Data Solution Development Driving Discovery, Innovation & Impact Cambridge Service for Data Driven Discovery 4
Cambridge research computing investment Highly resilient HPC DC 200 Cabinets, 30 Kw water cooled racks, 2000Kw IT Load People 32 FTE technical team Skill focus in :- HPC system integration Large scale storage Openstack development & deployment Scientific support Systems 3 PF (2000 servers X86 + GPU), 250 node Hadoop system 30 PB storage + Intel Lustre & tape Run rate budget 5M per year Cambridge Service for Data Driven Discovery 5
Research computing usage and outputs 1016 active from 272 research groups from 42 University departments 80% system utilisation HPC has change evolved to research computing, the long tail has arrived - significant usage by over 300 users who consumed 200 workstation days of usage in last 12 months New user growth rate is 28% CAGR year on year for last 9 years, growth rate is expected to grow with Openstack usage models Research computing services support a current active grant portfolio of 120 which represents 8% of the Universities annual grant income Underpinning 1400 publications over the last 9 years, current output ~300 per year Cambridge Service for Data Driven Discovery 6
New data-intensive system Cambridge Service for Data Driven Discovery 7
CSD3 development work Open stack for HPC / HPDA Co-design with StackHPC - ongoing Tiered storage - Co-design with Dell EMC - ongoing High performance remote visualisation - complete Hadoop integration with Lustre complete Machine learning framework configuration KNL and GPU and Skylake - ongoing Cambridge Service for Data Driven Discovery 8
Why OpenStack in research computing Makes computing, data and applications more accessible, flexible and secure. Makes research computing & data easier to use and easier to share Science-as-a-Service on OpenStack Decreasing the time to science and increasing innovation Cambridge Service for Data Driven Discovery 9
Openstack development @ Cambridge Development and deployment of Openstack for research computing both bare metal HPC via Openstack and long tail scientific VMs on demand Cambridge Openstack development work is a jointly funded by the research computing service and the SKA with ~ 1.5 budget over a two year window OpenStack Partners StackHPC Dell Intel Redhat Mellanox TACC & CHPC in South Africa We expect to be running CSD3 as a bear metal system provisioned by Openstack Cambridge Service for Data Driven Discovery 10
Tiered storage solution Requirements Large scale, reliable, cost effective storage I/O connection to large scale multi-petabyte heterogeneous compute capability High, determinant I/O rates, seen at a per application basis, both bandwidth and IOPS High performance transfer between storage and HPDA/ ML frameworks High performance data visualisation Enriched Metadata tagging and search Strong multitenant security with accreditation High level solution vision Multi tier storage solution SSD disk tape (automated data movement) Standardised commodity storage and server building blocks software defined functionality Cambridge Service for Data Driven Discovery 11
Storage Co-design Dell, Intel Tier 1a 1 PB bandwidth optimised all SSD Lustre Tier 1b 0.2 PB latency optimised NVMe over fabrics Tier 2 10 PB balanced performance / capacity spinning rust based Lustre Tier 3 20 PB capacity optimised Tape Co-design work focusing on:- Optimum SSD hardware configuration Slurm burst buffer implementation with LUN striping QOS irods Lustre integration for tape HSM irods metadata enhancement and audit for compliance NVMe over OPA implementation Cambridge Service for Data Driven Discovery 12
Medical Informatics Hospital Patient Data Applications CSD3 OpenStack University Research Environment Medical Analytics Development Computational Biomedical Research Cambridge Service for Data Driven Discovery 13
Surgical site infection reduction Dr John Cromwell from Iowa University Hospital developed a new statistical model that takes patient medical records, live feeds from operating room runs a real time statistical model Cuts surgical site infection rates by 58% Cambridge Service for Data Driven Discovery 14
Large scale NGS sequencing & analytics OpenCB a next generation big data analytics platform for population scale genomics analysis. Developed in partnership with Genomics England for the UK 100K genome study the largest study of its kind anywhere in the world OpenCB is already deployed on the CSD3 driving the Bridge study to analyse the genomes of 10,000 rare disease patients Cambridge Service for Data Driven Discovery 15
Medical imaging @ Wolfson Brain Imaging Centre New state of the art brain scanning facility, needed step change in computational and data storage capability. OpenStack image analysis VM s provide that step change Cambridge Service for Data Driven Discovery 16
SKA IT design Cambridge led (Astrophysics) Design work led by Prof Paul Alexander in Astrophysics Cambridge is contracted to help with HPC compute design HPC storage design HPC operations Cambridge Service for Data Driven Discovery 17