CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science

Similar documents
The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research

DDN About Us Solving Large Enterprise and Web Scale Challenges

São Paulo. August,

OpenStack for Research Computing A University of Cambridge Perspective

Emerging Technologies for HPC Storage

Integration Path for Intel Omni-Path Fabric attached Intel Enterprise Edition for Lustre (IEEL) LNET

Monash High Performance Computing

HPC in Ontario. Chris Loken, CTO HPC User Forum Tucson 2018

2013 AWS Worldwide Public Sector Summit Washington, D.C.

Software Defined Storage for the Evolving Data Center

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

Bringing OpenStack to the Enterprise. An enterprise-class solution ensures you get the required performance, reliability, and security

Flexible HPC for Bio-informatics. Peter Clapham

The Center for High Performance Computing. Dell Breakfast Events 20 th June 2016 Happy Sithole

A Breakthrough in Non-Volatile Memory Technology FUJITSU LIMITED

Simplified Multi-Tenancy for Data Driven Personalized Health Research

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

Smart Trading with Cray Systems: Making Smarter Models + Better Decisions in Algorithmic Trading

HPC Storage Use Cases & Future Trends

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

High Performance Computing Data Management. Philippe Trautmann BDM High Performance Computing Global Research

irods at TACC: Secure Infrastructure for Open Science Chris Jordan

Power of the Portfolio. Copyright 2012 EMC Corporation. All rights reserved.

Storage for HPC, HPDA and Machine Learning (ML)

Copyright 2012 EMC Corporation. All rights reserved.

DDN Annual High Performance Computing Trends Survey Reveals Rising Deployment of Flash Tiers & Private/Hybrid Clouds vs.

Copyright 2012 EMC Corporation. All rights reserved.

Atos announces the Bull sequana X1000 the first exascale-class supercomputer. Jakub Venc

Data Centre & Colocation in Birmingham. Flexible. Secure. Accredited.

Tier 2 Computer Centres CSD3. Cambridge Service for Data Driven Discovery

Modernize Your IT with FlexPod. NetApp & Schneider Electric

INCREASE IT EFFICIENCY, REDUCE OPERATING COSTS AND DEPLOY ANYWHERE

VPI / InfiniBand. Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

Enterprise Architectures The Pace Accelerates Camberley Bates Managing Partner & Analyst

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

in Action Fujitsu High Performance Computing Ecosystem Human Centric Innovation Innovation Flexibility Simplicity

Modernizing Healthcare IT for the Data-driven Cognitive Era Storage and Software-Defined Infrastructure

DATACENTER SERVICES DATACENTER

An ESS implementation in a Tier 1 HPC Centre

Flash Storage Complementing a Data Lake for Real-Time Insight

New Approach to Unstructured Data

A Cloud WHERE PHYSICAL ARE TOGETHER AT LAST

Unified Computing System Launch. Welcome to Yas Island

Vision of the Software Defined Data Center (SDDC)

Performance Accelerated Mellanox InfiniBand Adapters Provide Advanced Data Center Performance, Efficiency and Scalability

Co-existence: Can Big Data and Big Computation Co-exist on the Same Systems?

in Action Delivering the digital enterprise Human Centric Innovation Ralf Salzmann Manager OEM

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

CloudLab. Updated: 5/24/16

DELL EMC VXRACK FLEX FOR HIGH PERFORMANCE DATABASES AND APPLICATIONS, MULTI-HYPERVISOR AND TWO-LAYER ENVIRONMENTS

Oracle Exadata: Strategy and Roadmap

Big Data 2015: Sponsor and Participants Research Event ""

Port Tapping Session 2 Race tune your infrastructure

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

Availability in the Modern Datacenter

Logicalis What we do

IBM CORAL HPC System Solution

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Users and utilization of CERIT-SC infrastructure

HPC Progress and Response to the National Cyber-Infrastructure

NEW CONVERGED APPROACH FOR SAP POWERED BY ATOS

New Zealand Government IBM Infrastructure as a Service

ACCELERATE YOUR ANALYTICS GAME WITH ORACLE SOLUTIONS ON PURE STORAGE

FATTWIN SUPERSERVERS POWER RUTGERS UNIVERSITY S TOP RANKED NEW SUPERCOMPUTER

Dr. Jean-Laurent PHILIPPE, PhD EMEA HPC Technical Sales Specialist. With Dell Amsterdam, October 27, 2016

Renovating your storage infrastructure for Cloud era

Comet Virtualization Code & Design Sprint

OUR VISION To be a global leader of computing research in identified areas that will bring positive impact to the lives of citizens and society.

HPC SERVICE PROVISION FOR THE UK

KAO DATA CAMPUS DATA CENTRES AT THE HOME OF INNOVATION

Home to the cloud e-shelter Innovation Lab. Toan Nguyen - Frankfurt am Main, September 29 th, 2016

Understanding As-a-service: Teradata IntelliCloud

New Approaches To Challenges Facing Enterprise ICT

THE EMC ISILON STORY. Big Data In The Enterprise. Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon.

BUSINESS DATA LAKE FADI FAKHOURI, SR. SYSTEMS ENGINEER, ISILON SPECIALIST. Copyright 2016 EMC Corporation. All rights reserved.

EMC STORAGE STRATEGY. Copyright 2015 EMC Corporation. All rights reserved.

COMPANY PROFILE.

MODERNISE WITH ALL-FLASH. Intel Inside. Powerful Data Centre Outside.

StorNext 3.0 Product Update: Server and Storage Virtualization with StorNext and VMware

MyCloud Computing Business computing in the cloud, ready to go in minutes

Refining and redefining HPC storage

University at Buffalo Center for Computational Research

VMware Virtual SAN Technology

Moving into the Cloud. Steven Canale, VP of Sales for SoftLayer

HPC NETWORKING IN THE REAL WORLD

HP Update. Bill Mannel VP/GM HPC & Big Data Business Unit Apollo Servers

A global technology leader approaching $39B in sales with 54,000 people, and customers in 160+ countries LENOVO. ALL RIGHTS RESERVED

SOFTWAREDEFINED STORAGE: THE FUTURE IS HERE. Data is Exploding. Budgets are Imploding. You Need SDS.

HPC Innovation Lab Update. Dell EMC HPC Community Meeting 3/28/2017

Tech Talk on HPC. Ken Claffey. VP, Cloud Systems. May 2016

InfiniBand Networked Flash Storage

Cloud Sure - Virtual Machines

The FlashStack Data Center

Delivering the Digital Institution

Case Study Automating Data Centre Infrastructure Diagrams

BT CNSP - new solutions for health and social care

Bringing HyperScale Computing to the Enterprise. The need for Enterprises to overhaul their IT systems

A global technology leader approaching $42B in sales with 57,000 people, and customers in 160+ countries LENOVO. ALL RIGHTS RESERVED

Recent Innovations in Data Storage Technologies Dr Roger MacNicol Software Architect

Transcription:

CSD3 The Cambridge Service for Data Driven Discovery A New National HPC Service for Data Intensive science Dr Paul Calleja Director of Research Computing University of Cambridge

Problem statement Today we are seeing an explosion of data in research:- Experimental data, simulation data, sensor data, population data This impacts many science domains such as Observational Astronomy High energy physics Medical informatics Medical imaging Large scale population genome studies Simulation science weather, chemistry, engineering Social science - population data research IOT / smart cities Traditional HPC systems are not designed to efficiently store and process large amounts of data and new system architectures are needed that focus large data volume, I/O throughput and connection data analytics / machine learning capabilities Cambridge Service for Data Driven Discovery 2

Global leader in science & technology innovation One of the worlds leading research intensive Universities in terms of research outputs and impact, 10,000 staff 1.2B turn over Over 800 years old with 92 Nobel Laureates The Cambridge Cluster 1535 technology companies in surrounding science parks 27,000 staff, 13B turn over Cambridge Service for Data Driven Discovery 3

Research computing @ Cambridge Research Support External outreach Academic/Industry HPC & Data Solution Development Driving Discovery, Innovation & Impact Cambridge Service for Data Driven Discovery 4

Cambridge research computing investment Highly resilient HPC DC 200 Cabinets, 30 Kw water cooled racks, 2000Kw IT Load People 32 FTE technical team Skill focus in :- HPC system integration Large scale storage Openstack development & deployment Scientific support Systems 3 PF (2000 servers X86 + GPU), 250 node Hadoop system 30 PB storage + Intel Lustre & tape Run rate budget 5M per year Cambridge Service for Data Driven Discovery 5

Research computing usage and outputs 1016 active from 272 research groups from 42 University departments 80% system utilisation HPC has change evolved to research computing, the long tail has arrived - significant usage by over 300 users who consumed 200 workstation days of usage in last 12 months New user growth rate is 28% CAGR year on year for last 9 years, growth rate is expected to grow with Openstack usage models Research computing services support a current active grant portfolio of 120 which represents 8% of the Universities annual grant income Underpinning 1400 publications over the last 9 years, current output ~300 per year Cambridge Service for Data Driven Discovery 6

New data-intensive system Cambridge Service for Data Driven Discovery 7

CSD3 development work Open stack for HPC / HPDA Co-design with StackHPC - ongoing Tiered storage - Co-design with Dell EMC - ongoing High performance remote visualisation - complete Hadoop integration with Lustre complete Machine learning framework configuration KNL and GPU and Skylake - ongoing Cambridge Service for Data Driven Discovery 8

Why OpenStack in research computing Makes computing, data and applications more accessible, flexible and secure. Makes research computing & data easier to use and easier to share Science-as-a-Service on OpenStack Decreasing the time to science and increasing innovation Cambridge Service for Data Driven Discovery 9

Openstack development @ Cambridge Development and deployment of Openstack for research computing both bare metal HPC via Openstack and long tail scientific VMs on demand Cambridge Openstack development work is a jointly funded by the research computing service and the SKA with ~ 1.5 budget over a two year window OpenStack Partners StackHPC Dell Intel Redhat Mellanox TACC & CHPC in South Africa We expect to be running CSD3 as a bear metal system provisioned by Openstack Cambridge Service for Data Driven Discovery 10

Tiered storage solution Requirements Large scale, reliable, cost effective storage I/O connection to large scale multi-petabyte heterogeneous compute capability High, determinant I/O rates, seen at a per application basis, both bandwidth and IOPS High performance transfer between storage and HPDA/ ML frameworks High performance data visualisation Enriched Metadata tagging and search Strong multitenant security with accreditation High level solution vision Multi tier storage solution SSD disk tape (automated data movement) Standardised commodity storage and server building blocks software defined functionality Cambridge Service for Data Driven Discovery 11

Storage Co-design Dell, Intel Tier 1a 1 PB bandwidth optimised all SSD Lustre Tier 1b 0.2 PB latency optimised NVMe over fabrics Tier 2 10 PB balanced performance / capacity spinning rust based Lustre Tier 3 20 PB capacity optimised Tape Co-design work focusing on:- Optimum SSD hardware configuration Slurm burst buffer implementation with LUN striping QOS irods Lustre integration for tape HSM irods metadata enhancement and audit for compliance NVMe over OPA implementation Cambridge Service for Data Driven Discovery 12

Medical Informatics Hospital Patient Data Applications CSD3 OpenStack University Research Environment Medical Analytics Development Computational Biomedical Research Cambridge Service for Data Driven Discovery 13

Surgical site infection reduction Dr John Cromwell from Iowa University Hospital developed a new statistical model that takes patient medical records, live feeds from operating room runs a real time statistical model Cuts surgical site infection rates by 58% Cambridge Service for Data Driven Discovery 14

Large scale NGS sequencing & analytics OpenCB a next generation big data analytics platform for population scale genomics analysis. Developed in partnership with Genomics England for the UK 100K genome study the largest study of its kind anywhere in the world OpenCB is already deployed on the CSD3 driving the Bridge study to analyse the genomes of 10,000 rare disease patients Cambridge Service for Data Driven Discovery 15

Medical imaging @ Wolfson Brain Imaging Centre New state of the art brain scanning facility, needed step change in computational and data storage capability. OpenStack image analysis VM s provide that step change Cambridge Service for Data Driven Discovery 16

SKA IT design Cambridge led (Astrophysics) Design work led by Prof Paul Alexander in Astrophysics Cambridge is contracted to help with HPC compute design HPC storage design HPC operations Cambridge Service for Data Driven Discovery 17