Imperial College London. Simon Burbidge 29 Sept 2016

Similar documents
HPC Architectures. Types of resource currently in use

Using an HPC Cloud for Weather Science

Green Supercomputing

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues

Parallel Programming with MPI

Part 2: Computing and Networking Capacity (for research and instructional activities)

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

10 Cloud Myths Demystified

CSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science

THE CONVERGENCE OF HPC AND AI OBSERVATIONS AND INSIGHTS VERNEGLOBAL.COM

Users and utilization of CERIT-SC infrastructure

Compact Muon Solenoid: Cyberinfrastructure Solutions. Ken Bloom UNL Cyberinfrastructure Workshop -- August 15, 2005

High Performance Computing Course Notes HPC Fundamentals

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

LSF HPC :: getting most out of your NUMA machine

Trends in HPC (hardware complexity and software challenges)

The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research

HPC Capabilities at Research Intensive Universities

SGI Origin 400. The Integrated Workgroup Blade System Optimized for SME Workflows

in Action Fujitsu High Performance Computing Ecosystem Human Centric Innovation Innovation Flexibility Simplicity

Office 365 Business The Microsoft Office you know, powered by the cloud.

Virtualisation - the next phase. Kevin Davids April 2011

ECE 574 Cluster Computing Lecture 1

Determining Optimal MPI Process Placement for Large- Scale Meteorology Simulations with SGI MPIplace

AcuSolve Performance Benchmark and Profiling. October 2011

BlueGene/L. Computer Science, University of Warwick. Source: IBM

High Performance Computing in C and C++

Set your office free.

Sustainability of R&E Networks: A Bangladesh Perspective

Real Parallel Computers

High Performance Computing Course Notes Course Administration

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

Cluster Network Products

NFON Whitepaper: Integrating Microsoft Lync (Skype for Business) with Telephony

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Network Support for Data Intensive Science

High-Performance Computing at The University of Michigan College of Engineering

A New NSF TeraGrid Resource for Data-Intensive Science

Chris Dwan - Bioteam

Virtualization. Q&A with an industry leader. Virtualization is rapidly becoming a fact of life for agency executives,

MSC Nastran Explicit Nonlinear (SOL 700) on Advanced SGI Architectures

HPC and IT Issues Session Agenda. Deployment of Simulation (Trends and Issues Impacting IT) Mapping HPC to Performance (Scaling, Technology Advances)

Managed Services Rely on us to manage your business services

GPUs and Emerging Architectures

Top500 Supercomputer list

CAS 2K13 Sept Jean-Pierre Panziera Chief Technology Director

Leading New ICT The Road to City Digital Transformation

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

GÉANT and other projects Update

Outline. March 5, 2012 CIRMMT - McGill University 2

SPARC 2 Consultations January-February 2016

COSC 6385 Computer Architecture - Multi Processor Systems

UNIT PLAN PART A Program/Pathway Update

Compute Node Linux (CNL) The Evolution of a Compute OS

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010

Exercise Architecture of Parallel Computer Systems

Moving e-infrastructure into a new era the FP7 challenge

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

What is the Northern Ireland ehealth and Care strategy?

Cluster Computing. Cluster Architectures

Altair RADIOSS Performance Benchmark and Profiling. May 2013

HCL GRC IT AUDIT & ASSURANCE SERVICES

BlueGene/L (No. 4 in the Latest Top500 List)

Programme Specification (Master s Level)

Energy Action Plan 2015

The Optimal CPU and Interconnect for an HPC Cluster

The Use of Cloud Computing Resources in an HPC Environment

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

An Introduction to OpenACC

The Stampede is Coming: A New Petascale Resource for the Open Science Community

Tools and Methodology for Ensuring HPC Programs Correctness and Performance. Beau Paisley

Introduction to High-Performance Computing

Approaches to Parallel Computing

HPC future trends from a science perspective

Experiences in Optimizing a $250K Cluster for High- Performance Computing Applications

Stronger defence through digital technology. Enhancing UK s strategic edge in pursuit of maintaining military advantage now and in the future.

Work Project Report: Benchmark for 100 Gbps Ethernet network analysis

Linux Clusters for High- Performance Computing: An Introduction

High Performance Computing Resources at MSU

STRATEGIC PLAN. USF Emergency Management

SOUTH DAKOTA BOARD OF REGENTS HOUSING AND AUXILIARY FACILITIES SYSTEM REVENUE BONDS, SERIES 2015 Due Diligence Document Request List and Questionnaire

Introduction to Parallel Programming

The Fraunhofer Network: R&D for SMEs

Desktop Grids Are They Green? Dr James Osborne Dr Hugh Beedie

Falling Out of the Clouds: When Your Big Data Needs a New Home

We are also organizational home of the Internet Engineering Task Force (IETF), the premier Internet standards-setting body.

Governor Patrick Announces Funding to Launch Massachusetts Open Cloud Project Celebrates Release of 2014 Mass Big Data Report

9 myths about moving to the cloud. What small and medium-size businesses need to know about moving to Microsoft Office 365

SAFARICOM MANAGED WIDE AREA NETWORK. Safaricom MWAN CUSTOMER SERVICE MANAGEMENT: OR

Monash High Performance Computing

The knight makes his play for the crown Phi & Omni-Path Glenn Rosenberg Computer Insights UK 2016

Regional & National HPC resources available to UCSB

White Paper. How the Meltdown and Spectre bugs work and what you can do to prevent a performance plummet. Contents

White Paper. Technical Advances in the SGI. UV Architecture

Multi-Rail LNet for Lustre

Distributed Virtual Reality Computation

SHARCNET Workshop on Parallel Computing. Hugh Merz Laurentian University May 2008

It is requested that the Board approve the Northern Arizona University s Network Upgrade and Wireless Expansion Project.

34% DOING MORE WITH LESS How Red Hat Enterprise Linux shrinks total cost of ownership (TCO) compared to Windows. I n a study measuring

Transcription:

Imperial College London Simon Burbidge 29 Sept 2016

Imperial College London Premier UK University and research institution ranked #2= (with Cambridge) in QS World University rankings (MIT #1) #9 in worldwide Universities (THES) and #3 in Europe 15,000 students, 6,200 of them postgrad 7,200 staff (2,600 of which dedicated to research) Science based engineering/technology/medicine Lots of industrial research collaboration 350M per year research income

Main Campus South Kensington London Located on the site of the 1851 Great Exhibition in London (pic wikipedia Dickinsons)

HPC Service University wide service 2M annual budget for hardware and software Additional income directly from research groups 5 dedicated HPC staff Largest UK University HPC system (HPC-SIG)

3 HPC systems cx1 low end cluster mostly ethernet cx2 high end parallel MPI cluster (SGI ICE-X) ax3/4 large shared memory dedicated to genomics (UV)

ax3/4 Large memory SMP Two SGI UV - 160 processors 4 TB memory - 1024 processors 16 TB memory Main user is Bioscience Next Generation Gene Sequencing PBS topology aware scheduling, cpusets

cx2 High end MPP commissioned Autumn 09, upgraded several times, system refresh Winter 2015, supporting large highly parallel jobs SGI Altix ICE 8400 EX + ICE-X 12288 cores Dedicated to large MPI parallel jobs PBS topology aware scheduling simple config Focus on capability, running jobs on thousands of cores

cx1 Low end cluster Mostly gigabit ethernet, some infiniband islands Upgrading all the time, parts of the system owned by particular research groups 23026 cores, 1352 nodes, 12 distinct hardware types plus GPUs Serial and small parallel workload many shapes and sizes Tricky to schedule! 3 million jobs per year Focus on low cost/throughput

Why cx1 and cx2? Cost Flexibility Users able to contribute funds/hardware to cx1

10 years of Service Celebrating 10 years of successful service this summer Where to next?

More of the same For sure users like what they get! They want more We have rolling replacement programmes Finance and space constraints Space Users grants may last for 5 years, which is a bit long in the tooth for a server, but they have no more funds in the 5 year period to refresh Central funds are fixed and can t replace older hardware to a 3 year lifespan Computer room is full and limits expansion

Expand? Demand is there for expansion, more resources for current users plus newer areas, like bioscience and social science are growing fast Additional demand for higher fidelity simulations from traditional computational science larger molecules, more degrees of freedom, smaller timesteps, higher Reynolds number, more complex algorithms etc National resources constrained Good cases for expansion and likely funding but how?

Possible Options? New computer room Co-Lo s Cloud Where, how big Running costs (actually could be quite reasonable) Not really well suited for HPC hardware Prices not fixed market driven? Readily available at the moment For some work cloud gives good results, for large scale parallel simulations it doesn t Can be pricey and costs are variable hard to fund Stay tuned.

Software Challenges Its not just hardware though, should we invest in software? New processors have more and more cores/threads running more slowly These processors need carefully optimised code to reach their rated FLOP rates 3 levels of optimisation needed At the thread level (vectorisation) At the socket level (on chip communications, eg OpenMP) Between nodes (via the interconnect, eg MP) Memory access times becoming more problematic More NUMA levels: Registers/vector registers/cache On socket cache Off socket cache and user controlled buffers Main memory

Phew! Although some of this optimisation can be automated, most needs careful programming There is a real shortage of computational scientists who can do this work Its not fashionable to be a computational scientist in a research group Programming skills not being taught at lower levels Computing skills not regarded as worthwhile (not taught) in mainstream science and engineering degrees Skills shortage is a huge issue More training and recognition of programming and computation skills is badly needed across the board the next generation of HPC won t succeed without it.

HPC is fragile! HPC is Disruptive Challenging Expensive Changeable It needs a nurturing ecosystem in order to succeed Universities need to take care not to break it Long term sustainability needed Computer room infrastructure (15-30 year) It is not IT, it is not commodity, HPC is special and needs a special place of its own in the organisation. Essentially it s a research instrument Industrial collaboration and exchange is vital Working with the real world focusses on real challenges and issues. HPC could help in more areas than now by providing a safe place to conduct digital experiments, cheaply and effectively. It needs to be driven! Specialist Programming skills need to be fostered, encouraged and rewarded.

Thanks Questions? Simon Burbidge s.burbidge@imperial.ac.uk