HECToR. UK National Supercomputing Service. Andy Turner & Chris Johnson

Similar documents
High Performance Computing. What is it used for and why?

HPC Issues for DFT Calculations. Adrian Jackson EPCC

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Inauguration Cartesius June 14, 2013

Preparing GPU-Accelerated Applications for the Summit Supercomputer

HPC IN EUROPE. Organisation of public HPC resources

CALMIP : HIGH PERFORMANCE COMPUTING

An Introduction to OpenACC

Exascale Challenges and Applications Initiatives for Earth System Modeling

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

HECToR. The new UK National High Performance Computing Service. Dr Mark Parsons Commercial Director, EPCC

HECToR Annual Report January 31 December Issue: 1.0. HECToR_Annual_2011 Page 1 of 59

Umeå University

Umeå University

High Performance Computing : Code_Saturne in the PRACE project

Green Supercomputing

The View from the High End Fortran, Parallelism and the HECToR Service

High Performance Computing. What is it used for and why?

Výpočetní zdroje IT4Innovations a PRACE pro využití ve vědě a výzkumu

Exascale: challenges and opportunities in a power constrained world

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

Practical Scientific Computing

2DECOMP&FFT The Library Behind Incompact3D

HPC Architectures. Types of resource currently in use

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Present and Future Leadership Computers at OLCF

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Update on Cray Activities in the Earth Sciences

UCX: An Open Source Framework for HPC Network APIs and Beyond

CP2K Performance Benchmark and Profiling. April 2011

University at Buffalo Center for Computational Research

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

The NEMO Ocean Modelling Code: A Case Study

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich

Analyzing and Optimizing Global Array Toolkit for Cray Gemini Interconnect

High-Performance Computing & Simulations in Quantum Many-Body Systems PART I. Thomas Schulthess

The IBM Blue Gene/Q: Application performance, scalability and optimisation

Oak Ridge National Laboratory Computing and Computational Sciences

Mesh reordering in Fluidity using Hilbert space-filling curves

Parallel FFT Libraries

Performance Comparison of Capability Application Benchmarks on the IBM p5-575 and the Cray XT3. Mike Ashworth

Intel Many Integrated Core (MIC) Architecture

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

Computational Challenges and Opportunities for Nuclear Astrophysics

Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design

HPC Resources & Training

CloverLeaf: Preparing Hydrodynamics Codes for Exascale

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

Comparison of PRACE prototypes and benchmarks. Axel Berg (SARA, NL), ISC 10 Hamburg June 1 st 2010

HECToR to ARCHER. An Introduction from Cray. 10/3/2013 Cray Inc. Property

Technologies and application performance. Marc Mendez-Bermond HPC Solutions Expert - Dell Technologies September 2017

CP2K: HIGH PERFORMANCE ATOMISTIC SIMULATION

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

Compute Node Linux: Overview, Progress to Date & Roadmap

Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications

Porting and Optimisation of UM on ARCHER. Karthee Sivalingam, NCAS-CMS. HPC Workshop ECMWF JWCRP

CP2K Performance Benchmark and Profiling. April 2011

BlueGene/L. Computer Science, University of Warwick. Source: IBM

Our Workshop Environment

High Performance Computing from an EU perspective

ARCHER/RDF Overview. How do they fit together? Andy Turner, EPCC

Maxwell: a 64-FPGA Supercomputer

Lecture 20: Distributed Memory Parallelism. William Gropp

Experiences with HP SFS / Lustre in HPC Production

OP2 FOR MANY-CORE ARCHITECTURES

The Center for High Performance Computing. Dell Breakfast Events 20 th June 2016 Happy Sithole

Trends in HPC (hardware complexity and software challenges)

High Performance Computing and Data Resources at SDSC

User Training Cray XC40 IITM, Pune

IBM HPC DIRECTIONS. Dr Don Grice. ECMWF Workshop November, IBM Corporation

Introduction to National Supercomputing Centre in Guangzhou and Opportunities for International Collaboration

The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality! Katie Antypas, Tina Butler, and Jonathan Carter

Cray events. ! Cray User Group (CUG): ! Cray Technical Workshop Europe:

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

Achieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation

D1-1 Optimisation of R bootstrapping on HECToR with SPRINT

The Cray Rainier System: Integrated Scalar/Vector Computing

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

CPMD Performance Benchmark and Profiling. February 2014

GREAT PERFORMANCE FOR TINY PROBLEMS: BATCHED PRODUCTS OF SMALL MATRICES. Nikolay Markovskiy Peter Messmer

What does Heterogeneity bring?

The Red Storm System: Architecture, System Update and Performance Analysis

Building supercomputers from embedded technologies

HPC Saudi Jeffrey A. Nichols Associate Laboratory Director Computing and Computational Sciences. Presented to: March 14, 2017

Performance Study of Popular Computational Chemistry Software Packages on Cray HPC Systems

ABySS Performance Benchmark and Profiling. May 2010

The Future of GPU Computing

Arm in HPC. Toshinori Kujiraoka Sales Manager, APAC HPC Tools Arm Arm Limited

Barcelona Supercomputing Center

CS500 SMARTER CLUSTER SUPERCOMPUTERS

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

: A new version of Supercomputing or life after the end of the Moore s Law

PART I - Fundamentals of Parallel Computing

High Performance Computing with Accelerators

"On the Capability and Achievable Performance of FPGAs for HPC Applications"

Overview. Idea: Reduce CPU clock frequency This idea is well suited specifically for visualization

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers

Blue Waters System Overview. Greg Bauer

Practical Scientific Computing

Transcription:

HECToR UK National Supercomputing Service Andy Turner & Chris Johnson

Outline EPCC HECToR Introduction HECToR Phase 3 Introduction to AMD Bulldozer Architecture Performance Application placement the hardware really matters CP2K in PRACE Simulation at exascale What software is used on HECToR? Exascale software challenges What does this mean for a HPC users? 2

EPCC Founded in 1990 Based at The University of Edinburgh, within the School of Physics and Astronomy leading European centre of expertise in advanced research training and technology transfer supercomputer services to academia and business 95% of our funding from external sources Heavily involved in European projects such as PRACE and HPC-Europa

PRACE and HPC-Europa PRACE DECI - a resource exchange programme projects can access several million CPU-hours of compute resource on machines across Europe http://www.prace-ri.eu/call-announcements HPC-Europa visitor programme Visitors can visit one of 7 countries: Italy, UK, Spain, Germany, France, The Netherlands or Finland http://www.hpc-europa.eu/ Find a host in an academic department and HPC-Europa provides travel, subsistence and access to HPC resources

HECToR 5

HECToR Partners RCUK UK Research funding councils UoE HPCx Ltd./EPCC System host and operator Cray Inc. System provider NAG Ltd. Computational science and engineering support 6

HECToR Details UK National HPC Service PRACE Tier-1 machine Currently 30-cabinet Cray XE6 system 2816 nodes, 90,112 cores Each node has 2 16-core AMD Opterons (2.3GHz Interlagos) 32 GB memory Peak of over 800 TF 90 TB of memory 7

HECToR Service Compute nodes Login nodes Lustre OSS Lustre MDS NFS Server Boot/SDB node Cray XE6 Supercomputer 1 GigE Backbone Infiniband Switch 10 GigE Backup and Archive Servers esfs Lustre high-performance, parallel filesystem 8

HECToR Compute Nodes All dies link to memory, interconnect and each other by HyperTransport. Nodes arranged in 3D torus. Interconnect supports message passing and DRMA in hardware. Interconnect supports MPI, SHMEM, PGAS, ARMCI. Image courtesy of NAG. 9

AMD Bulldozer Architecture Image courtesy of Wikipedia 10

Dual-core Interlagos module Image courtesy of NAG.

Phase 3 Performance Comparison 12

Task placement matters 13

Taks placement

CP2K Improving scaling 15

CP2K: Overview CP2K is a freely available (GPL) Density Functional Theory code (+ support for classical, empirical potentials) can perform MD, MC, geometry optimisation, normal mode calculations The Swiss Army Knife of Molecular Simulation (VandeVondele) c.f. CASTEP, VASP, CPMD etc. 16

CP2K: Overview CP2K is a freely available (GPL) Density Functional Theory code (+ support for classical, empirical potentials) can perform MD, MC, geometry optimisation, normal mode calculations The Swiss Army Knife of Molecular Simulation (VandeVondele) c.f. CASTEP, VASP, CPMD etc. 17

CP2K million atom KS-DFT Focussing on CP2K on BlueGene/P (reducing memory usage) scaling to 1,000,000 atoms (estimated as 200,000 cores) Led by Iain Bethune at EPCC Supported by Dr. Joost VandeVondele et al, CP2K Developers at Physical Chemistry Institute, University of Zurich Work done under dcse and PRACE Improved scaling via increased use of OpenMP directives

CP2K mixed mode Performance improvement due to: Reduce impact of algorithms which scale poorly with number of MPI tasks E.g. When using T threads, switchover point from 1D decomposed FFT (more efficient) to 2D decomposed FFT (less efficient) is increased by a factor of T Improved load balancing Existing MPI load balancing algorithms do a coarser load balance, finegrained balance done over OpenMP threads Reduced number of messages significantly Especially on pre-gemini networks For all-to-all communications, message count reduced by factor oft2

CP2K: Functional Evaluation 93% efficiency with 6 threads, 74% with 24 threads Mixed Mode Parallelism in CP2K: A Case Study 20

CP2K: Fast Fourier Transforms CP2K uses a 3D Fourier Transform to turn real data on the plane wave grids into g-space data on the plane wave grids. The grids may be distributed as planes, or rays (pencils) so the FFT may involve one or two transpose steps between the 3 1D FFT operations The 1D FFTs are performed via an interface which supports many libraries e.g. FFTW 2/3 ESSL, ACML, CUDA, FFTSG (in-built) 21

CP2K: Fast Fourier Transforms We can parallelise two parts with OpenMP 1D FFT assign each thread a subset of rows to FFT Buffer packing threads cooperatively pack the buffers which are passed to MPI Communication still handled outside a the parallel regions 22

Simulation at Exascale Software Edinburgh/Tsukuba Workshop, February 2012 23

Scientific Software Chemistry, materials science, climate, oceanography, engineering, plasma physics, paleontology Dye-sensitised solar cells F. Schiffmann and J. VandeVondele University of Zurich Modelling dinosaur gaits Dr Bill Sellers, University of Manchester Fractal-based models of turbulent flows Christos Vassilicos & Sylvain Laizet, Imperial College Edinburgh/Tsukuba Workshop, February 2012

Scientific Usage Profile HECToRXT4 Chemistry/Materials Science, 37.26 Chemistry/Materials Science her/unknown, 38.23 Earth Science/Climate Physics Engineering Other/Unknown Engineering, 1.91 Physics, 6 Earth Science/Climate, 16.6 Edinburgh/Tsukuba Workshop, February 2012

HELIUM 2.9% NAMD 3.3% CP2KNEMO (Hybrid) LAMMPS Fluidity Quantum Espresso DL_POLY ChemShell 0.1% 0.1% 0.7% SENGA 1.0% 0.9% 1.2% 1.7% Terra 1.9% Shelf 2.3% 2.7% CP2K (MPI) 4.5% Others 45.7% CASTEP 5.9% UM 6.4% VASP 17.5% HECToRXT4

Future Look What does the future hold for HPC and the national facility? 2012 2015 2018 System Perf. 20 PFlops 100-200 PFlops 1 EFlops Memory 1 PB 5 PB 10 PB Node Perf. 200 GFlops 400 GFlops 1-10 TFlops Concurrency 32 O(100) O(1000) Interconnect BW 40 GB/s 100 GB/s 200-400 GB/s Nodes 100,000 500,000 O(Million) I/O 2 TB/s 10 TB/s 20 TB/s MTTI Days Days O(1 Day) Power 10 MW 10 MW 20 MW Accelerators: GPGPUs Edinburgh/Tsukuba Workshop, February 2012

Application sustainability National-scale HPC facilities provide a capability resource. For users who want to run calculations that are too large for other resources In reality, in the UK, also gets used for smaller-scale calculations The future of national-scale HPC (as for everyone else): Lots of cores per node (CPU + co-processor) Little memory per core Lots of compute power per network interface The balance of compute to communication power and compute to memory are both radically different to now Need to ensure UK researchers have software that can exploit these resources effectively Edinburgh/Tsukuba Workshop, February 2012 28

Application sustainability Requirements for software on future capability HPC resources: Probably cannot be pure message passing parallel This will not scale on nodes with high amount of compute Must exploit all parallelism at all levels vectorisation, shared-memory, message-passing Must exploit memory hierarchy efficiently Must harness the co-processors/lightweight cores Must be fault-tolerant None of today s large codes meet all these requirements Edinburgh/Tsukuba Workshop, February 2012 29