Introduction to High-Performance Computing

Similar documents
High Performance Computing

Introduction to Parallel Programming

INTRODUCTION TO OPENACC. Analyzing and Parallelizing with OpenACC, Feb 22, 2017

PART I - Fundamentals of Parallel Computing

Introduction to Parallel Programming

ECE 574 Cluster Computing Lecture 1

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

High Performance Computing Course Notes HPC Fundamentals

Overview of High Performance Computing

Let s say I give you a homework assignment today with 100 problems. Each problem takes 2 hours to solve. The homework is due tomorrow.

High Performance Computing (HPC) Introduction

Lecture 9: MIMD Architectures

Parallel Algorithm Engineering

The Use of Cloud Computing Resources in an HPC Environment

High Performance Computing with Accelerators

Trends in HPC (hardware complexity and software challenges)

High Performance Computing Course Notes Course Administration

Introduction to High Performance Computing. Shaohao Chen Research Computing Services (RCS) Boston University

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Performance of computer systems

HPC Architectures. Types of resource currently in use

Real Parallel Computers

PERFORMANCE PORTABILITY WITH OPENACC. Jeff Larkin, NVIDIA, November 2015

Multiprocessors - Flynn s Taxonomy (1966)

The Art of Parallel Processing

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

Parallelism and Concurrency. COS 326 David Walker Princeton University

Introduction to parallel Computing

2: Computer Performance

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

Advanced School in High Performance and GRID Computing November Introduction to Grid computing.

Lecture 7: Parallel Processing

Overview of Parallel Computing. Timothy H. Kaiser, PH.D.

Chapter 7. Multicores, Multiprocessors, and Clusters. Goal: connecting multiple computers to get higher performance

School of Parallel Programming & Parallel Architecture for HPC ICTP October, Intro to HPC Architecture. Instructor: Ekpe Okorafor

The Cray Rainier System: Integrated Scalar/Vector Computing

Steve Scott, Tesla CTO SC 11 November 15, 2011

Real Parallel Computers

Graphics Processor Acceleration and YOU

BİL 542 Parallel Computing

CME 213 S PRING Eric Darve

High Performance Computing

Introduction to High-Performance Computing

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 1. Copyright 2012, Elsevier Inc. All rights reserved. Computer Technology

Introduction to Cluster Computing

Parallel Computing Why & How?

3/24/2014 BIT 325 PARALLEL PROCESSING ASSESSMENT. Lecture Notes:

Copyright 2012, Elsevier Inc. All rights reserved.

Carlo Cavazzoni, HPC department, CINECA

High-Performance Scientific Computing

Parallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer?

High Performance Computing. What is it used for and why?

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010

What does Heterogeneity bring?

Lecture 3: Intro to parallel machines and models

COSC 6385 Computer Architecture - Multi Processor Systems

Chapter 5 Supercomputers

Page 1. Program Performance Metrics. Program Performance Metrics. Amdahl s Law. 1 seq seq 1

Lecture 1: Introduction and Computational Thinking

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

Using an HPC Cloud for Weather Science

CPU-GPU Heterogeneous Computing

Practical Scientific Computing

April 2 nd, Bob Burroughs Director, HPC Solution Sales

Top500 Supercomputer list

Complexity and Advanced Algorithms. Introduction to Parallel Algorithms

Fundamentals of Computer Design

Lecture 9: MIMD Architectures

Introduc)on to High Performance Compu)ng Advanced Research Computing

Parallel Programming Concepts. Tom Logan Parallel Software Specialist Arctic Region Supercomputing Center 2/18/04. Parallel Background. Why Bother?

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

NAMD Serial and Parallel Performance

High Performance Computing The Essential Tool for a Knowledge Economy

EECS4201 Computer Architecture

Lecture 7: Parallel Processing

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues

Advanced School in High Performance and GRID Computing November Mathematical Libraries. Part I

Introduction to FREE National Resources for Scientific Computing. Dana Brunson. Jeff Pummill

GPU Debugging Made Easy. David Lecomber CTO, Allinea Software

Review of previous examinations TMA4280 Introduction to Supercomputing

Advances of parallel computing. Kirill Bogachev May 2016

High Performance Computing. Introduction to Parallel Computing

Heterogeneous CPU+GPU Molecular Dynamics Engine in CHARMM

UCB CS61C : Machine Structures

High Performance Computing from an EU perspective

ICON for HD(CP) 2. High Definition Clouds and Precipitation for Advancing Climate Prediction

Fundamentals of Quantitative Design and Analysis

Lecture 9: MIMD Architecture

COMP528: Multi-core and Multi-Processor Computing

Outline. Lecture 11: EIT090 Computer Architecture. Small-scale MIMD designs. Taxonomy. Anders Ardö. November 25, 2009

Speedup Altair RADIOSS Solvers Using NVIDIA GPU

The STREAM Benchmark. John D. McCalpin, Ph.D. IBM eserver Performance ^ Performance

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Introduction to Parallel Computing

Performance Study of Popular Computational Chemistry Software Packages on Cray HPC Systems

Synonymous with supercomputing Tightly-coupled applications Implemented using Message Passing Interface (MPI) Large of amounts of computing for short

Users and utilization of CERIT-SC infrastructure

The Stampede is Coming: A New Petascale Resource for the Open Science Community

Day 9: Introduction to CHTC

Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures

Transcription:

Introduction to High-Performance Computing Dr. Axel Kohlmeyer Associate Dean for Scientific Computing, CST Associate Director, Institute for Computational Science Assistant Vice President for High-Performance Computing Temple University Philadelphia PA, USA a.kohlmeyer@temple.edu

Why use Computers in Science? Use complex theories without a closed solution: solve equations or problems that can only be solved numerically, i.e. by inserting numbers into expressions and analyzing the results Do impossible experiments: study (virtual) experiments, where the boundary conditions are inaccessible or not controllable Benchmark correctness of models and theories: the better a model/theory reproduces known experimental results, the better its predictions 2

What is High-Performance Computing (HPC)? Definition depends on individual person: HPC is when I care how fast I get an answer Thus HPC can happen on: A workstation, desktop, laptop, smartphone A supercomputer A Linux/MacOS/Windows/... cluster A grid or a cloud Cyberinfrastructure = any combination of the above HPC also means High-Productivity Computing 3

Parallel Workstation Most desktops today are parallel workstations => multi-core processors Running Linux OS (or MacOS X) allows programming like traditional Unix workstation All processors have access to all memory Uniform memory access (UMA): 1 memory pool for all, same speed for all Non-uniform memory access (NUMA): multiple pools, speed depends on distance 4

An HPC Cluster is... A cluster needs: Several computers, often in special cases for easy mounting in a rack (one node ~= one mainboard) One or more networks (interconnects) to access the nodes and for inter-node communication Software that orchestrates communication between parallel processes on the nodes (e.g. MPI) Software that reserves resources to individual users A cluster is: all of those components working together to form one big computer 5

What is Grid Computing? Loosely coupled network of compute resources Needs a middleware for transparent access for inhomogeneous resources Modeled after power grid => share resources not needed right now Run a global authentication framework => Globus, Unicore, Condor, Boinc Run an application specific client => SETI@home, Folding@home 6

What is Cloud Computing? Simplified: Grid computing made easy Grid: use job description to match calculation request to a suitable available host, use distinguished name to uniquely identify users, opportunistic resource management Cloud: provide virtual server instance on shared resource as needed with custom OS image, commercialization (cloud service providers, dedicated or spare server resources), physical location flexible 7

What is Supercomputing (SC)? The most visible manifestation of HPC (=> Top500 List) Is super due to large size, extreme technology Desktop vs. Supercomputer in 2014 (peak): Desktop processor (1 core): ~25 GigaFLOP/s Tesla K40 GPU (2880 cores): >1.4 TeraFLOP/s #1 supercomputer ( Tianhe-2 ): >50 PetaFLOP/s Sustained vs. Peak: K 93%, BG/Q 85%, Cray XK7 65%, Tianhe-2 61%, Cluster 65-90% 8

Why would HPC matter to you? Scientific computing is becoming more important in many research disciplines Problems become more complex, need teams of researchers with diverse expertise Scientific (HPC) application development limited often limited by lack of training More knowledge about HPC leads to more effective use of HPC resources and better interactions with colleagues 9

Research Disciplines in HPC Advanced Scientific Atmospheric Computing Sciences 2% 3% All 19 Others 4% Earth Sciences 3% Chemical, Thermal Systems 5% Materials Research 6% Molecular Biosciences 31% Astronomical Sciences 12% Physics 17% Chemistry 17% 10

My Background Undergraduate training as chemist (physical & organic), PhD in Theoretical Chemistry, University Ulm, Germany Postdoctoral Research Associate, Center for Theoretical Chemistry, Ruhr-University Bochum, Germany Associate Director, Center for Molecular Modeling, University of Pennsylvania, Philadelphia, USA Associate Dean for Scientific Computing, CST, Associate Director, Inst. for Comp. Molecular Science, Temple University, Philadelphia (2009-2012, since 2014) Scientific Computing Expert, International Centre for Theoretical Physics, (2012/13); now external consultant Lecturer at ICTP/SISSA International Master for HPC 11

Why Would I Care About HPC? My problem is big My problem is complex My computer is too small and too slow My software is not efficient and/or not parallel -> often scaling with system size the problem 12

HPC vs. Computer Science Most people in HPC are not computer scientists Software has to be correct first and (then) efficient; packages can be over 30 years old Technology is a mix of high-end & stone age (Extreme hardware, MPI, Fortran, C/C++) So what skills do I need to for HPC: Common sense, cross-discipline perspective Good understanding of calculus and (some) physics Patience and creativity, ability to deal with jargon 13

HPC is a Pragmatic Discipline Raw performance is not always what matters: how long does it take me to get an answer? HPC is more like a craft than a science: => practical experience is most important => leveraging existing solutions is preferred over inventing new ones requiring rewrites => a good solution today is worth more than a better solution tomorrow => but a readable and maintainable solution is better than a complicated one 14

How to Get My Answers Faster? Work harder => get faster hardware (get more funding) Work smarter => use optimized algorithms (libraries!) => write faster code (adapt to match hardware) => trade performance for convenience (e.g. compiled program vs. script program) Delegate parts of the work => parallelize code, (grid/batch computing) => use accelerators (GPU/MIC CUDA/OpenCL) 15

HPC Cluster in 2002 / The Good 16

HPC Cluster in 2002 / The Bad 17

HPC Cluster in 2012 18

A High-Performance Problem 19

Software Optimization Writing maximally efficient code is hard: => most of the time it will not be executed exactly as programmed, not even for assembly Maximally efficient code is not very portable: => cache sizes, pipeline depth, registers, instruction set will be different between CPUs Compilers are smart (but not too smart!) and can do the dirty work for us, but can get fooled => modular programming: generic code for most of the work plus well optimized kernels 20

Two Types of Parallelism Functional parallelism: different people are performing different tasks at the same time Data parallelism: different people are performing the same task, but on different equivalent and independent objects 21

How Do We Measure Performance? For numerical operations: FLOP/s = Floating-Point Operations per second Theoretical maximum (peak) performance: clock rate x number of double precision addition and/or multiplications completed per clock => 2.5 Ghz x 8 FLOP/clock = 20 GigaFLOP/s => can never be reached (data load/store) Real (sustained) performance: => very application dependent => Top500 uses Linpack (linear algebra) 22

Performance of SC Applications Strong scaling: fixed data/problem set; measure speedup with more processors Weak scaling: data/problem set increases with more processors; measure if speed is same Linpack benchmark: weak scaling test, more efficient with more memory => 50-90% peak Climate modeling (WRF): strong scaling test, work distribution limited, load balancing, serial overhead => < 5% peak (similar for MD) 23

Strong Scaling Graph 1 VesicleCG-System CG System/ 30,902,832 / 3,862,854 CG-Beads CG-Beads 8 Vesicles 0.61 0.36 Time per MD Step (sec) Time per MD step (sec) 0.39 0.17 0.24 0.08 0.15 - double logarithmic plot - smaller x-value better (faster) 0.04 0.1 22027 470 63 148 1006 345 2150 805 4596 1878 # Nodes 24

Weak Scaling Graph Weak Scaling: 7,544 CG-Beads/Node Time per MD-Step (sec) 0.2 12 MPI / 1 OpenMP 6 MPI / 2 OpenMP 4 MPI / 3 OpenMP 2 MPI / 6 OpenMP 2 MPI / 6 OpenMP (SP) 0.15 0.1 0.05 512 1024 2048 4096 # Nodes 25

Performance within an Application Rhodopsin Benchmark, 860k Atoms, 64 Nodes, Cray XT5 25 Other Neighbor Comm Kspace Bond Pair Time in seconds 20 15 10 5 0 128 256 384 768 128 256 384 768 768 # PE 26

Amdahl's Law vs. Real Life The speedup of a parallel program is limited by the sequential fraction of the program. This assumes perfect scaling and no overhead 1 Vesicle CG-System, 2 MPI / 6 OpenMP (SP) 100% 75% Perc entage of Time Other I/O 50% Comm Neighbor Kspace Bond 25% Pair 0% 32 64 128 256 512 1024 2048 4096 # of Nodes 27

Introduction to High-Performance Computing Dr. Axel Kohlmeyer Associate Dean for Scientific Computing, CST Associate Director, Institute for Computational Science Assistant Vice President for High-Performance Computing Temple University Philadelphia PA, USA a.kohlmeyer@temple.edu