Programming Techniques for Supercomputers

Size: px
Start display at page:

Download "Programming Techniques for Supercomputers"

Transcription

1 Programming Techniques for Supercomputers Prof. Dr. G. Wellein (a,b) Dr. G. Hager (a) Dr.-Ing. M. Wittmann (a) (a) HPC Services Regionales Rechenzentrum Erlangen (b) Department für Informatik University Erlangen-Nürnberg Sommersemester 2018

2 Audience & Contact Audience Computational Engineering, Computer Science, Computational & Applied Mathematics Physics, Engineering, Materials Science, Chemistry Contact: Gerhard Wellein: Georg Hager: Markus Wittmann:

3 Organization & Format Lecture/Tutorial is completely documented in our moodle LMS: See also PTFS univis entry Please enroll into the lecture and specify your matriculation number! Homework assignments, announcements etc. all handled via moodle 3

4 Organization & Format 4 hours of lecture: Monday 16:15 17:45 in H10 AND Thursday 16:15 17:45 in H4 (Wednesday 12:15 13:45 cancelled?! Due to conflict) DON T BE SHY AND ASK QUESTIONS! 4

5 Organization and Format 2 hours of tutorial: Monday 14:15 15:45 at OR Wednesday 10:15 11:45 at Tutorial "sheets" (homework) available every Monday in moodle Tutorials start next week, i.e You also need CIP pool accounts (ask CIP admins!) First tutorials (next week): Intro to systems handling (logging in via SSH, X forwarding, using compilers, batch jobs) of RRZE cluster 5

6 Format of course Lecture only: 5 ECTS Material covered in the lecture Register in meincampus Written exam: 60 Minutes Lecture & Exercises: ( ) ECTS Material covered in lecture AND tutorial Register for lecture AND exercise in meincampus Written exam: 90 Minutes No supporting material allowed in exam PTFS-CAM students: Please contact me via / in person after the lecture 6

7 Format of the course Prerequisite for exercises: Basic programming knowledge in C/C++ or FORTRAN Using LINUX / UNIX OS environments (including ssh) Recommended First experiences with parallel programming though we will introduce necessary basics 7

8 Supporting material Books: G. Hager and G. Wellein: Introduction to High Performance Computing for Scientists and Engineers. CRC Computational Science Series, ISBN see moodle for a very early version 10 copies are available in the library discounted copies ask us J. Hennessy and D. Patterson: Computer Architecture. A Quantitative Approach. Morgan Kaufmann Publishers, Elsevier, ISBN W. Schönauer: Scientific Supercomputing. (cf. 8

9 Supporting material Documentation: The big ones and more useful HPC related information: 9

10 Related teaching activities Regular seminar on Efficient numerical simulation on multicore processors (MuCoSim) 5 ECTS 2 hrs per week 2 talks + written summary Topics from code optimization, code parallelization and code benchmarking on latest multicore / manycore CPUs and GPUs This semester: Tuesday 16:00 17:30 RRZE (2.049) 10

11 SCOPE OF THE LECTURE 11

12 Scope of the lecture Ability to write hardware efficient serial and parallel programs for (super)computers Hardware coverage: Single-core + Multi-Core: Many-core / GPU: Intel Core i, Intel Xeon E5-2xyz) Intel Xeon Phi / NVIDIA Shared memory nodes Distributed memory computers Single node (RRZE) Compute clusters (RRZE) and MPP (IBM BlueGene, CRAY series) Identify basic hardware concepts and how to efficiently use them Shared Memory Parallel Programming OpenMP Distributed Memory Parallel Programming MPI (Hybrid programming MPI+OpenMP) Performance Analysis & Modeling throughout all topics April 12, 2018 PTfS

13 Performance Analysis and Modeling Scope of the lecture Introduction Performance: Basic, Measuring & Reporting, Benchmarks: Kernels & more Modern processors Single core: Basics, Pipelining, Superscalarity, SIMD Memory Hierarchy Multicore: Technology & Basics Manycore / GPU (*) Parallel computers: Shared Memory Shared-memory system architectures: UMA, ccnuma OpenMP basics Performance Modelling / Engineering: Roofline Model Case Studies: Dense&Sparse Matrix-Vector-Multiplication /Stencils Shared Memory in depth Advanced OpenMP, Pitfalls, Data Placement Parallel computers: Distributed Memory Architecture & Communication networks MPI in a nutshell Hardware performance monitoring and model validation (*) April 12, 2018 PTfS

14 Scope of the lecture!$omp PARALLEL DO do k = 1, Nk do j = 1, Nj; do i = 1, Ni y(i,j,k)= b*( enddo; enddo enddo!$omp END PARALLEL DO Establish limit simple performance model x(i-1,j,k)+ x(i+1,j,k)+ x(i,j-1,k)+ x(i,j+1,k)+ x(i,j,k-1)+ x(i,j,k+1)) Parallelize Parallelize Single core performance optimization 14

15 Introduction Supercomputers: The Big Ones and the working horses

16 Supercomputer A good definition?! Supercomputer is a computer that is only one generation behind what large-scale users want. Neil Lincoln, architect for the CDC Cyber 205 and others A supercomputer does not fit under the desktop! (and you can not plug it into a standard power line) Absolute, rare compute power is not a reasonable measure Assume: Computer is being used for numerical simulation Compute power of a system is measured by Floating Point Operations (MULT, ADD) for a specific numeric benchmark TOP500 list 16

17 Most powerful computers in the world: TOP500 Top 500: Survey of the 500 most powerful supercomputers Solve large dense system of linear equations: A x = b ( LINPACK ) Published twice a year (ISC in Germany, SC in USA) Established in 1993 (CM5/1024): 60 GFlop/s (TOP1) Since Nov (Sunway/China): 93,000,000 GFlop/s (TOP1) Performance increase: 81 % p.a. from Performance measure: MFlop/s, GFlop/s, TFlop/s, PFlop/s, EFlop/s Number of FLOATING POINT operations per second FLOATING POINT operations: double precision (64 bit) Add & Mult ops 10 6 : MFlop/s; 10 9 : GFlop/s; : TFlop/s; : PFlop/s ; : EFlop/s 17

18 TOP5 as of November 2017 R max : LINPACK Performance R peak : Peak Performance Power@ LINPACK Source: 18

19 TOP6-10 as of November 2017 R max : LINPACK Performance R peak : Peak Performance Power@ LINPACK Source: 19

20 TOP16-20 as of November 2017 Non-standard hardware continues until rank 15! Trends: Extreme number of parallelism Top1: 10,000,000 cores Many CPUs at low clock speed (1.3 GHz 1.5 GHz) Use of non-standard CPUs: NVIDIA GPGPUs, Chinese / Japanese processors Power range of TOP20 systems: 1 MW,..., 18 MW Cores Peak LINPACK Power Source: 20

21 TOP5: Why GPUs & special purpose? Energy efficiency Rmax/Power GF/J GF/J 6,1 8,6 14,2 4,0 1,6 2,0 21

22 Performance Trend & Projection ExaFlop/s machine by the end of this decade? Basic trend: Slope changes performance increase slows down Source: 22

23 Question? Current GPGPU (CPU) techology: approx. 20 GF/s W GF/s (5 ) W How much power does an ExaFlop (EF/s) consume? 1 EF s GF = 109 s ExaFlop GPGU machine: 109 GF/s 20 GF/s W ExaFlop CPU machine: 109 GF/s 5 GF/s W = 50 MW = 200 MW Power 15ct/kWhrs: 1 MW 1,300,000 p.a. Engery consumption is major issue for centers and users! 23

24 HPC Centers in Germany: A view from Erlangen Jülich Supercomputing Center BlueGene/Q 5.8 PFlop/s Hannover Berlin FZ Jülich Erlangen/ Nürnberg (0.5 PF/s) HLRS-Stuttgart LRZ-München HLR Stuttgart:: 7.4 PF (CRAY XC 40) IBM Cluster: 2*3 PF 24

25 SuperMUC LRZ Garching: TOP 4 (June 2012) Thin nodes: 18 Islands with 512 nodes each 2 Intel Xeon E processors (8 cores & 2.7 GHz baseline) per node 147,456 cores 3.2 PF/s Peak 2.9 PF/s LINPACK Fat nodes: 1 Island: 205 nodes 4 Intel Xeon E C per node 256 GB/node Total power consumption: 2.5 MW 3 MW Upgrade to 3+3 PF/s (Peak) in 2014 (with Intel Haswell proc.) 25

26 RRZE: Meggie -cluster 728 Compute nodes ( cores) 2 Intel Xeon E v4 (Broadwell) 2.2 GHz (10 cores) 20 cores/ node + SMT cores 64 GB main memory NO local disks Peak Performance: R peak = 0.5 PF/s Floating Point Ops/s #346@TOP500 Nov R max = 0.48 PF/s Intel OmniPath network: Up to 100 Gbit/s Price: 2,5 Mio. Power consumption: 120 KW KW (depending on workload) HPC am RRZE Gerhard Wellein 26

27 RRZE: Emmy -cluster 544 compute nodes ( cores) with 2 Intel Xeon E5-2660v2 (Ivy Bridge) 2.2 GHz (10 cores) 20 cores/ node + SMT cores 64 GB main memory NO local disks 16 accelerator nodes same CPUs 8 nodes with 2 x NVIDIA K20 GPGPUs 8 nodes with 2 x Intel Xeon Phi Vendor: NEC (Dual-Twin Supermicro) Power consumption: ~160 KW (backdoor heat exchanger) Full QuadDataRate Infiniband fat tree BW ~ 3 GB/s / direction and < 2 µs latency Parallel Filesystem: 400 TB+ (max. 7 GB/s) Operating system: LINUX Peak performance: 234 TFlop/s (all devices) 191 TFlop/s LINPACK (CPUs) #210 in TOP500 / Nov

28 Power 260 GHz (2009) 540 GHz (2013) 720 GHz (2016) Trends: Clock speed reduces / stagnates (Minor) energy efficiency improvements Power consumption depends on workload Not shown: Electric power for cooling is approx 50% of power drawn by clusters 28

29 Prepare computer access: Send to containing name, IDM account, Matrikelnummer Tour through computer room 29

for Supercomputers Prof. Dr. G. Wellein (a,b), Dr. G. Hager (a), J. Habich (a) HPC Services Regionales Rechenzentrum Erlangen (b)

for Supercomputers Prof. Dr. G. Wellein (a,b), Dr. G. Hager (a), J. Habich (a) HPC Services Regionales Rechenzentrum Erlangen (b) Programming Techniques for Supercomputers Prof. Dr. G. Wellein (a,b), Dr. G. Hager (a), J. Habich (a) (a) HPC Services Regionales Rechenzentrum Erlangen (b) Department für Informatik University Erlangen-Nürnberg

More information

Efficient numerical simulation on multicore processors (MuCoSim)

Efficient numerical simulation on multicore processors (MuCoSim) Efficient numerical simulation on multicore processors (MuCoSim) 13.10.2015 Prof. Gerhard Wellein, Dr. G. Hager Department für Informatik & HPC Services Regionales Rechenzentrum Erlangen (RRZE) http://moodle.rrze.uni-erlangen.de/course/view.php?id=340

More information

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29 Introduction CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction Spring 2018 1 / 29 Outline 1 Preface Course Details Course Requirements 2 Background Definitions

More information

Practical Scientific Computing

Practical Scientific Computing Practical Scientific Computing Performance-optimized Programming Preliminary discussion: July 11, 2008 Dr. Ralf-Peter Mundani, mundani@tum.de Dipl.-Ing. Ioan Lucian Muntean, muntean@in.tum.de MSc. Csaba

More information

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel

More information

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES P(ND) 2-2 2014 Guillaume Colin de Verdière OCTOBER 14TH, 2014 P(ND)^2-2 PAGE 1 CEA, DAM, DIF, F-91297 Arpajon, France October 14th, 2014 Abstract:

More information

HPC Architectures. Types of resource currently in use

HPC Architectures. Types of resource currently in use HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Parallel Programming. Michael Gerndt Technische Universität München

Parallel Programming. Michael Gerndt Technische Universität München Parallel Programming Michael Gerndt Technische Universität München gerndt@in.tum.de Contents 1. Introduction 2. Parallel architectures 3. Parallel applications 4. Parallelization approach 5. OpenMP 6.

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

Practical Scientific Computing

Practical Scientific Computing Practical Scientific Computing Performance-optimised Programming Preliminary discussion, 17.7.2007 Dr. Ralf-Peter Mundani, mundani@tum.de Dipl.-Ing. Ioan Lucian Muntean, muntean@in.tum.de Dipl.-Geophys.

More information

High Performance Computing - Parallel Computers and Networks. Prof Matt Probert

High Performance Computing - Parallel Computers and Networks. Prof Matt Probert High Performance Computing - Parallel Computers and Networks Prof Matt Probert http://www-users.york.ac.uk/~mijp1 Overview Parallel on a chip? Shared vs. distributed memory Latency & bandwidth Topology

More information

Advanced High Performance Computing CSCI 580

Advanced High Performance Computing CSCI 580 Advanced High Performance Computing CSCI 580 2:00 pm - 3:15 pm Tue & Thu Marquez Hall 322 Timothy H. Kaiser, Ph.D. tkaiser@mines.edu CTLM 241A http://inside.mines.edu/~tkaiser/csci580fall13/ 1 Two Similar

More information

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

Overview. CS 472 Concurrent & Parallel Programming University of Evansville Overview CS 472 Concurrent & Parallel Programming University of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information Science, University

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0) PRACE 16th Call Technical Guidelines for Applicants V1: published on 26/09/17 TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0) The contributing sites and the corresponding computer systems

More information

Parallel computer architecture classification

Parallel computer architecture classification Parallel computer architecture classification Hardware Parallelism Computing: execute instructions that operate on data. Computer Instructions Data Flynn s taxonomy (Michael Flynn, 1967) classifies computer

More information

HPC Algorithms and Applications

HPC Algorithms and Applications HPC Algorithms and Applications Intro Michael Bader Winter 2015/2016 Intro, Winter 2015/2016 1 Part I Scientific Computing and Numerical Simulation Intro, Winter 2015/2016 2 The Simulation Pipeline phenomenon,

More information

Efficient numerical simulation on multicore processors (MuCoSim) WS 2017

Efficient numerical simulation on multicore processors (MuCoSim) WS 2017 ERLANGEN REGIONAL COMPUTING CENTER Efficient numerical simulation on multicore processors (MuCoSim) WS 2017 Prof. Gerhard Wellein, Dr. G. Hager Department für Informatik & HPC Services Regionales Rechenzentrum

More information

Lecture 1. Introduction Course Overview

Lecture 1. Introduction Course Overview Lecture 1 Introduction Course Overview Welcome to CSE 260! Your instructor is Scott Baden baden@ucsd.edu Office: room 3244 in EBU3B Office hours Week 1: Today (after class), Tuesday (after class) Remainder

More information

Parallel Programming

Parallel Programming Parallel Programming Introduction Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Acknowledgements Prof. Felix Wolf, TU Darmstadt Prof. Matthias

More information

Performance Engineering - Case study: Jacobi stencil

Performance Engineering - Case study: Jacobi stencil Performance Engineering - Case study: Jacobi stencil The basics in two dimensions (2D) Layer condition in 2D From 2D to 3D OpenMP parallelization strategies and layer condition in 3D NT stores Prof. Dr.

More information

Intel Many Integrated Core (MIC) Architecture

Intel Many Integrated Core (MIC) Architecture Intel Many Integrated Core (MIC) Architecture Karl Solchenbach Director European Exascale Labs BMW2011, November 3, 2011 1 Notice and Disclaimers Notice: This document contains information on products

More information

High Performance Computing. What is it used for and why?

High Performance Computing. What is it used for and why? High Performance Computing What is it used for and why? Overview What is it used for? Drivers for HPC Examples of usage Why do you need to learn the basics? Hardware layout and structure matters Serial

More information

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Moore s law Intel Sandy Bridge EP: 2.3 billion Nvidia

More information

Brand-New Vector Supercomputer

Brand-New Vector Supercomputer Brand-New Vector Supercomputer NEC Corporation IT Platform Division Shintaro MOMOSE SC13 1 New Product NEC Released A Brand-New Vector Supercomputer, SX-ACE Just Now. Vector Supercomputer for Memory Bandwidth

More information

Motivation Goal Idea Proposition for users Study

Motivation Goal Idea Proposition for users Study Exploring Tradeoffs Between Power and Performance for a Scientific Visualization Algorithm Stephanie Labasan Computer and Information Science University of Oregon 23 November 2015 Overview Motivation:

More information

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Mitglied der Helmholtz-Gemeinschaft Welcome to the Jülich Supercomputing Centre D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Schedule: Thursday, Nov 26 13:00-13:30

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all

More information

Programming Techniques for Supercomputers. HPC RRZE University Erlangen-Nürnberg Sommersemester 2018

Programming Techniques for Supercomputers. HPC RRZE University Erlangen-Nürnberg Sommersemester 2018 Programming Techniques for Supercomputers HPC Services @ RRZE University Erlangen-Nürnberg Sommersemester 2018 Outline Login to RRZE s Emmy cluster Basic environment Some guidelines First Assignment 2

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt

More information

Fujitsu s Approach to Application Centric Petascale Computing

Fujitsu s Approach to Application Centric Petascale Computing Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview

More information

HPC-CINECA infrastructure: The New Marconi System. HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati,

HPC-CINECA infrastructure: The New Marconi System. HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati, HPC-CINECA infrastructure: The New Marconi System HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati, g.amati@cineca.it Agenda 1. New Marconi system Roadmap Some performance info

More information

Computer Architecture

Computer Architecture Informatics 3 Computer Architecture Dr. Vijay Nagarajan Institute for Computing Systems Architecture, School of Informatics University of Edinburgh (thanks to Prof. Nigel Topham) General Information Instructor

More information

AutoTune Workshop. Michael Gerndt Technische Universität München

AutoTune Workshop. Michael Gerndt Technische Universität München AutoTune Workshop Michael Gerndt Technische Universität München AutoTune Project Automatic Online Tuning of HPC Applications High PERFORMANCE Computing HPC application developers Compute centers: Energy

More information

ORAP Forum October 10, 2013

ORAP Forum October 10, 2013 Towards Petaflop simulations of core collapse supernovae ORAP Forum October 10, 2013 Andreas Marek 1 together with Markus Rampp 1, Florian Hanke 2, and Thomas Janka 2 1 Rechenzentrum der Max-Planck-Gesellschaft

More information

Habanero Operating Committee. January

Habanero Operating Committee. January Habanero Operating Committee January 25 2017 Habanero Overview 1. Execute Nodes 2. Head Nodes 3. Storage 4. Network Execute Nodes Type Quantity Standard 176 High Memory 32 GPU* 14 Total 222 Execute Nodes

More information

High Performance Computing in C and C++

High Performance Computing in C and C++ High Performance Computing in C and C++ Rita Borgo Computer Science Department, Swansea University WELCOME BACK Course Administration Contact Details Dr. Rita Borgo Home page: http://cs.swan.ac.uk/~csrb/

More information

Multicore-aware parallelization strategies for efficient temporal blocking (BMBF project: SKALB)

Multicore-aware parallelization strategies for efficient temporal blocking (BMBF project: SKALB) Multicore-aware parallelization strategies for efficient temporal blocking (BMBF project: SKALB) G. Wellein, G. Hager, M. Wittmann, J. Habich, J. Treibig Department für Informatik H Services, Regionales

More information

The Mont-Blanc approach towards Exascale

The Mont-Blanc approach towards Exascale http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Vijay Nagarajan and Prof. Nigel Topham! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors

More information

High Performance Computing

High Performance Computing High Performance Computing ADVANCED SCIENTIFIC COMPUTING Dr. Ing. Morris Riedel Adjunct Associated Professor School of Engineering and Natural Sciences, University of Iceland Research Group Leader, Juelich

More information

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications

More information

Parallel Computing: From Inexpensive Servers to Supercomputers

Parallel Computing: From Inexpensive Servers to Supercomputers Parallel Computing: From Inexpensive Servers to Supercomputers Lyle N. Long The Pennsylvania State University & The California Institute of Technology Seminar to the Koch Lab http://www.personal.psu.edu/lnl

More information

Objective. We will study software systems that permit applications programs to exploit the power of modern high-performance computers.

Objective. We will study software systems that permit applications programs to exploit the power of modern high-performance computers. CS 612 Software Design for High-performance Architectures 1 computers. CS 412 is desirable but not high-performance essential. Course Organization Lecturer:Paul Stodghill, stodghil@cs.cornell.edu, Rhodes

More information

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Mitglied der Helmholtz-Gemeinschaft Welcome to the Jülich Supercomputing Centre D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Schedule: Monday, May 18 13:00-13:30 Welcome

More information

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich JÜLICH SUPERCOMPUTING CENTRE Site Introduction 09.04.2018 Michael Stephan JSC @ Forschungszentrum Jülich FORSCHUNGSZENTRUM JÜLICH Research Centre Jülich One of the 15 Helmholtz Research Centers in Germany

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 14 th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 14 th CALL (T ier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 14 th CALL (T ier0) Contributing sites and the corresponding computer systems for this call are: GENCI CEA, France Bull Bullx cluster GCS HLRS, Germany Cray

More information

Real Parallel Computers

Real Parallel Computers Real Parallel Computers Modular data centers Overview Short history of parallel machines Cluster computing Blue Gene supercomputer Performance development, top-500 DAS: Distributed supercomputing Short

More information

Optimising the Mantevo benchmark suite for multi- and many-core architectures

Optimising the Mantevo benchmark suite for multi- and many-core architectures Optimising the Mantevo benchmark suite for multi- and many-core architectures Simon McIntosh-Smith Department of Computer Science University of Bristol 1 Bristol's rich heritage in HPC The University of

More information

Programming Techniques for Supercomputers: Modern processors. Architecture of the memory hierarchy

Programming Techniques for Supercomputers: Modern processors. Architecture of the memory hierarchy Programming Techniques for Supercomputers: Modern processors Architecture of the memory hierarchy Prof. Dr. G. Wellein (a,b), Dr. G. Hager (a), Dr. M. Wittmann (a) (a) HPC Services Regionales Rechenzentrum

More information

Hybrid Architectures Why Should I Bother?

Hybrid Architectures Why Should I Bother? Hybrid Architectures Why Should I Bother? CSCS-FoMICS-USI Summer School on Computer Simulations in Science and Engineering Michael Bader July 8 19, 2013 Computer Simulations in Science and Engineering,

More information

Introduction to High-Performance Computing

Introduction to High-Performance Computing Introduction to High-Performance Computing Dr. Axel Kohlmeyer Associate Dean for Scientific Computing, CST Associate Director, Institute for Computational Science Assistant Vice President for High-Performance

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.

More information

I/O Monitoring at JSC, SIONlib & Resiliency

I/O Monitoring at JSC, SIONlib & Resiliency Mitglied der Helmholtz-Gemeinschaft I/O Monitoring at JSC, SIONlib & Resiliency Update: I/O Infrastructure @ JSC Update: Monitoring with LLview (I/O, Memory, Load) I/O Workloads on Jureca SIONlib: Task-Local

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt

More information

The Stampede is Coming: A New Petascale Resource for the Open Science Community

The Stampede is Coming: A New Petascale Resource for the Open Science Community The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation

More information

Thread and Data parallelism in CPUs - will GPUs become obsolete?

Thread and Data parallelism in CPUs - will GPUs become obsolete? Thread and Data parallelism in CPUs - will GPUs become obsolete? USP, Sao Paulo 25/03/11 Carsten Trinitis Carsten.Trinitis@tum.de Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR) Institut für

More information

Computer Architecture!

Computer Architecture! Informatics 3 Computer Architecture! Dr. Boris Grot and Dr. Vijay Nagarajan!! Institute for Computing Systems Architecture, School of Informatics! University of Edinburgh! General Information! Instructors

More information

Computer Architecture

Computer Architecture Informatics 3 Computer Architecture Dr. Boris Grot and Dr. Vijay Nagarajan Institute for Computing Systems Architecture, School of Informatics University of Edinburgh General Information Instructors: Boris

More information

HPC Technology Trends

HPC Technology Trends HPC Technology Trends High Performance Embedded Computing Conference September 18, 2007 David S Scott, Ph.D. Petascale Product Line Architect Digital Enterprise Group Risk Factors Today s s presentations

More information

Parallel & Scalable Machine Learning Introduction to Machine Learning Algorithms

Parallel & Scalable Machine Learning Introduction to Machine Learning Algorithms Parallel & Scalable Machine Learning Introduction to Machine Learning Algorithms Dr. Ing. Morris Riedel Adjunct Associated Professor School of Engineering and Natural Sciences, University of Iceland Research

More information

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010

Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010 Making Supercomputing More Available and Accessible Windows HPC Server 2008 R2 Beta 2 Microsoft High Performance Computing April, 2010 Windows HPC Server 2008 R2 Windows HPC Server 2008 R2 makes supercomputing

More information

High Performance Computing. What is it used for and why?

High Performance Computing. What is it used for and why? High Performance Computing What is it used for and why? Overview What is it used for? Drivers for HPC Examples of usage Why do you need to learn the basics? Hardware layout and structure matters Serial

More information

Introduction to Xeon Phi. Bill Barth January 11, 2013

Introduction to Xeon Phi. Bill Barth January 11, 2013 Introduction to Xeon Phi Bill Barth January 11, 2013 What is it? Co-processor PCI Express card Stripped down Linux operating system Dense, simplified processor Many power-hungry operations removed Wider

More information

Presentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories

Presentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories HPC Benchmarking Presentations: Jack Dongarra, University of Tennessee & ORNL The HPL Benchmark: Past, Present & Future Mike Heroux, Sandia National Laboratories The HPCG Benchmark: Challenges It Presents

More information

TOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology

TOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology TOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology BY ERICH STROHMAIER COMPUTER SCIENTIST, FUTURE TECHNOLOGIES GROUP, LAWRENCE BERKELEY

More information

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, Mateo Valero SC 13, November 19 th 2013, Denver, CO, USA

More information

Basics of performance modeling for numerical applications: Roofline model and beyond

Basics of performance modeling for numerical applications: Roofline model and beyond Basics of performance modeling for numerical applications: Roofline model and beyond Georg Hager, Jan Treibig, Gerhard Wellein SPPEXA PhD Seminar RRZE April 30, 2014 Prelude: Scalability 4 the win! Scalability

More information

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS

Hybrid KAUST Many Cores and OpenACC. Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Hybrid Computing @ KAUST Many Cores and OpenACC Alain Clo - KAUST Research Computing Saber Feki KAUST Supercomputing Lab Florent Lebeau - CAPS + Agenda Hybrid Computing n Hybrid Computing n From Multi-Physics

More information

Our Workshop Environment

Our Workshop Environment Our Workshop Environment John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2015 Our Environment Today Your laptops or workstations: only used for portal access Blue Waters

More information

8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2

8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2 CSE 820 Graduate Computer Architecture Richard Enbody Dr. Enbody 1 st Day 2 1 Why Computer Architecture? Improve coding. Knowledge to make architectural choices. Ability to understand articles about architecture.

More information

ECE 574 Cluster Computing Lecture 1

ECE 574 Cluster Computing Lecture 1 ECE 574 Cluster Computing Lecture 1 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 22 January 2019 ECE574 Distribute and go over syllabus http://web.eece.maine.edu/~vweaver/classes/ece574/ece574_2019s.pdf

More information

Organizational issues (I)

Organizational issues (I) COSC 6385 Computer Architecture Introduction and Organizational Issues Fall 2009 Organizational issues (I) Classes: Monday, 1.00pm 2.30pm, SEC 202 Wednesday, 1.00pm 2.30pm, SEC 202 Evaluation 25% homework

More information

LRZ SuperMUC One year of Operation

LRZ SuperMUC One year of Operation LRZ SuperMUC One year of Operation IBM Deep Computing 13.03.2013 Klaus Gottschalk IBM HPC Architect Leibniz Computing Center s new HPC System is now installed and operational 2 SuperMUC Technical Highlights

More information

HPC Architectures past,present and emerging trends

HPC Architectures past,present and emerging trends HPC Architectures past,present and emerging trends Andrew Emerson, Cineca a.emerson@cineca.it 27/09/2016 High Performance Molecular 1 Dynamics - HPC architectures Agenda Computational Science Trends in

More information

Parallel Computer Architecture - Basics -

Parallel Computer Architecture - Basics - Parallel Computer Architecture - Basics - Christian Terboven 19.03.2012 / Aachen, Germany Stand: 15.03.2012 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda Processor

More information

GPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA

GPU COMPUTING AND THE FUTURE OF HPC. Timothy Lanfear, NVIDIA GPU COMPUTING AND THE FUTURE OF HPC Timothy Lanfear, NVIDIA ~1 W ~3 W ~100 W ~30 W 1 kw 100 kw 20 MW Power-constrained Computers 2 EXASCALE COMPUTING WILL ENABLE TRANSFORMATIONAL SCIENCE RESULTS First-principles

More information

Fra superdatamaskiner til grafikkprosessorer og

Fra superdatamaskiner til grafikkprosessorer og Fra superdatamaskiner til grafikkprosessorer og Brødtekst maskinlæring Prof. Anne C. Elster IDI HPC/Lab Parallel Computing: Personal perspective 1980 s: Concurrent and Parallel Pascal 1986: Intel ipsc

More information

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS

More information

GPU computing at RZG overview & some early performance results. Markus Rampp

GPU computing at RZG overview & some early performance results. Markus Rampp GPU computing at RZG overview & some early performance results Markus Rampp Introduction Outline Hydra configuration overview GPU software environment Benchmarking and porting activities Team Renate Dohmen

More information

High-Performance Scientific Computing

High-Performance Scientific Computing High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org

More information

Vectorisation and Portable Programming using OpenCL

Vectorisation and Portable Programming using OpenCL Vectorisation and Portable Programming using OpenCL Mitglied der Helmholtz-Gemeinschaft Jülich Supercomputing Centre (JSC) Andreas Beckmann, Ilya Zhukov, Willi Homberg, JSC Wolfram Schenck, FH Bielefeld

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Preparing GPU-Accelerated Applications for the Summit Supercomputer Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership

More information

Update on LRZ Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities. 2 Oct 2018 Prof. Dr. Dieter Kranzlmüller

Update on LRZ Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities. 2 Oct 2018 Prof. Dr. Dieter Kranzlmüller Update on LRZ Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities 2 Oct 2018 Prof. Dr. Dieter Kranzlmüller 1 Leibniz Supercomputing Centre Bavarian Academy of Sciences and

More information

Parallel Programming on Ranger and Stampede

Parallel Programming on Ranger and Stampede Parallel Programming on Ranger and Stampede Steve Lantz Senior Research Associate Cornell CAC Parallel Computing at TACC: Ranger to Stampede Transition December 11, 2012 What is Stampede? NSF-funded XSEDE

More information

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Multi-core today: Intel Xeon 600v4 (016) Xeon E5-600v4 Broadwell

More information

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center The Stampede is Coming Welcome to Stampede Introductory Training Dan Stanzione Texas Advanced Computing Center dan@tacc.utexas.edu Thanks for Coming! Stampede is an exciting new system of incredible power.

More information

Gerald Schubert 1, Georg Hager 2, Holger Fehske 1, Gerhard Wellein 2,3 1

Gerald Schubert 1, Georg Hager 2, Holger Fehske 1, Gerhard Wellein 2,3 1 Parallel lsparse matrix-vector ti t multiplication li as a test case for hybrid MPI+OpenMP programming Gerald Schubert 1, Georg Hager 2, Holger Fehske 1, Gerhard Wellein 2,3 1 Institute of Physics, University

More information

Moore s Law. CS 6534: Tech Trends / Intro. Good Ol Days: Frequency Scaling. The Power Wall. Charles Reiss. 24 August 2016

Moore s Law. CS 6534: Tech Trends / Intro. Good Ol Days: Frequency Scaling. The Power Wall. Charles Reiss. 24 August 2016 Moore s Law CS 6534: Tech Trends / Intro Microprocessor Transistor Counts 1971-211 & Moore's Law 2,6,, 1,,, Six-Core Core i7 Six-Core Xeon 74 Dual-Core Itanium 2 AMD K1 Itanium 2 with 9MB cache POWER6

More information

Overview. High Performance Computing - History of the Supercomputer. Modern Definitions (II)

Overview. High Performance Computing - History of the Supercomputer. Modern Definitions (II) Overview High Performance Computing - History of the Supercomputer Dr M. Probert Autumn Term 2017 Early systems with proprietary components, operating systems and tools Development of vector computing

More information

The Energy Challenge in HPC

The Energy Challenge in HPC ARNDT BODE Professor Arndt Bode is the Chair for Computer Architecture at the Leibniz-Supercomputing Center. He is Full Professor for Informatics at TU Mü nchen. His main research includes computer architecture,

More information

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 The New Generation of Supercomputers Hybrid

More information

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues Top500 Supercomputer list represent parallel computers, so distributed systems such as SETI@Home are not considered Does not consider storage or I/O issues Both custom designed machines and commodity machines

More information

Steve Scott, Tesla CTO SC 11 November 15, 2011

Steve Scott, Tesla CTO SC 11 November 15, 2011 Steve Scott, Tesla CTO SC 11 November 15, 2011 What goal do these products have in common? Performance / W Exaflop Expectations First Exaflop Computer K Computer ~10 MW CM5 ~200 KW Not constant size, cost

More information

Prototyping in PRACE PRACE Energy to Solution prototype at LRZ

Prototyping in PRACE PRACE Energy to Solution prototype at LRZ Prototyping in PRACE PRACE Energy to Solution prototype at LRZ Torsten Wilde 1IP-WP9 co-lead and 2IP-WP11 lead (GSC-LRZ) PRACE Industy Seminar, Bologna, April 16, 2012 Leibniz Supercomputing Center 2 Outline

More information

What have we learned from the TOP500 lists?

What have we learned from the TOP500 lists? What have we learned from the TOP500 lists? Hans Werner Meuer University of Mannheim and Prometeus GmbH Sun HPC Consortium Meeting Heidelberg, Germany June 19-20, 2001 Outlook TOP500 Approach Snapshots

More information

Quantifying performance bottlenecks of stencil computations using the Execution-Cache-Memory model

Quantifying performance bottlenecks of stencil computations using the Execution-Cache-Memory model ERLANGEN REGIONAL COMPUTING CENTER Quantifying performance bottlenecks of stencil computations using the Execution-Cache-Memory model Holger Stengel, J. Treibig, G. Hager, G. Wellein Erlangen Regional

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

The STREAM Benchmark. John D. McCalpin, Ph.D. IBM eserver Performance ^ Performance

The STREAM Benchmark. John D. McCalpin, Ph.D. IBM eserver Performance ^ Performance The STREAM Benchmark John D. McCalpin, Ph.D. IBM eserver Performance 2005-01-27 History Scientific computing was largely based on the vector paradigm from the late 1970 s through the 1980 s E.g., the classic

More information