for Supercomputers Prof. Dr. G. Wellein (a,b), Dr. G. Hager (a), J. Habich (a) HPC Services Regionales Rechenzentrum Erlangen (b)
|
|
- Earl Collins
- 6 years ago
- Views:
Transcription
1 Programming Techniques for Supercomputers Prof. Dr. G. Wellein (a,b), Dr. G. Hager (a), J. Habich (a) (a) HPC Services Regionales Rechenzentrum Erlangen (b) Department für Informatik University Erlangen-Nürnberg Sommersemester 2011
2 Audience Audience Computational Engineering, Computer Science Physics, Engineering, Materials Science, Contact: Gerhard Wellein: Georg Hager: Johannes Habich: Lecture is completely documented in our Moodle LMS: erlangen de/moodle/course/view php?id=145 Please enroll into the lecture and specify your matriculation number Homework assignments, submissions, credits etc. all conducted in Moodle 2
3 Format of course Lectures/tutorials: 4 hours of lecture: Wednesday 12:15 13:45 in E1.11 Thursday 10:00 11:30 in Slides of each lecture are available in the moodle 2 hours of tutorial: Thur. 12:15 13:45 at (CIP-Pool) Pool) Wed. 10:00 11:30 at 0.01 Exercise "sheets" (homework) available every Wednesday in Moodle, to be turned in one week later before the tutorial You also need CIP pool accounts (ask CIP admins!) First tutorial (May 5 th ): Intro to systems handling (logging in via SSH, X forwarding, using compilers, batch jobs) of RRZE cluster Please interrupt and ask questions! 3
4 Format of course Schein: Lecture only: 5 ECTS Oral Examination Register for in meincampus Lecture & Exercises: (5 + 2,5) ECTS At least 50% of the credits in the exercises Oral Examination Register for AND in meincampus Prerequisite for exercises: Basic programming g knowledge in C/C++ or FORTRAN Using LINUX / UNIX OS environments 4
5 Scope of course Ability to write (simple & efficient) parallel programs for modern parallel supercomputers Introduction to architecture of Single core/processor Multi-Core processors Shared memory nodes Distributed memory computers GPU x86_64 based architectures x86_64 dual- /quad-/octo-core Typical cluster nodes (RRZE) Compute clusters (RRZE) and MPP (IBM BlueGene, CRAY XT) nvidia Efficient programming and optimization strategies Concepts, Potentials & Pitfalls of Parallel Computing Shared Memory Parallel Programming OpenMP Distributed Memory Parallel Programming MPI Hybrid programming MPI+OpenMP Performance Analysis & Modeling throughout all topics 5
6 Scope of the course Introduction Colored slides, Performance: Measuring & Reporting, Standard benchmarks: Kernels & more Single Core: Architecture and optimization strategies Pipelining, i Caches, Blocking, Foundations of parallel processing Parallel processing (1): Multi-Core parallel processing for the masses Shared-memory system architectures & programming techniques multi-core, multi-socket, multi-everything, UMA, ccnuma, Parallel processing (2): Distributed-memory system architectures & programming techniques networks, clusters, MPI, Parallel processing (3): Hybrid programming techniques: MPI + OpenMP Parallel processing (4): GPU: nvidia & CUDA Modeling Perfor rmance Analys sis and 6
7 Supporting material Books: G. Hager and G. Wellein: Introduction to High Performance Computing for Scientists and Engineers. CRC Computational Science Series, ISBN see moodle for a very early version 10 copies are available in the library discounted copies ask us J. Hennessy and D. Patterson: Computer Architecture. t A Quantitative Approach. Morgan Kaufmann Publishers, Elsevier, ISBN W. Schönauer: Scientific Supercomputing. (cf. 7
8 Supporting material Documentation: t l / d t / / l The big ones and more useful HPC related information: 8
9 Introduction (1) Why Supercomputers? The Big Ones and the working horses
10 Computational Science drives the need for supercomputing The Two Principles of Science Theory Three Mathematical Models, Differential Equations, Newton Experiments Observation and prototypes empirical Sciences Computational Science Simulation, Optimization (quantitative) virtual Reality 10
11 Motivation: Supercomputers in everyday's life Engineering (Automotive): Crash simulations Aerodynamics Meteorology: Weather forecasts Hurricane warnings Did you know? Roasting Pringles Designing g engines for chain saws Drug design. 11
12 Motivation: MD Simulation of HIV protease dynamics Time Step: 1 fs (10-15 s) Real time: 10 ns (10-8 s) Compute time: CPU-hrs 8 CPUs 90 days Courtesy: Prof. Sticht, Bio-Informatics, Emil-Fischer Center, FAU 12
13 Motivation: Lattice Boltzmann flow solvers 13 Figures by courtesy of LS CAB-Braunschweig, Thomas Zeiser, N. Thuerey
14 Motivation Industrial Usage of HLRS systems Co osts CPU Hours Product cycle Ot her s Porsche T-Syst ems Porsche Panamera 14 Michael M. Resch High Performance Computing Center Stuttgart
15 Supercomputer A good definition?! Supercomputer is a computer that is only one generation behind what large-scale users want. Neil Lincoln, architect t for the CDC Cyber 205 and others A supercomputer does not fit under the desktop! Absolute, rare compute power is not a reasonable measure Assume: Computer is being used for numerical simulation Compute power of a system is measured by Floating Point Operations (MULT, ADD) for a specific numeric benchmark TOP500 list 15
16 Most powerful computers in the world: TOP500 Top 500: Survey of the 500 most powerful supercomputers Solve a large system of linear equations: A x = b ( LINPACK ) Published twice a year (ISC in Germany, SC in USA) Established in 1993 (CM5/1024): 60 GFlop/s (Top1) Nov 2010 (Tianhe-1A): GFlop/s (Top1) Performance increase: 87 % p.a. over almost 2 decades! Today s Laptop p Performance measure: MFlop/s, GFlop/s, TFlop/s, PFlop/s Number of FLOATING POINT operations per second FLOATING POINT operations: double precision (64 bit) Add & Mult ops 10 6 : MFlop/s; 10 9 : GFlop/s; : TFlop/s; : PFlop/s 16
17 TOP10 as of November 2010 Power consumption 4.0 MW 69MW MW 1 MW 1.75 Mio p.a. 23MW
18 Top500 list as of November 2010 Clusters, clusters, clusters with GBit Interconnect 18
19 TOP500 is going massively parallel (Nov. 99/04/09) 8 x 16 x 19
20 TOP500 is going massively parallel 20
21 Top500: Power becomes a real problem Absolute Power Levels kwatt t Electricity costs: ~ 1.5 Mio. p.a TOP500 Ranking By courtesy of H. Meuer - page 21
22 The next step? ExaFLOP dreams, visions and fears Performance Projections 10 Eflop/s 1 Eflop/s 100 Pflop/s 10 Pflop/s 1Pflop/s 100 Tflop/s 10 Tflop/s 1Tflop/s 100 Gflop/s 10 Gflop/s 1 Gflop/s 100 Mflop/s 1 Eflop/s 1 Pflop/s 1 Tflop/s SUM N=1 N= Notebook 6-8 years 8-10 years Notebook Intel 6th EMEA HPC Roundtable page 1 22
23 TOP1: Tianhe-1A 7168 compute nodes each with 2 x 6-core Intel Xeon/Westmere 2.93 GHz 1 x nvidia Tesla M2050 GPUs ( Fermi ) High speed interconnect Overall compute capacity: Intel CPUs: 1.0 PFlop/s nvidia GPUS: 3.6 PFlop/s Overall power consumption (LINPACK Test): 4 MWatt Heating of a single home: ~ KW Heating power for > 200 homes National Supercomputing Center Tianjin 23
24 TOP2: CRAY XT5 PetaFLOPs and beyond Cray Inc. Proprietary Cray-1 = 160 MF 1 PF = 6,250,000 as much! 24 Slide 24
25 The TOP2 system: CRAY XT5 5th generation of CRAY MPP systems (1 node = 2 QC chips) Successor of CRAY T3E System is designed to scale to s CPUs /sec 9.6 GB/ ~6 μs MPI latency 6.4 GB/s direct connect HyperTransport By courtesy of W. Oed, CRAY 2 32 GB memory 9.6 GB/sec 9.6 GB/sec 9.6 GB/sec 9.6 GB/sec Oak Ridge: Jaguar TOP1 9.6 GB/s sec ec Original development: 40 TFlop/s Red Storm OS: Linux micro kernel Cray SeaStar2+ Interconnect 25.6 GB/s direct connect memory 25
26 Roadrunner at Los Alamos - First PetaFLOP system June 2008: FLOP/s for the first time! (Nov. 1997: ASI Red (Intel Paragon / P6) breaks the TFlop/s barrier!) Mankind (7 secs per FLOP per person) >1 Year to do FLOP It`s not only the first PetaFLOP system, it`s heterogeneous! Opteron Dual-Core + IBM PowerXCell ( Playstation 3!) 26
27 RoadRunner node ( triblade ) structure Tri-Blade Characteristics: 2 x QS way host 400 GFlop/s peak Single 4x DDR IB Host: Dual-core Opteron ccnuma within QS22 blade Each SPE operates on 256 kbyte local store holding data and instruction code! Data needs to be transfered explicitly between local store and memory via DMA transfers! 27
28 IBM Blue Gene/P 28
29 IBM Blue Gene/P Up to 1 Petaflop Performance Blue Gene/P continues Blue Gene s leadership performance in a spacesaving, power-efficient package for the most demanding and scalable high-performance computing applications Rack 32 Node Cards 1024 chips, 4096 procs Cabled 8x8x16 System 72 Racks Compute Card 1 chip, 20 DRAMs Chip 4 processors 13.6 GF/s 8 MB EDRAM Node Card (32 chips 4x4x2) 32 compute, 0-1 IO cards 435 GF/s 64 GB 14 TF/s 2 TB Front End Node / Service Node 1 PF/s 144 TB HPC SW: Compilers GPFS ESSL Loadleveler 13.6 GF/s 2.0 (or 4.0) GB DDR Supports 4-way SMP JS21 / Power5 Linux SLES10 Loadleveler 29
30 IBM BlueGene/P Single Node IBM PowerPC450 4 cores, 8MB Cu Heatsink SDRAM-DDR2 1of20sites sites. 30
31 HPC Centers in Germany: A view from Erlangen Jülich Supercomputing Center BlueGene/P TFlop/s Hannover Berlin FZ Jülich Erlangen/ Nürnberg HLRS-Stuttgart LRZ-München HLR Stuttgart: 20 TFlop/s NEC SX9 To be replaced: 1 PF (2011/12) SGI Altix (62 TFlop/s) To be replaced: 3 PF (2012) 31
32 HLRB Munich SGI Altix 4700 / 9728 cores of LRZ urtesy o by cou 32
33 HLRB-II: 2D-Torus between 19 compute nodes 51.2 GByte/s per direction courtes sy of LR RZ by Each compute node: 512 processors 2000 GByte main memory 33
34 RRZE: LiMa -cluster 500 compute nodes (6.000 cores) with 2 Intel Westmere 2.66 GHz Hexacores 12 cores/ node + SMT cores 24 GB main memory NO local disks Vendor: NEC (Dual-Twin Supermicro) Power consumption: ~160 KW Closed Racks with cold water heat exchanger inside Full QuadDataRate Infiniband fat tree interconnect BW ~ 3 GB/s / direction and < 2 µs latency Parallel Filesystem: 130 TB+ accessible with 3 GB/s Operating system: LINUX Peak performance: 63.8 TFlop/s 2.66GHz LINPACK (Rmax): 57.3 TFlop/s (#130 in TOP500 / Nov. 2010) 34
35 RRZE: Woody -Cluster 860 Intel Xeon5160 processor cores Core2Duo architecture 3.0 GHz 12 GFlop/s per core 4 cores per compute node Installation: November 2006 Peak performance: GFlop/s Main memory: 2 GByte per core 1720 GByte in total Infiniband network Voltaire DDRx 216 ports Top500 Rank GBit/s+ per node & direction OS: SuSe Linux: SLES9 (Nov 2007) Parallel filesystem: 15 TByte NFS filesystem: 15 TByte Power consumption > 100 kw 35
36 RRZE: TinyBlue -Cluster 84 nodes each with: Dual Socket Nehalem X5550, 2,66 GHz SMT (8 physical Cores + 8 SMT) 12 GB RAM (DDR3-1333) 250 GB disc QDR InfiniBand (fully non blocking) 7,1 Tflop/s Peak (>65% von Woody!) Applicationperformance Woody P P P P P P P P C C C C C C C C C C C C C C C C C C MI MI ccnuma! Memory Memory 36
37 RRZE: TinyGPU -Cluster 8 nodes with Dual-Socket Dual GPU nodes 2 x Intel Xeon X5550 (2,66 GHz) 24 GB RAM (DDR3-1333) 2 NVIDIA Tesla M1060 passive passive cooling DDR InfiniBand Tesla M1060 GPU: 30 Multiprocessors 78 GFlop/s (dp), 933 GFlop/s (sp) 4 GB memory (102 GB/s) Overall compute power: Double precision: 0.68 TFlop/s (Xeons) TFlop/s (Teslas) Single precision: 1.36 TFlop/s (Xeons) TFlop/s (Teslas) 37
38 RRZE: Windows-Cluster 16 nodes with Dual-Socket AMD Opteron 2435 (2,6 GHz, Istanbul Hexacore) 32 GB RAM 160 GB HDD 2 TFlop/s Peak GBit Betriebssystem: ebssyste Windows HPC Server
Programming Techniques for Supercomputers
Programming Techniques for Supercomputers Prof. Dr. G. Wellein (a,b) Dr. G. Hager (a) Dr.-Ing. M. Wittmann (a) (a) HPC Services Regionales Rechenzentrum Erlangen (b) Department für Informatik University
More informationPractical Scientific Computing
Practical Scientific Computing Performance-optimized Programming Preliminary discussion: July 11, 2008 Dr. Ralf-Peter Mundani, mundani@tum.de Dipl.-Ing. Ioan Lucian Muntean, muntean@in.tum.de MSc. Csaba
More informationStockholm Brain Institute Blue Gene/L
Stockholm Brain Institute Blue Gene/L 1 Stockholm Brain Institute Blue Gene/L 2 IBM Systems & Technology Group and IBM Research IBM Blue Gene /P - An Overview of a Petaflop Capable System Carl G. Tengwall
More informationRoadmapping of HPC interconnects
Roadmapping of HPC interconnects MIT Microphotonics Center, Fall Meeting Nov. 21, 2008 Alan Benner, bennera@us.ibm.com Outline Top500 Systems, Nov. 2008 - Review of most recent list & implications on interconnect
More informationPractical Scientific Computing
Practical Scientific Computing Performance-optimised Programming Preliminary discussion, 17.7.2007 Dr. Ralf-Peter Mundani, mundani@tum.de Dipl.-Ing. Ioan Lucian Muntean, muntean@in.tum.de Dipl.-Geophys.
More informationOverview. CS 472 Concurrent & Parallel Programming University of Evansville
Overview CS 472 Concurrent & Parallel Programming University of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information Science, University
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all
More informationHigh Performance Computing: Blue-Gene and Road Runner. Ravi Patel
High Performance Computing: Blue-Gene and Road Runner Ravi Patel 1 HPC General Information 2 HPC Considerations Criterion Performance Speed Power Scalability Number of nodes Latency bottlenecks Reliability
More informationEarly experience with Blue Gene/P. Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007
Early experience with Blue Gene/P Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007 Agenda System components The Daresbury BG/P and BG/L racks How to use the system Some
More informationParallel Computer Architecture II
Parallel Computer Architecture II Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-692 Heidelberg phone: 622/54-8264 email: Stefan.Lang@iwr.uni-heidelberg.de
More informationIntroduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29
Introduction CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction Spring 2018 1 / 29 Outline 1 Preface Course Details Course Requirements 2 Background Definitions
More informationSlides compliment of Yong Chen and Xian-He Sun From paper Reevaluating Amdahl's Law in the Multicore Era. 11/16/2011 Many-Core Computing 2
Slides compliment of Yong Chen and Xian-He Sun From paper Reevaluating Amdahl's Law in the Multicore Era 11/16/2011 Many-Core Computing 2 Gene M. Amdahl, Validity of the Single-Processor Approach to Achieving
More informationHigh Performance Computing - Parallel Computers and Networks. Prof Matt Probert
High Performance Computing - Parallel Computers and Networks Prof Matt Probert http://www-users.york.ac.uk/~mijp1 Overview Parallel on a chip? Shared vs. distributed memory Latency & bandwidth Topology
More informationrepresent parallel computers, so distributed systems such as Does not consider storage or I/O issues
Top500 Supercomputer list represent parallel computers, so distributed systems such as SETI@Home are not considered Does not consider storage or I/O issues Both custom designed machines and commodity machines
More informationThe Mont-Blanc approach towards Exascale
http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are
More informationPrototypes Systems for PRACE. François Robin, GENCI, WP7 leader
Prototypes Systems for PRACE François Robin, GENCI, WP7 leader Outline Motivation Summary of the selection process Description of the set of prototypes selected by the Management Board Conclusions 2 Outline
More informationCRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar
CRAY XK6 REDEFINING SUPERCOMPUTING - Sanjana Rakhecha - Nishad Nerurkar CONTENTS Introduction History Specifications Cray XK6 Architecture Performance Industry acceptance and applications Summary INTRODUCTION
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt
More informationBlue Gene: A Next Generation Supercomputer (BlueGene/P)
Blue Gene: A Next Generation Supercomputer (BlueGene/P) Presented by Alan Gara (chief architect) representing the Blue Gene team. 2007 IBM Corporation Outline of Talk A brief sampling of applications on
More informationIBM HPC DIRECTIONS. Dr Don Grice. ECMWF Workshop November, IBM Corporation
IBM HPC DIRECTIONS Dr Don Grice ECMWF Workshop November, 2008 IBM HPC Directions Agenda What Technology Trends Mean to Applications Critical Issues for getting beyond a PF Overview of the Roadrunner Project
More informationReal Parallel Computers
Real Parallel Computers Modular data centers Overview Short history of parallel machines Cluster computing Blue Gene supercomputer Performance development, top-500 DAS: Distributed supercomputing Short
More informationOverview. High Performance Computing - History of the Supercomputer. Modern Definitions (II)
Overview High Performance Computing - History of the Supercomputer Dr M. Probert Autumn Term 2017 Early systems with proprietary components, operating systems and tools Development of vector computing
More informationJack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester
Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 12/3/09 1 ! Take a look at high performance computing! What s driving HPC! Issues with power consumption! Future
More informationSupercomputers. Alex Reid & James O'Donoghue
Supercomputers Alex Reid & James O'Donoghue The Need for Supercomputers Supercomputers allow large amounts of processing to be dedicated to calculation-heavy problems Supercomputers are centralized in
More informationHigh Performance Computing with Accelerators
High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing
More informationTitan - Early Experience with the Titan System at Oak Ridge National Laboratory
Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid
More informationTop500
Top500 www.top500.org Salvatore Orlando (from a presentation by J. Dongarra, and top500 website) 1 2 MPPs Performance on massively parallel machines Larger problem sizes, i.e. sizes that make sense Performance
More informationOrganizational issues (I)
COSC 6385 Computer Architecture Introduction and Organizational Issues Fall 2008 Organizational issues (I) Classes: Monday, 1.00pm 2.30pm, PGH 232 Wednesday, 1.00pm 2.30pm, PGH 232 Evaluation 25% homework
More informationOrganizational issues (I)
COSC 6385 Computer Architecture Introduction and Organizational Issues Fall 2009 Organizational issues (I) Classes: Monday, 1.00pm 2.30pm, SEC 202 Wednesday, 1.00pm 2.30pm, SEC 202 Evaluation 25% homework
More informationMulticore-aware parallelization strategies for efficient temporal blocking (BMBF project: SKALB)
Multicore-aware parallelization strategies for efficient temporal blocking (BMBF project: SKALB) G. Wellein, G. Hager, M. Wittmann, J. Habich, J. Treibig Department für Informatik H Services, Regionales
More informationJÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich
JÜLICH SUPERCOMPUTING CENTRE Site Introduction 09.04.2018 Michael Stephan JSC @ Forschungszentrum Jülich FORSCHUNGSZENTRUM JÜLICH Research Centre Jülich One of the 15 Helmholtz Research Centers in Germany
More informationAim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group
Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.
More informationNUMERICAL PARALLEL COMPUTING
Lecture 1, February 24, 2012: Introduction http://people.inf.ethz.ch/iyves/pnc12/ Peter Arbenz, Andreas Adelmann Chair of Computational Science, ETH Zürich, E-mail: arbenz@inf.ethz.ch Paul Scherrer Institut,
More informationResources Current and Future Systems. Timothy H. Kaiser, Ph.D.
Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic
More informationReal Parallel Computers
Real Parallel Computers Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005 Short history
More informationBrand-New Vector Supercomputer
Brand-New Vector Supercomputer NEC Corporation IT Platform Division Shintaro MOMOSE SC13 1 New Product NEC Released A Brand-New Vector Supercomputer, SX-ACE Just Now. Vector Supercomputer for Memory Bandwidth
More informationParallel Computer Architecture - Basics -
Parallel Computer Architecture - Basics - Christian Terboven 19.03.2012 / Aachen, Germany Stand: 15.03.2012 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda Processor
More informationPRACE prototypes. ICT 08, Lyon, Nov. 26, 2008 Dr. J.Ph. Nominé, CEA/DIF
PRACE prototypes ICT 08, Lyon, Nov. 26, 2008 Dr. J.Ph. Nominé, CEA/DIF jean-philippe.nomine@cea.fr Credits and acknowledgements: FZJ, CEA, NCF/SARA, HLRS, BSC, CSC, CSCS F. Robin (PRACE WP7 Leader) 2 PRACE
More informationCray XC Scalability and the Aries Network Tony Ford
Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?
More informationWelcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich
Mitglied der Helmholtz-Gemeinschaft Welcome to the Jülich Supercomputing Centre D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Schedule: Monday, May 18 13:00-13:30 Welcome
More informationHPC Technology Update Challenges or Chances?
HPC Technology Update Challenges or Chances? Swiss Distributed Computing Day Thomas Schoenemeyer, Technology Integration, CSCS 1 Move in Feb-April 2012 1500m2 16 MW Lake-water cooling PUE 1.2 New Datacenter
More informationPreparing GPU-Accelerated Applications for the Summit Supercomputer
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership
More informationModern computer architecture. From multicore to petaflops
Modern computer architecture From multicore to petaflops Motivation: Multi-ores where and why Introduction: Moore s law Intel Sandy Brige EP: 2.3 Billion nvidia FERMI: 3 Billion 1965: G. Moore claimed
More informationParallel Programming
Parallel Programming Introduction Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Acknowledgements Prof. Felix Wolf, TU Darmstadt Prof. Matthias
More informationFabio AFFINITO.
Introduction to High Performance Computing Fabio AFFINITO What is the meaning of High Performance Computing? What does HIGH PERFORMANCE mean??? 1976... Cray-1 supercomputer First commercial successful
More informationSteve Scott, Tesla CTO SC 11 November 15, 2011
Steve Scott, Tesla CTO SC 11 November 15, 2011 What goal do these products have in common? Performance / W Exaflop Expectations First Exaflop Computer K Computer ~10 MW CM5 ~200 KW Not constant size, cost
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt
More informationThe walberla Framework: Multi-physics Simulations on Heterogeneous Parallel Platforms
The walberla Framework: Multi-physics Simulations on Heterogeneous Parallel Platforms Harald Köstler, Uli Rüde (LSS Erlangen, ruede@cs.fau.de) Lehrstuhl für Simulation Universität Erlangen-Nürnberg www10.informatik.uni-erlangen.de
More informationJack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester
Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 12/24/09 1 Take a look at high performance computing What s driving HPC Future Trends 2 Traditional scientific
More informationABySS Performance Benchmark and Profiling. May 2010
ABySS Performance Benchmark and Profiling May 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC
More informationPresentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories
HPC Benchmarking Presentations: Jack Dongarra, University of Tennessee & ORNL The HPL Benchmark: Past, Present & Future Mike Heroux, Sandia National Laboratories The HPCG Benchmark: Challenges It Presents
More informationHPC Architectures. Types of resource currently in use
HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationWelcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich
Mitglied der Helmholtz-Gemeinschaft Welcome to the Jülich Supercomputing Centre D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Schedule: Thursday, Nov 26 13:00-13:30
More informationFujitsu s Approach to Application Centric Petascale Computing
Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview
More informationSami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1
Acknowledgements: Petra Kogel Sami Saarinen Peter Towers 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1 Motivation Opteron and P690+ clusters MPI communications IFS Forecast Model IFS 4D-Var
More informationI/O Monitoring at JSC, SIONlib & Resiliency
Mitglied der Helmholtz-Gemeinschaft I/O Monitoring at JSC, SIONlib & Resiliency Update: I/O Infrastructure @ JSC Update: Monitoring with LLview (I/O, Memory, Load) I/O Workloads on Jureca SIONlib: Task-Local
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationThe next generation supercomputer. Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency
The next generation supercomputer and NWP system of JMA Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency Contents JMA supercomputer systems Current system (Mar
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationOutline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers
Outline Execution Environments for Parallel Applications Master CANS 2007/2008 Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya Supercomputers OS abstractions Extended OS
More informationLarge scale Imaging on Current Many- Core Platforms
Large scale Imaging on Current Many- Core Platforms SIAM Conf. on Imaging Science 2012 May 20, 2012 Dr. Harald Köstler Chair for System Simulation Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen,
More informationOrganizational issues (I)
COSC 6385 Computer Architecture Introduction and Organizational Issues Fall 2007 Organizational issues (I) Classes: Monday, 1.00pm 2.30pm, PGH 232 Wednesday, 1.00pm 2.30pm, PGH 232 Evaluation 25% homework
More informationAn Overview of High Performance Computing
IFIP Working Group 10.3 on Concurrent Systems An Overview of High Performance Computing Jack Dongarra University of Tennessee and Oak Ridge National Laboratory 1/3/2006 1 Overview Look at fastest computers
More informationPedraforca: a First ARM + GPU Cluster for HPC
www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu
More informationParallel Computing: From Inexpensive Servers to Supercomputers
Parallel Computing: From Inexpensive Servers to Supercomputers Lyle N. Long The Pennsylvania State University & The California Institute of Technology Seminar to the Koch Lab http://www.personal.psu.edu/lnl
More informationWhat have we learned from the TOP500 lists?
What have we learned from the TOP500 lists? Hans Werner Meuer University of Mannheim and Prometeus GmbH Sun HPC Consortium Meeting Heidelberg, Germany June 19-20, 2001 Outlook TOP500 Approach Snapshots
More informationLecture 1. Introduction Course Overview
Lecture 1 Introduction Course Overview Welcome to CSE 260! Your instructor is Scott Baden baden@ucsd.edu Office: room 3244 in EBU3B Office hours Week 1: Today (after class), Tuesday (after class) Remainder
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
More informationChapter 1. Introduction
Chapter 1 Introduction Why High Performance Computing? Quote: It is hard to understand an ocean because it is too big. It is hard to understand a molecule because it is too small. It is hard to understand
More informationExperts in Application Acceleration Synective Labs AB
Experts in Application Acceleration 1 2009 Synective Labs AB Magnus Peterson Synective Labs Synective Labs quick facts Expert company within software acceleration Based in Sweden with offices in Gothenburg
More informationWhat does Heterogeneity bring?
What does Heterogeneity bring? Ken Koch Scientific Advisor, CCS-DO, LANL LACSI 2006 Conference October 18, 2006 Some Terminology Homogeneous Of the same or similar nature or kind Uniform in structure or
More informationTOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology
TOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology BY ERICH STROHMAIER COMPUTER SCIENTIST, FUTURE TECHNOLOGIES GROUP, LAWRENCE BERKELEY
More informationPresentation of the 16th List
Presentation of the 16th List Hans- Werner Meuer, University of Mannheim Erich Strohmaier, University of Tennessee Jack J. Dongarra, University of Tennesse Horst D. Simon, NERSC/LBNL SC2000, Dallas, TX,
More informationHPC Algorithms and Applications
HPC Algorithms and Applications Intro Michael Bader Winter 2015/2016 Intro, Winter 2015/2016 1 Part I Scientific Computing and Numerical Simulation Intro, Winter 2015/2016 2 The Simulation Pipeline phenomenon,
More informationHPC and the AppleTV-Cluster
HPC and the AppleTV-Cluster Dieter Kranzlmüller, Karl Fürlinger, Christof Klausecker Munich Network Management Team Ludwig-Maximilians-Universität München (LMU) & Leibniz Supercomputing Centre (LRZ) Outline
More informationJülich Supercomputing Centre
Mitglied der Helmholtz-Gemeinschaft Jülich Supercomputing Centre Norbert Attig Jülich Supercomputing Centre (JSC) Forschungszentrum Jülich (FZJ) Aug 26, 2009 DOAG Regionaltreffen NRW 2 Supercomputing at
More informationObjective. We will study software systems that permit applications programs to exploit the power of modern high-performance computers.
CS 612 Software Design for High-performance Architectures 1 computers. CS 412 is desirable but not high-performance essential. Course Organization Lecturer:Paul Stodghill, stodghil@cs.cornell.edu, Rhodes
More informationThread and Data parallelism in CPUs - will GPUs become obsolete?
Thread and Data parallelism in CPUs - will GPUs become obsolete? USP, Sao Paulo 25/03/11 Carsten Trinitis Carsten.Trinitis@tum.de Lehrstuhl für Rechnertechnik und Rechnerorganisation (LRR) Institut für
More informationThe Red Storm System: Architecture, System Update and Performance Analysis
The Red Storm System: Architecture, System Update and Performance Analysis Douglas Doerfler, Jim Tomkins Sandia National Laboratories Center for Computation, Computers, Information and Mathematics LACSI
More informationAccelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing
Accelerating HPC (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing SAAHPC, Knoxville, July 13, 2010 Legal Disclaimer Intel may make changes to specifications and product
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0) Contributing sites and the corresponding computer systems for this call are: BSC, Spain IBM System X idataplex CINECA, Italy The site selection
More informationIllinois Proposal Considerations Greg Bauer
- 2016 Greg Bauer Support model Blue Waters provides traditional Partner Consulting as part of its User Services. Standard service requests for assistance with porting, debugging, allocation issues, and
More informationEfficient numerical simulation on multicore processors (MuCoSim)
Efficient numerical simulation on multicore processors (MuCoSim) 13.10.2015 Prof. Gerhard Wellein, Dr. G. Hager Department für Informatik & HPC Services Regionales Rechenzentrum Erlangen (RRZE) http://moodle.rrze.uni-erlangen.de/course/view.php?id=340
More informationTrends in HPC Architectures
Mitglied der Helmholtz-Gemeinschaft Trends in HPC Architectures Norbert Eicker Institute for Advanced Simulation Jülich Supercomputing Centre PRACE/LinkSCEEM-2 CyI 2011 Winter School Nikosia, Cyprus Forschungszentrum
More informationInfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice
InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice Providing the Best Return on Investment by Delivering the Highest System Efficiency and Utilization Top500 Supercomputers June
More information8/28/12. CSE 820 Graduate Computer Architecture. Richard Enbody. Dr. Enbody. 1 st Day 2
CSE 820 Graduate Computer Architecture Richard Enbody Dr. Enbody 1 st Day 2 1 Why Computer Architecture? Improve coding. Knowledge to make architectural choices. Ability to understand articles about architecture.
More informationParallel computer architecture classification
Parallel computer architecture classification Hardware Parallelism Computing: execute instructions that operate on data. Computer Instructions Data Flynn s taxonomy (Michael Flynn, 1967) classifies computer
More informationCSE5351: Parallel Procesisng. Part 1B. UTA Copyright (c) Slide No 1
Slide No 1 CSE5351: Parallel Procesisng Part 1B Slide No 2 State of the Art In Supercomputing Several of the next slides (or modified) are the courtesy of Dr. Jack Dongarra, a distinguished professor of
More informationThe Road from Peta to ExaFlop
The Road from Peta to ExaFlop Andreas Bechtolsheim June 23, 2009 HPC Driving the Computer Business Server Unit Mix (IDC 2008) Enterprise HPC Web 100 75 50 25 0 2003 2008 2013 HPC grew from 13% of units
More informationMaking a Case for a Green500 List
Making a Case for a Green500 List S. Sharma, C. Hsu, and W. Feng Los Alamos National Laboratory Virginia Tech Outline Introduction What Is Performance? Motivation: The Need for a Green500 List Challenges
More informationPrototyping in PRACE PRACE Energy to Solution prototype at LRZ
Prototyping in PRACE PRACE Energy to Solution prototype at LRZ Torsten Wilde 1IP-WP9 co-lead and 2IP-WP11 lead (GSC-LRZ) PRACE Industy Seminar, Bologna, April 16, 2012 Leibniz Supercomputing Center 2 Outline
More informationA New NSF TeraGrid Resource for Data-Intensive Science
A New NSF TeraGrid Resource for Data-Intensive Science Michael L. Norman Principal Investigator Director, SDSC Allan Snavely Co-Principal Investigator Project Scientist Slide 1 Coping with the data deluge
More informationIntroducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method
Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method G. Wellein, T. Zeiser, G. Hager HPC Services Regional Computing Center A. Nitsure, K. Iglberger, U. Rüde Chair for System
More informationHigh performance Computing and O&G Challenges
High performance Computing and O&G Challenges 2 Seismic exploration challenges High Performance Computing and O&G challenges Worldwide Context Seismic,sub-surface imaging Computing Power needs Accelerating
More informationEN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University
EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University Material from: The Datacenter as a Computer: An Introduction to
More informationThe way toward peta-flops
The way toward peta-flops ISC-2011 Dr. Pierre Lagier Chief Technology Officer Fujitsu Systems Europe Where things started from DESIGN CONCEPTS 2 New challenges and requirements! Optimal sustained flops
More informationHow to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić
How to perform HPL on CPU&GPU clusters Dr.sc. Draško Tomić email: drasko.tomic@hp.com Forecasting is not so easy, HPL benchmarking could be even more difficult Agenda TOP500 GPU trends Some basics about
More informationArchitecture of the IBM Blue Gene Supercomputer. Dr. George Chiu IEEE Fellow IBM T.J. Watson Research Center Yorktown Heights, NY
Architecture of the IBM Blue Gene Supercomputer Dr. George Chiu IEEE Fellow IBM T.J. Watson Research Center Yorktown Heights, NY President Obama Honors IBM's Blue Gene Supercomputer With National Medal
More informationHow то Use HPC Resources Efficiently by a Message Oriented Framework.
How то Use HPC Resources Efficiently by a Message Oriented Framework www.hp-see.eu E. Atanassov, T. Gurov, A. Karaivanova Institute of Information and Communication Technologies Bulgarian Academy of Science
More information