Future Trends in High Performance Computing

Size: px
Start display at page:

Download "Future Trends in High Performance Computing"

Transcription

1 Future Trends in High Performance Computing Horst Simon Lawrence Berkeley National Laboratory and UC Berkeley Seminar at RWTH achen, Germany March 23, 2009

2

3 Berkeley Lab Mission Solve the most pressing and profound scientific problems facing humankind Basic science for a secure energy future Understand living systems to improve the environment, health, and energy supply Understand matter and energy in the universe Build and safely operate leading scientific facilities for the nation Train the next generation of scientists and engineers

4 Key Message Computing is changing more rapidly than ever before, and scientists have the unprecedented opportunity to change computing directions

5 Overview Turning point in 2004 Current trends and what to expect until 2014 Long term trends until 2019

6 Supercomputing Ecosystem (2005) Commercial Off The Shelf technology (COTS) Clusters 12 years of legacy MPI applications base From my presentation at ISC 2005

7 Supercomputing Ecosystem (2005) Commercial Off The Shelf technology (COTS) Clusters 12 years of legacy MPI applications base From my presentation at ISC 2005

8 Traditional Sources of Performance Improvement are Flat-Lining (2004) New Constraints 15 years of exponential clock rate growth has ended Moore s Law reinterpreted: How do we use all of those transistors to keep performance increasing at historical rates? Industry Response: #cores per chip doubles every 18 months instead of clock frequency! Figure courtesy of Kunle Olukotun, Lance Hammond, Herb Sutter, and Burton Smith

9 Supercomputing Ecosystem (2005) Commercial Off The Shelf technology (COTS) 2008 PCs and desktop systems are no longer the economic driver. rchitecture and programming model are about to change Clusters 12 years of legacy MPI applications base

10 Overview Turning point in 2004 Current trends and what to expect until 2014 Long term trends until 2019

11 Roadrunner Breaks the Pflop/s Barrier 1,026 Tflop/s on LINPCK reported on June 9, ,948 dual core Opteron + 12,960 cell BE 80 TByte of memory IBM built, installed at LNL

12 Cray XT5 at ORNL -- 1 Pflop/s in November 2008 Jaguar Total XT5 XT4 Peak Performance 1,645 1, M Opteron Cores 181, , ,328 System Memory (TB) isk Bandwidth (GB/s) isk Space (TB) 10,750 10, Interconnect Bandwidth (TB/s) The systems will be combined after acceptance of the new XT5 upgrade. Each system will be linked to the file system through 4x-R Infiniband

13 Cores per Socket

14 Performance evelopment 1.1 PFlop/s TFlop/s

15 Performance evelopment Projection

16 Jun-13 Jun-14 Jun-15 1,000, ,000 10,000 1, Concurrency Levels Maximum verage Minimum Jun-97 Jun-98 Jun-99 Jun-00 Jun-01 Jun-02 Jun-03 Jun-04 Jun-05 Jun-06 Jun-07 Jun-08 Jun-09 Jun-10 Jun-11 Jun-12 ISC 08, resden Jack s Notebook Jun-95 Jun-96 Jun-93 Jun-94 # processors

17 Moore s Law reinterpreted Number of cores per chip will double every two years Clock speed will not increase (possibly decrease) Need to deal with systems with millions of concurrent threads Need to deal with inter-chip parallelism as well as intra-chip parallelism

18 Multicore comes in a wide variety Multiple parallel general-purpose processors (GPPs) Multiple application-specific processors (SPs) RR M 1 QR SRM Stripe RR M 2 Intel XScale Core 32K IC 32K C QR SRM 2 RR M 3 Sun Niagara 8 GPP cores (32 threads) Intel Network Processor 1 GPP Core 16 SPs (128 threads) 64b 1 8 PCI (64b) 66 MHz E/ Q E/ Q G S K E T QR SRM 3 E/ Q QR SRM 4 E/ Q MEv 2 1 MEv 2 8 MEv 2 9 MEv 2 16 MEv 2 2 MEv 2 7 MEv 2 10 MEv 2 15 MEv 2 3 MEv 2 6 MEv 2 11 MEv 2 14 MEv 2 4 MEv 2 5 MEv 2 12 MEv 2 13 IXP280 S Rbuf P I 16b 128B 0 4 o r C S 16b I X Tbuf 128B Hash CSRs 48/64/ - Scratc 128 Fast_wr h -URT 16KB - Timers -GPIO - BootROM/S lowport IBM Cell 1 GPP (2 threads) 8 SPs Cisco CRS Tensilica GPPs Picochip SP 1 GPP core 248 SPs Intel 4004 (1971): 4-bit processor, 2312 transistors, ~100 KIPS, 10 micron PMOS, 11 mm 2 chip 1000s of processor cores per die The Processor is the new Transistor [Rowen]

19 What s Next? Source: Jack ongarra, ISC 2008

20 Likely Trajectory - Collision or Convergence? multi-threading multi-core many-core? future processor by 2012 programmability fully programmable partially programmable fixed function after Justin Rattner, Intel, ISC 2008 parallelism GPU

21 Trends for the next five years up to 2014 fter period of rapid architectural change we will likely settle on a future standard processor architecture good bet: Intel will continue to be a market leader Impact of this disruptive change on software and systems architecture not clear yet

22 Impact on Software We will need to rethink and redesign our software Similar challenge as the 1990 to 1995 transition to clusters and MPI??

23 Likely Future Scenario (2014) System: cluster + many core node Programming model: MPI+? Not Message Passing Hybrid & many core technologies will require new approaches: PGS, auto tuning,? after on Grice, IBM, Roadrunner Presentation, ISC 2008

24 Why MPI will persist Obviously MPI will not disappear in five years By 2014 there will be 20 years of legacy software in MPI New systems are not sufficiently different to lead to new programming model

25 What will be the? in MPI+? Likely candidates are PGS languages utotuning wildcard from commercial space

26 What s Wrong with MPI Everywhere?

27 What s Wrong with MPI Everywhere? One MPI process per core is wasteful of intra-chip latency and bandwidth Weak scaling: success model for the cluster era not enough memory per core Heterogeneity: MPI per CU threadblock?

28 PGS Languages Global address space: thread may directly read/write remote data Partitioned: data is designated as local or global Global address space x: 1 y: x: 5 y: x: 7 y: 0 l: l: l: g: g: g: p0 p1 pn Implementation issues: istributed memory: Reading a remote array or structure is explicit, not a cache fill Shared memory: Caches are allowed, but not required No less scalable than MPI! Permits sharing, whereas MPI rules it out!

29 Performance dvantage of One-Sided Communication The put/get operations in PGS languages (remote read/write) are one-sided (no required interaction from remote proc) This is faster for pure data transfers than two-sided send/receive R oun dtrip L aten cy (usec) MPI ping-pong GSNet put+sync byte Roundtrip Latency Percent HW peak 100% 90% 80% 70% 60% 50% 40% 30% 20% Flood Bandwidth for 4KB messages MPI GSNet % 0 Elan3/lpha Elan4/I64 Myrinet/x86 IB/G5 IB/Opteron SP/Fed 0% Elan3/lpha Elan4/I64 Myrinet/x86 IB/G5 IB/Opteron SP/Fed

30 utotuning Write programs that write programs utomate search across a complex optimization space Generate space of implementations, search it Performance far beyond current compilers Performance portability for diverse architectures! Past successes: PhiPC, TLS, FFTW, Spiral, OSKI For finite element problem [Im, Yelick, Vuduc, 2005] Best: 4x2 Reference Mflop/s Mflop/s

31 Multiprocessor Efficiency and Scaling (auto-tuned stencil kernel; Oliker et al., paper in IPPS 08) Performance Scaling Total Gflop/s M Opteron $ Bypass SIM Prefetch T/$ Block Reorder Padding NUM Naïve 3.5x # Cores (SP) # Cores (P) 4.5x MFlop/S/Watt MFlop/S/Watt x x x 4.6x x 0 2.3x 0 Power Efficiency Intel M M Sun Sun Cell Cell G80 G80 G80/Host G80/Host

32 utotuning for Scalability and Performance Portability

33 The Likely HPC Ecosystem in GPU = future many-core driven by commercial applications MPI+(autotuning, PGS,??) Next generation clusters with many-core or hybrid nodes

34 Overview Turning point in 2004 Current trends and what to expect until 2014 Long term trends until 2019

35 RP Exascale Study Commissioned by RP to explore the challenges for Exaflop computing (Kogge et al.) Two models for future performance growth Simplistic: ITRS roadmap; power for memory grows linear with # of chips; power for interconnect stays constant Fully scaled: same as simplistic, but memory and router power grow with peak flops per chip

36 We won t reach Exaflops with this approach From Peter Kogge, RP Exascale Study

37 and the power costs will still be staggering 1000 System Power (MW) From Peter Kogge, RP Exascale Study

38 Extrapolating to Exaflop/s in 2018 Source: avid Turek, IBM

39 Processor Technology Trend 1990s - R& computing hardware dominated by desktop/cots Had to learn how to use COTS technology for HPC R& investments moving rapidly to consumer electronics/ embedded processing Must learn how to leverage embedded processor technology for future HPC systems

40 Consumer Electronics has Replaced PCs as the ominant Market Force in esign!! pple Introduces IPod IPod+ITunes exceeds 50% of pple s Net Profit pple Introduces Cell Phone (iphone)

41 Green Flash: Ultra-Efficient Climate Modeling Project by Shalf, Oliker, Wehner and others at LBNL n alternative route to exascale computing Target specific machine designs to answer a scientific question Use of new technologies driven by the consumer market.

42 Green Flash: Ultra-Efficient Climate Modeling We present an alternative route to exascale computing Exascale science questions are already identified. Our idea is to target specific machine designs to each of these questions. This is possible because of new technologies driven by the consumer market. We want to turn the process around. sk What machine do we need to answer a question? Not What can we answer with that machine? Caveat: We present here a feasibility design study. Goal is to influence the HPC industry by evaluating a prototype design.

43 esign for Low Power: More Concurrency Tensilica P 0.09W PPC450 3W Cubic power improvement with lower clock rate due to V 2 F Intel Core2 15W Slower clock rates enable use of simpler cores Power 5 120W Simpler cores use less area (lower leakage) and reduce cost Tailor design to application to reduce waste This is how iphones and MP3 players are designed to maximize battery life and minimize cost

44 Green Flash Strawman System esign We examined three different approaches (in 2008 technology) Computation.015 o X.02 o X100L: 10 PFlops sustained, ~200 PFlops peak M Opteron: Commodity approach, lower efficiency for scientific applications offset by cost efficiencies of mass market BlueGene: Generic embedded processor core and customize system-onchip (SoC) to improve power efficiency for scientific applications Tensilica XTensa: Customized embedded w/soc provides further power efficiency benefits but maintains programmability Processor Clock Peak/ Core (Gflops) Cores/ Socket Sockets Cores Power Cost 2008 M Opteron 2.8GHz K 1.7M 179 MW $1B+ IBM BG/P 850MHz K 3.0M 20 MW $1B+ Green Flash / Tensilica XTensa 650MHz K 4.0M 3 MW $75M

45 Climate System esign Concept Strawman esign Study 10PF sustained ~120 m 2 <3MWatts < $75M VLIW : 128b load-store + 2 P MUL/ + integer op/ M per cycle: Synthesizable at 650MHz in commodity 65nm 1mm 2 core, mm 2 with inst cache, data cache data RM, M interface, 0.25mW/MHz ouble precision SIM FP : 4 ops/cycle (2.7GFLOPs) Vectorizing compiler, cycle-accurate simulator, debugger GUI (Existing part of Tensilica Tool Set) 8 channel M for streaming from on/off chip RM Nearest neighbor 2 communications grid 32K I 8 chan M K 2x128b 32 boards per rack 100 ~25KW power + comms 32 chip + memory clusters per board ( W RM RM Proc rray RM RM 8 RM per processor chip: ~50 GB/s External RM interface External RM interface External RM interface Opt. 8MB embedded RM 32 processors per 65nm chip 83 7W Master Processor External RM interface Comm Link Control

46 Summary on Green Flash Exascale computing is vital for numerous key scientific areas We propose a new approach to high-end computing that enables transformational changes for science Research effort: study feasibility and share insight w/ community This effort will augment high-end general purpose HPC systems Choose the science target first (climate in this case) esign systems for applications (rather than the reverse) Leverage power efficient embedded technology esign hardware, software, scientific algorithms together using hardware emulation and auto-tuning chieve exascale computing sooner and more efficiently pplicable to broad range of exascale-class applications

47 Summary Major Challenges are ahead for extreme computing Power Parallelism and many others not discussed here We will need completely new approaches and technologies to reach the Exascale level This opens up a unique opportunity for science applications to lead extreme scale systems development

Architectural Trends and Programming Model Strategies for Large-Scale Machines

Architectural Trends and Programming Model Strategies for Large-Scale Machines Architectural Trends and Programming Model Strategies for Large-Scale Machines Katherine Yelick U.C. Berkeley and Lawrence Berkeley National Lab http://titanium.cs.berkeley.edu http://upc.lbl.gov 1 Kathy

More information

Ten Ways to Waste a Parallel Computer Kathy Yelick NERSC Director, Lawrence Berkeley National Laboratory EECS Department, UC Berkleey

Ten Ways to Waste a Parallel Computer Kathy Yelick NERSC Director, Lawrence Berkeley National Laboratory EECS Department, UC Berkleey Ten Ways to Waste a Parallel Computer Kathy Yelick NERSC Director, Lawrence Berkeley National Laboratory EECS Department, UC Berkleey Moore s Law is Alive and Well 1.E+07 1.E+06 Transistors (in Thousands)

More information

How HPC Hardware and Software are Evolving Towards Exascale

How HPC Hardware and Software are Evolving Towards Exascale How HPC Hardware and Software are Evolving Towards Exascale Kathy Yelick Associate Laboratory Director and NERSC Director Lawrence Berkeley National Laboratory EECS Professor, UC Berkeley NERSC Overview

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all

More information

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.

More information

Future Trends in Computing

Future Trends in Computing Future Trends in Computing Horst Simon Lawrence Berkeley National Laboratory and UC Berkeley CS267 Lecture 21 April 6, 2010 Key essage Computing is changing more rapidly than ever before, and scientists

More information

Unified Parallel C (UPC) Katherine Yelick NERSC Director, Lawrence Berkeley National Laboratory EECS Professor, UC Berkeley

Unified Parallel C (UPC) Katherine Yelick NERSC Director, Lawrence Berkeley National Laboratory EECS Professor, UC Berkeley Unified Parallel C (UPC) Katherine Yelick NERSC Director, Lawrence Berkeley National Laboratory EECS Professor, UC Berkeley Berkeley UPC Team Current UPC Team Filip Blagojevic Dan Bonachea Paul Hargrove

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt

More information

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

Overview. CS 472 Concurrent & Parallel Programming University of Evansville Overview CS 472 Concurrent & Parallel Programming University of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information Science, University

More information

The Mont-Blanc approach towards Exascale

The Mont-Blanc approach towards Exascale http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are

More information

Parallel Languages: Past, Present and Future

Parallel Languages: Past, Present and Future Parallel Languages: Past, Present and Future Katherine Yelick U.C. Berkeley and Lawrence Berkeley National Lab 1 Kathy Yelick Internal Outline Two components: control and data (communication/sharing) One

More information

HPC Technology Trends

HPC Technology Trends HPC Technology Trends High Performance Embedded Computing Conference September 18, 2007 David S Scott, Ph.D. Petascale Product Line Architect Digital Enterprise Group Risk Factors Today s s presentations

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

Real Parallel Computers

Real Parallel Computers Real Parallel Computers Modular data centers Overview Short history of parallel machines Cluster computing Blue Gene supercomputer Performance development, top-500 DAS: Distributed supercomputing Short

More information

Paving the Road to Exascale

Paving the Road to Exascale Paving the Road to Exascale Kathy Yelick Associate Laboratory Director for Computing Sciences and NERSC Division Director, Lawrence Berkeley National Laboratory EECS Professor, UC Berkeley NERSC Overview

More information

The Road from Peta to ExaFlop

The Road from Peta to ExaFlop The Road from Peta to ExaFlop Andreas Bechtolsheim June 23, 2009 HPC Driving the Computer Business Server Unit Mix (IDC 2008) Enterprise HPC Web 100 75 50 25 0 2003 2008 2013 HPC grew from 13% of units

More information

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC

More information

An Overview of High Performance Computing and Challenges for the Future

An Overview of High Performance Computing and Challenges for the Future An Overview of High Performance Computing and Challenges for the Future Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 6/15/2009 1 H. Meuer, H. Simon, E. Strohmaier,

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt

More information

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid

More information

Fundamentals of Computers Design

Fundamentals of Computers Design Computer Architecture J. Daniel Garcia Computer Architecture Group. Universidad Carlos III de Madrid Last update: September 8, 2014 Computer Architecture ARCOS Group. 1/45 Introduction 1 Introduction 2

More information

IBM HPC DIRECTIONS. Dr Don Grice. ECMWF Workshop November, IBM Corporation

IBM HPC DIRECTIONS. Dr Don Grice. ECMWF Workshop November, IBM Corporation IBM HPC DIRECTIONS Dr Don Grice ECMWF Workshop November, 2008 IBM HPC Directions Agenda What Technology Trends Mean to Applications Critical Issues for getting beyond a PF Overview of the Roadrunner Project

More information

Preparing GPU-Accelerated Applications for the Summit Supercomputer

Preparing GPU-Accelerated Applications for the Summit Supercomputer Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership

More information

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, Mateo Valero SC 13, November 19 th 2013, Denver, CO, USA

More information

Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Parallel Computing Accelerators John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Purpose of this talk This is the 50,000 ft. view of the parallel computing landscape. We want

More information

Building supercomputers from embedded technologies

Building supercomputers from embedded technologies http://www.montblanc-project.eu Building supercomputers from embedded technologies Alex Ramirez Barcelona Supercomputing Center Technical Coordinator This project and the research leading to these results

More information

The Cray Rainier System: Integrated Scalar/Vector Computing

The Cray Rainier System: Integrated Scalar/Vector Computing THE SUPERCOMPUTER COMPANY The Cray Rainier System: Integrated Scalar/Vector Computing Per Nyberg 11 th ECMWF Workshop on HPC in Meteorology Topics Current Product Overview Cray Technology Strengths Rainier

More information

Exascale: Parallelism gone wild!

Exascale: Parallelism gone wild! IPDPS TCPP meeting, April 2010 Exascale: Parallelism gone wild! Craig Stunkel, Outline Why are we talking about Exascale? Why will it be fundamentally different? How will we attack the challenges? In particular,

More information

Fra superdatamaskiner til grafikkprosessorer og

Fra superdatamaskiner til grafikkprosessorer og Fra superdatamaskiner til grafikkprosessorer og Brødtekst maskinlæring Prof. Anne C. Elster IDI HPC/Lab Parallel Computing: Personal perspective 1980 s: Concurrent and Parallel Pascal 1986: Intel ipsc

More information

Mathematical computations with GPUs

Mathematical computations with GPUs Master Educational Program Information technology in applications Mathematical computations with GPUs Introduction Alexey A. Romanenko arom@ccfit.nsu.ru Novosibirsk State University How to.. Process terabytes

More information

The Stampede is Coming: A New Petascale Resource for the Open Science Community

The Stampede is Coming: A New Petascale Resource for the Open Science Community The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation

More information

THE PATH TO EXASCALE COMPUTING. Bill Dally Chief Scientist and Senior Vice President of Research

THE PATH TO EXASCALE COMPUTING. Bill Dally Chief Scientist and Senior Vice President of Research THE PATH TO EXASCALE COMPUTING Bill Dally Chief Scientist and Senior Vice President of Research The Goal: Sustained ExaFLOPs on problems of interest 2 Exascale Challenges Energy efficiency Programmability

More information

Intel Enterprise Processors Technology

Intel Enterprise Processors Technology Enterprise Processors Technology Kosuke Hirano Enterprise Platforms Group March 20, 2002 1 Agenda Architecture in Enterprise Xeon Processor MP Next Generation Itanium Processor Interconnect Technology

More information

Fundamentals of Computer Design

Fundamentals of Computer Design Fundamentals of Computer Design Computer Architecture J. Daniel García Sánchez (coordinator) David Expósito Singh Francisco Javier García Blas ARCOS Group Computer Science and Engineering Department University

More information

Complexity and Advanced Algorithms. Introduction to Parallel Algorithms

Complexity and Advanced Algorithms. Introduction to Parallel Algorithms Complexity and Advanced Algorithms Introduction to Parallel Algorithms Why Parallel Computing? Save time, resources, memory,... Who is using it? Academia Industry Government Individuals? Two practical

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

GPUS FOR NGVLA. M Clark, April 2015

GPUS FOR NGVLA. M Clark, April 2015 S FOR NGVLA M Clark, April 2015 GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS MACHINES PC DATA CENTER MOBILE The World Leader in Visual Computing 2 What is a? Tesla K40

More information

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center The Stampede is Coming Welcome to Stampede Introductory Training Dan Stanzione Texas Advanced Computing Center dan@tacc.utexas.edu Thanks for Coming! Stampede is an exciting new system of incredible power.

More information

InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice

InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice InfiniBand Strengthens Leadership as The High-Speed Interconnect Of Choice Providing the Best Return on Investment by Delivering the Highest System Efficiency and Utilization Top500 Supercomputers June

More information

The Future of High Performance Computing

The Future of High Performance Computing The Future of High Performance Computing Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Comparing Two Large-Scale Systems Oakridge Titan Google Data Center 2 Monolithic supercomputer

More information

Red Storm / Cray XT4: A Superior Architecture for Scalability

Red Storm / Cray XT4: A Superior Architecture for Scalability Red Storm / Cray XT4: A Superior Architecture for Scalability Mahesh Rajan, Doug Doerfler, Courtenay Vaughan Sandia National Laboratories, Albuquerque, NM Cray User Group Atlanta, GA; May 4-9, 2009 Sandia

More information

Parallel Computer Architecture Concepts

Parallel Computer Architecture Concepts Outline This image cannot currently be displayed. arallel Computer Architecture Concepts TDDD93 Lecture 1 Christoph Kessler ELAB / IDA Linköping university Sweden 2015 Lecture 1: arallel Computer Architecture

More information

High Performance Computing: Blue-Gene and Road Runner. Ravi Patel

High Performance Computing: Blue-Gene and Road Runner. Ravi Patel High Performance Computing: Blue-Gene and Road Runner Ravi Patel 1 HPC General Information 2 HPC Considerations Criterion Performance Speed Power Scalability Number of nodes Latency bottlenecks Reliability

More information

General Purpose GPU Computing in Partial Wave Analysis

General Purpose GPU Computing in Partial Wave Analysis JLAB at 12 GeV - INT General Purpose GPU Computing in Partial Wave Analysis Hrayr Matevosyan - NTC, Indiana University November 18/2009 COmputationAL Challenges IN PWA Rapid Increase in Available Data

More information

Introduction to Computational Science (aka Scientific Computing)

Introduction to Computational Science (aka Scientific Computing) (aka Scientific Computing) Xianyi Zeng xzeng@utep.edu Department of Mathematical Sciences The University of Texas at El Paso. August 23, 2016. Acknowledgement Dr. Shirley Moore for setting up a high standard

More information

M. DURANTON, D. BLACK-SHAFFER, S. YEHIA, K. DE BOSSCHERE.

M. DURANTON, D. BLACK-SHAFFER, S. YEHIA, K. DE BOSSCHERE. M. DURANTON, D. BLACK-SHAFFER, S. YEHIA, K. DE BOSSCHERE http://www.hipeac.net/roadmap HiPEAC = High-Performance Embedded Architecture and Compilers -- an EU FP7 Network of Excellence -- 2 HiPEAC Roadmaps

More information

Efficiency and Programmability: Enablers for ExaScale. Bill Dally Chief Scientist and SVP, Research NVIDIA Professor (Research), EE&CS, Stanford

Efficiency and Programmability: Enablers for ExaScale. Bill Dally Chief Scientist and SVP, Research NVIDIA Professor (Research), EE&CS, Stanford Efficiency and Programmability: Enablers for ExaScale Bill Dally Chief Scientist and SVP, Research NVIDIA Professor (Research), EE&CS, Stanford Scientific Discovery and Business Analytics Driving an Insatiable

More information

High Performance Computing in Europe and USA: A Comparison

High Performance Computing in Europe and USA: A Comparison High Performance Computing in Europe and USA: A Comparison Erich Strohmaier 1 and Hans W. Meuer 2 1 NERSC, Lawrence Berkeley National Laboratory, USA 2 University of Mannheim, Germany 1 Introduction In

More information

HW Trends and Architectures

HW Trends and Architectures Pavel Tvrdík, Jiří Kašpar (ČVUT FIT) HW Trends and Architectures MI-POA, 2011, Lecture 1 1/29 HW Trends and Architectures prof. Ing. Pavel Tvrdík CSc. Ing. Jiří Kašpar Department of Computer Systems Faculty

More information

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar CRAY XK6 REDEFINING SUPERCOMPUTING - Sanjana Rakhecha - Nishad Nerurkar CONTENTS Introduction History Specifications Cray XK6 Architecture Performance Industry acceptance and applications Summary INTRODUCTION

More information

What does Heterogeneity bring?

What does Heterogeneity bring? What does Heterogeneity bring? Ken Koch Scientific Advisor, CCS-DO, LANL LACSI 2006 Conference October 18, 2006 Some Terminology Homogeneous Of the same or similar nature or kind Uniform in structure or

More information

Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester

Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 12/24/09 1 Take a look at high performance computing What s driving HPC Future Trends 2 Traditional scientific

More information

What are Clusters? Why Clusters? - a Short History

What are Clusters? Why Clusters? - a Short History What are Clusters? Our definition : A parallel machine built of commodity components and running commodity software Cluster consists of nodes with one or more processors (CPUs), memory that is shared by

More information

Chapter 2: Computer-System Structures. Hmm this looks like a Computer System?

Chapter 2: Computer-System Structures. Hmm this looks like a Computer System? Chapter 2: Computer-System Structures Lab 1 is available online Last lecture: why study operating systems? Purpose of this lecture: general knowledge of the structure of a computer system and understanding

More information

Uniprocessor Computer Architecture Example: Cray T3E

Uniprocessor Computer Architecture Example: Cray T3E Chapter 2: Computer-System Structures MP Example: Intel Pentium Pro Quad Lab 1 is available online Last lecture: why study operating systems? Purpose of this lecture: general knowledge of the structure

More information

Parallel Programming

Parallel Programming Parallel Programming Introduction Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 Acknowledgements Prof. Felix Wolf, TU Darmstadt Prof. Matthias

More information

Performance Analysis, Modeling and Tuning at Scale. Katherine Yelick

Performance Analysis, Modeling and Tuning at Scale. Katherine Yelick C O M P U T A T I O N A L R E S E A R C H D I V I S I O N Performance Analysis, Modeling and Tuning at Scale Katherine Yelick Berkeley Institute for Performance Studies Lawrence Berkeley National Lab and

More information

Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester

Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 12/3/09 1 ! Take a look at high performance computing! What s driving HPC! Issues with power consumption! Future

More information

Intel Many Integrated Core (MIC) Architecture

Intel Many Integrated Core (MIC) Architecture Intel Many Integrated Core (MIC) Architecture Karl Solchenbach Director European Exascale Labs BMW2011, November 3, 2011 1 Notice and Disclaimers Notice: This document contains information on products

More information

John Hengeveld Director of Marketing, HPC Evangelist

John Hengeveld Director of Marketing, HPC Evangelist MIC, Intel and Rearchitecting for Exascale John Hengeveld Director of Marketing, HPC Evangelist Intel Data Center Group Dr. Jean-Laurent Philippe, PhD Technical Sales Manager & Exascale Technical Lead

More information

Lecture 3: Intro to parallel machines and models

Lecture 3: Intro to parallel machines and models Lecture 3: Intro to parallel machines and models David Bindel 1 Sep 2011 Logistics Remember: http://www.cs.cornell.edu/~bindel/class/cs5220-f11/ http://www.piazza.com/cornell/cs5220 Note: the entire class

More information

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber NERSC Site Update National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Richard Gerber NERSC Senior Science Advisor High Performance Computing Department Head Cori

More information

HPC and the AppleTV-Cluster

HPC and the AppleTV-Cluster HPC and the AppleTV-Cluster Dieter Kranzlmüller, Karl Fürlinger, Christof Klausecker Munich Network Management Team Ludwig-Maximilians-Universität München (LMU) & Leibniz Supercomputing Centre (LRZ) Outline

More information

IBM Blue Gene/Q solution

IBM Blue Gene/Q solution IBM Blue Gene/Q solution Pascal Vezolle vezolle@fr.ibm.com Broad IBM Technical Computing portfolio Hardware Blue Gene/Q Power Systems 86 Systems idataplex and Intelligent Cluster GPGPU / Intel MIC PureFlexSystems

More information

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel

More information

Practical Scientific Computing

Practical Scientific Computing Practical Scientific Computing Performance-optimized Programming Preliminary discussion: July 11, 2008 Dr. Ralf-Peter Mundani, mundani@tum.de Dipl.-Ing. Ioan Lucian Muntean, muntean@in.tum.de MSc. Csaba

More information

Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov

Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright njwright @ lbl.gov NERSC- National Energy Research Scientific Computing Center Mission: Accelerate the pace of scientific discovery

More information

Evaluation of sparse LU factorization and triangular solution on multicore architectures. X. Sherry Li

Evaluation of sparse LU factorization and triangular solution on multicore architectures. X. Sherry Li Evaluation of sparse LU factorization and triangular solution on multicore architectures X. Sherry Li Lawrence Berkeley National Laboratory ParLab, April 29, 28 Acknowledgement: John Shalf, LBNL Rich Vuduc,

More information

The way toward peta-flops

The way toward peta-flops The way toward peta-flops ISC-2011 Dr. Pierre Lagier Chief Technology Officer Fujitsu Systems Europe Where things started from DESIGN CONCEPTS 2 New challenges and requirements! Optimal sustained flops

More information

CS758: Multicore Programming

CS758: Multicore Programming CS758: Multicore Programming Introduction Fall 2009 1 CS758 Credits Material for these slides has been contributed by Prof. Saman Amarasinghe, MIT Prof. Mark Hill, Wisconsin Prof. David Patterson, Berkeley

More information

Cray XC Scalability and the Aries Network Tony Ford

Cray XC Scalability and the Aries Network Tony Ford Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?

More information

NII Lectures: Challenges and Opportunities of Parallelism

NII Lectures: Challenges and Opportunities of Parallelism NII Lectures: Challenges and Opportunities of Parallelism Lawrence Snyder www.cs.washington.edu/home/snyder 17 September 2008 The opportunity to visit NII is appreciated Thank you for the forum to speak

More information

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES P(ND) 2-2 2014 Guillaume Colin de Verdière OCTOBER 14TH, 2014 P(ND)^2-2 PAGE 1 CEA, DAM, DIF, F-91297 Arpajon, France October 14th, 2014 Abstract:

More information

Interconnect Challenges in a Many Core Compute Environment. Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp

Interconnect Challenges in a Many Core Compute Environment. Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp Interconnect Challenges in a Many Core Compute Environment Jerry Bautista, PhD Gen Mgr, New Business Initiatives Intel, Tech and Manuf Grp Agenda Microprocessor general trends Implications Tradeoffs Summary

More information

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics

CS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics CS4230 Parallel Programming Lecture 3: Introduction to Parallel Architectures Mary Hall August 28, 2012 Homework 1: Parallel Programming Basics Due before class, Thursday, August 30 Turn in electronically

More information

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29

Introduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29 Introduction CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction Spring 2018 1 / 29 Outline 1 Preface Course Details Course Requirements 2 Background Definitions

More information

Master Informatics Eng.

Master Informatics Eng. Advanced Architectures Master Informatics Eng. 207/8 A.J.Proença The Roofline Performance Model (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 207/8 AJProença, Advanced Architectures,

More information

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers Outline Execution Environments for Parallel Applications Master CANS 2007/2008 Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya Supercomputers OS abstractions Extended OS

More information

Fujitsu s Approach to Application Centric Petascale Computing

Fujitsu s Approach to Application Centric Petascale Computing Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview

More information

Timothy Lanfear, NVIDIA HPC

Timothy Lanfear, NVIDIA HPC GPU COMPUTING AND THE Timothy Lanfear, NVIDIA FUTURE OF HPC Exascale Computing will Enable Transformational Science Results First-principles simulation of combustion for new high-efficiency, lowemision

More information

Implicit and Explicit Optimizations for Stencil Computations

Implicit and Explicit Optimizations for Stencil Computations Implicit and Explicit Optimizations for Stencil Computations By Shoaib Kamil 1,2, Kaushik Datta 1, Samuel Williams 1,2, Leonid Oliker 2, John Shalf 2 and Katherine A. Yelick 1,2 1 BeBOP Project, U.C. Berkeley

More information

Intel High-Performance Computing. Technologies for Engineering

Intel High-Performance Computing. Technologies for Engineering 6. LS-DYNA Anwenderforum, Frankenthal 2007 Keynote-Vorträge II Intel High-Performance Computing Technologies for Engineering H. Cornelius Intel GmbH A - II - 29 Keynote-Vorträge II 6. LS-DYNA Anwenderforum,

More information

TOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology

TOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology TOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology BY ERICH STROHMAIER COMPUTER SCIENTIST, FUTURE TECHNOLOGIES GROUP, LAWRENCE BERKELEY

More information

Roadmapping of HPC interconnects

Roadmapping of HPC interconnects Roadmapping of HPC interconnects MIT Microphotonics Center, Fall Meeting Nov. 21, 2008 Alan Benner, bennera@us.ibm.com Outline Top500 Systems, Nov. 2008 - Review of most recent list & implications on interconnect

More information

Supercomputers. Alex Reid & James O'Donoghue

Supercomputers. Alex Reid & James O'Donoghue Supercomputers Alex Reid & James O'Donoghue The Need for Supercomputers Supercomputers allow large amounts of processing to be dedicated to calculation-heavy problems Supercomputers are centralized in

More information

ECE 588/688 Advanced Computer Architecture II

ECE 588/688 Advanced Computer Architecture II ECE 588/688 Advanced Computer Architecture II Instructor: Alaa Alameldeen alaa@ece.pdx.edu Fall 2009 Portland State University Copyright by Alaa Alameldeen and Haitham Akkary 2009 1 When and Where? When:

More information

The Mont-Blanc Project

The Mont-Blanc Project http://www.montblanc-project.eu The Mont-Blanc Project Daniele Tafani Leibniz Supercomputing Centre 1 Ter@tec Forum 26 th June 2013 This project and the research leading to these results has received funding

More information

Parallel Computer Architecture - Basics -

Parallel Computer Architecture - Basics - Parallel Computer Architecture - Basics - Christian Terboven 19.03.2012 / Aachen, Germany Stand: 15.03.2012 Version 2.3 Rechen- und Kommunikationszentrum (RZ) Agenda Processor

More information

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method

Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method Introducing a Cache-Oblivious Blocking Approach for the Lattice Boltzmann Method G. Wellein, T. Zeiser, G. Hager HPC Services Regional Computing Center A. Nitsure, K. Iglberger, U. Rüde Chair for System

More information

Parallel Computer Architecture II

Parallel Computer Architecture II Parallel Computer Architecture II Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-692 Heidelberg phone: 622/54-8264 email: Stefan.Lang@iwr.uni-heidelberg.de

More information

EECS Electrical Engineering and Computer Sciences

EECS Electrical Engineering and Computer Sciences P A R A L L E L C O M P U T I N G L A B O R A T O R Y Auto-tuning Stencil Codes for Cache-Based Multicore Platforms Kaushik Datta Dissertation Talk December 4, 2009 Motivation Multicore revolution has

More information

Multi-core Programming - Introduction

Multi-core Programming - Introduction Multi-core Programming - Introduction Based on slides from Intel Software College and Multi-Core Programming increasing performance through software multi-threading by Shameem Akhter and Jason Roberts,

More information

Parallel Computing: From Inexpensive Servers to Supercomputers

Parallel Computing: From Inexpensive Servers to Supercomputers Parallel Computing: From Inexpensive Servers to Supercomputers Lyle N. Long The Pennsylvania State University & The California Institute of Technology Seminar to the Koch Lab http://www.personal.psu.edu/lnl

More information

Steve Scott, Tesla CTO SC 11 November 15, 2011

Steve Scott, Tesla CTO SC 11 November 15, 2011 Steve Scott, Tesla CTO SC 11 November 15, 2011 What goal do these products have in common? Performance / W Exaflop Expectations First Exaflop Computer K Computer ~10 MW CM5 ~200 KW Not constant size, cost

More information

The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality! Katie Antypas, Tina Butler, and Jonathan Carter

The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality! Katie Antypas, Tina Butler, and Jonathan Carter The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality! Katie Antypas, Tina Butler, and Jonathan Carter CUG 2011, May 25th, 2011 1 Requirements to Reality Develop RFP Select

More information

European energy efficient supercomputer project

European energy efficient supercomputer project http://www.montblanc-project.eu European energy efficient supercomputer project Simon McIntosh-Smith University of Bristol (Based on slides from Alex Ramirez, BSC) Disclaimer: Speaking for myself... All

More information

Saman Amarasinghe and Rodric Rabbah Massachusetts Institute of Technology

Saman Amarasinghe and Rodric Rabbah Massachusetts Institute of Technology Saman Amarasinghe and Rodric Rabbah Massachusetts Institute of Technology http://cag.csail.mit.edu/ps3 6.189-chair@mit.edu A new processor design pattern emerges: The Arrival of Multicores MIT Raw 16 Cores

More information

Real Parallel Computers

Real Parallel Computers Real Parallel Computers Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005 Short history

More information

CS4961 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/30/11. Administrative UPDATE. Mary Hall August 30, 2011

CS4961 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/30/11. Administrative UPDATE. Mary Hall August 30, 2011 CS4961 Parallel Programming Lecture 3: Introduction to Parallel Architectures Administrative UPDATE Nikhil office hours: - Monday, 2-3 PM, MEB 3115 Desk #12 - Lab hours on Tuesday afternoons during programming

More information

GPU-centric communication for improved efficiency

GPU-centric communication for improved efficiency GPU-centric communication for improved efficiency Benjamin Klenk *, Lena Oden, Holger Fröning * * Heidelberg University, Germany Fraunhofer Institute for Industrial Mathematics, Germany GPCDP Workshop

More information