An Overview of High Performance Computing
|
|
- Delphia Hood
- 6 years ago
- Views:
Transcription
1 IFIP Working Group 10.3 on Concurrent Systems An Overview of High Performance Computing Jack Dongarra University of Tennessee and Oak Ridge National Laboratory 1/3/2006 1
2 Overview Look at fastest computers From the Top500 Some of the changes that face us Hardware Software Algorithms 00 2
3 H. Meuer, H. Simon, E. Strohmaier, & JD - Listing of the 500 most powerful Computers in the World - Yardstick: Rmax from LINPACK MPP Ax=b, dense problem TPP performance - Updated twice a year SC xy in the States in November Meeting in Germany in June Started in 1993, 13 years of data 00 3 Size - All data available from Rate
4 Performance Development 2.3 PF/s 1 Pflop/s TF/s 100 Tflop/s 10 Tflop/s TF/s SUM N=1 NEC Earth Simulator IBM BlueGene/L 1 Tflop/s 100 Gflop/s 10 Gflop/s 1 Gflop/s 59.7 GF/s 0.4 GF/s Fujitsu 'NWT' NAL Intel ASCI Red Sandia N=500 IBM ASCI White LLNL My Laptop TF/s 100 Mflop/s
5 Architecture/Systems Continuum Tightly Coupled Custom processor Best processor performance for with custom interconnect codes that are not cache friendly Cray X1 NEC SX-8 Good communication performance IBM Regatta Simpler programming model IBM Blue Gene/L Most expensive Loosely Coupled Commodity processor with custom interconnect SGI Altix Intel Itanium 2 Cray XT3, XD1 AMD Opteron Commodity processor with commodity interconnect Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics NEC TX7 IBM eserver 00 Dawning 5 Good communication performance Good scalability Best price/performance (for codes that work well with caches and are latency tolerant) More complex programming model
6 Architecture/Systems Continuum Tightly Coupled Custom processor Best processor performance for with custom interconnect codes Custom that are not cache 80% friendly Cray X1 NEC SX-8 Good communication performance IBM Regatta Simpler programming model IBM Blue Gene/L 60% Most expensive Loosely Coupled Commodity processor with custom interconnect SGI Altix Intel Itanium 2 Cray XT3, XD1 AMD Opteron Commodity processor with commodity interconnect Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics NEC TX7 IBM eserver 00 Dawning 6 100% 40% 20% 0% Good communication Hybrid performance Good scalability Best price/performance (for codes that work well with caches Commod and are latency tolerant) More complex programming model Jun-93 Dec-93 Jun-94 Dec-94 Jun-95 Dec-95 Jun-96 Dec-96 Jun-97 Dec-97 Jun-98 Dec-98 Jun-99 Dec-99 Jun-00 Dec-00 Jun-01 Dec-01 Jun-02 Dec-02 Jun-03 Dec-03 Jun- 04
7 Commodity Processors Intel Pentium Nocona 3.6 GHz, peak = 7.2 Gflop/s Linpack 100 = 1.8 Gflop/s Linpack 1000 = 4.2 Gflop/s Intel Itanium GHz, peak = 6.4 Gflop/s Linpack 100 = 1.7 Gflop/s Linpack 1000 = 5.7 Gflop/s AMD Opteron 2.6 GHz, peak = 5.2 Gflop/s Linpack 100 = 1.6 Gflop/s 00 7 Linpack 1000 = 3.9 Gflop/s
8 Architectures / Systems SIMD Single Proc. Cluster (360) Constellations SMP 0 MPP Cluster: Commodity processors & Commodity interconnect 00 8 Constellation: # of procs/node nodes in the system
9 Interconnects / Systems Others Cray Interconnect SP Switch Crossbar Quadrics Infiniband Myrinet (101) Gigabit Ethernet N/A (249)
10 Processor Types 500 SIMD 400 Sparc Vector 300 MIPS 200 Alpha HP 100 AMD IBM Power 0 Intel % 00 10
11 Processors Used in Each of the 500 Systems Hitachi SR8000 0% Sun Sparc 1% NEC 1% HP Alpha 1% Cray 2% Intel IA-32 41% 91% = 66% Intel 15% IBM 11% AMD HP PA-RISC 3% Intel IA-64 9% Intel EM64T 16% AMD x86_64 IBM Power 00 11% 15% 11
12 26th List: The TOP10 Manufacturer Computer Rmax [TF/s] Installation Site Country Year #Proc 1 IBM BlueGene/L eserver Blue Gene DOE/NNSA/LLNL USA IBM BGW eserver Blue Gene IBM Thomas Watson USA IBM ASC Purple Power5 p DOE/NNSA/LLNL USA SGI Columbia Altix, Itanium/Infiniband NASA Ames USA Dell Thunderbird Pentium/Infiniband Sandia USA Cray Red Storm Cray XT3 AMD Sandia USA NEC Earth-Simulator SX Earth Simulator Center Japan IBM MareNostrum PPC 970/Myrinet Barcelona Supercomputer Center Spain IBM eserver Blue Gene ASTRON University Groningen Netherlands Jaguar 10 Cray Oak Ridge National Lab USA Cray XT3 AMD
13 Customer Segments / Performance 100% 90% 80% 70% 60% 50% Classified Vendor Academic Industry Government 50% 40% 30% 20% Research 10% 0%
14 Countries / Performance 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Others France China Germany UK Japan 14% 1.0% 2.6% 3.0% 5.4% 6.0% US 68%
15 Asian Countries / Systems Others 17 India 4 Korea 7 China 17 Japan
16 Concurrency Levels of the Top k-128k 32k-64k 16k-32k 8k-16k 4k-8k
17 Concurrency Levels of the Top500 1,000, ,000 Maximum 131K 10,000 1, Average Minimum # processors 10 1 Jun-93 Jun-94 Jun-95 Jun-96 Jun-97 Jun-98 Jun-99 Jun-00 Jun-01 Jun-02 Jun-03 Jun-04 Jun
18 IBM BlueGene/L /L #1 131,072 Processors Total of 18 systems all in the Top MWatts (1600 homes) 43,000 ops/s/person (64 racks, 64x32x32) Rack 131,072 procs (32 Node boards, 8x8x16) 2048 processors BlueGene/L Compute ASIC Node Board (32 chips, 4x4x2) 16 Compute Cards 64 processors Compute Card (2 chips, 2x1x1) 4 processors Chip (2 processors) 2.8/5.6 GF/s 4 MB (cache) 5.6/11.2 GF/s 1 GB DDR 90/180 GF/s 16 GB DDR 2.9/5.7 TF/s 0.5 TB DDR 180/360 TF/s 32 TB DDR Fastest Computer BG/L 700 MHz 131K proc The compute node ASICs include all networking and processor functionality. 64 racks Each compute ASIC includes two 32-bit superscalar PowerPC 440 embedded 00 Peak: 367 Tflop/s cores (note that L1 cache coherence is not maintained between these cores). 18 (13K sec about 3.6 hours; n=1.8m) Linpack: 281 Tflop/s 77% of peak Full system total of 131,072 processors
19 Performance Projection 1 Eflop/s 100 Pflop/s 10 Pflop/s 1 Pflop/s 100 Tflop/s 10 Tflop/s 1 Tflop/s SUM DARPA HPCS 100 Gflop/s 10 Gflop/s 1 Gflop/s 100 Mflop/s N=1 N=500 My Laptop
20 A PetaFlop Computer by the End of the Decade 10 Companies working on a building a Petaflop system by the end of the decade. Cray } IBM Sun Dawning Galactic Lenovo Hitachi NEC Fujitsu Bull } Chinese } Chinese Companies Japanese Life Simulator (10 Pflop/s) 00 20
21 Flops per Gross Domestic Product Based on the November 2005 Top500 Israel USA New Zealand Switzerland Russia UK Netherlands Australian South Korea Saudi Arabia China Japan Ireland Spain Germany Taiwan Canada Brazil 00 21
22 KFlop/s per Capita (Flops/Pop) Based on the November 2005 Top Hint: Peter Jackson had something to do with this United States Switzerland Israel New Zealand United Kingdom Netherlands Australia Japan Germany Spain Canada Korea, South Sweden Saudia Arabia France Taiwan Italy Brazil Mexico China Russia India Has nothing to do with the 47.2 million sheep in NZ
23 KFlop/s per Capita (Flops/Pop) Based on the November 2005 Top Hint: Peter Jackson had something to do with this WETA Digital (Lord of the Rings) United States Switzerland Israel New Zealand United Kingdom Netherlands Australia Japan Germany Spain Canada Korea, South Sweden Saudia Arabia France Taiwan Italy Brazil Mexico China Russia India Has nothing to do with the 47.2 million sheep in NZ
24 Fuel Efficiency: GFlops/Watt BlueGene/L Blue Gene ASC Purple p5 1.9 GHz Columbia - SGI Altix 1.5 GHz Thunderbird - Pentium 3.6 GHz Red Storm Cray XT3, 2.0 GHz Earth-Simulator MareNostrum PPC 970, 2.2 GHz Blue Gene Jaguar - Cray XT3, 2.4 GHz Thunder - Intel Itanium2 1.4GHz Blue Gene Blue Gene Cray XT3, 2.6 GHz Apple XServe, 2.0 GHz Cray X1E (4GB) Cray X1E (2GB) ASCI Q - Alpha 1.25 GHz IBM p GHz System X 2.3 GHz Apple XServe/ Top 20 systems Based on processor power rating only (3,>100,>800) GFlops/Watt
25 10,000 Sun s s Surface Power Density (W/cm 2 ) 1, Hot Plate 486 Nuclear Reactor Rocket Nozzle Pentium Intel Developer Forum, Spring Pat Gelsinger (Pentium at 90 W) Cube relationship between the cycle time and power.
26 Lower Voltage Increase Clock Rate & Transistor Density We have seen increasing number of gates on a chip and increasing clock speed. Heat becoming an unmanageable problem, Intel Processors > 100 Watts We will not see the dramatic increases in clock speeds in the future. However, the number of gates on a chip will continue to increase Intel Yonah doubles the processing power on a per watt basis.
27 1 Core Operations per second for serial code Free Lunch For Traditional Software (It just runs twice as fast every 18 months with no change to the code!) 3GHz 1 Core 6 GHz 1 Core 12 GHz, 1 Core 24 GHz, 1 Core Additional operations per second if code can take advantage of concurrency From Craig Mundie, Microsoft
28 1 Core No Free Lunch For Traditional Software (Without highly concurrent software it won t get any faster!) 2 Cores 3 GHz, 4 Cores 4 Cores 3 GHz, 8 Cores 8 Cores Operations per second for serial code Free Lunch For Traditional Software (It just runs twice as fast every 18 months with no change to the code!) 6 GHz 1 Core 12 GHz, 1 Core 24 GHz, 1 Core 3 GHz 2 Cores 3GHz 1 Core Additional operations per second if code can take advantage of concurrency From Craig Mundie, Microsoft
29 CPU Desktop Trends Relative processing power will continue to double every 18 months 256 logical processors per chip in late Year Hardware Threads Per Chip Cores Per Processor Chip
30 Commodity Processor Trends Bandwidth/Latency is the Critical Issue, not FLOPS Annual increase Typical value in 2005 Typical value in 2010 Got Bandwidth? Typical value in 2020 Single-chip floating-point performance 59% 4 GFLOP/s 32 GFLOP/s 3300 GFLOP/s Front-side bus bandwidth 23% 1 GWord/s = 0.25 word/flop 3.5 GWord/s = 0.11 word/flop 27 GWord/s = word/flop DRAM latency (5.5%) 70 ns = 280 FP ops = 70 loads 50 ns = 1600 FP ops = 170 loads 28 ns = 94,000 FP ops = 780 loads 00 Source: Getting Up to Speed: The Future of Supercomputing, National Research Council, pages, 2004, National Academies Press, Washington DC, ISBN
31 Commodity Processor Trends Bandwidth/Latency is the Critical Issue, not FLOPS Annual increase Typical value in 2005 Typical value in 2010 Got Bandwidth? Typical value in 2020 Single-chip floating-point performance 59% 4 GFLOP/s 32 GFLOP/s 3300 GFLOP/s Front-side bus bandwidth 23% 1 GWord/s = 0.25 word/flop 3.5 GWord/s = 0.11 word/flop 27 GWord/s = word/flop DRAM latency (5.5%) 70 ns = 280 FP ops = 70 loads 50 ns = 1600 FP ops = 170 loads 28 ns = 94,000 FP ops = 780 loads 00 Source: Getting Up to Speed: The Future of Supercomputing, National Research Council, pages, 2004, National Academies Press, Washington DC, ISBN
32 Future Challenge: Developing the Ecosystem for HPC From the NRC Report on The Future of Supercomputing : Hardware, software, algorithms, tools, networks, institutions, applications, and people who solve supercomputing applications can be thought of collectively as a multifaceted ecosystem Research investment in HPC should be informed by the ecosystem point of view - progress must come on a broad front of interrelated technologies, rather than in the form of individual breakthroughs. A supercomputer ecosystem is a continuum of computing platforms, system software, algorithms, tools, networks, and the people who know how to exploit them to solve computational science applications
33 Real Crisis With HPC Is With The Software Our ability to configure a hardware system capable of 1 PetaFlop (10 15 ops/s) is without question just a matter of time and $$. A supercomputer application and software are usually much more long-lived than a hardware Hardware life typically five years at most. Apps years Fortran and C are the main programming models (still!!) The REAL CHALLENGE is Software Programming hasn t changed since the 70 s HUGE manpower investment MPI is that all there is? Often requires HERO programming Investments in the entire software stack is required (OS, libs, etc.) Software is a major cost component of modern technologies. The tradition in HPC system procurement is to assume that the software is free SOFTWARE COSTS (over and over) 00 33
34 Summary of Current Unmet Needs Performance / Portability Fault tolerance Memory bandwidth/latency Adaptability: Some degree of autonomy to self optimize, test, or monitor. Able to change mode of operation: static or dynamic Better programming models Global shared address space Visible locality Maybe coming soon (incremental, yet offering real benefits): Global Address Space (GAS) languages: UPC, Co-Array Fortran, Titanium, Chapel) Minor extensions to existing languages More convenient than MPI Have performance transparency via explicit remote memory references What s needed is a long-term, balanced investment in hardware, software, algorithms and applications
35 Collaborators / Support Top500 Team Erich Strohmaier, NERSC Hans Meuer, Mannheim Horst Simon, NERSC Slides are available at my website
An Overview of High Performance Computing. Jack Dongarra University of Tennessee and Oak Ridge National Laboratory 11/29/2005 1
An Overview of High Performance Computing Jack Dongarra University of Tennessee and Oak Ridge National Laboratory 11/29/ 1 Overview Look at fastest computers From the Top5 Some of the changes that face
More informationSelf Adapting Numerical Software. Self Adapting Numerical Software (SANS) Effort and Fault Tolerance in Linear Algebra Algorithms
Self Adapting Numerical Software (SANS) Effort and Fault Tolerance in Linear Algebra Algorithms Jack Dongarra University of Tennessee and Oak Ridge National Laboratory 9/19/2005 1 Overview Quick look at
More informationHigh Performance Computing in Europe and USA: A Comparison
High Performance Computing in Europe and USA: A Comparison Hans Werner Meuer University of Mannheim and Prometeus GmbH 2nd European Stochastic Experts Forum Baden-Baden, June 28-29, 2001 Outlook Introduction
More informationWhat have we learned from the TOP500 lists?
What have we learned from the TOP500 lists? Hans Werner Meuer University of Mannheim and Prometeus GmbH Sun HPC Consortium Meeting Heidelberg, Germany June 19-20, 2001 Outlook TOP500 Approach Snapshots
More informationPresentation of the 16th List
Presentation of the 16th List Hans- Werner Meuer, University of Mannheim Erich Strohmaier, University of Tennessee Jack J. Dongarra, University of Tennesse Horst D. Simon, NERSC/LBNL SC2000, Dallas, TX,
More informationTOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology
TOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology BY ERICH STROHMAIER COMPUTER SCIENTIST, FUTURE TECHNOLOGIES GROUP, LAWRENCE BERKELEY
More informationTOP500 Listen und industrielle/kommerzielle Anwendungen
TOP500 Listen und industrielle/kommerzielle Anwendungen Hans Werner Meuer Universität Mannheim Gesprächsrunde Nichtnumerische Anwendungen im Bereich des Höchstleistungsrechnens des BMBF Berlin, 16./ 17.
More informationThe TOP500 list. Hans-Werner Meuer University of Mannheim. SPEC Workshop, University of Wuppertal, Germany September 13, 1999
The TOP500 list Hans-Werner Meuer University of Mannheim SPEC Workshop, University of Wuppertal, Germany September 13, 1999 Outline TOP500 Approach HPC-Market as of 6/99 Market Trends, Architecture Trends,
More informationReal Parallel Computers
Real Parallel Computers Modular data centers Overview Short history of parallel machines Cluster computing Blue Gene supercomputer Performance development, top-500 DAS: Distributed supercomputing Short
More informationThe TOP500 Project of the Universities Mannheim and Tennessee
The TOP500 Project of the Universities Mannheim and Tennessee Hans Werner Meuer University of Mannheim EURO-PAR 2000 29. August - 01. September 2000 Munich/Germany Outline TOP500 Approach HPC-Market as
More informationChapter 5b: top500. Top 500 Blades Google PC cluster. Computer Architecture Summer b.1
Chapter 5b: top500 Top 500 Blades Google PC cluster Computer Architecture Summer 2005 5b.1 top500: top 10 Rank Site Country/Year Computer / Processors Manufacturer Computer Family Model Inst. type Installation
More informationOutline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers
Outline Execution Environments for Parallel Applications Master CANS 2007/2008 Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya Supercomputers OS abstractions Extended OS
More informationStockholm Brain Institute Blue Gene/L
Stockholm Brain Institute Blue Gene/L 1 Stockholm Brain Institute Blue Gene/L 2 IBM Systems & Technology Group and IBM Research IBM Blue Gene /P - An Overview of a Petaflop Capable System Carl G. Tengwall
More informationPresentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories
HPC Benchmarking Presentations: Jack Dongarra, University of Tennessee & ORNL The HPL Benchmark: Past, Present & Future Mike Heroux, Sandia National Laboratories The HPCG Benchmark: Challenges It Presents
More informationJack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester
Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 12/24/09 1 Take a look at high performance computing What s driving HPC Future Trends 2 Traditional scientific
More informationReal Parallel Computers
Real Parallel Computers Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005 Short history
More informationLeistungsanalyse von Rechnersystemen
Center for Information Services and High Performance Computing (ZIH) Leistungsanalyse von Rechnersystemen 10. November 2010 Nöthnitzer Straße 46 Raum 1026 Tel. +49 351-463 - 35048 Holger Brunst (holger.brunst@tu-dresden.de)
More informationHigh Performance Computing in Europe and USA: A Comparison
High Performance Computing in Europe and USA: A Comparison Erich Strohmaier 1 and Hans W. Meuer 2 1 NERSC, Lawrence Berkeley National Laboratory, USA 2 University of Mannheim, Germany 1 Introduction In
More informationThe Mont-Blanc approach towards Exascale
http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are
More informationMaking a Case for a Green500 List
Making a Case for a Green500 List S. Sharma, C. Hsu, and W. Feng Los Alamos National Laboratory Virginia Tech Outline Introduction What Is Performance? Motivation: The Need for a Green500 List Challenges
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all
More informationJack Dongarra INNOVATIVE COMP ING LABORATORY. University i of Tennessee Oak Ridge National Laboratory
Computational Science, High Performance Computing, and the IGMCS Program Jack Dongarra INNOVATIVE COMP ING LABORATORY University i of Tennessee Oak Ridge National Laboratory 1 The Third Pillar of 21st
More informationSupercomputing im Jahr eine Analyse mit Hilfe der TOP500 Listen
Supercomputing im Jahr 2000 - eine Analyse mit Hilfe der TOP500 Listen Hans Werner Meuer Universität Mannheim Feierliche Inbetriebnahme von CLIC TU Chemnitz 11. Oktober 2000 TOP500 CLIC TU Chemnitz View
More informationSupercomputers. Alex Reid & James O'Donoghue
Supercomputers Alex Reid & James O'Donoghue The Need for Supercomputers Supercomputers allow large amounts of processing to be dedicated to calculation-heavy problems Supercomputers are centralized in
More informationEE 4683/5683: COMPUTER ARCHITECTURE
3/3/205 EE 4683/5683: COMPUTER ARCHITECTURE Lecture 8: Interconnection Networks Avinash Kodi, kodi@ohio.edu Agenda 2 Interconnection Networks Performance Metrics Topology 3/3/205 IN Performance Metrics
More informationJack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester
Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 12/3/09 1 ! Take a look at high performance computing! What s driving HPC! Issues with power consumption! Future
More informationHigh Performance Computing: Blue-Gene and Road Runner. Ravi Patel
High Performance Computing: Blue-Gene and Road Runner Ravi Patel 1 HPC General Information 2 HPC Considerations Criterion Performance Speed Power Scalability Number of nodes Latency bottlenecks Reliability
More informationCS 5803 Introduction to High Performance Computer Architecture: Performance Metrics
CS 5803 Introduction to High Performance Computer Architecture: Performance Metrics A.R. Hurson 323 Computer Science Building, Missouri S&T hurson@mst.edu 1 Instructor: Ali R. Hurson 323 CS Building hurson@mst.edu
More informationArchitetture di calcolo e di gestione dati a alte prestazioni in HEP IFAE 2006, Pavia
Architetture di calcolo e di gestione dati a alte prestazioni in HEP IFAE 2006, Pavia Marco Briscolini Deep Computing Sales Marco_briscolini@it.ibm.com IBM Pathways to Deep Computing Single Integrated
More informationRecent Trends in the Marketplace of High Performance Computing
Recent Trends in the Marketplace of High Performance Computing Erich Strohmaier 1, Jack J. Dongarra 2, Hans W. Meuer 3, and Horst D. Simon 4 High Performance Computing, HPC Market, Supercomputer Market,
More informationLawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory
Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Title TOP500 Supercomputers for June 2005 Permalink https://escholarship.org/uc/item/4h84j873 Authors Strohmaier, Erich Meuer,
More informationBlue Gene: A Next Generation Supercomputer (BlueGene/P)
Blue Gene: A Next Generation Supercomputer (BlueGene/P) Presented by Alan Gara (chief architect) representing the Blue Gene team. 2007 IBM Corporation Outline of Talk A brief sampling of applications on
More informationCS 267: Introduction to Parallel Machines and Programming Models Lecture 3 "
CS 267: Introduction to Parallel Machines and Lecture 3 " James Demmel www.cs.berkeley.edu/~demmel/cs267_spr15/!!! Outline Overview of parallel machines (~hardware) and programming models (~software) Shared
More informationOverview. CS 472 Concurrent & Parallel Programming University of Evansville
Overview CS 472 Concurrent & Parallel Programming University of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information Science, University
More informationTop500
Top500 www.top500.org Salvatore Orlando (from a presentation by J. Dongarra, and top500 website) 1 2 MPPs Performance on massively parallel machines Larger problem sizes, i.e. sizes that make sense Performance
More informationRecent trends in the marketplace of high performance computing
Parallel Computing 31 (2005) 261 273 www.elsevier.com/locate/parco Recent trends in the marketplace of high performance computing Erich Strohmaier a, *, Jack J. Dongarra b,c, Hans W. Meuer d, Horst D.
More informationParallel Computing Platforms
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationWhat are Clusters? Why Clusters? - a Short History
What are Clusters? Our definition : A parallel machine built of commodity components and running commodity software Cluster consists of nodes with one or more processors (CPUs), memory that is shared by
More informationEuropean energy efficient supercomputer project
http://www.montblanc-project.eu European energy efficient supercomputer project Simon McIntosh-Smith University of Bristol (Based on slides from Alex Ramirez, BSC) Disclaimer: Speaking for myself... All
More informationAim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group
Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.
More informationParallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple
More informationCS 267: Introduction to Parallel Machines and Programming Models Lecture 3 "
CS 267: Introduction to Parallel Machines and Programming Models Lecture 3 " James Demmel www.cs.berkeley.edu/~demmel/cs267_spr16/!!! Outline Overview of parallel machines (~hardware) and programming models
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationBlueGene/L. Computer Science, University of Warwick. Source: IBM
BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt
More informationrepresent parallel computers, so distributed systems such as Does not consider storage or I/O issues
Top500 Supercomputer list represent parallel computers, so distributed systems such as SETI@Home are not considered Does not consider storage or I/O issues Both custom designed machines and commodity machines
More informationFabio AFFINITO.
Introduction to High Performance Computing Fabio AFFINITO What is the meaning of High Performance Computing? What does HIGH PERFORMANCE mean??? 1976... Cray-1 supercomputer First commercial successful
More informationParallel computer architecture classification
Parallel computer architecture classification Hardware Parallelism Computing: execute instructions that operate on data. Computer Instructions Data Flynn s taxonomy (Michael Flynn, 1967) classifies computer
More informationNode Hardware. Performance Convergence
Node Hardware Improved microprocessor performance means availability of desktop PCs with performance of workstations (and of supercomputers of 10 years ago) at significanty lower cost Parallel supercomputers
More informationParallel Computer Architecture II
Parallel Computer Architecture II Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-692 Heidelberg phone: 622/54-8264 email: Stefan.Lang@iwr.uni-heidelberg.de
More informationRoadmapping of HPC interconnects
Roadmapping of HPC interconnects MIT Microphotonics Center, Fall Meeting Nov. 21, 2008 Alan Benner, bennera@us.ibm.com Outline Top500 Systems, Nov. 2008 - Review of most recent list & implications on interconnect
More informationAlgorithms and Architecture. William D. Gropp Mathematics and Computer Science
Algorithms and Architecture William D. Gropp Mathematics and Computer Science www.mcs.anl.gov/~gropp Algorithms What is an algorithm? A set of instructions to perform a task How do we evaluate an algorithm?
More informationParallel Computing: From Inexpensive Servers to Supercomputers
Parallel Computing: From Inexpensive Servers to Supercomputers Lyle N. Long The Pennsylvania State University & The California Institute of Technology Seminar to the Koch Lab http://www.personal.psu.edu/lnl
More informationInfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment. TOP500 Supercomputers, June 2014
InfiniBand Strengthens Leadership as the Interconnect Of Choice By Providing Best Return on Investment TOP500 Supercomputers, June 2014 TOP500 Performance Trends 38% CAGR 78% CAGR Explosive high-performance
More informationArchitecture of the IBM Blue Gene Supercomputer. Dr. George Chiu IEEE Fellow IBM T.J. Watson Research Center Yorktown Heights, NY
Architecture of the IBM Blue Gene Supercomputer Dr. George Chiu IEEE Fellow IBM T.J. Watson Research Center Yorktown Heights, NY President Obama Honors IBM's Blue Gene Supercomputer With National Medal
More informationJack Dongarra INNOVATIVE COMP ING LABORATORY. University of Tennessee Oak Ridge National Laboratory University of Manchester 1/17/2008 1
Planned Developments of High End Systems Around the World Jack Dongarra INNOVATIVE COMP ING LABORATORY University of Tennessee Oak Ridge National Laboratory University of Manchester 1/17/2008 1 Planned
More informationParallel Languages: Past, Present and Future
Parallel Languages: Past, Present and Future Katherine Yelick U.C. Berkeley and Lawrence Berkeley National Lab 1 Kathy Yelick Internal Outline Two components: control and data (communication/sharing) One
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt
More informationSupercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?
Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, Mateo Valero SC 13, November 19 th 2013, Denver, CO, USA
More informationConvergence of Parallel Architecture
Parallel Computing Convergence of Parallel Architecture Hwansoo Han History Parallel architectures tied closely to programming models Divergent architectures, with no predictable pattern of growth Uncertainty
More informationDas TOP500-Projekt der Universitäten Mannheim und Tennessee zur Evaluierung des Supercomputer Marktes
Das TOP500-Projekt der Universitäten Mannheim und Tennessee zur Evaluierung des Supercomputer Marktes Hans-Werner Meuer Universität Mannheim RUM - Kolloquium 11. Januar 1999, 11:00 Uhr Outline TOP500 Approach
More informationJack Dongarra University of Tennessee Oak Ridge National Laboratory
Jack Dongarra University of Tennessee Oak Ridge National Laboratory 3/9/11 1 TPP performance Rate Size 2 100 Pflop/s 100000000 10 Pflop/s 10000000 1 Pflop/s 1000000 100 Tflop/s 100000 10 Tflop/s 10000
More informationCCS HPC. Interconnection Network. PC MPP (Massively Parallel Processor) MPP IBM
CCS HC taisuke@cs.tsukuba.ac.jp 1 2 CU memoryi/o 2 2 4single chipmulti-core CU 10 C CM (Massively arallel rocessor) M IBM BlueGene/L 65536 Interconnection Network 3 4 (distributed memory system) (shared
More informationCustomers want to transform their datacenter 80% 28% global IT budgets spent on maintenance. time spent on administrative tasks
Customers want to transform their datacenter 80% global IT budgets spent on maintenance 28% time spent on administrative tasks Cloud is a new way to think about your datacenter Traditional model Dedicated
More informationCommunication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.
Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance
More informationHPCS HPCchallenge Benchmark Suite
HPCS HPCchallenge Benchmark Suite David Koester, Ph.D. () Jack Dongarra (UTK) Piotr Luszczek () 28 September 2004 Slide-1 Outline Brief DARPA HPCS Overview Architecture/Application Characterization Preliminary
More informationPractical Scientific Computing
Practical Scientific Computing Performance-optimized Programming Preliminary discussion: July 11, 2008 Dr. Ralf-Peter Mundani, mundani@tum.de Dipl.-Ing. Ioan Lucian Muntean, muntean@in.tum.de MSc. Csaba
More informationLecture 9: MIMD Architecture
Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is
More informationIntel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins
Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications
More informationSMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems
Reference Papers on SMP/NUMA Systems: EE 657, Lecture 5 September 14, 2007 SMP and ccnuma Multiprocessor Systems Professor Kai Hwang USC Internet and Grid Computing Laboratory Email: kaihwang@usc.edu [1]
More informationTitan - Early Experience with the Titan System at Oak Ridge National Laboratory
Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid
More informationParallel Programming with MPI
Parallel Programming with MPI Science and Technology Support Ohio Supercomputer Center 1224 Kinnear Road. Columbus, OH 43212 (614) 292-1800 oschelp@osc.edu http://www.osc.edu/supercomputing/ Functions
More informationAn Overview of High Performance Computing and Challenges for the Future
An Overview of High Performance Computing and Challenges for the Future Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 6/15/2009 1 H. Meuer, H. Simon, E. Strohmaier,
More informationCSE5351: Parallel Procesisng. Part 1B. UTA Copyright (c) Slide No 1
Slide No 1 CSE5351: Parallel Procesisng Part 1B Slide No 2 State of the Art In Supercomputing Several of the next slides (or modified) are the courtesy of Dr. Jack Dongarra, a distinguished professor of
More informationLecture 7: Parallel Processing
Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction
More informationSteve Scott, Tesla CTO SC 11 November 15, 2011
Steve Scott, Tesla CTO SC 11 November 15, 2011 What goal do these products have in common? Performance / W Exaflop Expectations First Exaflop Computer K Computer ~10 MW CM5 ~200 KW Not constant size, cost
More informationHigh Performance Computing
CSC630/CSC730: Parallel & Distributed Computing Trends in HPC 1 High Performance Computing High-performance computing (HPC) is the use of supercomputers and parallel processing techniques for solving complex
More informationINTERNATIONAL ADVANCED RESEARCH WORKSHOP ON HIGH PERFORMANCE COMPUTING AND GRIDS Cetraro (Italy), July 3-6, 2006
INTERNATIONAL ADVANCED RESEARCH WORKSHOP ON HIGH PERFORMANCE COMPUTING AND GRIDS Cetraro (Italy), July 3-6, 2006 The Challenges of Multicore and Specialized Accelerators Jack Dongarra University of Tennessee
More informationChapter 1. Introduction
Chapter 1 Introduction Why High Performance Computing? Quote: It is hard to understand an ocean because it is too big. It is hard to understand a molecule because it is too small. It is hard to understand
More informationCSC 447: Parallel Programming for Multi- Core and Cluster Systems
CSC 447: Parallel Programming for Multi- Core and Cluster Systems Why Parallel Computing? Haidar M. Harmanani Spring 2017 Definitions What is parallel? Webster: An arrangement or state that permits several
More informationHigh Performance Computing - Parallel Computers and Networks. Prof Matt Probert
High Performance Computing - Parallel Computers and Networks Prof Matt Probert http://www-users.york.ac.uk/~mijp1 Overview Parallel on a chip? Shared vs. distributed memory Latency & bandwidth Topology
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationOverview. High Performance Computing - History of the Supercomputer. Modern Definitions (II)
Overview High Performance Computing - History of the Supercomputer Dr M. Probert Autumn Term 2017 Early systems with proprietary components, operating systems and tools Development of vector computing
More informationWhy Parallel Architecture
Why Parallel Architecture and Programming? Todd C. Mowry 15-418 January 11, 2011 What is Parallel Programming? Software with multiple threads? Multiple threads for: convenience: concurrent programming
More informationAn Overview of High Performance Computing and Future Requirements
An Overview of High Performance Computing and Future Requirements Jack Dongarra University of Tennessee Oak Ridge National Laboratory 1 H. Meuer, H. Simon, E. Strohmaier, & JD - Listing of the 500 most
More informationGodson Processor and its Application in High Performance Computers
Godson Processor and its Application in High Performance Computers Weiwu Hu Institute of Computing Technology, Chinese Academy of Sciences Loongson Technologies Corporation Limited hww@ict.ac.cn 1 Contents
More informationDheeraj Bhardwaj May 12, 2003
HPC Systems and Models Dheeraj Bhardwaj Department of Computer Science & Engineering Indian Institute of Technology, Delhi 110 016 India http://www.cse.iitd.ac.in/~dheerajb 1 Sequential Computers Traditional
More informationHigh-Performance Computing & Simulations in Quantum Many-Body Systems PART I. Thomas Schulthess
High-Performance Computing & Simulations in Quantum Many-Body Systems PART I Thomas Schulthess schulthess@phys.ethz.ch What exactly is high-performance computing? 1E10 1E9 1E8 1E7 relative performance
More informationTechnology Trends Presentation For Power Symposium
Technology Trends Presentation For Power Symposium 2006 8-23-06 Darryl Solie, Distinguished Engineer, Chief System Architect IBM Systems & Technology Group From Ingenuity to Impact Copyright IBM Corporation
More informationParallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer?
Parallel and Distributed Systems Instructor: Sandhya Dwarkadas Department of Computer Science University of Rochester What is a parallel computer? A collection of processing elements that communicate and
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected
More informationCOSC 6385 Computer Architecture - Multi Processor Systems
COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:
More informationHETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA
HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS
More informationTowards Massively Parallel Simulations of Massively Parallel High-Performance Computing Systems
Towards Massively Parallel Simulations of Massively Parallel High-Performance Computing Systems Robert Birke, German Rodriguez, Cyriel Minkenberg IBM Research Zurich Outline High-performance computing:
More informationCluster Network Products
Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster
More informationTop500 Supercomputer list
Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity
More informationCRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar
CRAY XK6 REDEFINING SUPERCOMPUTING - Sanjana Rakhecha - Nishad Nerurkar CONTENTS Introduction History Specifications Cray XK6 Architecture Performance Industry acceptance and applications Summary INTRODUCTION
More informationThe Center for Computational Research & Grid Computing
The Center for Computational Research & Grid Computing Russ Miller Center for Computational Research Computer Science & Engineering SUNY-Buffalo Hauptman-Woodward Medical Inst NSF, NIH, DOE NIMA, NYS,
More informationPreparing GPU-Accelerated Applications for the Summit Supercomputer
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership
More informationBrand-New Vector Supercomputer
Brand-New Vector Supercomputer NEC Corporation IT Platform Division Shintaro MOMOSE SC13 1 New Product NEC Released A Brand-New Vector Supercomputer, SX-ACE Just Now. Vector Supercomputer for Memory Bandwidth
More information