Search for Optimal Network Topologies for Supercomputers 寻找超级计算机优化的网络拓扑结构

Size: px
Start display at page:

Download "Search for Optimal Network Topologies for Supercomputers 寻找超级计算机优化的网络拓扑结构"

Transcription

1 Search for Optimal Network Topologies for Supercomputers 寻找超级计算机优化的网络拓扑结构 GUO, Meng 郭猛 Shandong Computer Science Center (National Supercomputer Center in Jinan) 山东省计算中心 ( 国家超级计算济南中心 ) 2014/11/5 Guangzhou 广州

2 Acknowledgements Y.F. Deng of Stony Brook, USA & National Supercomputer Center in Jinan, China M. Michalewicz and L. Orlowski of A*CRC, Singapore and Stony Brook T. Mayer, Z. Ye, and L. Zhang of Stony Brook, USA C. C. Hwang, Y. T. Chen, C. H. Liang, and S. W. Liou of NCKU, Taiwan Joint work with Prof Deng s students, postdocs, and other colleagues Early work was done on Shenway Bluelight at The National Supercomputer Center in Jinan, China

3 Search for Optimal Network Topologies for Supercomputers 01 Motivations 02 Reviews of Network Topologies 03 Search for Optimal Topologies 04 Summaries

4 Development of calculator / computer Tianhe-2 (2013) ~50PFlops ENIAC (1946) ~300Flops Mechanical Calculator ~0.1Flops Human ~0.01Flops

5 TOP1: Tianhe-2 System Tianhe-2 Cores 3,120,000 Nodes 16,000 Flops/node RAM Rmax Rpeak Power Network OS TH-IVB FEP Cluster, Xeon E C 2.2GHz with Phi 31S1p Tflops / node 1,024,000 GB 33.9 Pflop/s 54.9 Pflop/s 17.8 MW TH Express-2 Kylin Linux

6 Communication Costly in Time and Energy [Power costs per operation, today] Operation DP FMADD flop DP DRAM read-to-register DP word transmit-to-neighbor DP word transmit-across-system Approximate energy cost 100 pico J 4800 pico J 7500 pico J 9000 pico J [Power costs per operation, 2019] Local calculation FMAdd costs: 11 pj Cross-die a double costs: 96 pj Communication vs. Computation: 96/11=8.73 DARPA report of P. Kogge (ND) et al. and T. Schulthess (ETH), and David Keyes PPT

7 Interconnection Network: Topologies & Technologies [Topologies] Bus, ring, grid/mesh, torus, hypercube, tree, fat tree, omega, crossbar, etc. [Technologies] Device: Ethernet, Myrinet, Infiniband, etc. Protocol: TCP/IP, UDP, VMMC, U-net, BIP, etc. NETWORK: Latency Bandwidth SYSTEM: Performance Cost

8 Search for Optimal Network Topologies for Supercomputers 01 Motivations 02 Reviews of Network Topologies 03 Search for Optimal Topologies 04 Summaries

9 Source: Diagram of different BASIC network topologies

10 Source: B. Parhami An Ocean of Networks

11 The Top Supercomputer in TOP500 Lists in The Last 20 Years System Site Topology Date TMC CM-5 Los Alamos National Lab Fat Tree 6/93~11/93 Fujitsu Numerical Wind Tunnel National Aerospace Laboratory of Japan Crossbar 11/93~6/96 Intel XP/S 140 Paragon Sandia National Labs 2D Mesh 6/94~11/94 Hitachi SR2201 University of Tokyo 3D Crossbar 6/96~11/96 Hitachi CP-PACS University of Tsukuba 3D Hyper- crossbar 11/96~6/97 Intel ASCI Red Sandia National Laboratory Mesh 6/97 ~11/00 IBM ASCI White Lawrence Livermore National Laboratory Omega 11/00~6/02 NEC The Earth Simulator Earth Simulator Center Crossbar 6/02~11/04 IBM BlueGene/L Lawrence Livermore National Laboratory 3D Torus 11/04~6/08 IBM Roadrunner Los Alamos National Laboratory Fat-Tree hierarchy of crossbars 6/08~11/09 Cray Jaguar Oak Ridge National Laboratory 3D Torus 11/09~11/10 NUDT Tianhe-1A National Supercomputing Center in Tianjin Fat Tree 11/10~6/11 Fujitsu K Computer RIKEN Advanced Institute for Computational Science Tofu: 6D Mesh / Torus 6/11~6/12 IBM Sequoia Blue Gene/Q Lawrence Livermore National Laboratory 5D Torus 6/12~11/12 Cray Titan Oak Ridge National Laboratory 3D Torus 11/12~6/13 NUDT Tianhe-2 National Super Computer Center in Guangzhou Fat Tree 6/13~ Source:

12 Source: Interconnect Family System Share of TOP 500 (June 2014)

13 Popular Networks: Butterfly (Monsoon) Dragonfly (Cray XC30) Hypercube (SGI Origin) 3D Torus (Cray Gemini) 5D Torus (IBM) Tofu: 6D Mesh/Torus (K)

14 Popular Network: Fat Tree The networks for Tianhe-2 (GZ), Shenway (JN), Dawning Nebulae (SZ),

15 Returning to Square One 2000s Scalability, local wires 1960s Mesh-based (ILLIAC IV) Direct to indirect, shared memory So, only a small portion of the of the networks has been explored in practical 1990s Fat tree, LAN-based 1970s Butterfly, other MINs parallel computers Greater bandwidth 1980s Hypercube, bus-based Lower diameter, message passing

16 Comparison of Common Topologies 网络拓扑 节点度数 网络直径 对分带宽 Full Connected N 1 1 N 2 /4 Ring 2 N/2 2 2D Torus 4 N 1 2 N Diameter N - 1 Linear Array N / 2 Ring Tree - 2log d 1 N 1 Fat Tree - 2log 2 N N/2 sqrt N Torus Hypercube log 2 N log 2 N N/2 Butterfly 4 2l N/ l + 1 log N Binary tree, Hypercube de Bruijn d log d N 2dN/log d N Dcell k + 1 < log n N 1 N/ 4log n N 1 Full Connected Degree

17 Supercomputer Interconnects?!

18 Data Traffic in Computer is Similar to This New York

19 Search for Optimal Network Topologies for Supercomputers [Our Goals] Design the state-of-the-art interconnection networks. [Challenges] The entire ecosystem of network design is too big. [Our Focus] On discovering the optimal network topologies.

20 Search for Optimal Network Topologies for Supercomputers 01 Motivations 02 Reviews of Network Topologies 03 Search for Optimal Topologies 04 Summaries

21 Number of nodes N Interconnection Networks Heterogeneous nodes Longest wire Other attributes: Regularity Scalability Packageability Robustness Diameter D Bisection bandwidth B Node degree k Adapted from B. Parhami

22 Strategy to Search for Optimal Topologies Diameter N - 1 Linear Array N / 2 Ring sqrt N Torus Add bypass links on known topologies Add links on a Hamiltonian Cycle Remove links from full-connected network Successive construction Exhaustion and embedding log N Binary tree, Hypercube 1 Full Connected

23 Add Bypass Links on Torus ibt (8 8; b=<4>) ibt (9 3 ; b=<3>) Torus Link X-Axis Bypass Link Y-Axis Bypass Link Torus Link Source: P. Zhang, R. Powell, and Y. Deng, IEEE Trans. Parallel and Distributed Systems Vol. 22 Issue 2 (2011) pp

24 Diameter Mean Path Length 3D ibt vs. 4D Torus D ibt (8) D ibt (8) , ,000 Network Size (# of nodes) 1,000 Network Size (# of nodes) 1,000

25 Add Links on A Hamiltonian Cycle N8k4 N16k6 N16k6 N64k6 Mean Path Length Diameter Number of cable HPL with CPU stability HPL with CPU Turbo Boost HPL with CPU Turbo & HT N8k Done Pending Pending N16k Done Pending Pending N32k Done Pending Pending N64k Pending Pending Pending Tests at 10/14 at NCKU

26 Add Links on A Hamiltonian Cycle 94.00% 92.00% 90.00% 88.00% 86.00% 84.00% 82.00% 80.00% 78.00% 76.00% 100% 98% 96% 94% 92% 90% 88% 86% HPL Efficiency HPL Efficiency (64G RAM) HPL Efficiency (96G RAM) Parallel Efficiency Parallel Efficiency(64G RAM) Parallel Efficiency(96G RAM) Node HPL Best Efficiency (64G RAM) HPL Best Efficiency (96G RAM) % 91.31% % 87.49% % 84.44% % 81.94% 64 Parallel Efficiency (64G RAM) Parallel Efficiency (96G RAM) 1 100% 100% % 95.82% % 92.48% % 89.74% 64 Tests at 10/14 at NCKU

27 Remove Links from Full-connected Network M. Michalewicz, L. Orlowski and Y.F. Deng, Constructing graphs by algorithmic edge removal (in preparation)

28 Successive Construction M. Michalewicz, L. Orlowski and Y.F. Deng, Constructing graphs by algorithmic edge removal (in preparation)

29 What is The Best Network Topology? Diameter N - 1 Linear Array N / 2 Ring sqrt N Torus Wires vs. Diameter? log N Binary tree, Hypercube 1 Full Connected

30 The Degree/Diameter Graph Problem The Degree/Diameter Graph Problem Suppose you have an unlimited supply of degree-d nodes. How many can be connected into a network of diameter D? Petersen graph d = 3, D = 2, N=10 Hoffman-Singleton graph d = 7, D = 2, N=50 Source: Singleton_graph

31 E8 Picture (E8 Lie Group: 240 points in 8-dim. Source: Graph Theory Topology Network Problem Statement: (N, k) Given N vertices, find a graph for which the diameter, defined as the longest of the geodesic distances between all pairs of nodes, is minimal for a fixed vertex degree k (defined as the number of edges incident to the vertex). Also, the mean path length is minimal.

32 Why do Exhaustion Search? 3D 2D Rearrange a couple links Rearrange a couple links Diameter Mean path length Rearranging order of links makes diameter reduced by 33.3% & mean path length by 8.3%!

33 Comparison of Topologies for N=16 Hypercube 4x4 Mesh 4x4 Torus Optimal N16k3 Optimal N16k4 Network Degree Diameter Mean path length Number of edges Using 25% the same less of amount the wires of wires to keep to get similar 25% mean less of path diameter length and mean 25% less path of length. diameter.

34 Comparison of Topologies for N=32 Hypercube 4x8 Torus Optimal N32k3 Optimal N32k4 Optimal N32k5 Network Degree Diameter Mean path length Number of edges

35 Graph for N=64 N64k6: D=3; A=2.33; L=192 How to find topologies for N=1,024 or even 3,000,000???

36 Possible to Generate Massive Graphs? Exhaustive searches for top topologies are possible for N64k6, i.e., N=64 and k=6. The search space for 256k8 is ~10 1,760 For a comparison, there are stars in the universe so it s probably impossible to do exhaustive search for larger graphs Therefore, we must invent techniques to search for top topologies (quasioptimal). McKay, B. D., & Wormald, N. C. (1990). Asymptotic enumeration by degree sequence of graphs of high degree. European Journal of Combinatorics, 11, Retrieved from Deng, Y. et al (2014, in preparation), The first-principle discovery of k-degree optimal graphs and engineering validations of optimality

37 Method 2: Graph Embedding (N8k3)x(N8k3(a))(M=64) =

38 Best Way to Connect M=32 Nodes Hypercube 4x8 Torus N4k2 x N8k3 (a) N4k2 x N8k3 (b) Network Degree Diameter Mean path length Number of edges

39 Graph Embedding N8k3xN8k3(a) (M=64) For hypercube 2^6 M=64, k=6 A = D = 6 L = 192 = (64x6/2) For 2D torus 8x8 M=64, k=4 A = D = 8 L = 128 = (64x4/2) N8k3 x N8k3(a): M=64, k=3 or 4 A =??? D =??? L = 76 = (12x8+8+4)

40 Graph Embedding N8k3 x N16k3 (M=128) For hypercube 2^7 M=128, k=7 A = D = 7 L = 448 For 16x8 Torus M=128, k=4 A = D = 12 L = 256 Hop Distributions N8k3 x N16k3: M=128, k=3 or 4 A = D = 13 L = 216

41 Graph Embedding (N16k3)^2 (M=256) For hypercube 2^8 M=256, k=8 A = D = 8 L = 1024 For 16x16 Torus M=256, k=4 A = D = 16 L = 512 Hop Distributions N16k3 x N16k3: M=256, k=3 or 4 A = 9.23 D = 15 L = 408

42 Graph Embedding (N8k3)^3 (M=512) For hypercube 2^9 M=512, k=9 A = D = 9 L = 2304 For 32x16 Torus M=512, k=4 A = D = 24 L = Hop Distributions N8k3xN8k3xN8k3: M=512, k=3 or 4 A = D = 20 L = 876

43 Graph Embedding (M=4096): (N16k3)^3 & (N8k3)^4

44 Hop Distributions for M=4096 For hypercube 2^12 M=4096, k=12 A = , D = 12 L = 24,576 For 64x64 Torus M=4096, k=4 A = , D = 64 L = 8,192 (N16k3)^3: M=4096, k=3 or 4 A = 34.72, D = 60 L = 6,552 (N8k3)^4: M=4096, k=3 or 4 A = , D = 55 L = 7,020

45 Prototype 1: ibt(8^2,b=2) vs. T(8^2) vs. T(4^3) NAMD NAS Parallel Benchmarks HPC Challenge Benchmarks LINPACK Benchmarks

46 Prototype 2: Optimal N8k3 at NCKU

47 One Prototype with N=1024 at NCKU CK Pflops, 1.1 MWs 5,120 Fiber links

48 Search for Optimal Network Topologies for Supercomputers 01 Motivations 02 Reviews of Network Topologies 03 Search for Optimal Topologies 04 Summaries

49 Search for Optimal Network Topologies for Supercomputers [Summaries] Next generation supercomputer need better interconnect network: Technologies and topologies Optimal topology shows a better performance: Diameter, mean path length, utilization of wires, etc. There s a long way to find and use optimal topologies: Other optimization metrics: Bandwidth, applications, etc. Massive network: Optimization algorithm, embedding, packaging, etc. Routing, mapping; Scalability, robustness, etc. Engineering

50

Presentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories

Presentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories HPC Benchmarking Presentations: Jack Dongarra, University of Tennessee & ORNL The HPL Benchmark: Past, Present & Future Mike Heroux, Sandia National Laboratories The HPCG Benchmark: Challenges It Presents

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

CS 5803 Introduction to High Performance Computer Architecture: Performance Metrics

CS 5803 Introduction to High Performance Computer Architecture: Performance Metrics CS 5803 Introduction to High Performance Computer Architecture: Performance Metrics A.R. Hurson 323 Computer Science Building, Missouri S&T hurson@mst.edu 1 Instructor: Ali R. Hurson 323 CS Building hurson@mst.edu

More information

Supercomputers. Alex Reid & James O'Donoghue

Supercomputers. Alex Reid & James O'Donoghue Supercomputers Alex Reid & James O'Donoghue The Need for Supercomputers Supercomputers allow large amounts of processing to be dedicated to calculation-heavy problems Supercomputers are centralized in

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all

More information

EE 4683/5683: COMPUTER ARCHITECTURE

EE 4683/5683: COMPUTER ARCHITECTURE 3/3/205 EE 4683/5683: COMPUTER ARCHITECTURE Lecture 8: Interconnection Networks Avinash Kodi, kodi@ohio.edu Agenda 2 Interconnection Networks Performance Metrics Topology 3/3/205 IN Performance Metrics

More information

Cray XC Scalability and the Aries Network Tony Ford

Cray XC Scalability and the Aries Network Tony Ford Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?

More information

Network Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami

Network Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami Network Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami Dept. Electrical & Computer Eng. Univ. of California, Santa Barbara Parallel Computer Architecture

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt

More information

CCS HPC. Interconnection Network. PC MPP (Massively Parallel Processor) MPP IBM

CCS HPC. Interconnection Network. PC MPP (Massively Parallel Processor) MPP IBM CCS HC taisuke@cs.tsukuba.ac.jp 1 2 CU memoryi/o 2 2 4single chipmulti-core CU 10 C CM (Massively arallel rocessor) M IBM BlueGene/L 65536 Interconnection Network 3 4 (distributed memory system) (shared

More information

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University Material from: The Datacenter as a Computer: An Introduction to

More information

CSC630/CSC730: Parallel Computing

CSC630/CSC730: Parallel Computing CSC630/CSC730: Parallel Computing Parallel Computing Platforms Chapter 2 (2.4.1 2.4.4) Dr. Joe Zhang PDC-4: Topology 1 Content Parallel computing platforms Logical organization (a programmer s view) Control

More information

High Performance Computing in Europe and USA: A Comparison

High Performance Computing in Europe and USA: A Comparison High Performance Computing in Europe and USA: A Comparison Hans Werner Meuer University of Mannheim and Prometeus GmbH 2nd European Stochastic Experts Forum Baden-Baden, June 28-29, 2001 Outlook Introduction

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt

More information

Confessions of an Accidental Benchmarker

Confessions of an Accidental Benchmarker Confessions of an Accidental Benchmarker http://bit.ly/hpcg-benchmark 1 Appendix B of the Linpack Users Guide Designed to help users extrapolate execution Linpack software package First benchmark report

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

Overview. High Performance Computing - History of the Supercomputer. Modern Definitions (II)

Overview. High Performance Computing - History of the Supercomputer. Modern Definitions (II) Overview High Performance Computing - History of the Supercomputer Dr M. Probert Autumn Term 2017 Early systems with proprietary components, operating systems and tools Development of vector computing

More information

TOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology

TOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology TOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology BY ERICH STROHMAIER COMPUTER SCIENTIST, FUTURE TECHNOLOGIES GROUP, LAWRENCE BERKELEY

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36

More information

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems

SMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems Reference Papers on SMP/NUMA Systems: EE 657, Lecture 5 September 14, 2007 SMP and ccnuma Multiprocessor Systems Professor Kai Hwang USC Internet and Grid Computing Laboratory Email: kaihwang@usc.edu [1]

More information

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Topics Taxonomy Metric Topologies Characteristics Cost Performance 2 Interconnection

More information

Physical Organization of Parallel Platforms. Alexandre David

Physical Organization of Parallel Platforms. Alexandre David Physical Organization of Parallel Platforms Alexandre David 1.2.05 1 Static vs. Dynamic Networks 13-02-2008 Alexandre David, MVP'08 2 Interconnection networks built using links and switches. How to connect:

More information

Interconnection Network

Interconnection Network Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) Topics

More information

What have we learned from the TOP500 lists?

What have we learned from the TOP500 lists? What have we learned from the TOP500 lists? Hans Werner Meuer University of Mannheim and Prometeus GmbH Sun HPC Consortium Meeting Heidelberg, Germany June 19-20, 2001 Outlook TOP500 Approach Snapshots

More information

Parallel Architecture. Sathish Vadhiyar

Parallel Architecture. Sathish Vadhiyar Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate

More information

Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Parallel Computing Accelerators John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Purpose of this talk This is the 50,000 ft. view of the parallel computing landscape. We want

More information

Top500

Top500 Top500 www.top500.org Salvatore Orlando (from a presentation by J. Dongarra, and top500 website) 1 2 MPPs Performance on massively parallel machines Larger problem sizes, i.e. sizes that make sense Performance

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Network Topologies John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 14 28 February 2017 Topics for Today Taxonomy Metrics

More information

Presentation of the 16th List

Presentation of the 16th List Presentation of the 16th List Hans- Werner Meuer, University of Mannheim Erich Strohmaier, University of Tennessee Jack J. Dongarra, University of Tennesse Horst D. Simon, NERSC/LBNL SC2000, Dallas, TX,

More information

High-Performance Computing - and why Learn about it?

High-Performance Computing - and why Learn about it? High-Performance Computing - and why Learn about it? Tarek El-Ghazawi The George Washington University Washington D.C., USA Outline What is High-Performance Computing? Why is High-Performance Computing

More information

Cluster Network Products

Cluster Network Products Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster

More information

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2 CS 770G - arallel Algorithms in Scientific Computing arallel Architectures May 7, 2001 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan Kaufmann

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar CRAY XK6 REDEFINING SUPERCOMPUTING - Sanjana Rakhecha - Nishad Nerurkar CONTENTS Introduction History Specifications Cray XK6 Architecture Performance Industry acceptance and applications Summary INTRODUCTION

More information

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory

Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Quinn Mitchell HPC UNIX/LINUX Storage Systems ORNL is managed by UT-Battelle for the US Department of Energy U.S. Department

More information

BOPS, Not FLOPS! A New Metric, Measuring Tool, and Roofline Performance Model For Datacenter Computing. Chen Zheng ICT,CAS

BOPS, Not FLOPS! A New Metric, Measuring Tool, and Roofline Performance Model For Datacenter Computing. Chen Zheng ICT,CAS BOPS, Not FLOPS! A New Metric, Measuring Tool, and Roofline Performance Model For Datacenter Computing Chen Zheng ICT,CAS Data Center Computing (DC ) HPC only takes 20% market share Big Data, AI, Internet

More information

CS Parallel Algorithms in Scientific Computing

CS Parallel Algorithms in Scientific Computing CS 775 - arallel Algorithms in Scientific Computing arallel Architectures January 2, 2004 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan

More information

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers Outline Execution Environments for Parallel Applications Master CANS 2007/2008 Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya Supercomputers OS abstractions Extended OS

More information

Parallel Computer Architecture II

Parallel Computer Architecture II Parallel Computer Architecture II Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-692 Heidelberg phone: 622/54-8264 email: Stefan.Lang@iwr.uni-heidelberg.de

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction Why High Performance Computing? Quote: It is hard to understand an ocean because it is too big. It is hard to understand a molecule because it is too small. It is hard to understand

More information

Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester

Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 12/3/09 1 ! Take a look at high performance computing! What s driving HPC! Issues with power consumption! Future

More information

Fujitsu s Approach to Application Centric Petascale Computing

Fujitsu s Approach to Application Centric Petascale Computing Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview

More information

Interconnection networks

Interconnection networks Interconnection networks When more than one processor needs to access a memory structure, interconnection networks are needed to route data from processors to memories (concurrent access to a shared memory

More information

INTERCONNECTION TECHNOLOGIES. Non-Uniform Memory Access Seminar Elina Zarisheva

INTERCONNECTION TECHNOLOGIES. Non-Uniform Memory Access Seminar Elina Zarisheva INTERCONNECTION TECHNOLOGIES Non-Uniform Memory Access Seminar Elina Zarisheva 26.11.2014 26.11.2014 NUMA Seminar Elina Zarisheva 2 Agenda Network topology Logical vs. physical topology Logical topologies

More information

Resource allocation and utilization in the Blue Gene/L supercomputer

Resource allocation and utilization in the Blue Gene/L supercomputer Resource allocation and utilization in the Blue Gene/L supercomputer Tamar Domany, Y Aridor, O Goldshmidt, Y Kliteynik, EShmueli, U Silbershtein IBM Labs in Haifa Agenda Blue Gene/L Background Blue Gene/L

More information

ECE 574 Cluster Computing Lecture 2

ECE 574 Cluster Computing Lecture 2 ECE 574 Cluster Computing Lecture 2 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 24 January 2019 Announcements Put your name on HW#1 before turning in! 1 Top500 List November

More information

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control CS 498 Hot Topics in High Performance Computing Networks and Fault Tolerance 9. Routing and Flow Control Intro What did we learn in the last lecture Topology metrics Including minimum diameter of directed

More information

An approach to provide remote access to GPU computational power

An approach to provide remote access to GPU computational power An approach to provide remote access to computational power University Jaume I, Spain Joint research effort 1/84 Outline computing computing scenarios Introduction to rcuda rcuda structure rcuda functionality

More information

CSE5351: Parallel Procesisng. Part 1B. UTA Copyright (c) Slide No 1

CSE5351: Parallel Procesisng. Part 1B. UTA Copyright (c) Slide No 1 Slide No 1 CSE5351: Parallel Procesisng Part 1B Slide No 2 State of the Art In Supercomputing Several of the next slides (or modified) are the courtesy of Dr. Jack Dongarra, a distinguished professor of

More information

Slim Fly: A Cost Effective Low-Diameter Network Topology

Slim Fly: A Cost Effective Low-Diameter Network Topology TORSTEN HOEFLER, MACIEJ BESTA Slim Fly: A Cost Effective Low-Diameter Network Topology Images belong to their creator! NETWORKS, LIMITS, AND DESIGN SPACE Networks cost 25-30% of a large supercomputer Hard

More information

Making a Case for a Green500 List

Making a Case for a Green500 List Making a Case for a Green500 List S. Sharma, C. Hsu, and W. Feng Los Alamos National Laboratory Virginia Tech Outline Introduction What Is Performance? Motivation: The Need for a Green500 List Challenges

More information

The Impact of Optics on HPC System Interconnects

The Impact of Optics on HPC System Interconnects The Impact of Optics on HPC System Interconnects Mike Parker and Steve Scott Hot Interconnects 2009 Manhattan, NYC Will cost-effective optics fundamentally change the landscape of networking? Yes. Changes

More information

Report on the Sunway TaihuLight System. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory

Report on the Sunway TaihuLight System. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory Report on the Sunway TaihuLight System Jack Dongarra University of Tennessee Oak Ridge National Laboratory June 24, 2016 University of Tennessee Department of Electrical Engineering and Computer Science

More information

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues Top500 Supercomputer list represent parallel computers, so distributed systems such as SETI@Home are not considered Does not consider storage or I/O issues Both custom designed machines and commodity machines

More information

Practical Scientific Computing

Practical Scientific Computing Practical Scientific Computing Performance-optimized Programming Preliminary discussion: July 11, 2008 Dr. Ralf-Peter Mundani, mundani@tum.de Dipl.-Ing. Ioan Lucian Muntean, muntean@in.tum.de MSc. Csaba

More information

COSC 6374 Parallel Computation. Parallel Computer Architectures

COSC 6374 Parallel Computation. Parallel Computer Architectures OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:

More information

The TOP500 Project of the Universities Mannheim and Tennessee

The TOP500 Project of the Universities Mannheim and Tennessee The TOP500 Project of the Universities Mannheim and Tennessee Hans Werner Meuer University of Mannheim EURO-PAR 2000 29. August - 01. September 2000 Munich/Germany Outline TOP500 Approach HPC-Market as

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

PART I - Fundamentals of Parallel Computing

PART I - Fundamentals of Parallel Computing PART I - Fundamentals of Parallel Computing Objectives What is scientific computing? The need for more computing power The need for parallel computing and parallel programs 1 What is scientific computing?

More information

COSC 6374 Parallel Computation. Parallel Computer Architectures

COSC 6374 Parallel Computation. Parallel Computer Architectures OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Edgar Gabriel Fall 2015 Flynn s Taxonomy

More information

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K. Fundamentals of Parallel Computing Sanjay Razdan Alpha Science International Ltd. Oxford, U.K. CONTENTS Preface Acknowledgements vii ix 1. Introduction to Parallel Computing 1.1-1.37 1.1 Parallel Computing

More information

Overview of Tianhe-2

Overview of Tianhe-2 Overview of Tianhe-2 (MilkyWay-2) Supercomputer Yutong Lu School of Computer Science, National University of Defense Technology; State Key Laboratory of High Performance Computing, China ytlu@nudt.edu.cn

More information

COMP 322: Principles of Parallel Programming. Lecture 17: Understanding Parallel Computers (Chapter 2) Fall 2009

COMP 322: Principles of Parallel Programming. Lecture 17: Understanding Parallel Computers (Chapter 2) Fall 2009 COMP 322: Principles of Parallel Programming Lecture 17: Understanding Parallel Computers (Chapter 2) Fall 2009 http://www.cs.rice.edu/~vsarkar/comp322 Vivek Sarkar Department of Computer Science Rice

More information

Stockholm Brain Institute Blue Gene/L

Stockholm Brain Institute Blue Gene/L Stockholm Brain Institute Blue Gene/L 1 Stockholm Brain Institute Blue Gene/L 2 IBM Systems & Technology Group and IBM Research IBM Blue Gene /P - An Overview of a Petaflop Capable System Carl G. Tengwall

More information

What is Parallel Computing?

What is Parallel Computing? What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #5 1/29/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 From last class Outline

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Moore's Law abandoned serial programming around 2004 Courtesy Liberty Computer Architecture Research Group

More information

Why we need Exascale and why we won t get there by 2020 Horst Simon Lawrence Berkeley National Laboratory

Why we need Exascale and why we won t get there by 2020 Horst Simon Lawrence Berkeley National Laboratory Why we need Exascale and why we won t get there by 2020 Horst Simon Lawrence Berkeley National Laboratory 2013 International Workshop on Computational Science and Engineering National University of Taiwan

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

Interconnection Networks. Issues for Networks

Interconnection Networks. Issues for Networks Interconnection Networks Communications Among Processors Chris Nevison, Colgate University Issues for Networks Total Bandwidth amount of data which can be moved from somewhere to somewhere per unit time

More information

High Performance Computing

High Performance Computing CSC630/CSC730: Parallel & Distributed Computing Trends in HPC 1 High Performance Computing High-performance computing (HPC) is the use of supercomputers and parallel processing techniques for solving complex

More information

Fabio AFFINITO.

Fabio AFFINITO. Introduction to High Performance Computing Fabio AFFINITO What is the meaning of High Performance Computing? What does HIGH PERFORMANCE mean??? 1976... Cray-1 supercomputer First commercial successful

More information

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing Dr Izadi CSE-4533 Introduction to Parallel Processing Chapter 4 Models of Parallel Processing Elaborate on the taxonomy of parallel processing from chapter Introduce abstract models of shared and distributed

More information

Interconnection Networks

Interconnection Networks Lecture 18: Interconnection Networks Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of these slides were created by Michael Papamichael This lecture is partially

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information

Parallel Architectures

Parallel Architectures Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36 Outline 1 Parallel Computer Classification Flynn s

More information

Lecture 3: Topology - II

Lecture 3: Topology - II ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 3: Topology - II Tushar Krishna Assistant Professor School of Electrical and

More information

Chapter 9 Multiprocessors

Chapter 9 Multiprocessors ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University

More information

IS TOPOLOGY IMPORTANT AGAIN? Effects of Contention on Message Latencies in Large Supercomputers

IS TOPOLOGY IMPORTANT AGAIN? Effects of Contention on Message Latencies in Large Supercomputers IS TOPOLOGY IMPORTANT AGAIN? Effects of Contention on Message Latencies in Large Supercomputers Abhinav S Bhatele and Laxmikant V Kale ACM Research Competition, SC 08 Outline Why should we consider topology

More information

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS

More information

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo Overview of Supercomputer Systems Supercomputing Division Information Technology Center The University of Tokyo Supercomputers at ITC, U. of Tokyo Oakleaf-fx (Fujitsu PRIMEHPC FX10) Total Peak performance

More information

Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester

Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 12/24/09 1 Take a look at high performance computing What s driving HPC Future Trends 2 Traditional scientific

More information

Delivering HPC Performance at Scale

Delivering HPC Performance at Scale Delivering HPC Performance at Scale October 2011 Joseph Yaworski QLogic Director HPC Product Marketing Office: 610-233-4854 Joseph.Yaworski@QLogic.com Agenda QLogic Overview TrueScale Performance Design

More information

TOP500 Listen und industrielle/kommerzielle Anwendungen

TOP500 Listen und industrielle/kommerzielle Anwendungen TOP500 Listen und industrielle/kommerzielle Anwendungen Hans Werner Meuer Universität Mannheim Gesprächsrunde Nichtnumerische Anwendungen im Bereich des Höchstleistungsrechnens des BMBF Berlin, 16./ 17.

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Moore's Law abandoned serial programming around 2004 Courtesy Liberty Computer Architecture Research Group

More information

INSPUR and HPC Innovation

INSPUR and HPC Innovation INSPUR and HPC Innovation Dong Qi (Forrest) Product manager Inspur dongqi@inspur.com Contents 1 2 3 4 5 Inspur introduction HPC Challenge and Inspur HPC strategy HPC cases Inspur contribution to HPC community

More information

The way toward peta-flops

The way toward peta-flops The way toward peta-flops ISC-2011 Dr. Pierre Lagier Chief Technology Officer Fujitsu Systems Europe Where things started from DESIGN CONCEPTS 2 New challenges and requirements! Optimal sustained flops

More information

CS575 Parallel Processing

CS575 Parallel Processing CS575 Parallel Processing Lecture three: Interconnection Networks Wim Bohm, CSU Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 license.

More information

The TOP500 list. Hans-Werner Meuer University of Mannheim. SPEC Workshop, University of Wuppertal, Germany September 13, 1999

The TOP500 list. Hans-Werner Meuer University of Mannheim. SPEC Workshop, University of Wuppertal, Germany September 13, 1999 The TOP500 list Hans-Werner Meuer University of Mannheim SPEC Workshop, University of Wuppertal, Germany September 13, 1999 Outline TOP500 Approach HPC-Market as of 6/99 Market Trends, Architecture Trends,

More information

Exploring Hardware Overprovisioning in Power-Constrained, High Performance Computing

Exploring Hardware Overprovisioning in Power-Constrained, High Performance Computing Exploring Hardware Overprovisioning in Power-Constrained, High Performance Computing Tapasya Patki 1 David Lowenthal 1 Barry Rountree 2 Martin Schulz 2 Bronis de Supinski 2 1 The University of Arizona

More information

Communication Performance in Network-on-Chips

Communication Performance in Network-on-Chips Communication Performance in Network-on-Chips Axel Jantsch Royal Institute of Technology, Stockholm November 24, 2004 Network on Chip Seminar, Linköping, November 25, 2004 Communication Performance In

More information

Jack Dongarra University of Tennessee Oak Ridge National Laboratory

Jack Dongarra University of Tennessee Oak Ridge National Laboratory Jack Dongarra University of Tennessee Oak Ridge National Laboratory 3/9/11 1 TPP performance Rate Size 2 100 Pflop/s 100000000 10 Pflop/s 10000000 1 Pflop/s 1000000 100 Tflop/s 100000 10 Tflop/s 10000

More information

Lecture 2: Topology - I

Lecture 2: Topology - I ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and

More information

The State and Opportunities of HPC Applications in China. Ruibo Wang National University of Defense Technology

The State and Opportunities of HPC Applications in China. Ruibo Wang National University of Defense Technology The State and Opportunities of HPC Applications in China Ruibo Wang National University of Defense Technology Outline Brief introduction to the Sites Applications Fusion Development of HPC, Cloud & Big

More information

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

Overview. CS 472 Concurrent & Parallel Programming University of Evansville Overview CS 472 Concurrent & Parallel Programming University of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information Science, University

More information

Fujitsu s Technologies Leading to Practical Petascale Computing: K computer, PRIMEHPC FX10 and the Future

Fujitsu s Technologies Leading to Practical Petascale Computing: K computer, PRIMEHPC FX10 and the Future Fujitsu s Technologies Leading to Practical Petascale Computing: K computer, PRIMEHPC FX10 and the Future November 16 th, 2011 Motoi Okuda Technical Computing Solution Unit Fujitsu Limited Agenda Achievements

More information

HPC Technology Update Challenges or Chances?

HPC Technology Update Challenges or Chances? HPC Technology Update Challenges or Chances? Swiss Distributed Computing Day Thomas Schoenemeyer, Technology Integration, CSCS 1 Move in Feb-April 2012 1500m2 16 MW Lake-water cooling PUE 1.2 New Datacenter

More information

CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2

CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2 Real Machines Interconnection Network Topology Design Trade-offs CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99

More information

Parallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer?

Parallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer? Parallel and Distributed Systems Instructor: Sandhya Dwarkadas Department of Computer Science University of Rochester What is a parallel computer? A collection of processing elements that communicate and

More information