Search for Optimal Network Topologies for Supercomputers 寻找超级计算机优化的网络拓扑结构
|
|
- Rudolf Marshall
- 6 years ago
- Views:
Transcription
1 Search for Optimal Network Topologies for Supercomputers 寻找超级计算机优化的网络拓扑结构 GUO, Meng 郭猛 Shandong Computer Science Center (National Supercomputer Center in Jinan) 山东省计算中心 ( 国家超级计算济南中心 ) 2014/11/5 Guangzhou 广州
2 Acknowledgements Y.F. Deng of Stony Brook, USA & National Supercomputer Center in Jinan, China M. Michalewicz and L. Orlowski of A*CRC, Singapore and Stony Brook T. Mayer, Z. Ye, and L. Zhang of Stony Brook, USA C. C. Hwang, Y. T. Chen, C. H. Liang, and S. W. Liou of NCKU, Taiwan Joint work with Prof Deng s students, postdocs, and other colleagues Early work was done on Shenway Bluelight at The National Supercomputer Center in Jinan, China
3 Search for Optimal Network Topologies for Supercomputers 01 Motivations 02 Reviews of Network Topologies 03 Search for Optimal Topologies 04 Summaries
4 Development of calculator / computer Tianhe-2 (2013) ~50PFlops ENIAC (1946) ~300Flops Mechanical Calculator ~0.1Flops Human ~0.01Flops
5 TOP1: Tianhe-2 System Tianhe-2 Cores 3,120,000 Nodes 16,000 Flops/node RAM Rmax Rpeak Power Network OS TH-IVB FEP Cluster, Xeon E C 2.2GHz with Phi 31S1p Tflops / node 1,024,000 GB 33.9 Pflop/s 54.9 Pflop/s 17.8 MW TH Express-2 Kylin Linux
6 Communication Costly in Time and Energy [Power costs per operation, today] Operation DP FMADD flop DP DRAM read-to-register DP word transmit-to-neighbor DP word transmit-across-system Approximate energy cost 100 pico J 4800 pico J 7500 pico J 9000 pico J [Power costs per operation, 2019] Local calculation FMAdd costs: 11 pj Cross-die a double costs: 96 pj Communication vs. Computation: 96/11=8.73 DARPA report of P. Kogge (ND) et al. and T. Schulthess (ETH), and David Keyes PPT
7 Interconnection Network: Topologies & Technologies [Topologies] Bus, ring, grid/mesh, torus, hypercube, tree, fat tree, omega, crossbar, etc. [Technologies] Device: Ethernet, Myrinet, Infiniband, etc. Protocol: TCP/IP, UDP, VMMC, U-net, BIP, etc. NETWORK: Latency Bandwidth SYSTEM: Performance Cost
8 Search for Optimal Network Topologies for Supercomputers 01 Motivations 02 Reviews of Network Topologies 03 Search for Optimal Topologies 04 Summaries
9 Source: Diagram of different BASIC network topologies
10 Source: B. Parhami An Ocean of Networks
11 The Top Supercomputer in TOP500 Lists in The Last 20 Years System Site Topology Date TMC CM-5 Los Alamos National Lab Fat Tree 6/93~11/93 Fujitsu Numerical Wind Tunnel National Aerospace Laboratory of Japan Crossbar 11/93~6/96 Intel XP/S 140 Paragon Sandia National Labs 2D Mesh 6/94~11/94 Hitachi SR2201 University of Tokyo 3D Crossbar 6/96~11/96 Hitachi CP-PACS University of Tsukuba 3D Hyper- crossbar 11/96~6/97 Intel ASCI Red Sandia National Laboratory Mesh 6/97 ~11/00 IBM ASCI White Lawrence Livermore National Laboratory Omega 11/00~6/02 NEC The Earth Simulator Earth Simulator Center Crossbar 6/02~11/04 IBM BlueGene/L Lawrence Livermore National Laboratory 3D Torus 11/04~6/08 IBM Roadrunner Los Alamos National Laboratory Fat-Tree hierarchy of crossbars 6/08~11/09 Cray Jaguar Oak Ridge National Laboratory 3D Torus 11/09~11/10 NUDT Tianhe-1A National Supercomputing Center in Tianjin Fat Tree 11/10~6/11 Fujitsu K Computer RIKEN Advanced Institute for Computational Science Tofu: 6D Mesh / Torus 6/11~6/12 IBM Sequoia Blue Gene/Q Lawrence Livermore National Laboratory 5D Torus 6/12~11/12 Cray Titan Oak Ridge National Laboratory 3D Torus 11/12~6/13 NUDT Tianhe-2 National Super Computer Center in Guangzhou Fat Tree 6/13~ Source:
12 Source: Interconnect Family System Share of TOP 500 (June 2014)
13 Popular Networks: Butterfly (Monsoon) Dragonfly (Cray XC30) Hypercube (SGI Origin) 3D Torus (Cray Gemini) 5D Torus (IBM) Tofu: 6D Mesh/Torus (K)
14 Popular Network: Fat Tree The networks for Tianhe-2 (GZ), Shenway (JN), Dawning Nebulae (SZ),
15 Returning to Square One 2000s Scalability, local wires 1960s Mesh-based (ILLIAC IV) Direct to indirect, shared memory So, only a small portion of the of the networks has been explored in practical 1990s Fat tree, LAN-based 1970s Butterfly, other MINs parallel computers Greater bandwidth 1980s Hypercube, bus-based Lower diameter, message passing
16 Comparison of Common Topologies 网络拓扑 节点度数 网络直径 对分带宽 Full Connected N 1 1 N 2 /4 Ring 2 N/2 2 2D Torus 4 N 1 2 N Diameter N - 1 Linear Array N / 2 Ring Tree - 2log d 1 N 1 Fat Tree - 2log 2 N N/2 sqrt N Torus Hypercube log 2 N log 2 N N/2 Butterfly 4 2l N/ l + 1 log N Binary tree, Hypercube de Bruijn d log d N 2dN/log d N Dcell k + 1 < log n N 1 N/ 4log n N 1 Full Connected Degree
17 Supercomputer Interconnects?!
18 Data Traffic in Computer is Similar to This New York
19 Search for Optimal Network Topologies for Supercomputers [Our Goals] Design the state-of-the-art interconnection networks. [Challenges] The entire ecosystem of network design is too big. [Our Focus] On discovering the optimal network topologies.
20 Search for Optimal Network Topologies for Supercomputers 01 Motivations 02 Reviews of Network Topologies 03 Search for Optimal Topologies 04 Summaries
21 Number of nodes N Interconnection Networks Heterogeneous nodes Longest wire Other attributes: Regularity Scalability Packageability Robustness Diameter D Bisection bandwidth B Node degree k Adapted from B. Parhami
22 Strategy to Search for Optimal Topologies Diameter N - 1 Linear Array N / 2 Ring sqrt N Torus Add bypass links on known topologies Add links on a Hamiltonian Cycle Remove links from full-connected network Successive construction Exhaustion and embedding log N Binary tree, Hypercube 1 Full Connected
23 Add Bypass Links on Torus ibt (8 8; b=<4>) ibt (9 3 ; b=<3>) Torus Link X-Axis Bypass Link Y-Axis Bypass Link Torus Link Source: P. Zhang, R. Powell, and Y. Deng, IEEE Trans. Parallel and Distributed Systems Vol. 22 Issue 2 (2011) pp
24 Diameter Mean Path Length 3D ibt vs. 4D Torus D ibt (8) D ibt (8) , ,000 Network Size (# of nodes) 1,000 Network Size (# of nodes) 1,000
25 Add Links on A Hamiltonian Cycle N8k4 N16k6 N16k6 N64k6 Mean Path Length Diameter Number of cable HPL with CPU stability HPL with CPU Turbo Boost HPL with CPU Turbo & HT N8k Done Pending Pending N16k Done Pending Pending N32k Done Pending Pending N64k Pending Pending Pending Tests at 10/14 at NCKU
26 Add Links on A Hamiltonian Cycle 94.00% 92.00% 90.00% 88.00% 86.00% 84.00% 82.00% 80.00% 78.00% 76.00% 100% 98% 96% 94% 92% 90% 88% 86% HPL Efficiency HPL Efficiency (64G RAM) HPL Efficiency (96G RAM) Parallel Efficiency Parallel Efficiency(64G RAM) Parallel Efficiency(96G RAM) Node HPL Best Efficiency (64G RAM) HPL Best Efficiency (96G RAM) % 91.31% % 87.49% % 84.44% % 81.94% 64 Parallel Efficiency (64G RAM) Parallel Efficiency (96G RAM) 1 100% 100% % 95.82% % 92.48% % 89.74% 64 Tests at 10/14 at NCKU
27 Remove Links from Full-connected Network M. Michalewicz, L. Orlowski and Y.F. Deng, Constructing graphs by algorithmic edge removal (in preparation)
28 Successive Construction M. Michalewicz, L. Orlowski and Y.F. Deng, Constructing graphs by algorithmic edge removal (in preparation)
29 What is The Best Network Topology? Diameter N - 1 Linear Array N / 2 Ring sqrt N Torus Wires vs. Diameter? log N Binary tree, Hypercube 1 Full Connected
30 The Degree/Diameter Graph Problem The Degree/Diameter Graph Problem Suppose you have an unlimited supply of degree-d nodes. How many can be connected into a network of diameter D? Petersen graph d = 3, D = 2, N=10 Hoffman-Singleton graph d = 7, D = 2, N=50 Source: Singleton_graph
31 E8 Picture (E8 Lie Group: 240 points in 8-dim. Source: Graph Theory Topology Network Problem Statement: (N, k) Given N vertices, find a graph for which the diameter, defined as the longest of the geodesic distances between all pairs of nodes, is minimal for a fixed vertex degree k (defined as the number of edges incident to the vertex). Also, the mean path length is minimal.
32 Why do Exhaustion Search? 3D 2D Rearrange a couple links Rearrange a couple links Diameter Mean path length Rearranging order of links makes diameter reduced by 33.3% & mean path length by 8.3%!
33 Comparison of Topologies for N=16 Hypercube 4x4 Mesh 4x4 Torus Optimal N16k3 Optimal N16k4 Network Degree Diameter Mean path length Number of edges Using 25% the same less of amount the wires of wires to keep to get similar 25% mean less of path diameter length and mean 25% less path of length. diameter.
34 Comparison of Topologies for N=32 Hypercube 4x8 Torus Optimal N32k3 Optimal N32k4 Optimal N32k5 Network Degree Diameter Mean path length Number of edges
35 Graph for N=64 N64k6: D=3; A=2.33; L=192 How to find topologies for N=1,024 or even 3,000,000???
36 Possible to Generate Massive Graphs? Exhaustive searches for top topologies are possible for N64k6, i.e., N=64 and k=6. The search space for 256k8 is ~10 1,760 For a comparison, there are stars in the universe so it s probably impossible to do exhaustive search for larger graphs Therefore, we must invent techniques to search for top topologies (quasioptimal). McKay, B. D., & Wormald, N. C. (1990). Asymptotic enumeration by degree sequence of graphs of high degree. European Journal of Combinatorics, 11, Retrieved from Deng, Y. et al (2014, in preparation), The first-principle discovery of k-degree optimal graphs and engineering validations of optimality
37 Method 2: Graph Embedding (N8k3)x(N8k3(a))(M=64) =
38 Best Way to Connect M=32 Nodes Hypercube 4x8 Torus N4k2 x N8k3 (a) N4k2 x N8k3 (b) Network Degree Diameter Mean path length Number of edges
39 Graph Embedding N8k3xN8k3(a) (M=64) For hypercube 2^6 M=64, k=6 A = D = 6 L = 192 = (64x6/2) For 2D torus 8x8 M=64, k=4 A = D = 8 L = 128 = (64x4/2) N8k3 x N8k3(a): M=64, k=3 or 4 A =??? D =??? L = 76 = (12x8+8+4)
40 Graph Embedding N8k3 x N16k3 (M=128) For hypercube 2^7 M=128, k=7 A = D = 7 L = 448 For 16x8 Torus M=128, k=4 A = D = 12 L = 256 Hop Distributions N8k3 x N16k3: M=128, k=3 or 4 A = D = 13 L = 216
41 Graph Embedding (N16k3)^2 (M=256) For hypercube 2^8 M=256, k=8 A = D = 8 L = 1024 For 16x16 Torus M=256, k=4 A = D = 16 L = 512 Hop Distributions N16k3 x N16k3: M=256, k=3 or 4 A = 9.23 D = 15 L = 408
42 Graph Embedding (N8k3)^3 (M=512) For hypercube 2^9 M=512, k=9 A = D = 9 L = 2304 For 32x16 Torus M=512, k=4 A = D = 24 L = Hop Distributions N8k3xN8k3xN8k3: M=512, k=3 or 4 A = D = 20 L = 876
43 Graph Embedding (M=4096): (N16k3)^3 & (N8k3)^4
44 Hop Distributions for M=4096 For hypercube 2^12 M=4096, k=12 A = , D = 12 L = 24,576 For 64x64 Torus M=4096, k=4 A = , D = 64 L = 8,192 (N16k3)^3: M=4096, k=3 or 4 A = 34.72, D = 60 L = 6,552 (N8k3)^4: M=4096, k=3 or 4 A = , D = 55 L = 7,020
45 Prototype 1: ibt(8^2,b=2) vs. T(8^2) vs. T(4^3) NAMD NAS Parallel Benchmarks HPC Challenge Benchmarks LINPACK Benchmarks
46 Prototype 2: Optimal N8k3 at NCKU
47 One Prototype with N=1024 at NCKU CK Pflops, 1.1 MWs 5,120 Fiber links
48 Search for Optimal Network Topologies for Supercomputers 01 Motivations 02 Reviews of Network Topologies 03 Search for Optimal Topologies 04 Summaries
49 Search for Optimal Network Topologies for Supercomputers [Summaries] Next generation supercomputer need better interconnect network: Technologies and topologies Optimal topology shows a better performance: Diameter, mean path length, utilization of wires, etc. There s a long way to find and use optimal topologies: Other optimization metrics: Bandwidth, applications, etc. Massive network: Optimization algorithm, embedding, packaging, etc. Routing, mapping; Scalability, robustness, etc. Engineering
50
Presentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories
HPC Benchmarking Presentations: Jack Dongarra, University of Tennessee & ORNL The HPL Benchmark: Past, Present & Future Mike Heroux, Sandia National Laboratories The HPCG Benchmark: Challenges It Presents
More informationCommunication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.
Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance
More informationCS 5803 Introduction to High Performance Computer Architecture: Performance Metrics
CS 5803 Introduction to High Performance Computer Architecture: Performance Metrics A.R. Hurson 323 Computer Science Building, Missouri S&T hurson@mst.edu 1 Instructor: Ali R. Hurson 323 CS Building hurson@mst.edu
More informationSupercomputers. Alex Reid & James O'Donoghue
Supercomputers Alex Reid & James O'Donoghue The Need for Supercomputers Supercomputers allow large amounts of processing to be dedicated to calculation-heavy problems Supercomputers are centralized in
More informationBlueGene/L. Computer Science, University of Warwick. Source: IBM
BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Waiting for Moore s Law to save your serial code start getting bleak in 2004 Source: published SPECInt data Moore s Law is not at all
More informationEE 4683/5683: COMPUTER ARCHITECTURE
3/3/205 EE 4683/5683: COMPUTER ARCHITECTURE Lecture 8: Interconnection Networks Avinash Kodi, kodi@ohio.edu Agenda 2 Interconnection Networks Performance Metrics Topology 3/3/205 IN Performance Metrics
More informationCray XC Scalability and the Aries Network Tony Ford
Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?
More informationNetwork Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami
Network Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami Dept. Electrical & Computer Eng. Univ. of California, Santa Barbara Parallel Computer Architecture
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt
More informationCCS HPC. Interconnection Network. PC MPP (Massively Parallel Processor) MPP IBM
CCS HC taisuke@cs.tsukuba.ac.jp 1 2 CU memoryi/o 2 2 4single chipmulti-core CU 10 C CM (Massively arallel rocessor) M IBM BlueGene/L 65536 Interconnection Network 3 4 (distributed memory system) (shared
More informationEN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University
EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University Material from: The Datacenter as a Computer: An Introduction to
More informationCSC630/CSC730: Parallel Computing
CSC630/CSC730: Parallel Computing Parallel Computing Platforms Chapter 2 (2.4.1 2.4.4) Dr. Joe Zhang PDC-4: Topology 1 Content Parallel computing platforms Logical organization (a programmer s view) Control
More informationHigh Performance Computing in Europe and USA: A Comparison
High Performance Computing in Europe and USA: A Comparison Hans Werner Meuer University of Mannheim and Prometeus GmbH 2nd European Stochastic Experts Forum Baden-Baden, June 28-29, 2001 Outlook Introduction
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt
More informationConfessions of an Accidental Benchmarker
Confessions of an Accidental Benchmarker http://bit.ly/hpcg-benchmark 1 Appendix B of the Linpack Users Guide Designed to help users extrapolate execution Linpack software package First benchmark report
More informationInterconnection Network
Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network
More informationOverview. High Performance Computing - History of the Supercomputer. Modern Definitions (II)
Overview High Performance Computing - History of the Supercomputer Dr M. Probert Autumn Term 2017 Early systems with proprietary components, operating systems and tools Development of vector computing
More informationTOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology
TOP500 List s Twice-Yearly Snapshots of World s Fastest Supercomputers Develop Into Big Picture of Changing Technology BY ERICH STROHMAIER COMPUTER SCIENTIST, FUTURE TECHNOLOGIES GROUP, LAWRENCE BERKELEY
More informationParallel Architectures
Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36
More informationSMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems
Reference Papers on SMP/NUMA Systems: EE 657, Lecture 5 September 14, 2007 SMP and ccnuma Multiprocessor Systems Professor Kai Hwang USC Internet and Grid Computing Laboratory Email: kaihwang@usc.edu [1]
More informationInterconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Topics Taxonomy Metric Topologies Characteristics Cost Performance 2 Interconnection
More informationPhysical Organization of Parallel Platforms. Alexandre David
Physical Organization of Parallel Platforms Alexandre David 1.2.05 1 Static vs. Dynamic Networks 13-02-2008 Alexandre David, MVP'08 2 Interconnection networks built using links and switches. How to connect:
More informationInterconnection Network
Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu) Topics
More informationWhat have we learned from the TOP500 lists?
What have we learned from the TOP500 lists? Hans Werner Meuer University of Mannheim and Prometeus GmbH Sun HPC Consortium Meeting Heidelberg, Germany June 19-20, 2001 Outlook TOP500 Approach Snapshots
More informationParallel Architecture. Sathish Vadhiyar
Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate
More informationParallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
Parallel Computing Accelerators John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Purpose of this talk This is the 50,000 ft. view of the parallel computing landscape. We want
More informationTop500
Top500 www.top500.org Salvatore Orlando (from a presentation by J. Dongarra, and top500 website) 1 2 MPPs Performance on massively parallel machines Larger problem sizes, i.e. sizes that make sense Performance
More informationParallel Computing Platforms
Parallel Computing Platforms Network Topologies John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 14 28 February 2017 Topics for Today Taxonomy Metrics
More informationPresentation of the 16th List
Presentation of the 16th List Hans- Werner Meuer, University of Mannheim Erich Strohmaier, University of Tennessee Jack J. Dongarra, University of Tennesse Horst D. Simon, NERSC/LBNL SC2000, Dallas, TX,
More informationHigh-Performance Computing - and why Learn about it?
High-Performance Computing - and why Learn about it? Tarek El-Ghazawi The George Washington University Washington D.C., USA Outline What is High-Performance Computing? Why is High-Performance Computing
More informationCluster Network Products
Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster
More informationCS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2
CS 770G - arallel Algorithms in Scientific Computing arallel Architectures May 7, 2001 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan Kaufmann
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationCRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar
CRAY XK6 REDEFINING SUPERCOMPUTING - Sanjana Rakhecha - Nishad Nerurkar CONTENTS Introduction History Specifications Cray XK6 Architecture Performance Industry acceptance and applications Summary INTRODUCTION
More informationManaging HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory
Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Quinn Mitchell HPC UNIX/LINUX Storage Systems ORNL is managed by UT-Battelle for the US Department of Energy U.S. Department
More informationBOPS, Not FLOPS! A New Metric, Measuring Tool, and Roofline Performance Model For Datacenter Computing. Chen Zheng ICT,CAS
BOPS, Not FLOPS! A New Metric, Measuring Tool, and Roofline Performance Model For Datacenter Computing Chen Zheng ICT,CAS Data Center Computing (DC ) HPC only takes 20% market share Big Data, AI, Internet
More informationCS Parallel Algorithms in Scientific Computing
CS 775 - arallel Algorithms in Scientific Computing arallel Architectures January 2, 2004 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan
More informationOutline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers
Outline Execution Environments for Parallel Applications Master CANS 2007/2008 Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya Supercomputers OS abstractions Extended OS
More informationParallel Computer Architecture II
Parallel Computer Architecture II Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-692 Heidelberg phone: 622/54-8264 email: Stefan.Lang@iwr.uni-heidelberg.de
More informationChapter 1. Introduction
Chapter 1 Introduction Why High Performance Computing? Quote: It is hard to understand an ocean because it is too big. It is hard to understand a molecule because it is too small. It is hard to understand
More informationJack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester
Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 12/3/09 1 ! Take a look at high performance computing! What s driving HPC! Issues with power consumption! Future
More informationFujitsu s Approach to Application Centric Petascale Computing
Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview
More informationInterconnection networks
Interconnection networks When more than one processor needs to access a memory structure, interconnection networks are needed to route data from processors to memories (concurrent access to a shared memory
More informationINTERCONNECTION TECHNOLOGIES. Non-Uniform Memory Access Seminar Elina Zarisheva
INTERCONNECTION TECHNOLOGIES Non-Uniform Memory Access Seminar Elina Zarisheva 26.11.2014 26.11.2014 NUMA Seminar Elina Zarisheva 2 Agenda Network topology Logical vs. physical topology Logical topologies
More informationResource allocation and utilization in the Blue Gene/L supercomputer
Resource allocation and utilization in the Blue Gene/L supercomputer Tamar Domany, Y Aridor, O Goldshmidt, Y Kliteynik, EShmueli, U Silbershtein IBM Labs in Haifa Agenda Blue Gene/L Background Blue Gene/L
More informationECE 574 Cluster Computing Lecture 2
ECE 574 Cluster Computing Lecture 2 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 24 January 2019 Announcements Put your name on HW#1 before turning in! 1 Top500 List November
More informationCS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control
CS 498 Hot Topics in High Performance Computing Networks and Fault Tolerance 9. Routing and Flow Control Intro What did we learn in the last lecture Topology metrics Including minimum diameter of directed
More informationAn approach to provide remote access to GPU computational power
An approach to provide remote access to computational power University Jaume I, Spain Joint research effort 1/84 Outline computing computing scenarios Introduction to rcuda rcuda structure rcuda functionality
More informationCSE5351: Parallel Procesisng. Part 1B. UTA Copyright (c) Slide No 1
Slide No 1 CSE5351: Parallel Procesisng Part 1B Slide No 2 State of the Art In Supercomputing Several of the next slides (or modified) are the courtesy of Dr. Jack Dongarra, a distinguished professor of
More informationSlim Fly: A Cost Effective Low-Diameter Network Topology
TORSTEN HOEFLER, MACIEJ BESTA Slim Fly: A Cost Effective Low-Diameter Network Topology Images belong to their creator! NETWORKS, LIMITS, AND DESIGN SPACE Networks cost 25-30% of a large supercomputer Hard
More informationMaking a Case for a Green500 List
Making a Case for a Green500 List S. Sharma, C. Hsu, and W. Feng Los Alamos National Laboratory Virginia Tech Outline Introduction What Is Performance? Motivation: The Need for a Green500 List Challenges
More informationThe Impact of Optics on HPC System Interconnects
The Impact of Optics on HPC System Interconnects Mike Parker and Steve Scott Hot Interconnects 2009 Manhattan, NYC Will cost-effective optics fundamentally change the landscape of networking? Yes. Changes
More informationReport on the Sunway TaihuLight System. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory
Report on the Sunway TaihuLight System Jack Dongarra University of Tennessee Oak Ridge National Laboratory June 24, 2016 University of Tennessee Department of Electrical Engineering and Computer Science
More informationrepresent parallel computers, so distributed systems such as Does not consider storage or I/O issues
Top500 Supercomputer list represent parallel computers, so distributed systems such as SETI@Home are not considered Does not consider storage or I/O issues Both custom designed machines and commodity machines
More informationPractical Scientific Computing
Practical Scientific Computing Performance-optimized Programming Preliminary discussion: July 11, 2008 Dr. Ralf-Peter Mundani, mundani@tum.de Dipl.-Ing. Ioan Lucian Muntean, muntean@in.tum.de MSc. Csaba
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:
More informationThe TOP500 Project of the Universities Mannheim and Tennessee
The TOP500 Project of the Universities Mannheim and Tennessee Hans Werner Meuer University of Mannheim EURO-PAR 2000 29. August - 01. September 2000 Munich/Germany Outline TOP500 Approach HPC-Market as
More informationLecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
More informationPART I - Fundamentals of Parallel Computing
PART I - Fundamentals of Parallel Computing Objectives What is scientific computing? The need for more computing power The need for parallel computing and parallel programs 1 What is scientific computing?
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Edgar Gabriel Fall 2015 Flynn s Taxonomy
More informationFundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.
Fundamentals of Parallel Computing Sanjay Razdan Alpha Science International Ltd. Oxford, U.K. CONTENTS Preface Acknowledgements vii ix 1. Introduction to Parallel Computing 1.1-1.37 1.1 Parallel Computing
More informationOverview of Tianhe-2
Overview of Tianhe-2 (MilkyWay-2) Supercomputer Yutong Lu School of Computer Science, National University of Defense Technology; State Key Laboratory of High Performance Computing, China ytlu@nudt.edu.cn
More informationCOMP 322: Principles of Parallel Programming. Lecture 17: Understanding Parallel Computers (Chapter 2) Fall 2009
COMP 322: Principles of Parallel Programming Lecture 17: Understanding Parallel Computers (Chapter 2) Fall 2009 http://www.cs.rice.edu/~vsarkar/comp322 Vivek Sarkar Department of Computer Science Rice
More informationStockholm Brain Institute Blue Gene/L
Stockholm Brain Institute Blue Gene/L 1 Stockholm Brain Institute Blue Gene/L 2 IBM Systems & Technology Group and IBM Research IBM Blue Gene /P - An Overview of a Petaflop Capable System Carl G. Tengwall
More informationWhat is Parallel Computing?
What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #5 1/29/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 From last class Outline
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Moore's Law abandoned serial programming around 2004 Courtesy Liberty Computer Architecture Research Group
More informationWhy we need Exascale and why we won t get there by 2020 Horst Simon Lawrence Berkeley National Laboratory
Why we need Exascale and why we won t get there by 2020 Horst Simon Lawrence Berkeley National Laboratory 2013 International Workshop on Computational Science and Engineering National University of Taiwan
More informationInterconnection Networks
Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact
More informationInterconnection Networks. Issues for Networks
Interconnection Networks Communications Among Processors Chris Nevison, Colgate University Issues for Networks Total Bandwidth amount of data which can be moved from somewhere to somewhere per unit time
More informationHigh Performance Computing
CSC630/CSC730: Parallel & Distributed Computing Trends in HPC 1 High Performance Computing High-performance computing (HPC) is the use of supercomputers and parallel processing techniques for solving complex
More informationFabio AFFINITO.
Introduction to High Performance Computing Fabio AFFINITO What is the meaning of High Performance Computing? What does HIGH PERFORMANCE mean??? 1976... Cray-1 supercomputer First commercial successful
More informationCSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing
Dr Izadi CSE-4533 Introduction to Parallel Processing Chapter 4 Models of Parallel Processing Elaborate on the taxonomy of parallel processing from chapter Introduce abstract models of shared and distributed
More informationInterconnection Networks
Lecture 18: Interconnection Networks Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of these slides were created by Michael Papamichael This lecture is partially
More informationLecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel
More informationParallel Architectures
Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36 Outline 1 Parallel Computer Classification Flynn s
More informationLecture 3: Topology - II
ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 3: Topology - II Tushar Krishna Assistant Professor School of Electrical and
More informationChapter 9 Multiprocessors
ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University
More informationIS TOPOLOGY IMPORTANT AGAIN? Effects of Contention on Message Latencies in Large Supercomputers
IS TOPOLOGY IMPORTANT AGAIN? Effects of Contention on Message Latencies in Large Supercomputers Abhinav S Bhatele and Laxmikant V Kale ACM Research Competition, SC 08 Outline Why should we consider topology
More informationHETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA
HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS
More informationOverview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo
Overview of Supercomputer Systems Supercomputing Division Information Technology Center The University of Tokyo Supercomputers at ITC, U. of Tokyo Oakleaf-fx (Fujitsu PRIMEHPC FX10) Total Peak performance
More informationJack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester
Jack Dongarra University of Tennessee Oak Ridge National Laboratory University of Manchester 12/24/09 1 Take a look at high performance computing What s driving HPC Future Trends 2 Traditional scientific
More informationDelivering HPC Performance at Scale
Delivering HPC Performance at Scale October 2011 Joseph Yaworski QLogic Director HPC Product Marketing Office: 610-233-4854 Joseph.Yaworski@QLogic.com Agenda QLogic Overview TrueScale Performance Design
More informationTOP500 Listen und industrielle/kommerzielle Anwendungen
TOP500 Listen und industrielle/kommerzielle Anwendungen Hans Werner Meuer Universität Mannheim Gesprächsrunde Nichtnumerische Anwendungen im Bereich des Höchstleistungsrechnens des BMBF Berlin, 16./ 17.
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Moore's Law abandoned serial programming around 2004 Courtesy Liberty Computer Architecture Research Group
More informationINSPUR and HPC Innovation
INSPUR and HPC Innovation Dong Qi (Forrest) Product manager Inspur dongqi@inspur.com Contents 1 2 3 4 5 Inspur introduction HPC Challenge and Inspur HPC strategy HPC cases Inspur contribution to HPC community
More informationThe way toward peta-flops
The way toward peta-flops ISC-2011 Dr. Pierre Lagier Chief Technology Officer Fujitsu Systems Europe Where things started from DESIGN CONCEPTS 2 New challenges and requirements! Optimal sustained flops
More informationCS575 Parallel Processing
CS575 Parallel Processing Lecture three: Interconnection Networks Wim Bohm, CSU Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 license.
More informationThe TOP500 list. Hans-Werner Meuer University of Mannheim. SPEC Workshop, University of Wuppertal, Germany September 13, 1999
The TOP500 list Hans-Werner Meuer University of Mannheim SPEC Workshop, University of Wuppertal, Germany September 13, 1999 Outline TOP500 Approach HPC-Market as of 6/99 Market Trends, Architecture Trends,
More informationExploring Hardware Overprovisioning in Power-Constrained, High Performance Computing
Exploring Hardware Overprovisioning in Power-Constrained, High Performance Computing Tapasya Patki 1 David Lowenthal 1 Barry Rountree 2 Martin Schulz 2 Bronis de Supinski 2 1 The University of Arizona
More informationCommunication Performance in Network-on-Chips
Communication Performance in Network-on-Chips Axel Jantsch Royal Institute of Technology, Stockholm November 24, 2004 Network on Chip Seminar, Linköping, November 25, 2004 Communication Performance In
More informationJack Dongarra University of Tennessee Oak Ridge National Laboratory
Jack Dongarra University of Tennessee Oak Ridge National Laboratory 3/9/11 1 TPP performance Rate Size 2 100 Pflop/s 100000000 10 Pflop/s 10000000 1 Pflop/s 1000000 100 Tflop/s 100000 10 Tflop/s 10000
More informationLecture 2: Topology - I
ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and
More informationThe State and Opportunities of HPC Applications in China. Ruibo Wang National University of Defense Technology
The State and Opportunities of HPC Applications in China Ruibo Wang National University of Defense Technology Outline Brief introduction to the Sites Applications Fusion Development of HPC, Cloud & Big
More informationOverview. CS 472 Concurrent & Parallel Programming University of Evansville
Overview CS 472 Concurrent & Parallel Programming University of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information Science, University
More informationFujitsu s Technologies Leading to Practical Petascale Computing: K computer, PRIMEHPC FX10 and the Future
Fujitsu s Technologies Leading to Practical Petascale Computing: K computer, PRIMEHPC FX10 and the Future November 16 th, 2011 Motoi Okuda Technical Computing Solution Unit Fujitsu Limited Agenda Achievements
More informationHPC Technology Update Challenges or Chances?
HPC Technology Update Challenges or Chances? Swiss Distributed Computing Day Thomas Schoenemeyer, Technology Integration, CSCS 1 Move in Feb-April 2012 1500m2 16 MW Lake-water cooling PUE 1.2 New Datacenter
More informationCS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2
Real Machines Interconnection Network Topology Design Trade-offs CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99
More informationParallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer?
Parallel and Distributed Systems Instructor: Sandhya Dwarkadas Department of Computer Science University of Rochester What is a parallel computer? A collection of processing elements that communicate and
More information