Parallel Computer Architecture II
|
|
- Karen Dalton
- 5 years ago
- Views:
Transcription
1 Parallel Computer Architecture II Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-692 Heidelberg phone: 622/ WS 5/6 Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 / 37
2 Parallel Computer Architecture II Multiprocessor architectures Message passing Network topologies Example architectures Routing TOP 5 TOP2 Architectures Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 2 / 37
3 Classificaton of MIMD Architectures Physical memory arrangement shared memory distributed memory Address space global local Programming model shared address space message passing Communication structure Memory coupling Message coupling Synchronization semaphores barriers Latency treatment latency hiding latency minimization Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 3 / 37
4 Distributed Memory: MP P C M P C M Connection Network CN Multi processors have a local address space: Each processor can access only its associated memory. Interaction with other processors exclusively by sending of messages. Processors, memory and cache are standard components: Full utilization of the price advantage of high quantity of units. Connection network ranging from Fast Ethernet to Infiniband. Approach with highes scalability: IBM BlueGene > K processors Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 4 / 37
5 Distributed Memory: Message Passing Processes communicate data between distributed address spaces Explicit message passing is necessary Send-/Receive operations Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 5 / 37
6 A Generic Parallel Computer Architecture Compute Node Compute Node CPU MEM MEM CPU I/O NI Connection NI I/O Network R Data R I/O Node R Wiring R Compute Node I/O NI NI MEM I/O I/O I/O CPU Generic approach of a scalable parallel computer with distributed memory Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 6 / 37
7 Scalability Parameters Parameters, that drive the scalability of a computing system: Bandwidth [MB/s] Latency [µs] Costs [$] Physical size [m 2,m 3 ] Power consumption [W ] Fault tolerance / recovery abilities A truely scalable architecture should avoid any hard limits! Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 7 / 37
8 Influence of Parallel Software? from Culler, Singh, Gupta: Parallel Computer Architecture Although the parallel architecture scales scalable software is prerequisite for scalable numerical computation. Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 8 / 37
9 Message Passing Memory block (of variable length) shall be copied from one memory to another More precise: From the address space of one process to the address space of another process (running on a different processor) The connection network is packet oriented. Each message is subdivided in packets of fixed length (e. g. 32 byte to 4 kbyte) Tail Usage Data Head Head: target processor, tail: check sum Communication protocol: acknowledgment whether packet has arrived valid, flow control Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 9 / 37
10 Message Passing Layer model (Hierarchy of protocols): Pi_i Pi_j Operating System Network Driver Operating System Network Driver Network Interface Network Interface Connection Network Model of transmission time: t mess (n) = t s + n t b. t s : setup time (latency), t b : time per byte, /t b : bandwidth, dependent on protocol Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 / 37
11 Network Topology I (a) (b) (c) (d) (e) (a) full connected, (b) star, (c) array (d) hypercube, (e) torus, (f) folded torus (f) Hypercube: of dimension d has 2 d processors. Processor p is connected to q if their binary representation differs in exactly one bit. Network node: Earlier (before 99) this was the processor itself, nowadays it is a dedicated communication processor. Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 / 37
12 Network Topology II Reference parameters: Network topology Nodedegree K Wirecount L Diameter D Bisectionbandwidth B Symmetry Full Connectivity N N(N (N/2) 2 yes )/2 Star N N 2 N/2 no 2D Grid 4 2N 2 2( N N no N ) 3D Torus 6 3N 3 N/2 2 N yes Hypercube log 2 N nn log 2 N n N/2 yes k-ary n-cube (N = k n ) 3N n k/2 nn 2k n yes Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 2 / 37
13 Store & Forward Routing Store-and-forward routing: Message of length n is subdivided into packets of length N. Pipelining on packet level: Packet is stored completely in the network node. M M NN NN NN2 NNd NNd Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 3 / 37
14 Store & Forward Routing Transmission of a packet: Source Target Time Routing Routing Routing Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 4 / 37
15 Store & Forward Routing Run time: ( n ) t SF (n, N, d) = t s + d(t h + Nt b )+ N (t h + Nt b ) ( = t s + t h d + n )+t N b (n+n(d )). t s : time, that is needed on source and target computer until the network is instructed with the message transmission, respectively until the receiving process is informed. This is the software share of the protocol. t h : time that is necessary to transmit the first byte of a message from a network node to another one (node latency, hop time). t b : time for transmission of a byte from network node to network node d: Hops to reach the target node. Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 5 / 37
16 Cut-Through Routing Cut-through routing or wormhole routing: Packets are not buffered, each word (so called flit) is routed immediately to the next network node. Transmission of a packet: Source Target Routing Routing Routing Time 3 2 Run time: t CT (n, N, d) = t s + t h d + t b n Time for short message (n = N): t CT = t s + dt h + Nt b. Because of dt h t s (Hardware!) nearly distance independent Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 6 / 37
17 Deadlock In packet transmiting network the danger of a store-and-forward deadlock exists: D to A C to D to B A to C B Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 7 / 37
18 Deadlock Together with cut-through routing: Deadlock free dimension order routing. Example 2D grid: Partition network into +x, x, +y and y networks each with individual buffers. Message is routed first in the sender row, then in the receiver column. Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 8 / 37
19 Multi-Processor Performance from Culler, Singh, Gupta: Parallel Computer Architecture Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 9 / 37
20 Message Passing Architectures I First machine with parallel Unix Intel Paragon: Process migration, gang scheduling Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 2 / 37
21 Message Passing Architectures II IBM SP2: Compute nodes are RS 6 workstations Switching network Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 2 / 37
22 Message Passing Architectures III high package density a single system wide clock virtual shared memory Cray T3E: Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 22 / 37
23 Top5 Top5 Benchmark: LINPACK benchmark is used for evaluation of the systems Benchmark performance does not reflect the overall performance of the system Benchmark indicates performance during solution of dense linear equation sytem Very regular problem: high acchievable performance (near peak performance) Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 23 / 37
24 Top of Top5 Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 24 / 37
25 Top5 key facts Entry barrier is a performance of 6.8 TeraFlop/s Mean energy consumption of Top is 4.9 MW:.8-2 GFlops/W 4 systems need more than MW Accumulated performance is 23.4 PFlops/s (74.2 PFlop/s) Top minimal performance is 72.6 TFlop/s (5.9 TFlop/s) 2 Petaflops systems Top5 minial performance is 2.97 TFlop/s (9.29 TFlop/s) Processor type: Intel SandyBridge, AMD Opteron, IBM Power % of the systems have processors with 6 or more cores Infiniband (28) and Gigabit Ethernet (27) networks dominate Architecture: 8% Cluster, 2% MPP, % SIMD/SMP Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 25 / 37
26 Top5 Continent + Archtecture Type Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 26 / 37
27 Top5 CoresPerSocket + Interconnect Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 27 / 37
28 IBM Blue Gene/Q Architecture Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 28 / 37
29 IBM Blue Gene/Q Networks 9834 node are connected by three distinct, integrated networks 3 Dimensional Torus Virtual cut-through hardware routing to maximize efficiency 2.8 Gb/s on all 2 node links (total of 4.2 GB/s per node) Communication backbone 34 TB/s total torus interconnect bandwidth Global Tree One-to-all or all-all broadcast functionality Arithmetic operations implemented in tree ~.4 GB/s of bandwidth from any node to all other nodes Latency of tree traversal less than usec Ethernet Incorporated into every node ASIC Disk I/O Host control, booting and diagnostics Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 29 / 37
30 Cray RedStorm Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 3 / 37
31 Cray RedStorm Configuration Normally Unclassified Switchable Nodes Normally Classified Disconnect Cabinets Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 3 / 37
32 Cray RedStorm Building Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 32 / 37
33 Cray RedStorm Compute Node DRAM (or 2) Gbyte or more ASIC = Application Specific Integrated Circuit, or a custom chip CPUs AMD Opteron ASIC NIC + Router Six Links To Other Nodes in X, Y, and Z Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 33 / 37
34 Cray RedStorm Cabinet Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 34 / 37
35 Blue Gene L vs Red Storm BGL 36 TF version, Red Storm TF version Blue Gene L Red Storm Node speed 5.6 GF 5.6 GF (x) Node memory GB 2 (-8 GB) (4x) Network latency 7 us 2 us (2/7x) Network link bw.28 GB/s 6. GB/s (22x) BW Bytes/Flops.5. (22x) Bi-Section B/F.6.38 (24x) #nodes/problem 4,, (/4x) Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 35 / 37
36 Cray XT-5 Jaguar Architecture Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 36 / 37
37 Cray XT-5 Jaguar IO-Configuration Stefan Lang (IWR) Simulation on High-Performance Computers WS 5/6 37 / 37
BlueGene/L. Computer Science, University of Warwick. Source: IBM
BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours
More informationCommunication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.
Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance
More informationLecture 2 Parallel Programming Platforms
Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple
More informationReal Parallel Computers
Real Parallel Computers Modular data centers Overview Short history of parallel machines Cluster computing Blue Gene supercomputer Performance development, top-500 DAS: Distributed supercomputing Short
More informationMIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer
MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware
More informationThe Red Storm System: Architecture, System Update and Performance Analysis
The Red Storm System: Architecture, System Update and Performance Analysis Douglas Doerfler, Jim Tomkins Sandia National Laboratories Center for Computation, Computers, Information and Mathematics LACSI
More informationInterconnection Network
Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network
More information4. Networks. in parallel computers. Advances in Computer Architecture
4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors
More informationNetwork-on-chip (NOC) Topologies
Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance
More informationLecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996
Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review: ABCs of Networks Starting Point: Send bits between 2 computers Queue
More informationReal Parallel Computers
Real Parallel Computers Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra, Meuer, Simon Parallel Computing 2005 Short history
More informationCommunication Performance in Network-on-Chips
Communication Performance in Network-on-Chips Axel Jantsch Royal Institute of Technology, Stockholm November 24, 2004 Network on Chip Seminar, Linköping, November 25, 2004 Communication Performance In
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Spring 2010 Flynn s Taxonomy SISD:
More informationTDT Appendix E Interconnection Networks
TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages
More informationCOSC 6374 Parallel Computation. Parallel Computer Architectures
OS 6374 Parallel omputation Parallel omputer Architectures Some slides on network topologies based on a similar presentation by Michael Resch, University of Stuttgart Edgar Gabriel Fall 2015 Flynn s Taxonomy
More informationOutline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers
Outline Execution Environments for Parallel Applications Master CANS 2007/2008 Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya Supercomputers OS abstractions Extended OS
More informationLecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control
Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection
More informationNetworks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin
Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220 Admin Homework #5 Due Dec 3 Projects Final (yes it will be cumulative) CPS 220 2 1 Review: Terms Network characterized
More informationINTERCONNECTION NETWORKS LECTURE 4
INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1 Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source
More informationInterconnection Networks: Topology. Prof. Natalie Enright Jerger
Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design
More informationLecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)
Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew
More informationCOSC 6385 Computer Architecture - Multi Processor Systems
COSC 6385 Computer Architecture - Multi Processor Systems Fall 2006 Classification of Parallel Architectures Flynn s Taxonomy SISD: Single instruction single data Classical von Neumann architecture SIMD:
More informationStockholm Brain Institute Blue Gene/L
Stockholm Brain Institute Blue Gene/L 1 Stockholm Brain Institute Blue Gene/L 2 IBM Systems & Technology Group and IBM Research IBM Blue Gene /P - An Overview of a Petaflop Capable System Carl G. Tengwall
More informationInterconnect Technology and Computational Speed
Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented
More informationCS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2
CS 770G - arallel Algorithms in Scientific Computing arallel Architectures May 7, 2001 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan Kaufmann
More informationWhat are Clusters? Why Clusters? - a Short History
What are Clusters? Our definition : A parallel machine built of commodity components and running commodity software Cluster consists of nodes with one or more processors (CPUs), memory that is shared by
More informationLecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control
Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees, butterflies,
More informationLecture: Interconnection Networks
Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD
More informationChapter 9 Multiprocessors
ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University
More informationCluster Network Products
Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster
More informationInterconnect Routing
Interconnect Routing store-and-forward routing switch buffers entire message before passing it on latency = [(message length / bandwidth) + fixed overhead] * # hops wormhole routing pipeline message through
More informationCRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar
CRAY XK6 REDEFINING SUPERCOMPUTING - Sanjana Rakhecha - Nishad Nerurkar CONTENTS Introduction History Specifications Cray XK6 Architecture Performance Industry acceptance and applications Summary INTRODUCTION
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD
More informationScalability and Classifications
Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static
More informationHigh Performance Computing: Blue-Gene and Road Runner. Ravi Patel
High Performance Computing: Blue-Gene and Road Runner Ravi Patel 1 HPC General Information 2 HPC Considerations Criterion Performance Speed Power Scalability Number of nodes Latency bottlenecks Reliability
More information6.1 Multiprocessor Computing Environment
6 Parallel Computing 6.1 Multiprocessor Computing Environment The high-performance computing environment used in this book for optimization of very large building structures is the Origin 2000 multiprocessor,
More informationCS Parallel Algorithms in Scientific Computing
CS 775 - arallel Algorithms in Scientific Computing arallel Architectures January 2, 2004 Lecture 2 References arallel Computer Architecture: A Hardware / Software Approach Culler, Singh, Gupta, Morgan
More informationTopologies. Maurizio Palesi. Maurizio Palesi 1
Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and
More informationOverview. Processor organizations Types of parallel machines. Real machines
Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments
More informationNon-Uniform Memory Access (NUMA) Architecture and Multicomputers
Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer
More informationLecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels
Lecture: Interconnection Networks Topics: TM wrap-up, routing, deadlock, flow control, virtual channels 1 TM wrap-up Eager versioning: create a log of old values Handling problematic situations with a
More informationThe Impact of Optics on HPC System Interconnects
The Impact of Optics on HPC System Interconnects Mike Parker and Steve Scott Hot Interconnects 2009 Manhattan, NYC Will cost-effective optics fundamentally change the landscape of networking? Yes. Changes
More informationUnderstanding the Routing Requirements for FPGA Array Computing Platform. Hayden So EE228a Project Presentation Dec 2 nd, 2003
Understanding the Routing Requirements for FPGA Array Computing Platform Hayden So EE228a Project Presentation Dec 2 nd, 2003 What is FPGA Array Computing? Aka: Reconfigurable Computing Aka: Spatial computing,
More informationInterconnection Networks
Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact
More informationLecture 15: Networks & Interconnect Interface, Switches, Routing, Examples Professor David A. Patterson Computer Science 252 Fall 1996
Lecture 15: Networks & Interconnect Interface, Switches, Routing, Examples Professor David A. Patterson Computer Science 252 Fall 1996 DAP.F96 1 Review: Memory Hierarchies File Cache Hard Disk Tapes On-Line
More informationInitial Performance Evaluation of the Cray SeaStar Interconnect
Initial Performance Evaluation of the Cray SeaStar Interconnect Ron Brightwell Kevin Pedretti Keith Underwood Sandia National Laboratories Scalable Computing Systems Department 13 th IEEE Symposium on
More informationScalable Distributed Memory Machines
Scalable Distributed Memory Machines Goal: Parallel machines that can be scaled to hundreds or thousands of processors. Design Choices: Custom-designed or commodity nodes? Network scalability. Capability
More informationLecture 7: Parallel Processing
Lecture 7: Parallel Processing Introduction and motivation Architecture classification Performance evaluation Interconnection network Zebo Peng, IDA, LiTH 1 Performance Improvement Reduction of instruction
More informationCCS HPC. Interconnection Network. PC MPP (Massively Parallel Processor) MPP IBM
CCS HC taisuke@cs.tsukuba.ac.jp 1 2 CU memoryi/o 2 2 4single chipmulti-core CU 10 C CM (Massively arallel rocessor) M IBM BlueGene/L 65536 Interconnection Network 3 4 (distributed memory system) (shared
More informationEE382C Lecture 1. Bill Dally 3/29/11. EE 382C - S11 - Lecture 1 1
EE382C Lecture 1 Bill Dally 3/29/11 EE 382C - S11 - Lecture 1 1 Logistics Handouts Course policy sheet Course schedule Assignments Homework Research Paper Project Midterm EE 382C - S11 - Lecture 1 2 What
More informationDistributed-Memory Programming Models I
Distributed-Memory Programming Models I Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-69120 Heidelberg phone: 06221/54-8264 email: Stefan.Lang@iwr.uni-heidelberg.de
More informationWhat is Parallel Computing?
What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing
More informationRoadmapping of HPC interconnects
Roadmapping of HPC interconnects MIT Microphotonics Center, Fall Meeting Nov. 21, 2008 Alan Benner, bennera@us.ibm.com Outline Top500 Systems, Nov. 2008 - Review of most recent list & implications on interconnect
More informationBlueGene/L (No. 4 in the Latest Top500 List)
BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside
More informationParallel Computing Platforms. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Elements of a Parallel Computer Hardware Multiple processors Multiple
More informationEN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University
EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University Material from: The Datacenter as a Computer: An Introduction to
More informationCSC630/CSC730: Parallel Computing
CSC630/CSC730: Parallel Computing Parallel Computing Platforms Chapter 2 (2.4.1 2.4.4) Dr. Joe Zhang PDC-4: Topology 1 Content Parallel computing platforms Logical organization (a programmer s view) Control
More informationLecture 18: Communication Models and Architectures: Interconnection Networks
Design & Co-design of Embedded Systems Lecture 18: Communication Models and Architectures: Interconnection Networks Sharif University of Technology Computer Engineering g Dept. Winter-Spring 2008 Mehdi
More informationSMP and ccnuma Multiprocessor Systems. Sharing of Resources in Parallel and Distributed Computing Systems
Reference Papers on SMP/NUMA Systems: EE 657, Lecture 5 September 14, 2007 SMP and ccnuma Multiprocessor Systems Professor Kai Hwang USC Internet and Grid Computing Laboratory Email: kaihwang@usc.edu [1]
More information1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects
Overview of Interconnects Myrinet and Quadrics Leading Modern Interconnects Presentation Outline General Concepts of Interconnects Myrinet Latest Products Quadrics Latest Release Our Research Interconnects
More informationTopologies. Maurizio Palesi. Maurizio Palesi 1
Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and
More informationRecall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms
CS252 Graduate Computer Architecture Lecture 16 Multiprocessor Networks (con t) March 14 th, 212 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252
More informationParallel Computing Platforms
Parallel Computing Platforms Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationComputing architectures Part 2 TMA4280 Introduction to Supercomputing
Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:
More informationLecture 9: MIMD Architecture
Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected
More informationLecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University
18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel
More informationParallel Architectures
Parallel Architectures CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Parallel Architectures Spring 2018 1 / 36 Outline 1 Parallel Computer Classification Flynn s
More informationCS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011
CS252 Graduate Computer Architecture Lecture 14 Multiprocessor Networks March 9 th, 2011 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252
More informationLecture 20: Distributed Memory Parallelism. William Gropp
Lecture 20: Distributed Parallelism William Gropp www.cs.illinois.edu/~wgropp A Very Short, Very Introductory Introduction We start with a short introduction to parallel computing from scratch in order
More informationLecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance
Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,
More informationProcessor Architecture and Interconnect
Processor Architecture and Interconnect What is Parallelism? Parallel processing is a term used to denote simultaneous computation in CPU for the purpose of measuring its computation speeds. Parallel Processing
More informationNetwork on Chip Architecture: An Overview
Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology
More informationECE 669 Parallel Computer Architecture
ECE 669 Parallel Computer Architecture Lecture 21 Routing Outline Routing Switch Design Flow Control Case Studies Routing Routing algorithm determines which of the possible paths are used as routes how
More informationScaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc
Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC
More informationBlue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft
Blue Gene/Q Hardware Overview 02.02.2015 Michael Stephan Blue Gene/Q: Design goals System-on-Chip (SoC) design Processor comprises both processing cores and network Optimal performance / watt ratio Small
More informationLecture 2: Topology - I
ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and
More informationCS575 Parallel Processing
CS575 Parallel Processing Lecture three: Interconnection Networks Wim Bohm, CSU Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 license.
More informationCS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2
Real Machines Interconnection Network Topology Design Trade-offs CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More informationScalable Multiprocessors
Topics Scalable Multiprocessors Scaling issues Supporting programming models Network interface Interconnection network Considerations in Bluegene/L design 2 Limited Scaling of a Bus Comparing with a LAN
More informationrepresent parallel computers, so distributed systems such as Does not consider storage or I/O issues
Top500 Supercomputer list represent parallel computers, so distributed systems such as SETI@Home are not considered Does not consider storage or I/O issues Both custom designed machines and commodity machines
More informationLecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background
Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background 1 Hard Error Tolerance in PCM PCM cells will eventually fail; important to cause gradual capacity degradation
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More informationThe Tofu Interconnect D
The Tofu Interconnect D 11 September 2018 Yuichiro Ajima, Takahiro Kawashima, Takayuki Okamoto, Naoyuki Shida, Kouichi Hirai, Toshiyuki Shimizu, Shinya Hiramoto, Yoshiro Ikeda, Takahide Yoshikawa, Kenji
More informationFujitsu s Approach to Application Centric Petascale Computing
Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview
More informationSupercomputers. Alex Reid & James O'Donoghue
Supercomputers Alex Reid & James O'Donoghue The Need for Supercomputers Supercomputers allow large amounts of processing to be dedicated to calculation-heavy problems Supercomputers are centralized in
More informationLecture 16: On-Chip Networks. Topics: Cache networks, NoC basics
Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality
More informationLarge Scale Multiprocessors and Scientific Applications. By Pushkar Ratnalikar Namrata Lele
Large Scale Multiprocessors and Scientific Applications By Pushkar Ratnalikar Namrata Lele Agenda Introduction Interprocessor Communication Characteristics of Scientific Applications Synchronization: Scaling
More informationInterconnection Networks
Lecture 18: Interconnection Networks Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of these slides were created by Michael Papamichael This lecture is partially
More informationMultiprocessor Interconnection Networks
Multiprocessor Interconnection Networks Todd C. Mowry CS 740 November 19, 1998 Topics Network design space Contention Active messages Networks Design Options: Topology Routing Direct vs. Indirect Physical
More informationCMSC 611: Advanced. Parallel Systems
CMSC 611: Advanced Computer Architecture Parallel Systems Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems
More informationEE382 Processor Design. Illinois
EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors Part II EE 382 Processor Design Winter 98/99 Michael Flynn 1 Illinois EE 382 Processor Design Winter 98/99 Michael Flynn 2 1 Write-invalidate
More informationParallel Computer Architecture I
Parallel Computer Architecture I Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-69120 Heidelberg phone: 06221/54-8264 email: Stefan.Lang@iwr.uni-heidelberg.de
More informationNOW Handout Page 1. Outline. Networks: Routing and Design. Routing. Routing Mechanism. Routing Mechanism (cont) Properties of Routing Algorithms
Outline Networks: Routing and Design Routing Switch Design Case Studies CS 5, Spring 99 David E. Culler Computer Science Division U.C. Berkeley 3/3/99 CS5 S99 Routing Recall: routing algorithm determines
More informationModule 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth
Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012
More informationOutline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)
Cluster Computing Dichotomy of Parallel Computing Platforms (Continued) Lecturer: Dr Yifeng Zhu Class Review Interconnections Crossbar» Example: myrinet Multistage» Example: Omega network Outline Flynn
More informationConvergence of Parallel Architecture
Parallel Computing Convergence of Parallel Architecture Hwansoo Han History Parallel architectures tied closely to programming models Divergent architectures, with no predictable pattern of growth Uncertainty
More informationParallel Architecture. Sathish Vadhiyar
Parallel Architecture Sathish Vadhiyar Motivations of Parallel Computing Faster execution times From days or months to hours or seconds E.g., climate modelling, bioinformatics Large amount of data dictate
More information