INTERCONNECTION NETWORKS LECTURE 4

Similar documents
A MULTIPROCESSOR SYSTEM. Mariam A. Salih

Multiprocessors Interconnection Networks

Advanced Parallel Architecture. Annalisa Massini /2017

4. Networks. in parallel computers. Advances in Computer Architecture

Interconnection networks

This chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research

Dr e v prasad Dt

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Static Interconnection Networks Prof. Kasim M. Al-Aubidy Computer Eng. Dept.

Network-on-chip (NOC) Topologies

Interconnect Technology and Computational Speed

Interconnection Network

Interconnection Network

Multiprocessor Interconnection Networks- Part Three

Physical Organization of Parallel Platforms. Alexandre David

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Interconnection Networks

Lecture 2 Parallel Programming Platforms

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

CSC630/CSC730: Parallel Computing

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Parallel Architectures

Interconnection Networks. Issues for Networks

Scalability and Classifications

CS252 Graduate Computer Architecture Lecture 14. Multiprocessor Networks March 9 th, 2011

SHARED MEMORY VS DISTRIBUTED MEMORY

Lecture: Interconnection Networks

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

EE/CSCI 451: Parallel and Distributed Computation

Chapter 9 Multiprocessors

Lecture 7: Parallel Processing

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

CS575 Parallel Processing

Overview. Processor organizations Types of parallel machines. Real machines

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

EE/CSCI 451: Parallel and Distributed Computation

CS 770G - Parallel Algorithms in Scientific Computing Parallel Architectures. May 7, 2001 Lecture 2

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Interconnection Networks

Network Properties, Scalability and Requirements For Parallel Processing. Communication assist (CA)

CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99 2

EE/CSCI 451: Parallel and Distributed Computation

Outline. Distributed Shared Memory. Shared Memory. ECE574 Cluster Computing. Dichotomy of Parallel Computing Platforms (Continued)

TDT Appendix E Interconnection Networks

Different network topologies

BlueGene/L. Computer Science, University of Warwick. Source: IBM

Network Properties, Scalability and Requirements For Parallel Processing. Communication assist (CA)

EE382 Processor Design. Illinois

Lecture 7: Parallel Processing

Lecture 2: Topology - I

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

CS Parallel Algorithms in Scientific Computing

Network on Chip Architecture: An Overview

Parallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved.

Types of Parallel Computers

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Interconnection Networks

Parallel Computing Interconnection Networks

Chapter 11. Introduction to Multiprocessors

Parallel Architecture. Sathish Vadhiyar

Chapter 8 : Multiprocessors

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)

Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996

Fundamentals of Networking Types of Topologies

COMPARISON OF OCTAGON-CELL NETWORK WITH OTHER INTERCONNECTED NETWORK TOPOLOGIES AND ITS APPLICATIONS

More on IO: The Universal Serial Bus (USB)

Communication Performance in Network-on-Chips

Understanding the Routing Requirements for FPGA Array Computing Platform. Hayden So EE228a Project Presentation Dec 2 nd, 2003

Characteristics of Mult l ip i ro r ce c ssors r

Spider-Web Topology: A Novel Topology for Parallel and Distributed Computing

Parallel Computer Architecture II

Computer parallelism Flynn s categories

Multiprocessor Interconnection Networks

Topologies. Maurizio Palesi. Maurizio Palesi 1

Model Questions and Answers on

Multi-Processor / Parallel Processing

Message Passing Models and Multicomputer distributed system LECTURE 7

MULTIPROCESSORS. Characteristics of Multiprocessors. Interconnection Structures. Interprocessor Arbitration

Simulation Analysis of Permutation Passibility behavior of Multi-stage Interconnection Networks

What is Parallel Computing?

CPS 303 High Performance Computing. Wensheng Shen Department of Computational Science SUNY Brockport

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

EE 4683/5683: COMPUTER ARCHITECTURE

Lecture 18: Communication Models and Architectures: Interconnection Networks

COSC 6374 Parallel Computation. Parallel Computer Architectures

SMD149 - Operating Systems - Multiprocessing

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Network-on-Chip Architecture

High Performance Computing Programming Paradigms and Scalability Part 2: High-Performance Networks

Multiprocessors - Flynn s Taxonomy (1966)

COSC 6374 Parallel Computation. Parallel Computer Architectures

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control

Design of a System-on-Chip Switched Network and its Design Support Λ

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N.

Network Dilation: A Strategy for Building Families of Parallel Processing Architectures Behrooz Parhami

Transcription:

INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1

Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source to destination? Static or adaptive Buffering and Flow Control What do we store within the network? (Entire packets, parts of packets, etc?) How do we manage and negotiate buffer space? Tightly coupled with routing strategy PAGE 2

Network interface Connects endpoints (e.g. cores) to network. Decouples computation/communication Links Bundle of wires that carries a signal Switch/router Connects fixed number of input channels to fixed number of output channels Channel A single logical connection between routers/switches PAGE 3

These are fundamental decisions in determining the appropriate architecture of an interconnection network (IN) for parallel machines. The decisions are made between :- Mode of Operation Control strategy Switching methodology Network topology PAGE 4

INs are classified as synchronous versus asynchronous. In synchronous mode of operation, a single global clock is used by all components in the system such that the whole system is operating in a lock step manner. Asynchronous mode of operation, on the other hand, does not require a global clock. Handshaking signals are used instead in order to coordinate the operation of asynchronous systems. While synchronous systems tend to be slower compared to asynchronous systems, they are race and hazard-free. PAGE 5

A typical interconnection network consists of a number of switching elements and interconnecting links. Interconnection functions are realized by properly setting control of the switching elements. The control-setting function can be managed by a centralized controller or by the individual switching element. The latter strategy is called distributed control and the first strategy corresponds to centralized control. Most existing SIMD interconnection networks choose the centralized control on all switch elements by the control unit. PAGE 6

The two major switching methodologies are circuit switching and packet switching. circuit switching sets up a full path (acquires all resources) between sender and receiver prior to sending a message Reserve link than send data Higher bandwidth transmission (no link management overhead) Overhead to set up a path Reserving link can results in low utilization In packet switching, data is put in a packet and routed through the interconnection network without establishing a physical connection path. In general, circuit switching is much more suitable for bulk data transmission, and packet switching is more efficient for many short data messages. Route packets individually (possibly on different network links) Opportunity to use link whenever a link is idle Overhead due to dynamic switching Most SIMD interconnection networks are hardwired to assume circuit switching operations. Packet switched networks have been suggested mainly for MIMD machines. PAGE 7

A network can be depicted by a graph in which nodes represent switching points and edges represent communication links. The topologies tend to be regular and can be grouped into two categories: static and dynamic. In a static topology, links between two processors are passive and dedicated buses cannot be reconfigured for direct connections to other processors. On the other hand, links in the dynamic category can be reconfigured by setting the network's active switching elements. The choice of a particular interconnection network depends on the application demands, technology supports, and cost- effectiveness. PAGE 8

Regular or Irregular regular if topology is regular graph (e.g. ring, mesh) Routing Distance number of links/hops along route Diameter maximum routing distance Average Distance average number of hops across all valid routes PAGE 9

PAGE 10

STATIC NETWORKS The inter-pe communication network can be specified by a set of data- Routing Static networks Topologies in the static networks can be classified according to the dimensions required for layout. For illustration, one-dimensional, two-dimensional, three-dimensional, and hypercube are shown in next slide PAGE 11

PAGE 12

We consider two classes of dynamic networks: single-stage versus multistage, as described below separately. Single-stage networks A single-stage network is a switching network with N input selectors (IS) and N output selectors The single-stage network is also called a recirculating network. Data items may have to recirculate through the single stage several times before reaching their final destinations. PAGE 13

Many stages of interconnected switches form a multistage SIMD network. Multistage networks are described by three characterizing features: the switch box, the network topology, and the control structure. Many switch boxes are used in a multistage network. Each box is essentially an interchange device with two inputs and two outputs Illustrated are four states of a switch box: straight, exchange, upper broadcast, and lower broadcast. A two-function switch box can assume either the straight or the exchange states. A four-function switch box can be in any one of the four legitimate states. PAGE 14

A network is called a re-arrangeable network if it can perform all possible connections between inputs and outputs by rearranging its existing connections so that a connection path for a new input-output pair can always be established. PAGE 15

Static networks are opposite of dynamic networks in terms of network status, meaning that static networks are fixed and can be unidirectional or bi-directional between processors. There exist two types of static networks Completely connected Networks(CCN) Limited Connection Networks (LCN) Linear Arrays Rings (Loops) Two-Dimensional Arrays and Tori Tree Networks N-Cube Networks PAGE 16

(Node Degree) d= the number of edges incident on a node. (Diameter) D= the maximum shortest path between any two nodes. Bisection width is the minimum number of links that must be cut to divide the network into two equal halves(low bisection width means low data transfer and high bisection means high level of data transfer may happen). Network latency: worst-case time for a unit message to be transferred Hardware complexity: implementation costs for wire, logic, switches, connectors, etc. PAGE 17

A CCN consists of any number of nodes, where all nodes are connected to each other. The network diameter is therefore and D=1 the node degree is d=n-1 (a node is connected to all other nodes, except itself). the Bisection Width of a CCN is b=n 2 /4. Needs N(N-1)/2 links to connect N processor nodes. Example N=16 -> 136 connections. N=1,024 -> 524,288 connections D=1 d=n-1 b=n 2 /4 PAGE 18

Linear Arrays and Rings a linear array s nodes are connected to each other, forming a straight line. This is an asymmetric network: all nodes have a degree of 2, with the exception of the end nodes, which have a degree of 1 The network has a bisection width of 1. Asymmetric network Degree d=2 Diameter D=N-1 Bisection bandwidth: b=1 Allows for using different sections of the channel by different sources concurrently. one serious disadvantage is that this network s diameter increases proportionally with the number of nodes. As a result, this topology is not scalable. PAGE 19

The ring topology attempts to solve the large diameter problem inherent in linear arrays. As shown below, the ring is simply a linear array, with its end nodes connected together. This has the effect of making the network symmetric: all nodes have a degree of 2. Linear Array Ring Ring arranged to use short wires D N / 2 d=2 D=N-1 for unidirectional ring or D=N/2 for bidirectional ring PAGE 20

the processors are located at the leaves all other nodes are switches. A k-ary tree has height Diameter D=2(h) bisection width of the tree is b=1 the bisection width of the tree is 1 resulting in poor bandwidth at the root level. One solution for this problem is the fat tree (discussed below). PAGE 21

The fat tree solves the bandwidth problem by doubling the number of connections at each level in the tree; each processor, however, still has a degree of 1, as shown in the figure below. PAGE 22

In an n-dimensional cube (n-cube) network, there are N=2 n processors, and each processor is connected to n other processors. Each PE is connected to (d = log N) other PEs 100 110 d = log 2 N, b=n/2=2 n-1 000 010 101 111 Binary labels of neighbor PEs differ in only one bit A d-dimensional hypercube can be partitioned into two (d-1)-dimensional hypercubes The distance between Pi and Pj in a hypercube: the number of bit positions in which i and j differ (ie. the Hamming distance) Example: 10011 01001 = 11010 Distance between PE11 and PE9 is 3 D=log 2 N 100 110 000 010 101 001 011 0-D 1-D 2-D 3-D 111 4-D 001 011 PAGE 23

mesh is an asymmetric network, where the corner nodes have d = 2, the sides d = 3, and the centre nodes d = 4. k-dimensional mesh has N=n k nodes. d= 2k except at boundary nodes. Like the ring topology, the torus topology attempts to decrease the network diameter, for a given number of nodes. The usual diameter of a 2-dimensional mesh is The torus, on the other hand, has a diameter of, effectively reducing the diameter by a factor of k. However, this has the effect of increasing the bisection width by the same amount from to Furthermore, the torus network is symmetric since all nodes now have a degree d = 4. mesh torus PAGE 24

Single Bus Systems A single bus is considered the simplest way to connect multiprocessor systems. Such a system consists of N processors, each having its own cache, connected by a shared bus Although simple and easy to expand, single bus multiprocessors are inherently limited by the bandwidth of the bus and the fact that only one processor can access the bus, and in turn only one memory access can take place at any given time. PAGE 25

Advantages Simple Cost effective Easy to implement Disadvantages High contention: all nodes contend for shared bus Limited bandwidth: all nodes communicate over same wires PAGE 26

The use of multiple buses to connect multiple processors is a natural extension to the single shared bus system. A multiple bus multiprocessor system uses several parallel buses to interconnect multiple processors and multiple memory modules. A number of connection schemes are possible in this case. Among the possibilities are : multiple bus with full bus memory connection (MBFBMC), multiple bus with single bus memory connection (MBSBMC), multiple bus with partial bus-memory connection (MBPBMC), multiple bus with class-based memory connection (MBCBMC). PAGE 27

MULTIPLE BUS WITH FULL BUS MEMORY CONNECTION (MBFBMC), PAGE 28

MULTIPLE BUS WITH SINGLE BUS MEMORY CONNECTION (MBSBMC) PAGE 29

MULTIPLE BUS WITH PARTIAL BUS-MEMORY CONNECTION (MBPBMC) PAGE 30

MULTIPLE BUS WITH CLASS-BASED MEMORY CONNECTION (MBCBMC) PAGE 31