INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1
Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source to destination? Static or adaptive Buffering and Flow Control What do we store within the network? (Entire packets, parts of packets, etc?) How do we manage and negotiate buffer space? Tightly coupled with routing strategy PAGE 2
Network interface Connects endpoints (e.g. cores) to network. Decouples computation/communication Links Bundle of wires that carries a signal Switch/router Connects fixed number of input channels to fixed number of output channels Channel A single logical connection between routers/switches PAGE 3
These are fundamental decisions in determining the appropriate architecture of an interconnection network (IN) for parallel machines. The decisions are made between :- Mode of Operation Control strategy Switching methodology Network topology PAGE 4
INs are classified as synchronous versus asynchronous. In synchronous mode of operation, a single global clock is used by all components in the system such that the whole system is operating in a lock step manner. Asynchronous mode of operation, on the other hand, does not require a global clock. Handshaking signals are used instead in order to coordinate the operation of asynchronous systems. While synchronous systems tend to be slower compared to asynchronous systems, they are race and hazard-free. PAGE 5
A typical interconnection network consists of a number of switching elements and interconnecting links. Interconnection functions are realized by properly setting control of the switching elements. The control-setting function can be managed by a centralized controller or by the individual switching element. The latter strategy is called distributed control and the first strategy corresponds to centralized control. Most existing SIMD interconnection networks choose the centralized control on all switch elements by the control unit. PAGE 6
The two major switching methodologies are circuit switching and packet switching. circuit switching sets up a full path (acquires all resources) between sender and receiver prior to sending a message Reserve link than send data Higher bandwidth transmission (no link management overhead) Overhead to set up a path Reserving link can results in low utilization In packet switching, data is put in a packet and routed through the interconnection network without establishing a physical connection path. In general, circuit switching is much more suitable for bulk data transmission, and packet switching is more efficient for many short data messages. Route packets individually (possibly on different network links) Opportunity to use link whenever a link is idle Overhead due to dynamic switching Most SIMD interconnection networks are hardwired to assume circuit switching operations. Packet switched networks have been suggested mainly for MIMD machines. PAGE 7
A network can be depicted by a graph in which nodes represent switching points and edges represent communication links. The topologies tend to be regular and can be grouped into two categories: static and dynamic. In a static topology, links between two processors are passive and dedicated buses cannot be reconfigured for direct connections to other processors. On the other hand, links in the dynamic category can be reconfigured by setting the network's active switching elements. The choice of a particular interconnection network depends on the application demands, technology supports, and cost- effectiveness. PAGE 8
Regular or Irregular regular if topology is regular graph (e.g. ring, mesh) Routing Distance number of links/hops along route Diameter maximum routing distance Average Distance average number of hops across all valid routes PAGE 9
PAGE 10
STATIC NETWORKS The inter-pe communication network can be specified by a set of data- Routing Static networks Topologies in the static networks can be classified according to the dimensions required for layout. For illustration, one-dimensional, two-dimensional, three-dimensional, and hypercube are shown in next slide PAGE 11
PAGE 12
We consider two classes of dynamic networks: single-stage versus multistage, as described below separately. Single-stage networks A single-stage network is a switching network with N input selectors (IS) and N output selectors The single-stage network is also called a recirculating network. Data items may have to recirculate through the single stage several times before reaching their final destinations. PAGE 13
Many stages of interconnected switches form a multistage SIMD network. Multistage networks are described by three characterizing features: the switch box, the network topology, and the control structure. Many switch boxes are used in a multistage network. Each box is essentially an interchange device with two inputs and two outputs Illustrated are four states of a switch box: straight, exchange, upper broadcast, and lower broadcast. A two-function switch box can assume either the straight or the exchange states. A four-function switch box can be in any one of the four legitimate states. PAGE 14
A network is called a re-arrangeable network if it can perform all possible connections between inputs and outputs by rearranging its existing connections so that a connection path for a new input-output pair can always be established. PAGE 15
Static networks are opposite of dynamic networks in terms of network status, meaning that static networks are fixed and can be unidirectional or bi-directional between processors. There exist two types of static networks Completely connected Networks(CCN) Limited Connection Networks (LCN) Linear Arrays Rings (Loops) Two-Dimensional Arrays and Tori Tree Networks N-Cube Networks PAGE 16
(Node Degree) d= the number of edges incident on a node. (Diameter) D= the maximum shortest path between any two nodes. Bisection width is the minimum number of links that must be cut to divide the network into two equal halves(low bisection width means low data transfer and high bisection means high level of data transfer may happen). Network latency: worst-case time for a unit message to be transferred Hardware complexity: implementation costs for wire, logic, switches, connectors, etc. PAGE 17
A CCN consists of any number of nodes, where all nodes are connected to each other. The network diameter is therefore and D=1 the node degree is d=n-1 (a node is connected to all other nodes, except itself). the Bisection Width of a CCN is b=n 2 /4. Needs N(N-1)/2 links to connect N processor nodes. Example N=16 -> 136 connections. N=1,024 -> 524,288 connections D=1 d=n-1 b=n 2 /4 PAGE 18
Linear Arrays and Rings a linear array s nodes are connected to each other, forming a straight line. This is an asymmetric network: all nodes have a degree of 2, with the exception of the end nodes, which have a degree of 1 The network has a bisection width of 1. Asymmetric network Degree d=2 Diameter D=N-1 Bisection bandwidth: b=1 Allows for using different sections of the channel by different sources concurrently. one serious disadvantage is that this network s diameter increases proportionally with the number of nodes. As a result, this topology is not scalable. PAGE 19
The ring topology attempts to solve the large diameter problem inherent in linear arrays. As shown below, the ring is simply a linear array, with its end nodes connected together. This has the effect of making the network symmetric: all nodes have a degree of 2. Linear Array Ring Ring arranged to use short wires D N / 2 d=2 D=N-1 for unidirectional ring or D=N/2 for bidirectional ring PAGE 20
the processors are located at the leaves all other nodes are switches. A k-ary tree has height Diameter D=2(h) bisection width of the tree is b=1 the bisection width of the tree is 1 resulting in poor bandwidth at the root level. One solution for this problem is the fat tree (discussed below). PAGE 21
The fat tree solves the bandwidth problem by doubling the number of connections at each level in the tree; each processor, however, still has a degree of 1, as shown in the figure below. PAGE 22
In an n-dimensional cube (n-cube) network, there are N=2 n processors, and each processor is connected to n other processors. Each PE is connected to (d = log N) other PEs 100 110 d = log 2 N, b=n/2=2 n-1 000 010 101 111 Binary labels of neighbor PEs differ in only one bit A d-dimensional hypercube can be partitioned into two (d-1)-dimensional hypercubes The distance between Pi and Pj in a hypercube: the number of bit positions in which i and j differ (ie. the Hamming distance) Example: 10011 01001 = 11010 Distance between PE11 and PE9 is 3 D=log 2 N 100 110 000 010 101 001 011 0-D 1-D 2-D 3-D 111 4-D 001 011 PAGE 23
mesh is an asymmetric network, where the corner nodes have d = 2, the sides d = 3, and the centre nodes d = 4. k-dimensional mesh has N=n k nodes. d= 2k except at boundary nodes. Like the ring topology, the torus topology attempts to decrease the network diameter, for a given number of nodes. The usual diameter of a 2-dimensional mesh is The torus, on the other hand, has a diameter of, effectively reducing the diameter by a factor of k. However, this has the effect of increasing the bisection width by the same amount from to Furthermore, the torus network is symmetric since all nodes now have a degree d = 4. mesh torus PAGE 24
Single Bus Systems A single bus is considered the simplest way to connect multiprocessor systems. Such a system consists of N processors, each having its own cache, connected by a shared bus Although simple and easy to expand, single bus multiprocessors are inherently limited by the bandwidth of the bus and the fact that only one processor can access the bus, and in turn only one memory access can take place at any given time. PAGE 25
Advantages Simple Cost effective Easy to implement Disadvantages High contention: all nodes contend for shared bus Limited bandwidth: all nodes communicate over same wires PAGE 26
The use of multiple buses to connect multiple processors is a natural extension to the single shared bus system. A multiple bus multiprocessor system uses several parallel buses to interconnect multiple processors and multiple memory modules. A number of connection schemes are possible in this case. Among the possibilities are : multiple bus with full bus memory connection (MBFBMC), multiple bus with single bus memory connection (MBSBMC), multiple bus with partial bus-memory connection (MBPBMC), multiple bus with class-based memory connection (MBCBMC). PAGE 27
MULTIPLE BUS WITH FULL BUS MEMORY CONNECTION (MBFBMC), PAGE 28
MULTIPLE BUS WITH SINGLE BUS MEMORY CONNECTION (MBSBMC) PAGE 29
MULTIPLE BUS WITH PARTIAL BUS-MEMORY CONNECTION (MBPBMC) PAGE 30
MULTIPLE BUS WITH CLASS-BASED MEMORY CONNECTION (MBCBMC) PAGE 31