Supporting Fully Adaptive Routing in InfiniBand Networks

Similar documents
Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Deadlock-free Routing in InfiniBand TM through Destination Renaming Λ

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing

Performance Evaluation of a New Routing Strategy for Irregular Networks with Source Routing

Removing the Latency Overhead of the ITB Mechanism in COWs with Source Routing Λ

A Simple and Efficient Mechanism to Prevent Saturation in Wormhole Networks Λ

Improving Network Performance by Reducing Network Contention in Source-Based COWs with a Low Path-Computation Overhead Λ

Message Transport With The User Datagram Protocol

In-Order Packet Delivery in Interconnection Networks using Adaptive Routing

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization

A First Implementation of In-Transit Buffers on Myrinet GM Software Λ

Performance Modelling of Necklace Hypercubes

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks

Adaptive Load Balancing based on IP Fast Reroute to Avoid Congestion Hot-spots

An Adaptive Routing Algorithm for Communication Networks using Back Pressure Technique

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks

Questions? Post on piazza, or Radhika (radhika at eecs.berkeley) or Sameer (sa at berkeley)!

Study of Network Optimization Method Based on ACL

A Multiple LID Routing Scheme for Fat-Tree-Based InfiniBand Networks

Non-homogeneous Generalization in Privacy Preserving Data Publishing

Boosting the Performance of Myrinet Networks

Deterministic versus Adaptive Routing in Fat-Trees

Dynamic Network Reconfiguration for Switch-based Networks

Improving Spatial Reuse of IEEE Based Ad Hoc Networks

IEEE TRANSACTIONS ON COMPUTERS, VOL. 52, NO. 7, JULY Applying In-Transit Buffers to Boost the Performance of Networks with Source Routing

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources

DUE to the increasing computing power of microprocessors

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH

EDOVE: Energy and Depth Variance-Based Opportunistic Void Avoidance Scheme for Underwater Acoustic Sensor Networks

Coupling the User Interfaces of a Multiuser Program

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method

Optimal Topology for Distributed Shared-Memory. Multiprocessors: Hypercubes Again? Jose Duato and M.P. Malumbres

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control

Robust PIM-SM Multicasting using Anycast RP in Wireless Ad Hoc Networks

ACE: And/Or-parallel Copying-based Execution of Logic Programs

Comparison of Methods for Increasing the Performance of a DUA Computation

AnyTraffic Labeled Routing

Loop Scheduling and Partitions for Hiding Memory Latencies

Architecture Design of Mobile Access Coordinated Wireless Sensor Networks

Online Appendix to: Generalizing Database Forensics

Lab work #8. Congestion control

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Software-Based Deadlock Recovery Technique for True Fully Adaptive Routing in Wormhole Networks

Questions? Post on piazza, or Radhika (radhika at eecs.berkeley) or Sameer (sa at berkeley)!

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem

Table-based division by small integer constants

Disjoint Multipath Routing in Dual Homing Networks using Colored Trees

A shortest path algorithm in multimodal networks: a case study with time varying costs

Politecnico di Torino. Porto Institutional Repository

Research Article REALFLOW: Reliable Real-Time Flooding-Based Routing Protocol for Industrial Wireless Sensor Networks

An Energy Efficient Routing for Wireless Sensor Networks: Hierarchical Approach

Provisioning Virtualized Cloud Services in IP/MPLS-over-EON Networks

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems

Congestion in InfiniBand Networks

On the Placement of Internet Taps in Wireless Neighborhood Networks

Comparison of Wireless Network Simulators with Multihop Wireless Network Testbed in Corridor Environment

A New Proposal to Fill in the InfiniBand Arbitration Tables Λ

A Hierarchical P2PSIP Architecture to support Skype-like services

CS 106 Winter 2016 Craig S. Kaplan. Module 01 Processing Recap. Topics

Optimal Routing and Scheduling for Deterministic Delay Tolerant Networks

Computer Organization

A Framework for Dialogue Detection in Movies

d 3 d 4 d d d d d d d d d d d 1 d d d d d d

Lecture 1 September 4, 2013

Two Dimensional-IP Routing

NAND flash memory is widely used as a storage

Design Principles for Practical Self-Routing Nonblocking Switching Networks with O(N log N) Bit-Complexity

Control of Scalable Wet SMA Actuator Arrays

Distributed Line Graphs: A Universal Technique for Designing DHTs Based on Arbitrary Regular Graphs

Modifying ROC Curves to Incorporate Predicted Probabilities

PART 2. Organization Of An Operating System

Probabilistic Medium Access Control for. Full-Duplex Networks with Half-Duplex Clients

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks

Waleed K. Al-Assadi. Anura P. Jayasumana. Yashwant K. Malaiya y. February Colorado State University

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks

36 IEEE POTENTIALS /07/$ IEEE

Clustering using Particle Swarm Optimization. Nuria Gómez Blas, Octavio López Tolic

Skyline Community Search in Multi-valued Networks

Scalable Deterministic Scheduling for WDM Slot Switching Xhaul with Zero-Jitter

University of Castilla-La Mancha

MODULE VII. Emerging Technologies

Chapter 5 Proposed models for reconstituting/ adapting three stereoscopes

A Measurement Framework for Pin-Pointing Routing Changes

Chapter 9 Memory Management

A Neural Network Model Based on Graph Matching and Annealing :Application to Hand-Written Digits Recognition

Politehnica University of Timisoara Mobile Computing, Sensors Network and Embedded Systems Laboratory. Testing Techniques

MODULE V. Internetworking: Concepts, Addressing, Architecture, Protocols, Datagram Processing, Transport-Layer Protocols, And End-To-End Services

Shift-map Image Registration

Finite Automata Implementations Considering CPU Cache J. Holub

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 31, NO. 4, APRIL

6.823 Computer System Architecture. Problem Set #3 Spring 2002

University of Castilla-La Mancha

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien

Implementation and Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA

Using Vector and Raster-Based Techniques in Categorical Map Generalization

Detour Planning for Fast and Reliable Failure Recovery in SDN with OpenState

THE InfiniBand Architecture (IBA) [8], [14] is a new

EFFICIENT ON-LINE TESTING METHOD FOR A FLOATING-POINT ADDER

Transcription:

XIV JORNADAS DE PARALELISMO - LEGANES, SEPTIEMBRE 200 1 Supporting Fully Aaptive Routing in InfiniBan Networks J.C. Martínez, J. Flich, A. Robles, P. López an J. Duato Resumen InfiniBan is a new stanar for communication between processing noes an I/O evices as well as for interprocessor communication. The InfiniBan Architecture (IBA) supports istribute routing. However, routing in IBA is eterministic because forwaring tables store a single per estination ID. This prevents packets from using alternative paths when the requeste is busy. Despite the fact that alternative paths coul be selecte at the source noe to reach the same estination noe, this is not effective enough to improve network performance. However, using aaptive routing coul help to circumvent the congeste areas in the network, leaing to an increment in performance. In this paper, we propose a simple strategy to implement forwaring tables for IBA switches that support aaptive routing while still maintaining compatibility with the IBA specs. Aaptive routing can be enable or isable iniviually for each packet at the source noe. Also, the propose strategy enables the use in IBA of fully aaptive routing algorithms without using aitional network resources to improve network performance. Evaluation results show that extening IBA switch capabilities with fully aaptive routing noticeably increases network performance. In particular, network throughput increases up to an average factor of.9. Palabras clave SANs, InfiniBan, aaptive routing, virtual aressing. I. INTRODUCTION NFINIBAND [11] is a new interconnect stanar for communication between processing noes an I/O evices, as well as for interprocessor communication (IPC). The InfiniBan Architecture (IBA) [12] is esigne aroun a switch-base interconnect technology with high-spee serial point-to-point links. IBA supports any topology efine by the user in orer to provie wiring flexibility an incremental expansion capability. More than 180 companies, incluing the leaing computer manufacturers, support the InfiniBan initiative. It was esigne to solve the lack of high banwith, reliability, availability an scalability of existing server I/O technologies. However, the spectrum of possible application omains for InfiniBan is wier, incluing I/O interconnect, system area networks (SAN), storage area networks (STAN), an local area networks (LAN). InfiniBan coul be use as a SAN fabric for high performance server clusters or commoity clusters forme by PCs or workstations. In particular, InfiniBan can provie the high banwith an low en-to-en latency require for enhancing commoity clusters that can be use as a cost-effective alternative to parallel This work was supporte by the Spanish Ministry of Science an Technology uner Grants TIC2000 1151 C07 an 1FD97-2129, by Generalitat Valenciana uner Grant GV00-11-14, an by JJ.CC. e Castilla-La Mancha uner Grant PBC-02-008. Dept. of Computer Engineering (DISCA). Universia Politécnica e Valencia. Camino e Vera, 14, 46071 Valencia, Spain. E-mail: jc@isca.upv.es computers. Routing in IBA subnets is istribute an eterministic, base on forwaring tables store in each switch an which only consier the packet estination ID for routing packets [12]. The routing tables only store one output link per estination. Deterministic routing is use in many network interconnects ue to its simplicity [1], [14]. However, eterministic routing algorithms o not make effective use of network links because a unique path is provie for each source-estination pair. IBA allows the use of alternative paths between any source-estination pair 1. The final path can be selecte at each source noe accoring to certain criterion (ranom, roun-robin, etc). However, by using alternative paths selecte at the source noe, the network performance is harly improve [5], [6]. On the other han, aaptive routing [] ynamically buils the path use by a packet, selecting the channels along the path consiering network status information. With aaptive routing, each switch selects the output port to forwar a packet from a set of routing options, avoiing congeste areas an increasing network performance. Several aaptive routing algorithms have been propose in the literature [2], [16], [10], showing that network performance can be strongly increase. However, aaptive routing has some rawbacks. The first one is that it increases switch complexity. The secon one is the fact that it cannot guarantee in-orer packet elivery. However, in most MPI-base parallel applications, there exists a certain percentage of traffic that may be elivere out-of-orer. Moreover, in-orer packets coul also use aaptive routing if packets were reorere at the estination host before being elivere. II. MOTIVATION At first glance, InfiniBan specifications [12] o not support istribute aaptive routing. Effectively, IBA specs state that forwaring tables must provie only one physical per estination noe. However, IBA specs o not efine the internal architecture of a switch. Inee, IBA switches alreay support istribute routing. Thus, enhancing switch capabilities to support aaptive routing coul be feasible. In [1], we propose a simple strategy to allow aaptive routing in IBA while still maintaining compatibility with the IBA specs. Inee, this proposal can be consiere as an extension of IBA specs, aing more functionality to them. In this paper, we provie support to fully aaptive routing algorithms [4] in IBA switches. In particular, we implement an extension to virtual cut-through of the MA routing algorithm [16] for networks with irregular By means of the virtual aressing scheme of IBA [12].

2 J.C. MARTÍNEZ Y COL. topology. The mechanism propose in this paper oes not nee to a resources to the IBA switches. Only, the existing resources must be appropriately arrange. The rest of the paper is organize as follows. Section III escribes the propose mechanism. In Section IV some performance evaluation results are shown. Finally, in Section V some conclusions are rawn. III. SUPPORTING ADAPTIVE ROUTING IN IBA This section contains a short escription of the propose mechanism. A more etaile escription can be foun at [1]. An IBA network is compose by en noes interconnecte by switches. Each en noe contains one or more channel aapters (CA). Each channel aapter contains one or more ports. Each port has a local ientifier (LID) assigne by the local subnet manager, which is unique within the subnet. IBA switches route packets base on forwaring tables store at each switch, aresse by the packet estination local ientifier (which is referre to as the estination LID or DLID). As this table only returns one to forwar the packet towars its estination, IBA switches o not support aaptive routing. A. Proviing Multiple Routing Options In orer to support aaptivity, each switch shoul supply several feasible s when a packet is route. We can use a trick to provie multiple routing options in IBA switches. IBA allows a single estination to be assigne not only a unique aress but a range of them by efining a LID Mask Control or LMC [12]. The LMC specifies the number of least significant bits of the LID that a physical port masks (ignores) when it valiates that a packet DLID matches its assigne LID. As these bits are not ignore by the switches from the subnet point of view, each CA port has been assigne up to consecutive aresses. Each CA port will accept all packets estine for any vali aress within its range. Notice that IBA specifications allow for a maximum of 128 ifferent aresses per estination port. Thus, a maximum of 128 ifferent routing options coul be provie with the propose scheme, which seems more than require. The iea is to assign to each port the same number of aresses as the number of routing options at each switch. In this case, there will be a range of consecutive aresses assigne to the same host. When the switch has to select the for any packet, it selects all the s that are assigne to the packet estination host, espite selecting only one. In this way, the forwaring table will provie more than one output port for each packet. This mechanism allows source hosts to select eterministic or aaptive routing, on a per packet basis. When a source host selects the least virtual aress ( ) of the estination host, only an will be selecte at each switch. This will be always the same for each estination host in orer to provie eterministic routing (in-orer elivering). When a source host selects the aress, all the s will be provie in orer to provie aaptive routing. In orer to access several entries in the forwaring table, they can be accesse sequentially or a multi-port memory can be use. However, if a linear forwaring table 2 [12] is use at switches, a simple implementation can be one by organizing the forwaring table as an interleave memory compose by several moules that are selecte by the least significant bits. To sum up, Figure 1 shows the implementation of the mechanism when two routing options are provie at each switch. The estination fiel of the packet (the DLID) is use to access the forwaring table, obtaining simultaneously two s. To allow these two simultaneous accesses, the forwaring table is organize as two interleave moules. In orer to select the switch that will be finally use, the least significant bit of the DLID is first checke. If it is set to zero, eterministic routing is require for the packet, so the that correspons to the first aress is selecte. Otherwise, two routing options are selecte. The final selection can be one either immeiately or at the internal switch arbitration time, an either consiering the status of the s or performing a static selection. B. Support for Aaptive Routing with Escape Paths The mechanism allows the use of aaptive routing algorithms with escape paths. Therefore, the fully aaptive routing algorithm [4] can be use in IBA. We coul split each IBA VL into two queues, the aaptive an the escape queues. Both queues are multiplexe onto the corresponing VL. However, we have not support to ientify from which queue ata is being transferre or to manage the buffer space available at each queue. To solve this problem, we propose (see Figure 2) to ivie the physical buffer assigne to each VL into two logical queues that will implement the aaptive an the escape queue, respectively. The first part of the physical buffer correspons to the aaptive queue, whereas the secon part of it is the escape queue. Notice that the entire VL is treate as a unique queue. Therefore, the escape queue will be use only when the aaptive queue is full. However, the packets store in the escape queue must be able to be route an forware inepenently of the ones store in the aaptive one. This can be accomplishe in the propose organization by arranging two connections from each VL to the internal switch, one locate at the hea of the aaptive queue an another locate at the hea of the escape queue. In Figure 2, a multiplexer is use in orer to select a packet either from the aaptive or the escape queue. Since escape an aaptive queues are parts of the same queue, a packet may be transfere from the escape queue to the aaptive queue. As can be seen in [4], this oes not lea to ealock. Once the forwaring table offers the aaptive an escape routing options (leaing to s an The linear forwaring table provies a simple map from LID to. In other wors, the table contains only s an the LID acts as an inex into the table.

XIV JORNADAS DE PARALELISMO - LEGANES, SEPTIEMBRE 200 ata Forwaring table 0 1 2 s status 0 2 +1 Selection logic selecte LSB () +1 1 LSB () Fig. 1. Proviing multiple routing options in IBA switches. in Figure 2), the switch, base on the information about the number of creits (, an in Figure 2), can properly select the to use. The aaptive routing option will be use only if there are enough creits available ( ) at the aaptive queue. The escape routing option can be use at any time. Notice that packets forware through this routing option will be store either in the aaptive or the escape queues epening on the number of available creits ( an ). Finally, in the case both routing options have no creits available for the packet, the switch can either ecie to forwar the packet through the escape queue (once there are enough available creits) or to take the ecision later. As the buffer associate to VLs is now ivie into two logical queues (aaptive an escape) an virtual cutthrough is use, each one of them shoul be able to store an entire packet. This can be accomplishe either by increasing buffer size accoringly or by reucing the Maximum Transfer Unit (MTU). As switches may extract a packet from the hea of the escape queue, a packet may overtake another packet that has been place at the aaptive queue in the same VL. This may prouce out-of-orer elivering even when using always the same routing option per estination. Switches may avoi this problem using a pointer to the first eterministic packet in the VL. The packet pointe by this pointer is consiere as the hea of the escape queue. Therefore, eterministic packets cannot bypass other eterministic ones. IV. PERFORMANCE EVALUATION In this section we will evaluate the impact on network performance of the propose strategy. For this purpose, we have evelope a etaile simulator that allows us to moel the network at the register transfer level following the IBA specifications [12]. First, we will escribe the IBA subnet moel efine in the specs together with the simulator parameters an the moeling consierations we have use in all the evaluations. Then, we will evaluate the aaptive technique propose uner ifferent topologies an ifferent traffic patterns. A. The IBA Subnet Moel The IBA specification efines a switch-base network with point-to-point links, allowing the user to efine any topology. The network allows the communication between en-noes. The en-noes are attache to switches using the same kin of links use between switches. Packets are route at each switch by accessing the forwaring table, that contains the to be use at the switch for each possible estination. Several routing options are provie base on the strategy propose in this paper. In particular, the routing options will be compute by using the FA routing algorithm propose in [4], which uses the up*/own* routing for computing the escape paths. The is selecte at arbitration time consiering the status of the requeste s an the number of creits available. Switches can support up to 16 virtual lanes. VLs can be use to form separate virtual networks. We will use a non-multiplexe crossbar on each switch. This crossbar supplies separate ports for each VL. Buffers will be use both at the input an the output sie of the crossbar. Buffer size will be fixe in both cases to 1KB. Hence, the aaptive an escape queues implemente on each buffer will be 512 bytes epth. The switch routing time will be set to 100ns, incluing the time to access the forwaring tables, the crossbar arbiter time, an the time to set up the crossbar connections. Links in InfiniBan are serial. 10/8 coing [12] is use. In the simulator, the link rate will be fixe to the 1X an 4X configuration [12] (2.5 an 10 Gbps). We will moel 20m copper cables with a propagation elay of 5ns/m. The IBA specification efines a creit-base flow control scheme for each virtual lane with inepenent buffer resources. A packet will be transmitte over the link if there is enough buffer space (measure in creits of 64 bytes) to store the entire packet. IBA allows the efinition of ifferent MTU values for packets, ranging from 256 to 4096 bytes. We use a MTU of 256 bytes. Aitionally, the virtual cut-through switching technique is use. Several packet estination istributions will be use: uniform, bit-reversal, an hot-spot. In the latter case, a noe is ranomly selecte an a percentage (we use 5%, 1, an 2) of traffic is sent to this host. 2 an 256-byte packets will be use. In all the presente results, we will plot the average packet latency measure in nanosecons versus the average accepte Latency is the elapse time between the generation of a packet at the source host until it is elivere at the estination en-noe.

4 J.C. MARTÍNEZ Y COL. Switch C XY creits escape output port aaptive output port Available creits VL0 C 00 C 00E Buffer #!# $!$ for! VL0 "!" at next switch escape queue 1k byte buffer op 0 16 cre. -!-!-!-!-!- )!) *!* +!+!+,!,!, /!/!/!/!/!/!/ 0!0!0!0!0!0!0-!-!-!-!-!-.!.!.!.!.!. /!/!/!/!/!/!/ 0!0!0!0!0!0!0.!.!.!.!.!. Buffer for VL0 +1 FT op 0 op 1 C 00A Available creits VL0 C 10 C 10E C 10A op 1 %!% &!& '!'!' (!( Buffer for VL0 at next switch tail 8 cre. 8 cre. 512 bytes 512 bytes C XYE creits aaptive queue hea C XYA creits MUX internal switch C XY creits stans for creits at VL "Y" of channel "X" C XYE creits stans for creits at the escape channel of VL "Y" of channel "X" C stans for creits at the aaptive channel of VL "Y" of channel "X" XYA creits Fig. 2. Supporting aaptive routing with escape paths. traffic 4 measure in bytes/ns/switch. We will analize irregular networks of 8, 16, 2, an 64 switches ranomly generate following some restrictions. First, we will assume that every switch in the network has the same number of ports (we use 8 or 10) an the same number of noes connecte to every switch (4 in our simulations). An secon, neighboring switches will be interconnecte by just one link. Ten ifferent topologies will be ranomly generate for each network size. Thus, we will also show minimum, maximum an average results. In aition, we will also plot the results for some representative topologies for every network size. B. Evaluation Results In this section we analyze the influence on network performance when using IBA switches with aaptive routing capabilities First, we analyze the influence of the percentage of aaptive traffic on network performance. Then, we analyze how the network connectivity affects network performance. B.1 Influence of the Percentage of Aaptive Traffic Figures.a,.b,.c, an. show the simulation results for the FA routing when varying the percentage of aaptive traffic from (eterministic traffic) up to 10 for network sizes of 8, 16, 2, an 64 switches, respectively. In this case, forwaring tables provie two routing options at most an four links (4X each one) are use in each switch to connenct to other switches. Uniform packet estination istribution an 2- byte packets are use. As it can be seen, the improvement on performance achieve by using IBA switches with support for aaptive routing linearly increases with the percentage of applie aaptive traffic. However, for the 8-switch network, when or 10 of aaptive traffic is injecte, the FA routing algorithm almost obtains the same network throughput. On the contrary, for the 64-switch network, the ifference in network throughput when injecting an 10 of aaptive traffic is greater. Table I shows minimum, maximum, an average factors of throughput increase for ifferent network sizes an ifferent packet sizes when using 10 aaptive 1 Accepte traffic is the amount of information elivere by the network per time unit. traffic an 4X links 5. As we can see, network throughput benefits increase as network size increases. In particular, a uniform traffic pattern with 2-byte packets, when using networks with two routing options an 4 links connecting switches, network throughput is increase, on average, from 1. to.28, epening on network size. The higher throughput increase observe for large networks with respect to small networks is ue to the fact that the up*/own* routing oes not scale well. Therefore, with of aaptive traffic, as network size increases, the up*/own* routing tens to use longer non-minimal paths an also to unbalance the traffic, congesting the switches near the root switch [7]. Hence, packets benefit more from using aaptive routing. Table I shows results for other traffic paterns. The higher the percentage of hot-spot traffic, the lower the factors of throughput improvement. This is because traffic aroun the hot-spot tens to concentrate, spreaing the congestion through the network. Better results are obtaine for other traffic patterns (See Table I). Inee, for the bit-reversal traffic pattern (which creates some local congestion areas), similar results to the uniform traffic pattern are obtaine, as shown in Table I. We can observe that the use of aaptive routing causes throughput to increase, on average, by a factor of 1.58 for 8-switch networks an 2.8 for 64-switch networks. Finally, we can observe that qualitatively similar results were obtaine for long packets. B.2 Influence of Increasing the Number of Routing Options an Network Connectivity With a connectivity of 4 links per switch, it is not worth proviing more than two routing options per switch (See Table III). For instance, only 17.48% of the estinations provie more than two routing options in a 64-switch network. However, when using 6 links to connect to other switches we can observe that this percentage is increase. Table I (right sie) shows the throughput improvement results when switches have 6 ports available to connect to other switches an forwaring tables provie up to four routing options for uniform traffic. With 4 links connecting switches an up to three routing options per estination at each switch, throughput is slightly increase (.28 vs.50 for 64-switch networks an 2-2 Table II shows results when using 1X links. As can be observe, the same conclusions can be euce.

XIV JORNADAS DE PARALELISMO - LEGANES, SEPTIEMBRE 200 5 000 2000 5 10 6500 5500 4500 500 000 2500 5 10 6500 5500 4500 500 000 0.02 0.04 0.06 0.08 0.1 0.12 0.14 5 10 (a) 6500 5500 4500 500 0.01 0.02 0.0 0.04 0.05 0.06 0.07 5 10 (b) 0.01 0.02 0.0 0.04 0.05 (c) 0.01 0.02 0.0 0.04 () Fig.. Average packet latency vs. traffic. 1 Virtual lane an 4 links connecting switches. 4X links. Uniform traffic pattern. Network size is (a) 8, (b) 16, (c) 2, an () 64 switches. Packet size is 2 bytes. TABLA I MINIMUM, MAXIMUM, AND AVERAGE FACTORS OF THROUGHPUT INCREASE WHEN USING 10 ADAPTIVE TRAFFIC VS. 10 DETERMINISTIC TRAFFIC. 4X LINKS. 1 VL. AT MR/LS COLUMN, MR STANDS FOR MAXIMUM NUMBER OF ROUTING OPTIONS AT EACH SWITCH FOR EACH DESTINATION PORT. LS STANDS FOR CONNECTING LINKS BETWEEN SWITCHES. 2-byte packets 256-byte packets 2-byte packets 256-byte packets Sw MR/LS Traffic Min/Max/Avg Min/Max/Avg Sw MR/LS Traffic Min/Max/Avg Min/Max/Avg 8 2/4 Unif. 1.15/1.42/1.0 1.27/1.52/1.8 8 /4 Unif. 1.15/1.52/1. 1.27/1.60/1.44 16 2/4 Unif. 1.41/2.0/1.71 1.25/1.62/1.45 16 /4 Unif. 1.2/2.04/1.71 1.25/1.82/1.52 2 2/4 Unif. 1.77/.16/2.8 1.44/2.2/1.86 2 /4 Unif. 1.77/2.26/2.45 1.56/2.51/1.99 64 2/4 Unif. 2.82/4.2/.28 2.00/2.60/2.9 64 /4 Unif. 2.97/4.47/.50 2.00/2.79/2.52 8 2/4 HS 5% 1.2/1.76/1.40 1.27/1.69/1.42 16 2/4 HS 5% 1.25/2.01/1.61 1.26/1.77/1.47 16 2/6 Unif. 1.18/1.68/1.45 1.6/2.08/1.69 2 2/4 HS 5% 1.50/2.21/1.87 1.47/2.08/1.76 2 2/6 Unif. 1.51/2.4/2.02 1.69/2.40/2.00 64 2/4 HS 5% 1.70/2.28/1.97 1.71/2.28/1.98 64 2/6 Unif. 2.49/.11/2.88 2.8/2.91/2.67 8 2/4 HS 1 1.09/1.55/1.28 1.09/1.46/1.26 16 2/4 HS 1 1.12/1.4/1.0 1.22/1.48/1. 16 /6 Unif. 1.20/1.74/1.49 1.7/2.2/1.82 2 2/4 HS 1 1.14/1.57/1.8 1.15/1.62/1.40 2 /6 Unif. 1.56/2.4/2.01 1.75/2.62/2.16 64 2/4 HS 1 1.27/1.57/1.5 1.24/1.56/1.6 64 /6 Unif. 2.56/.47/.07 2.54/.20/2.9 8 2/4 BR 1.06/1.69/1.7 1.14/1.82/1.49 16 2/4 BR 1.25/1.91/1.56 1.22/1.61/1.9 16 4/6 Unif. 1.20/1.74/1.49 1.7/2./1.8 2 2/4 BR 1.51/2.95/1.2 1.48/2.20/1.85 2 4/6 Unif. 1.56/2.57/2.05 1.75/1.67/2.21 64 2/4 BR 1.69/.5/2.65 2.05/2.71/2.27 64 4/6 Unif. 2.88/2.48/.17 2.70/.28/.02 byte packets). However, as network becomes more connecte (with 6 links connecting switches), Tables I an II show some ifferences for 2-byte packets. The higher the banwisth link, the lower the factor of throughput increase. This is ue to the better behaviour of eterministic traffic in 4X links. In this case, contention is reuce because of the higher connectivity an the higher banwith. However, when using 256-byte packets an eterministic traffic, contention is increase. Therefore, higher benefits are achieve when using aaptive traffic. V. CONCLUSIONS We have propose a simple mechanism to enhance IBA switch capabilities to support aaptive routing while maintaining compatibility with IBA specs. For this aim, forwaring tables are arrange in such a way that they can provie several routing options at the same time. Also, the logic circuitry necessary to support fully aaptive routing algorithms (aaptive an escape queues an the proper utilization of creits to avoi ealock) has been propose. Also, aaptive routing can be enable or isable on a per-packet basis by the running application. The propose mechanism has been evaluate using the fully aaptive routing scheme propose in [4]. Results show that enhancing IBA switches with aaptive routing noticeably increases network performance. This is specially significant for large networks. As network connectivity increases, higher throughput improvement is obtaine. In particular, network can be improve up to a factor of.9.

6 J.C. MARTÍNEZ Y COL. TABLA II 4 MINIMUM, MAXIMUM, AND AVERAGE FACTORS OF THROUGHPUT INCREASE WHEN USING 10 ADAPTIVE TRAFFIC VS. 10 DETERMINISTIC TRAFFIC. 1X LINKS. 1 VL. AT MR/LS COLUMN, MR STANDS FOR MAXIMUM NUMBER OF ROUTING OPTIONS AT EACH SWITCH FOR EACH DESTINATION PORT. LS STANDS FOR CONNECTING LINKS BETWEEN SWITCHES. 2-byte packets 256-byte packets 2-byte packets 256-byte packets Sw MR/LS Traffic Min/Max/Avg Min/Max/Avg Sw MR/LS Traffic Min/Max/Avg Min/Max/Avg 8 2/4 Unif. 1.0/1.69/1.50 1.24/1.51/1.7 8 /4 Unif. 1.0/1.77/1.55 1.24/1.58/1.41 16 2/4 Unif. 1.26/2.04/1.70 1.22/1.79/1.50 16 /4 Unif. 1.0/2.16/1.76 1./1.88/1.58 2 2/4 Unif. 1.77/.11/2.8 1.64/2.24/1.99 2 /4 Unif. 1.77/.9/2.49 1.18/2.5/2.01 64 2/4 Unif. 2.75/.94/.27 2.19/2.87/2.5 64 /4 Unif. 2.87/4.28/.47 2.28/.25/2.69 8 2/4 HS 5% 1.15/1.71/1.40 1.14/1.50/1. 16 2/4 HS 5% 1.2/1.91/1.55 1./1.82/1.51 16 2/6 Unif. 1.40/2.0/1.8 1.9/2.05/1.65 2 2/4 HS 5% 1.8/2.17/1.78 1.6/2.00/1.67 2 2/6 Unif. 1.76/2.98/2.4 1.20/2.47/1.98 64 2/4 HS 5% 1.62/2.20/1.85 1.45/2.00/1.69 64 2/6 Unif. 2.58/4.00/.57 2.56/.19/2.92 8 2/4 HS 1 1.0/1.45/1.21 1.09/1.5/1.17 16 2/4 HS 1 1.07/1.7/1.22 1.11/1.8/1.20 16 /6 Unif. 1.42/2.52/1.94 1.40/2.2/1.74 2 2/4 HS 1 1.09/1.50/1.1 1.05/1.41/1.26 2 /6 Unif. 1.98/.27/2.5 1.75/2.69/2.20 64 2/4 HS 1 1.19/1.48/1.27 1.14/1./1.22 64 /6 Unif. 2.85/4.1/.86 2.85/.52/.21 8 2/4 BR 1.21/1.90/1.58 1.20/1.89/1.59 16 2/4 BR 1.12/1.81/1.51 1.24/1.66/1.47 16 4/6 Unif. 1.42/2.55/1.96 1.41/2.27/1.76 2 2/4 BR 1.41/.1/2.18 1.46/2.46/1.98 2 4/6 Unif. 1.94/.40/2.56 1.79/2.69/2.24 64 2/4 BR 2.16/4.09/2.8 2.5/2.99/2.52 64 4/6 Unif. 2.99/4.45/.90 2.85/.52/.25 TABLA III AVERAGE PERCENTAGE OF ROUTING OPTIONS AT EACH SWITCH FOR EACH DESTINATION PORT. MR STANDS FOR MAXIMUM NUMBER OF ROUTING OPTIONS AT EACH SWITCH FOR EACH DESTINATION. 4 links 6 links Routing options Routing options Sw MR 1 2 4 1 2 4 8 2 57.14 42.86 - - - - - - 8 57.14 22.14 20.72 - - - - - 8 4 57.14 22.14 19.46 1.26 - - - - 16 2 54.79 45.21 - - 49. 50.67 - - 16 54.79 1.75 1.46-49. 24. 26.4-16 4 54.79 1.75 11.8 1.62 49. 24. 20.29 6.05 2 2 48.07 51.9 - - 45.9 54.61 - - 2 48.07 5.86 16.07-45.9 26.49 28.12-2 4 48.07 5.86 1.21 2.86 45.9 26.49 1.50 14.62 64 2 41.2 58.68 - - 7.29 62.71 - - 64 41.2 41.20 17.48-7.29 2.65 0.06-64 4 41.2 41.20 14.09.9 7.29 2.65 19.1 10.9 Although the propose mechanism consumes a virtual aress per routing option, the number of require aresses remains low an it is not a scarce resource. Also, evaluation results show that by using only two routing options per estination port at each switch, roughly 9 of the maximum throughput improvement is achieve. We recently propose some effective strategies to improve IBA network performance [8], [9] by allowing most packets to be route through minimal paths an proviing better traffic balance. These strategies make an efficient use of virtual lanes that are not use by QoS purposes. As future work we plan to combine this mechanism with these strategies in orer to boost network performance further. REFERENCIAS [1] N.J. Boen, D. Cohen, R.E. Felerman, A.E. Kulawik, C.L. Seitz, J. Seizovic an W. Su, Myrinet - A gigabit per secon local area network, IEEE Micro, pp. 29 6, February 1995. [2] W.J. Dally an H. Aoki, Dealock-free Aaptive Routing in Multiprocessor Networks Using Virtual Channels, IEEE Trans. Parallel an Distribute Systems, vol. 4, no. 4, pp. 466-475, April 199. [] J. Duato, S. Yalamanchili, an L. Ni, Interconnection Networks, An Engineering Approach, IEEE Computer Society Press, 1997. [4] J. Duato, A. Robles, F. Silla, an R. Beivie, A comparison of router architecture for virtual cut-through an wormhole switching in a NOW environment, in Jour. of Parallel an Distribute Computing 61, 224-25 (2001). [5] J. Flich, M.P. Malumbres, P. Lopez, an J. Duato, Improving Routing Performance in Myrinet Networks, in Proc. of Int. Parallel an Distribute Processing Symp., May 2000. [6] J. Flich, Improving performance of networks of workstations with source routing, Ph. D. Thesis, March 2001. [7] J. Flich, P. López, M.P. Malumbres, an J. Duato, Boosting the Performance of Myrinet Networks, IEEE Trans. Parallel an Distribute Systems, vol. 1, no. 7, July 2002. [8] J. Flich, P. López, J.C. Sancho, A. Robles, an J. Duato, Improving InfiniBan Routing through Multiple Virtual Networks, in Int. Symp. High Performance Computing, May 2002. [9] J. C. Sancho, A. Robles, J. Flich, P. López an J. Duato, Effective Methoology for Dealock-Free Minimal Routing in InfiniBan Networks, to be presente in Int. Conf. on Parallel Processing, August 2002. [10] C.J. Glass an L.M. Ni, A Turn Moel for Aaptive Routing, in Proc. Int. Symp. on Computer Architecture, May 1992. [11] InfiniBan SM Trae Association. www.infinibanta.com. [12] InfiniBan SM Trae Association, InfiniBan576 Architecture. Specification Volume 1. Release 1.0. Available at www.infinibanta.com. [1] J. Martínez, J. Flich, A. Robles, P. López, an J. Duato Supporting Fully Aaptive Routing in InfiniBan Networks in Proc. of the International Parallel an Distribute Processing Symposium, April 200. [14] M.D. Schroeer et al., Autonet: A high-spee, self-configuring local area network using point-to-point links, IEEE Jour. on Selecte Areas in Communications, 9(8):118-15, October 1991. [15] F. Silla, M.P. Malumbres, A. Robles, P. Lopez, an J. Duato, Efficient aaptive routing in networks of workstations with irregular topology, in Proc. of the Workshop on Communications an Architectural Support for Network-base Parallel Computing, 1997. [16] F. Silla an J. Duato, Improving the Efficiency of Aaptive Routing in Networks with Irregular Topology, 1997 Int. Conf. on High Performance Computing, December 1997. [17] Y. Tamir, H.C. Chi, Symetric Crossbar Arbiters for VLSI Communication Switches, in IEEE Trans. on Parallel an Distribute Systems, vol. 2, n. 1, January 199.