CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 10. Routing and Flow Control

Size: px
Start display at page:

Download "CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 10. Routing and Flow Control"

Transcription

1 CS 498 Hot Topics in High Performance Computing Networks and Fault Tolerance 10. Routing and Flow Control

2 Intro What did we learn in the last lecture Some more topologies Routing (schemes and metrics) What will we learn today Routing practical examples Flow control Blue Waters topology and routing (if time) Torsten Hoefler: CS 498 Hot Topics in HPC 267

3 Deterministic Routing: InfiniBand Full bisection bandwidth fat-tree Communications 1 6 and 4 14 Torsten Hoefler: CS 498 Hot Topics in HPC 268

4 Deterministic Routing: InfiniBand Bisection (band)width is not a good metric! It is only an upper bound on worst-case performance! Routing can lower effective bandwidth Deterministic routing *WILL* (see Valiant s proof) Introduce Effective Bisection Bandwidth Expected bandwidth for a random permutation pattern (observable empirically) Optimal: full bandwidth, check impact by simulation Torsten Hoefler: CS 498 Hot Topics in HPC 269

5 Deterministic Routing: InfiniBand Sandia 4096 compute nodes dual Xeon EM64T 3.6 Ghz CPUs 6 GiB RAM ½ bisection bandwidth fat tree 4390 active LIDs while queried Torsten Hoefler: CS 498 Hot Topics in HPC 270

6 Deterministic Routing: InfiniBand LLNL 1152 compute nodes dual 4-core 2.4 GHz Opteron 16 GiB RAM full bisection bandwidth fat tree 1142 active LIDs while queried Torsten Hoefler: CS 498 Hot Topics in HPC 271

7 Deterministic Routing: InfiniBand TACC 3936 compute nodes quad 4-core 2.3 GHz Opteron 32 GiB RAM full bisection bandwidth fat tree 3908 active LIDs while queried Torsten Hoefler: CS 498 Hot Topics in HPC 272

8 Deterministic Routing: InfiniBand TUC 542 compute nodes dual 2-core 2.6 GHz Opteron 4 GiB RAM full bisection bandwidth fat tree 566 active LIDs while queried Torsten Hoefler: CS 498 Hot Topics in HPC 273

9 Deterministic Routing: InfiniBand Impact of backend congestion on bandwidth Provoked congestion 24-port switches Fat-tree Max. cong: 12! Torsten Hoefler: CS 498 Hot Topics in HPC 274

10 Deterministic Routing: InfiniBand Effective bisection bandwidth: Ranger: 57.6% Atlas: 55.6% Thunderbird: 40.6% (1/2 bisection bandwidth) Obvious questions: Does it make sense to build networks with full bisection bandwidth? What is the tradeoff between cost and bandwidth? Torsten Hoefler: CS 498 Hot Topics in HPC 275

11 Adaptive Routing: Myrinet Implemented and benchmarked optimal deterministic routing (will be defined later), random routing, and local and global adaptive routing in Firmware 512 Myrinet MX nodes connected as fullbisection bandwidth Clos (similar to fat-tree) Running effective bisection bandwidth benchmark, reporting min, avg, max bandwidth Torsten Hoefler: CS 498 Hot Topics in HPC 276

12 Deterministic Routing: Myrinet Torsten Hoefler: CS 498 Hot Topics in HPC 277

13 Random Routing: Myrinet Torsten Hoefler: CS 498 Hot Topics in HPC 278

14 Local Adaptive Routing: Myrinet Torsten Hoefler: CS 498 Hot Topics in HPC 279

15 Global (Probing) Adaptive Routing: Myrinet Torsten Hoefler: CS 498 Hot Topics in HPC 280

16 Overview of Techniques: Myrinet Torsten Hoefler: CS 498 Hot Topics in HPC 281

17 Routing Implementations Source routing All routing decisions are made entirely at the source node and encoded in packet Distributed (table) routing Routes are distributed across the switches (each switch only stores the parts it needs) Algorithmic routing Routes can be determined algorithmically (e.g., butterfly, DOR, ) Torsten Hoefler: CS 498 Hot Topics in HPC 282

18 Routing on Arbitrary Topologies General routing schemes needed for general topologies (e.g., InfiniBand, Ethernet) Standard Ethernet uses spanning tree (bad) What metric to optimize for? Application-optimized routing (possible if communication topology is known) Arbitrary permutations seem reasonable for general case! effective bisection bandwidth Torsten Hoefler: CS 498 Hot Topics in HPC 283

19 Routing Metrics Model network as G=(V P [V C, E) Path r(u,v) is a path between u,v 2 V P Routing R consists of P(P-1) paths Edge load l(e) = number of paths on e 2 E Edge forwarding index ¼(G,R)=max e2e l(e) ¼(G,R) is a upper bound to congestion of permutation routing! Goal is to find R that minimizes ¼(G,R) shown to be NP-hard in the general case Torsten Hoefler: CS 498 Hot Topics in HPC 284

20 A Greedy Heuristic P-SSSP routing starts a SSSP run at each node finds paths with minimal edge-load l(e) updates routing tables in reverse essentially SDSP updates l(e) between runs let s discuss an example Torsten Hoefler: CS 498 Hot Topics in HPC 285

21 P-SSSP Routing (1/3) Step 1: Source-node 0: 286

22 P-SSSP Routing (2/3) Step 2: Source-node 1: 287

23 P-SSSP Routing (3/3) Step 3: Source-node 2: ¼(G,R)=2 288

24 Routing Example: InfiniBand Example: InfiniBand InfiniBand uses deterministic distributed routing OpenSM configures the local network (subnet) implements different routing schemes Linear forwarding table (LFT) at each switch Lid mask control (LMC) enables multiple addresses per port Torsten Hoefler: CS 498 Hot Topics in HPC 289

25 Routing Example: InfiniBand Different routing algorithms: MINHOP (finds minimal paths, balances number of routes local at each switch) UPDN (uses Up*/Down* turn-control, very similar to MINHOP) FTREE (fat-tree optimized routing) DOR (dimension order routing for k-ary n-cubes) LASH (uses DOR and breaks credit-loops with virtual lanes) Torsten Hoefler: CS 498 Hot Topics in HPC 290

26 Benchmark Results Odin Simulation Benchmark (Netgauge Pattern ebb) Simulation predicts 5% improvement Benchmark shows 18% improvement! 291

27 Benchmark Results Deimos Simulation Benchmark (Netgauge Pattern ebb) Simulation predicts 23% improvement Benchmark shows 40% improvement! 292

28 Flow Control Determines how network resources are allocated Can either be viewed as resource allocation or contention resolution problem Mechanisms: bufferless: drop or misroute if channel is blocked circuit switching: only headers are buffered and reserves path, waits if path is not available buffered: decouples channel allocation in time Torsten Hoefler: CS 498 Hot Topics in HPC 293

29 Flits Packets are MTU-sized Typically several 1000 bytes Switch buffering and forwarding often works at smaller granularity (several bytes) Flit FLow Control UnIT Packets are divided into flits by the hardware Typically no extra headers Torsten Hoefler: CS 498 Hot Topics in HPC 294

30 Buffered Flow Control Store-and-forward Receives full packets into buffer and forwards them after they have been received High latency (each switch waits for full packet) Cut-through Forwards packet as soon as first (header) flit arrives and outgoing resources are available Low latency but blocks a channel for the duration of a whole packet transmission Torsten Hoefler: CS 498 Hot Topics in HPC 295

31 Buffered Flow Control Wormhole Like cut-through but channels and buffers are allocated on a per-flit basis Header flit allocates virtual channel, data and tail flits follow, tail flit frees channel VC might have pointers to a packet s flits Most flexible, high priority packets can intercept lower-priority packets in the middle Most efficient use of buffer space Also allows for buffers much smaller than MTU! Torsten Hoefler: CS 498 Hot Topics in HPC 296

32 Buffer Management and Backpressure Flow control methods need to communicate availability of buffers to downstream nodes Common types in use today: Credit-based Router keeps count of number of free flit buffers of peer routers, decrements when sending, stops when zero reached, peers send credits back On/Off Binary decision, neighbor routers send on or off flag to start or stop incoming flit streams Torsten Hoefler: CS 498 Hot Topics in HPC 297

33 Buffer Management and Backpressure Common types in use today (continued): Ack/Nack No state at routers, each flit will be ack d if it fits and nack d if not (resend) Drop No real flow control, packets are dropped if they do not fit and upper layer will re-send (common in Internet) Torsten Hoefler: CS 498 Hot Topics in HPC 298

34 Livelock If packets can take non-minimal routes, they could be routed in circles forever Good example: Internet Simple workaround: TTL Is not typical for HPC settings though Will not be further discussed here Torsten Hoefler: CS 498 Hot Topics in HPC 299

35 Deadlock A group of packets cannot progress because they wait for each other to free resources Cyclic dependency (deadlock criterion) Deadlock is catastrophic (networks clogs up) Deadlock avoidance E.g., loop-free routing Deadlock resolution E.g., packet timeouts Torsten Hoefler: CS 498 Hot Topics in HPC 300

36 Complex Deadlock Example Source Network and Routes Buffer Dependency Graph 301

37 Deadlock Avoidance Limit route selection function Up*/Down* routing Identify a tree (root) in the topology Allow only a single turn (relative to root) Lower bandwidth due to limited routes Virtual Channels VCs are independent, different block each other Can be used to create cycle-free layers Used in OpenSM s LASH algorithm Torsten Hoefler: CS 498 Hot Topics in HPC 302

38 Layered Routing Each VL requires physical resources Only 16 in InfiniBand (8 in practice) Layered routing needs to minimize needed VLs Minimization is NP-complete for arbitrary channel dependency graphs [see Domke, Hoefler, 11] Torsten Hoefler: CS 498 Hot Topics in HPC 303

39 Topology Example: Blue Waters P7-IH Quad Block Diagram 32w SMP w/ 2 Tier SMP Fabric, 4 chip Processor MCM + Hub SCM w/ On-Board Optics Four POWER 7 chips One IBM Hup Chip per node DIMM 2 DIMM 3 DIMM 11 DIMM 10 DIMM 6 DIMM 7 DIMM 0 DIMM 1 A Clk Grp B Clk Grp C Clk Grp D Clk Grp A Clk Grp B Clk Grp C Clk Grp D Clk Grp A Clk Grp B Clk Grp C Clk Grp D Clk Grp A Clk Grp B Clk Grp C Clk Grp D Clk Grp MC 0 MC11 W Z MC0 0 MC1 1 8c up 8c up P7-0 P7-1 B A Y X C Z C Y A B X W B A W X C Y C Z A B X Y P7-3 P7-2 8c up 8c up MC00 MC11 Z W MC0 0 MC1 1 A Clk Grp B Clk Grp C Clk Grp D Clk Grp A Clk Grp B Clk Grp C Clk Grp D Clk Grp A Clk Grp B Clk Grp C Clk Grp D Clk Grp A Clk Grp B Clk Grp C Clk Grp D Clk Grp DIMM 5 DIMM 4 DIMM 12 DIMM 13 DIMM 14 DIMM 15 DIMM 8 DIMM 9 28x XMIT/RCV 10 Gb/s 832 Z W 12x 12x 12x 12x 12x 12x 12x 12x 12x 12x 12x 12x 12x 12x 12x 12x D0-D x 12x 12x 12x 12x 12x 12x 12x 12x 12x 12x 12x Lr0-Lr GB/s 164 Ll GB/s 164 Ll1 Hub Chip Module 22+22GB/s 164 Ll GB/s 164 Ll GB/s 164 Ll GB/s 164 Ll GB/s 164 Ll6 7+7GB/s 7+7GB/s 7+7GB/s EG2 EG1 EG2 Y X 10+10GB/s (12x=10+2) 5+5GB/s (6x=5+1) 320 GB/s 240 GB/s 7 Inter-Hub Board Level L-Buses 8B+8B, 90% sus. peak PCIe 61x PCIe 16x PCIe 8x 304 Torsten Hoefler: CS 498 Hot Topics in HPC Source: IBM

40 1.1 TB/s HUB 192 GB/s Host Connection PX0 Bus Hot Plug Ctl PX1 Bus Hot Plug Ctl PX2 Bus Hot Plug Ctl FSP1-A FSP1-B TPMD-A, TMPD-B MDC-A MDC-B SEEPROM 1 SEEPROM 2 I2C I2C SVIC SVIC I2C FSI FSI 8x 8x 16x 16x 16x 16x 336 GB/s to 7 other local nodes 240 GB/s to local-remote nodes 320 GB/s to remote nodes 40 GB/s to general purpose I/O L local HUB To HUB Copper Board Wiring cf. The PERCS 10 LL0 Bus Copper LL1 Bus Copper LL2 Bus Copper LL3 Bus Copper LL4 Bus Copper LL5 Bus Copper LL6 Bus Copper 8B 8B 8B 8B 8B 8B 8B 8B 8B 8B 8B 8B 8B 8B EI-3 PHYs 6x PCI-E IO PHY 6x PCI-E IO PHY Diff PHYs PCI-E IO PHY Hub Chip Torrent 6x 6x 8B W-Bus 8B W-Bus TOD Sync 8B X-Bus 8B X-Bus TOD Sync 8B Y-Bus 8B Y-Bus TOD Sync Diff PHYs 8B Z-Bus 8B Z-Bus TOD Sync 28 I2C 12x 12x 12x 12x 16 D Buses I2C_0 + Int I2C_27 + Int D0 Bus Optical D15 Bus Optical I2C To Optical Modules D Bus Interconnect of Supernodes Optical LR0 Bus Optical 24 L remote Buses LR23 Bus Optical HUB to QCM Connections Address/Data L remote 4 Drawer Interconnect to Create a Supernode Optical 305 Torsten Hoefler: CS 498 Hot Topics in HPC Source: IBM

41 Drawer 8 nodes 64/40 Optical 'D-Link' P C I e 17 P1-C17-C1 P C I e 16 P1-C16-C1 P C I e 15 P1-C15-C1 P C I e 14 P1-C14-C1 P C I e 13 P1-C13-C1 P C I e 12 P1-C12-C1 P C I e 11 P1-C11-C1 P C I e 10 P1-C10-C1 P C I e 9 P1-C9-C1 Optical Fan-out from HUB Modules 2,304 Fiber 'L-Link' P C I e 8 P1-C8-C1 P C I e 7 P1-C7-C1 P C I e 6 P1-C6-C1 P C I e 5 P1-C5-C1 P C I e 4 P1-C4-C1 P C I e 3 P1-C3-C1 P C I e 2 P1-C2-C1 P C I e 1 P1-C1-C1 64/40 Optical 'D-Link' 32 chips 256 cores HUB 0 HUB 1 HUB 2 HUB 3 HUB 4 HUB 5 HUB 6 HUB 7 First Level Interconnect L-Local HUB to HUB Copper Wiring 256 Cores N0-DIMM15 N0-DIMM14 N0-DIMM13 N0-DIMM12 N0-DIMM11 N0-DIMM10 N0-DIMM09 N0-DIMM08 P7-1 QCM 0 P7-2 P7-0 P7-3 U-P1-M1 N0-DIMM07 N0-DIMM06 N0-DIMM05 N0-DIMM04 N0-DIMM03 N0-DIMM02 N0-DIMM01 N0-DIMM00 N1-DIMM15 N1-DIMM14 N1-DIMM13 N1-DIMM12 N1-DIMM11 N1-DIMM10 N1-DIMM09 N1-DIMM08 P7-1 QCM 1 P7-2 P7-0 P7-3 U-P1-M2 N1-DIMM07 N1-DIMM06 N1-DIMM05 N1-DIMM04 N1-DIMM03 N1-DIMM02 N1-DIMM01 N1-DIMM00 N2-DIMM15 N2-DIMM14 N2-DIMM13 N2-DIMM12 N2-DIMM11 N2-DIMM10 N2-DIMM09 N2-DIMM08 P7-1 QCM 2 P7-2 P7-0 P7-3 U-P1-M3 N2-DIMM07 N2-DIMM06 N2-DIMM05 N2-DIMM04 N2-DIMM03 N2-DIMM02 N2-DIMM01 N2-DIMM00 N3-DIMM15 N3-DIMM14 N3-DIMM13 N3-DIMM12 N3-DIMM11 N3-DIMM10 N3-DIMM09 P7-1 QCM 3 P7-2 P7-0 P7-3 U-P1-M4 N3-DIMM07 N3-DIMM06 N3-DIMM05 N3-DIMM04 N3-DIMM03 N3-DIMM02 N3-DIMM01 N3-DIMM08 N4-DIMM15 N4-DIMM14 N4-DIMM13 N4-DIMM12 N4-DIMM11 N4-DIMM10 N4-DIMM09 N4-DIMM08 P7-1 QCM 4 P7-2 P7-0 P7-3 U-P1-M5 N3-DIMM00 N4-DIMM07 N4-DIMM06 N4-DIMM05 N4-DIMM04 N4-DIMM03 N4-DIMM02 N4-DIMM01 N4-DIMM00 N5-DIMM15 N5-DIMM14 N5-DIMM13 N5-DIMM12 N5-DIMM11 N5-DIMM10 N5-DIMM09 N5-DIMM08 P7-1 QCM 5 P7-2 P7-0 P7-3 U-P1-M6 N5-DIMM07 N5-DIMM06 N5-DIMM05 N5-DIMM04 N5-DIMM03 N5-DIMM02 N5-DIMM01 N5-DIMM00 N6-DIMM15 N6-DIMM14 N6-DIMM13 N6-DIMM12 N6-DIMM11 N6-DIMM10 N6-DIMM09 N6-DIMM08 P7-1 QCM 6 P7-2 P7-0 P7-3 U-P1-M7 N6-DIMM07 N6-DIMM06 N6-DIMM05 N6-DIMM04 N6-DIMM03 N6-DIMM02 N6-DIMM01 N6-DIMM00 N7-DIMM15 N7-DIMM14 N7-DIMM13 N7-DIMM12 N7-DIMM11 N7-DIMM10 N7-DIMM09 N7-DIMM08 P7-1 QCM 7 P7-2 P7-0 P7-3 U-P1-M8 N7-DIMM07 N7-DIMM06 N7-DIMM05 N7-DIMM04 N7-DIMM03 N7-DIMM02 N7-DIMM01 N7-DIMM00 DCA-0 Connector (Top DCA) DCA-1 Connector (Bottom DCA) Torsten Hoefler: CS 498 Hot Topics in HPC Source: IBM 1st Level Local Interconnect (256 cores) 306

42 Topology Example: Blue Waters Torsten Hoefler: CS 498 Hot Topics in HPC 307

43 LL Topology 24 GB/s 7 links/hub Fully connected 8 Hubs Torsten Hoefler: CS 498 Hot Topics in HPC Source: B. Arimilli et al. The PERCS High- Performance Interconnect 308

44 LR Topology 5 GB/s 24 links/hub Fully connected 4 Drawers 32 Hubs Torsten Hoefler: CS 498 Hot Topics in HPC Source: B. Arimilli et al. The PERCS High- Performance Interconnect 309

45 D Topology 10 GB/s 16 links/hub Fully connected 512 SNs 2048 Drawers Hubs Torsten Hoefler: CS 498 Hot Topics in HPC Source: B. Arimilli et al. The PERCS High- Performance Interconnect 310

46 Routing Deterministic routing Hits Borodin-Hopcroft bound ( ) Bad worst-case performance Indirect (random) routing Increases latency Effectively halves (global) bandwidth Worst-case becomes very unlikely 311 Torsten Hoefler: CS 498 Hot Topics in HPC

47 Direct Routing SuperNode A LR12 SN-SN Direct Routing L-Hops: 2 D-Hops: 1 Total: 3 hops SuperNode B D3 LR21 Torsten Hoefler: CS 498 Hot Source: Topics B. in Arimilli HPC et al. The PERCS High- Performance Interconnect 312

48 SuperNode A SuperNode B SN-SN Indirect Routing LL7 L-Hops: 3 D-Hops: 2 Total: 5 hops Total paths = # SNs 2 LR21 D5 SuperNode x D12 LR30 Torsten Hoefler: CS 498 Hot Topics in HPC 313 Source: B. Arimilli et al. The PERCS High- Performance Interconnect

49 Example Model: BW Global Alltoall Bandwidth Assume all communications happen simultaneously Derive upper (expected) bound for direct routing Simple counting argument Each CN can be reached through series of LL, LR, D Not more than one D link or LL-LR, LR-LL, LL-LL, LR-LR Denote e(p) as number of nodes reachable through path P e(ll) =7, e(lr)=24, e(d)=d (variable number of D-links) # nodes reachable in two hops: e(ll-d)=e(d-ll)=7d e(lr-d)=e(d-lr) = 24d 314 Torsten Hoefler: CS 498 Hot Topics in HPC Source: B. Arimilli et al. The PERCS High- Performance Interconnect

50 Example: Alltoall Bandwidth Continued # nodes reachable in three hops: e(ll-d-ll) = 49d e(ll-d-lr)=e(lr-d-ll) = 168d e(lr-d-lr)=576d Number of paths from each source: d Now count the number of paths (i.e., congestion) through each LL, LR, D link c(l): (omitted uninteresting parts of summation) c(ll) = 1+64d c(lr) = 1+64d c(d) = Torsten Hoefler: CS 498 Hot Topics in HPC Source: B. Arimilli et al. The PERCS High- Performance Interconnect

51 Example: Alltoall Bandwidth Continued Effective (expected) bandwidths for each link-type: bandwidth = link bandwidth / congestion b(ll) = 24 GB/s / (1+64d) b(lr) = 5 GB/s / (1+64d) b(d) = 10 GB/s / 1024 If d<8, D is the bottleneck, otherwise LR Bandwidth per PE: 5 GB/s / 1+64d * total # paths d=9 ( cores): GB/s d=10 ( cores): GB/s total ~ 0.8 PB/s Connection to P7 chips: 4*24 GB/s = 96 GB/s ~ 83% 316 Torsten Hoefler: CS 498 Hot Topics in HPC Source: B. Arimilli et al. The PERCS High- Performance Interconnect

Optimized Routing for Large- Scale InfiniBand Networks

Optimized Routing for Large- Scale InfiniBand Networks Optimized Routing for Large- Scale InfiniBand Networks Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine Open Systems Lab Indiana University 1 Effect of Network Congestion CHiC Supercomputer: 566 nodes,

More information

FAST AND DEADLOCK-FREE ROUTING FOR ARBITRARY INFINIBAND NETWORKS

FAST AND DEADLOCK-FREE ROUTING FOR ARBITRARY INFINIBAND NETWORKS FAST AND DEADLOCK-FREE ROUTING FOR ARBITRARY INFINIBAND NETWORKS AKA. DFSSSP ROUTING Torsten Hoefler and Jens Domke All used images belong to the producers. MOTIVATION FOR BETTER ROUTING New challenging

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Routing, Network Embedding John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 14-15 4,11 October 2018 Topics for Today

More information

The effects of common communication patterns in large scale networks with switch based static routing

The effects of common communication patterns in large scale networks with switch based static routing The effects of common communication patterns in large scale networks with switch based static routing Torsten Hoefler Indiana University talk at: Cisco Systems San Jose, CA Nerd Lunch 21st August 2008

More information

Improving Parallel Computing Platforms - programming, topology and routing -

Improving Parallel Computing Platforms - programming, topology and routing - Improving Parallel Computing Platforms - programming, topology and routing - Torsten Hoefler Open Systems Lab Indiana University Bloomington, IN, USA 1 State of the Art (what we all know) Moores law mandates

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control

CS 498 Hot Topics in High Performance Computing. Networks and Fault Tolerance. 9. Routing and Flow Control CS 498 Hot Topics in High Performance Computing Networks and Fault Tolerance 9. Routing and Flow Control Intro What did we learn in the last lecture Topology metrics Including minimum diameter of directed

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection

More information

Lecture 3: Flow-Control

Lecture 3: Flow-Control High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor

More information

Advanced Computer Networks. Flow Control

Advanced Computer Networks. Flow Control Advanced Computer Networks 263 3501 00 Flow Control Patrick Stuedi, Qin Yin, Timothy Roscoe Spring Semester 2015 Oriana Riva, Department of Computer Science ETH Zürich 1 Today Flow Control Store-and-forward,

More information

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees, butterflies,

More information

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels Lecture: Interconnection Networks Topics: TM wrap-up, routing, deadlock, flow control, virtual channels 1 TM wrap-up Eager versioning: create a log of old values Handling problematic situations with a

More information

Advanced Computer Networks. Flow Control

Advanced Computer Networks. Flow Control Advanced Computer Networks 263 3501 00 Flow Control Patrick Stuedi Spring Semester 2017 1 Oriana Riva, Department of Computer Science ETH Zürich Last week TCP in Datacenters Avoid incast problem - Reduce

More information

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality

More information

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background 1 Hard Error Tolerance in PCM PCM cells will eventually fail; important to cause gradual capacity degradation

More information

Adaptive Routing Strategies for Modern High Performance Networks

Adaptive Routing Strategies for Modern High Performance Networks Adaptive Routing Strategies for Modern High Performance Networks Patrick Geoffray Myricom patrick@myri.com Torsten Hoefler Indiana University htor@cs.indiana.edu 28 August 2008 Hot Interconnect Stanford,

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

Interconnection Networks

Interconnection Networks Lecture 18: Interconnection Networks Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of these slides were created by Michael Papamichael This lecture is partially

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

Interconnection Networks: Routing. Prof. Natalie Enright Jerger Interconnection Networks: Routing Prof. Natalie Enright Jerger Routing Overview Discussion of topologies assumed ideal routing In practice Routing algorithms are not ideal Goal: distribute traffic evenly

More information

Routing Algorithms. Review

Routing Algorithms. Review Routing Algorithms Today s topics: Deterministic, Oblivious Adaptive, & Adaptive models Problems: efficiency livelock deadlock 1 CS6810 Review Network properties are a combination topology topology dependent

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger Interconnection Networks: Flow Control Prof. Natalie Enright Jerger Switching/Flow Control Overview Topology: determines connectivity of network Routing: determines paths through network Flow Control:

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information

Lecture 20: Distributed Memory Parallelism. William Gropp

Lecture 20: Distributed Memory Parallelism. William Gropp Lecture 20: Distributed Parallelism William Gropp www.cs.illinois.edu/~wgropp A Very Short, Very Introductory Introduction We start with a short introduction to parallel computing from scratch in order

More information

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus)

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Routing Algorithm How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Many routing algorithms exist 1) Arithmetic 2) Source-based 3) Table lookup

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Flow Control can be viewed as a problem of

Flow Control can be viewed as a problem of NOC Flow Control 1 Flow Control Flow Control determines how the resources of a network, such as channel bandwidth and buffer capacity are allocated to packets traversing a network Goal is to use resources

More information

TDT Appendix E Interconnection Networks

TDT Appendix E Interconnection Networks TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages

More information

Adaptive Routing. Claudio Brunelli Adaptive Routing Institute of Digital and Computer Systems / TKT-9636

Adaptive Routing. Claudio Brunelli Adaptive Routing Institute of Digital and Computer Systems / TKT-9636 1 Adaptive Routing Adaptive Routing Basics Minimal Adaptive Routing Fully Adaptive Routing Load-Balanced Adaptive Routing Search-Based Routing Case Study: Adapted Routing in the Thinking Machines CM-5

More information

ECE 669 Parallel Computer Architecture

ECE 669 Parallel Computer Architecture ECE 669 Parallel Computer Architecture Lecture 21 Routing Outline Routing Switch Design Flow Control Case Studies Routing Routing algorithm determines which of the possible paths are used as routes how

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

NOW Handout Page 1. Outline. Networks: Routing and Design. Routing. Routing Mechanism. Routing Mechanism (cont) Properties of Routing Algorithms

NOW Handout Page 1. Outline. Networks: Routing and Design. Routing. Routing Mechanism. Routing Mechanism (cont) Properties of Routing Algorithms Outline Networks: Routing and Design Routing Switch Design Case Studies CS 5, Spring 99 David E. Culler Computer Science Division U.C. Berkeley 3/3/99 CS5 S99 Routing Recall: routing algorithm determines

More information

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220 Admin Homework #5 Due Dec 3 Projects Final (yes it will be cumulative) CPS 220 2 1 Review: Terms Network characterized

More information

Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996

Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996 Lecture 28: Networks & Interconnect Architectural Issues Professor Randy H. Katz Computer Science 252 Spring 1996 RHK.S96 1 Review: ABCs of Networks Starting Point: Send bits between 2 computers Queue

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Lecture 7: Flow Control - I

Lecture 7: Flow Control - I ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 7: Flow Control - I Tushar Krishna Assistant Professor School of Electrical

More information

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal Lecture 19 Interconnects: Flow Control Winter 2018 Subhankar Pal http://www.eecs.umich.edu/courses/eecs570/ Slides developed in part by Profs. Adve, Falsafi, Hill, Lebeck, Martin, Narayanasamy, Nowatzyk,

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Module 17: Interconnection Networks Lecture 37: Introduction to Routers Interconnection Networks. Fundamentals. Latency and bandwidth Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks Lecture: Transactional Memory, Networks Topics: TM implementations, on-chip networks 1 Summary of TM Benefits As easy to program as coarse-grain locks Performance similar to fine-grain locks Avoids deadlock

More information

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms CS252 Graduate Computer Architecture Lecture 16 Multiprocessor Networks (con t) March 14 th, 212 John Kubiatowicz Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs252

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Hari Subramoni, Ping Lai, Sayantan Sur and Dhabhaleswar. K. Panda Department of

More information

Interconnection Networks

Interconnection Networks Lecture 15: Interconnection Networks Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2016 Credit: some slides created by Michael Papamichael, others based on slides from Onur Mutlu

More information

Lecture 3: Topology - II

Lecture 3: Topology - II ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 3: Topology - II Tushar Krishna Assistant Professor School of Electrical and

More information

OceanStor 9000 InfiniBand Technical White Paper. Issue V1.01 Date HUAWEI TECHNOLOGIES CO., LTD.

OceanStor 9000 InfiniBand Technical White Paper. Issue V1.01 Date HUAWEI TECHNOLOGIES CO., LTD. OceanStor 9000 Issue V1.01 Date 2014-03-29 HUAWEI TECHNOLOGIES CO., LTD. Copyright Huawei Technologies Co., Ltd. 2014. All rights reserved. No part of this document may be reproduced or transmitted in

More information

Parallel Computing Interconnection Networks

Parallel Computing Interconnection Networks Parallel Computing Interconnection Networks Readings: Hager s book (4.5) Pacheco s book (chapter 2.3.3) http://pages.cs.wisc.edu/~tvrdik/5/html/section5.html#aaaaatre e-based topologies Slides credit:

More information

Lecture 2: Topology - I

Lecture 2: Topology - I ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and

More information

CS 43: Computer Networks Switches and LANs. Kevin Webb Swarthmore College December 5, 2017

CS 43: Computer Networks Switches and LANs. Kevin Webb Swarthmore College December 5, 2017 CS 43: Computer Networks Switches and LANs Kevin Webb Swarthmore College December 5, 2017 Ethernet Metcalfe s Ethernet sketch Dominant wired LAN technology: cheap $20 for NIC first widely used LAN technology

More information

Lecture 14: Large Cache Design III. Topics: Replacement policies, associativity, cache networks, networking basics

Lecture 14: Large Cache Design III. Topics: Replacement policies, associativity, cache networks, networking basics Lecture 14: Large Cache Design III Topics: Replacement policies, associativity, cache networks, networking basics 1 LIN Qureshi et al., ISCA 06 Memory level parallelism (MLP): number of misses that simultaneously

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Deadlock and Router Micro-Architecture

Deadlock and Router Micro-Architecture 1 EE482: Advanced Computer Organization Lecture #8 Interconnection Network Architecture and Design Stanford University 22 April 1999 Deadlock and Router Micro-Architecture Lecture #8: 22 April 1999 Lecturer:

More information

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management

OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management Marina Garcia 22 August 2013 OFAR-CM: Efficient Dragonfly Networks with Simple Congestion Management M. Garcia, E. Vallejo, R. Beivide, M. Valero and G. Rodríguez Document number OFAR-CM: Efficient Dragonfly

More information

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID 1 Virtual Channel Flow Control Each switch has multiple virtual channels per phys. channel Each virtual

More information

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N.

Interconnection topologies (cont.) [ ] In meshes and hypercubes, the average distance increases with the dth root of N. Interconnection topologies (cont.) [ 10.4.4] In meshes and hypercubes, the average distance increases with the dth root of N. In a tree, the average distance grows only logarithmically. A simple tree structure,

More information

Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks

Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks HPI-DC 09 Fast-Response Multipath Routing Policy for High-Speed Interconnection Networks Diego Lugones, Daniel Franco, and Emilio Luque Leonardo Fialho Cluster 09 August 31 New Orleans, USA Outline Scope

More information

Lecture 15: Networks & Interconnect Interface, Switches, Routing, Examples Professor David A. Patterson Computer Science 252 Fall 1996

Lecture 15: Networks & Interconnect Interface, Switches, Routing, Examples Professor David A. Patterson Computer Science 252 Fall 1996 Lecture 15: Networks & Interconnect Interface, Switches, Routing, Examples Professor David A. Patterson Computer Science 252 Fall 1996 DAP.F96 1 Review: Memory Hierarchies File Cache Hard Disk Tapes On-Line

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

The Network Layer and Routers

The Network Layer and Routers The Network Layer and Routers Daniel Zappala CS 460 Computer Networking Brigham Young University 2/18 Network Layer deliver packets from sending host to receiving host must be on every host, router in

More information

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts School of Electrical and Computer Engineering Cornell University revision: 2017-10-17-12-26 1 Network/Roadway Analogy 3 1.1. Running

More information

The Tofu Interconnect D

The Tofu Interconnect D The Tofu Interconnect D 11 September 2018 Yuichiro Ajima, Takahiro Kawashima, Takayuki Okamoto, Naoyuki Shida, Kouichi Hirai, Toshiyuki Shimizu, Shinya Hiramoto, Yoshiro Ikeda, Takahide Yoshikawa, Kenji

More information

CMPE 150/L : Introduction to Computer Networks. Chen Qian Computer Engineering UCSC Baskin Engineering Lecture 18

CMPE 150/L : Introduction to Computer Networks. Chen Qian Computer Engineering UCSC Baskin Engineering Lecture 18 CMPE 150/L : Introduction to Computer Networks Chen Qian Computer Engineering UCSC Baskin Engineering Lecture 18 1 Final project demo Please do the demo THIS week to the TAs. Or you are allowed to use

More information

Parallel Computer Architecture II

Parallel Computer Architecture II Parallel Computer Architecture II Stefan Lang Interdisciplinary Center for Scientific Computing (IWR) University of Heidelberg INF 368, Room 532 D-692 Heidelberg phone: 622/54-8264 email: Stefan.Lang@iwr.uni-heidelberg.de

More information

CMSC 611: Advanced. Interconnection Networks

CMSC 611: Advanced. Interconnection Networks CMSC 611: Advanced Computer Architecture Interconnection Networks Interconnection Networks Massively parallel processor networks (MPP) Thousands of nodes Short distance (

More information

Growth. Individual departments in a university buy LANs for their own machines and eventually want to interconnect with other campus LANs.

Growth. Individual departments in a university buy LANs for their own machines and eventually want to interconnect with other campus LANs. Internetworking Multiple networks are a fact of life: Growth. Individual departments in a university buy LANs for their own machines and eventually want to interconnect with other campus LANs. Fault isolation,

More information

Part IV: 3D WiNoC Architectures

Part IV: 3D WiNoC Architectures Wireless NoC as Interconnection Backbone for Multicore Chips: Promises, Challenges, and Recent Developments Part IV: 3D WiNoC Architectures Hiroki Matsutani Keio University, Japan 1 Outline: 3D WiNoC Architectures

More information

Network on Chip Architecture: An Overview

Network on Chip Architecture: An Overview Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology

More information

Parallel Computing Platforms

Parallel Computing Platforms Parallel Computing Platforms Network Topologies John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 14 28 February 2017 Topics for Today Taxonomy Metrics

More information

1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects

1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects Overview of Interconnects Myrinet and Quadrics Leading Modern Interconnects Presentation Outline General Concepts of Interconnects Myrinet Latest Products Quadrics Latest Release Our Research Interconnects

More information

InfiniBand SDR, DDR, and QDR Technology Guide

InfiniBand SDR, DDR, and QDR Technology Guide White Paper InfiniBand SDR, DDR, and QDR Technology Guide The InfiniBand standard supports single, double, and quadruple data rate that enables an InfiniBand link to transmit more data. This paper discusses

More information

Link layer: introduction

Link layer: introduction Link layer: introduction terminology: hosts and routers: nodes communication channels that connect adjacent nodes along communication path: links wired links wireless links LANs layer-2 packet: frame,

More information

Generic Topology Mapping Strategies for Large-scale Parallel Architectures

Generic Topology Mapping Strategies for Large-scale Parallel Architectures Generic Topology Mapping Strategies for Large-scale Parallel Architectures Torsten Hoefler and Marc Snir Scientific talk at ICS 11, Tucson, AZ, USA, June 1 st 2011, Hierarchical Sparse Networks are Ubiquitous

More information

Deadlock and Livelock. Maurizio Palesi

Deadlock and Livelock. Maurizio Palesi Deadlock and Livelock 1 Deadlock (When?) Deadlock can occur in an interconnection network, when a group of packets cannot make progress, because they are waiting on each other to release resource (buffers,

More information

Interconnection networks

Interconnection networks Interconnection networks When more than one processor needs to access a memory structure, interconnection networks are needed to route data from processors to memories (concurrent access to a shared memory

More information

Interconnect Technology and Computational Speed

Interconnect Technology and Computational Speed Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented

More information

Lecture 18: Communication Models and Architectures: Interconnection Networks

Lecture 18: Communication Models and Architectures: Interconnection Networks Design & Co-design of Embedded Systems Lecture 18: Communication Models and Architectures: Interconnection Networks Sharif University of Technology Computer Engineering g Dept. Winter-Spring 2008 Mehdi

More information

ES1 An Introduction to On-chip Networks

ES1 An Introduction to On-chip Networks December 17th, 2015 ES1 An Introduction to On-chip Networks Davide Zoni PhD mail: davide.zoni@polimi.it webpage: home.dei.polimi.it/zoni Sources Main Reference Book (for the examination) Designing Network-on-Chip

More information

Chapter 3 : Topology basics

Chapter 3 : Topology basics 1 Chapter 3 : Topology basics What is the network topology Nomenclature Traffic pattern Performance Packaging cost Case study: the SGI Origin 2000 2 Network topology (1) It corresponds to the static arrangement

More information

Introduction to High-Speed InfiniBand Interconnect

Introduction to High-Speed InfiniBand Interconnect Introduction to High-Speed InfiniBand Interconnect 2 What is InfiniBand? Industry standard defined by the InfiniBand Trade Association Originated in 1999 InfiniBand specification defines an input/output

More information

Outline: Connecting Many Computers

Outline: Connecting Many Computers Outline: Connecting Many Computers Last lecture: sending data between two computers This lecture: link-level network protocols (from last lecture) sending data among many computers 1 Review: A simple point-to-point

More information

Topology basics. Constraints and measures. Butterfly networks.

Topology basics. Constraints and measures. Butterfly networks. EE48: Advanced Computer Organization Lecture # Interconnection Networks Architecture and Design Stanford University Topology basics. Constraints and measures. Butterfly networks. Lecture #: Monday, 7 April

More information

Overview. Internetworking and Reliable Transmission. CSE 561 Lecture 3, Spring David Wetherall. Internetworking. Reliable Transmission

Overview. Internetworking and Reliable Transmission. CSE 561 Lecture 3, Spring David Wetherall. Internetworking. Reliable Transmission Internetworking and Reliable Transmission CSE 561 Lecture 3, Spring 2002. David Wetherall Overview Internetworking Addressing Packet size Error detection Gateway services Reliable Transmission Stop and

More information

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing? J. Flich 1,P.López 1, M. P. Malumbres 1, J. Duato 1, and T. Rokicki 2 1 Dpto. Informática

More information

Lessons learned from MPI

Lessons learned from MPI Lessons learned from MPI Patrick Geoffray Opinionated Senior Software Architect patrick@myri.com 1 GM design Written by hardware people, pre-date MPI. 2-sided and 1-sided operations: All asynchronous.

More information

Dr. Gengbin Zheng and Ehsan Totoni. Parallel Programming Laboratory University of Illinois at Urbana-Champaign

Dr. Gengbin Zheng and Ehsan Totoni. Parallel Programming Laboratory University of Illinois at Urbana-Champaign Dr. Gengbin Zheng and Ehsan Totoni Parallel Programming Laboratory University of Illinois at Urbana-Champaign April 18, 2011 A function level simulator for parallel applications on peta scale machine An

More information

EE 382C Interconnection Networks

EE 382C Interconnection Networks EE 8C Interconnection Networks Deadlock and Livelock Stanford University - EE8C - Spring 6 Deadlock and Livelock: Terminology Deadlock: A condition in which an agent waits indefinitely trying to acquire

More information

Cray XC Scalability and the Aries Network Tony Ford

Cray XC Scalability and the Aries Network Tony Ford Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?

More information

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson Overview NoC: Future generation of many core processor on a single chip

More information

Topics for Today. Network Layer. Readings. Introduction Addressing Address Resolution. Sections 5.1,

Topics for Today. Network Layer. Readings. Introduction Addressing Address Resolution. Sections 5.1, Topics for Today Network Layer Introduction Addressing Address Resolution Readings Sections 5.1, 5.6.1-5.6.2 1 Network Layer: Introduction A network-wide concern! Transport layer Between two end hosts

More information

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Interconnection Network. Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University Interconnection Network Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Topics Taxonomy Metric Topologies Characteristics Cost Performance 2 Interconnection

More information

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching. Switching/Flow Control Overview Interconnection Networks: Flow Control and Microarchitecture Topology: determines connectivity of network Routing: determines paths through network Flow Control: determine

More information

Distributed Queue Dual Bus

Distributed Queue Dual Bus Distributed Queue Dual Bus IEEE 802.3 to 802.5 protocols are only suited for small LANs. They cannot be used for very large but non-wide area networks. IEEE 802.6 DQDB is designed for MANs It can cover

More information

Slim Fly: A Cost Effective Low-Diameter Network Topology

Slim Fly: A Cost Effective Low-Diameter Network Topology TORSTEN HOEFLER, MACIEJ BESTA Slim Fly: A Cost Effective Low-Diameter Network Topology Images belong to their creator! NETWORKS, LIMITS, AND DESIGN SPACE Networks cost 25-30% of a large supercomputer Hard

More information

The Impact of Optics on HPC System Interconnects

The Impact of Optics on HPC System Interconnects The Impact of Optics on HPC System Interconnects Mike Parker and Steve Scott Hot Interconnects 2009 Manhattan, NYC Will cost-effective optics fundamentally change the landscape of networking? Yes. Changes

More information