An introduction on the on-chip networks (NoC)

Size: px
Start display at page:

Download "An introduction on the on-chip networks (NoC)"

Transcription

1 Friday, October 12th, 2012 An introduction on the on-chip networks (NoC) Davide Zoni PhD Student webpage: home.dei.polimi.it/zoni

2 Outline Introduction to Network-on-Chip New challenges Scenario Cache implications Topologies and abstract metrics Routing algorithms Types Deadlock free property Limitations Router microarchitecture Flit based Optimization dimensions 2

3 Tiled multi-core architecture with shared memory Source: Natalie Jerger, ACACES Summer School,

4 Some slides adapted from... Specific References Timothy M. Pinkston, University of Southern California, On-Chip Networks, Natalie E. Jerger and Li-Shiuan Peh Principles and Practices of Interconnection Networks, William J. Dally and Brian Towles Other people Chita R. Das Penn State NoC Research Group Li-Shiuan-Peh, MIT Onur Mutlu, CMU Karen Bergman, Columbia Bill Dally, Stanford Rajeev Balasubramoniam, Utah Steve Keckler, UT Austin Valeria Bertacco, University of Michigan 4

5 What about an interconnection network? Applications: low-latency, high-bandwidth, dedicated channels between logic and memory Technology: Dedicated channels too expensive in terms of area, power and reliability 5

6 What about an interconnection network? An Interconnection Network is a programmable system that transports data between terminals Technology: Interconnection network helps efficiently utilize scarce resources Application: Managing communication can be critical to performance 6

7 What about a classification? Interconnection networks can be grouped into four domains depending on number and proximity of devices to be connected Networks on Chip (NoCs or OCNs) Devices include: microarchitectural elements (functional units, register files), caches, directories, processors Current/Future systems: dozens, hundreds of devices Ex: Intel TeraFLOPS research prototypes 80 cores Intel Single-chip Cloud Computer 48 cores Proximity: millimeters 7

8 System/Storage Area Networks (SANs) Multiprocessor and multicomputer systems Interprocessor and processor-memory interconnections Server and data center environments Storage and I/O components Hundreds to thousands of devices interconnected IBM Blue Gene/L supercomputer (64K nodes, each with 2 processors) Maximum interconnect distance tens of meters (typical) to a few hundred meters Examples (standards and proprietary): InfiniBand, Myrinet, Quadrics, Advanced Switching Interconnect 8

9 LANs and WANs Local Area Networks (LANs) Interconnect autonomous computer systems Machine room or throughout a building or campus Hundreds of devices interconnected (1,000s with bridging) Maximum interconnect distance few kilometers to few tens of kilometers Example (most popular): Ethernet, with 10 Gbps over 40Km Wide Area Networks (WANs) Interconnect systems distributed across globe Internet-working support required Many millions of devices interconnected Max distance: many thousands of kilometers Example: ATM (asynchronous transfer mode) 9

10 Network scenario 10

11 Network scenario 11

12 Why networks? 12

13 What about computing demands? 13

14 The energy-performance wall 14

15 The energy performance wall 15

16 The energy-performance wall 16

17 The energy-performance wall 17

18 Why on-chip networks? They provide external connectivity from system to outside world Also, connectivity within a single computer system at many levels I/O units, boards, chips, modules and blocks inside chips Trends: high demand on communication bandwidth Increased computing power and storage capacity Switched networks are replacing buses Integral part of many-core architectures Energy consumed by communication will exceed that of computation in future systems Lots of innovation needed! Computer architects/engineers must understand interconnect problems and solutions in order to more effectively design and evaluate systems 18

19 On-chip vs off-chip Significant research in multi-chassis interconnection networks (off-chip) Supercomputers and Clusters of workstations Internet routers Leverage research and insight but... Constraints are different Pin-limited bandwidth Mix of short and long packets on-chip Inherent overheads of off-chip I/O transmission New research area to meet performance, area, thermal, power and reliability needs (On-chip) Wiring constraints and metal layer limitations Horizontal and vertical layout Short, fixed length Repeater insertion limits routing of wires Avoid routing over dense logic Impact wiring density 19

20 20 Some examples BLUEGENE/L Mellanox Server Blade IP Routers - Huge power consumption - One million Watts - Complicated network structure - Total power budget Constrained by packaging and cooling costs <= 30W - Network power consumption ~10 to 15 W - Constrained by costs + regulatory limits - ~200W line card - ~60W interconnection network IB 4X CPU System logic Alpha & its Thermal Profile Intel SCC 48-core Alpha Packaging and cooling costs Dell s law <= $25 - Router+link ~25W MIT Raw CMP - Complicated communication networks - On-chip network consumes about 36% of total chip power MIT Raw CMP

21 On-chip Networks PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE 21

22 On-chip Networks: outline Topology Routing Properties Deadlock avoidance Router microarchitecture Baseline model Optimizations Metrics Power Performance PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE 22

23 On-chip Network: Where we are... General Purpose Multi-cores Shared Memory Distributed memory (or Message Passing) 23

24 On-chip Network: Where we are... Here we are Shared Memory General Purpose Multi-cores Distributed memory (or Message Passing) 24

25 25 Shared memory multi-core

26 Memory Model in CMPs Message Passing Explicit movement of data between nodes and address spaces Programmers manage communication Shared Memory Communication occurs implicitly through loads/stores and accessing instructions Will focus on shared memory Look at optimization for cache coherence protocols 26

27 Memory Model in CMPs Logically Practically... All processors access some shared memory cache hierarchies reduce access latency to improve performance Requires cache coherence protocol to maintain coherent view in presence of multiple shared copies Consistency model: the behaviour of the memory model in multi-core environment, i.e. what is allowed and what is not allowed Coherence: shadow the cache hierarchy to the programmer (without lose performance improvement) 27

28 28 Tiled multi-core architecture with shared memory Source: Natalie Jerger, ACACES Summer School, 2012

29 Intel SCC 2D mesh State of the art VC routers 2Cores per each tiles Multiple voltage islands 1 Vdd per each tile 1 NoC Vdd island Source: Natalie Jerger, ACACES Summer School,

30 Coherence Protocol on Network Performance Coherence protocol shapes communication needed by system Single writer, multiple reader invariant Requires: Data requests Data responses Coherence permissions Suggested reading for a quick review of coherence: A Primer on Memory Consistency and Cache Coherence, Daniel Sorin, Mark Hill and David Wood. Morgan Claypool Publishers,

31 Hardware cache coherence Rough goal: all caches have same data at all times Minimal flushing, maximum caches best performance Two solutions: Broadcast-based protocol: All processors see all requests at the same time, same order. Often relies on bus But can broadcast on unordered interconnect Directory-based protocol: Order of the requests relies on a different mechanism than bus Maybe better flexibility and scalability Maybe higher latency 31

32 Broadcast-based coherence Source: Natalie Jerger, ACACES Summer School,

33 Coherence Bandwidth Requirements How much address bus bandwidth does snooping need? Well, coherence events generated on... Misses (only in L2, not so bad) Dirty replacements Some parameters: 2 GHz CPUs, 2 IPC 33% memory operations, 2% of which miss in L2 50% of evictions are dirty Some results: (0.33 * 0.02) + (0.33 * 0.02 * 0.50)) = 0.01 events/insn 0.01 events/insns * 2 insn/cycle * 2 cycle/ns = 0.04 events/ns Request: 0.04 events/ns * 4B/event = 0.16 GB/s = 160 MB/s Data response: 0.04 events/ns * 64 B/event = 2.56 GB/s What about scalability? That s 2.5 GB/s... per processor With 16 processors, that 40 GB/s! With 128 processors, that s 320 GB/s!! 33

34 Scalable Cache Coherence Two parts solution: Bus-based interconnect: Replace non-scalable bandwidth substrate (bus) with scalable bandwidth substrate (point-to-point network, e.g. mesh) Processor'snooping'bandwidth: Interesting most snoops result in no actions Replace non scalable broadcast protocol (it spam everyone)...with scalable directory protocol (it only spams processors that care) NOTE: physical address space statically partitioned (Still shared!!) Can easily determine which memory module holds a given line That memory module sometimes called home Can t easily determine which processors have line in their caches Bus-based protocol: broadcast events to all processors/caches Simple and fast, but non-scalable 34

35 Scalable Cache Coherence Source: Natalie Jerger, ACACES Summer School,

36 Coherence Protocol Requirements Different message types Unicast, multicast, broadcast Directory protocol Majority of requests: Unicast Lower bandwidth demands on network More scalable due to point-to-point communication Broadcast protocol Majority of requests: Broadcast Higher bandwidth demands Often rely on network ordering 36

37 Impact of Cache Hierarchy Sharing of injection/ejection port among cores and caches Caches reduce average memory latency Private caches Multiple L2 copies Data can be replicated to be close to processor Shared caches Data can only exist in one L2 to bank Addresses striped across banks (Lots of different ways to do this) Aside: lots of research on cache block placement, replication and migration Serve as filter for interconnect traffic 37

38 Private vs. Shared Caches Private caches Reduce latency of L2 cache hits keep frequently accessed data close to processor Increase off-chip pressure Shared caches Better use of storage Non-uniform L2 hit latency More on-chip network pressure all L1 misses go onto network 38

39 On-chip Network: Private L2 Cache Hit Private L2 Cache Hit A 3Tag s Router A Logic Data Controller L1 I/D Cache 2 Core 1 Miss A LD A Memory Controller Source: Chita Das, ACACES Summer School,

40 On-chip Network: Private L2 Cache Miss Format message to memory controller 4 Miss A Private L2 3 Cache Tag Data s (off-chip) Router Logic 6 Data received, sent to L2 Controller L1 I/D Cache 2 Miss A Core 1 LD A Source: Chita Das, ACACES Summer School, Memory Controller Request sent offchip 40

41 On-chip Network: Shared L2 Local Cache Miss Receive data, send to L1 and core Format request message 3 and sent to L2 Bank that A maps to Router 41 (on-chip) 7 Shared L2 Cache Tags Data Logic Send data to 6 requestor Receive message and sent to L2 4 Shared L2 Cache L2 Hit Controller Tags L1 I/D Cache 2 Core 5 Data Controller 1 LD A Miss A A Memory Controller Source: Chita Das, ACACES Summer School, 2011 Router L1 I/D Cache Logic A Core

42 42 Network-on-Chip details

43 43 Topology nomenclature 1 Two broad classes: Direct and Indirect Networks Direct Networks: Every node is both a terminal and a switch Examples: Mesh, Torus, k-ary-n-cubes Indirect Networks: The network is basically composed of switches that connect the end nodes Examples: MIN, Crossbar, etc Direct Source: Natalie Jerger, ACACES Summer School, 2012 Indirect

44 44 Topology abstract metrics 1 Switch Degree: Number of links/edges incident on a node Proxy for estimating cost Higher degree requires more links and port counts at each router 2 Source: Natalie Jerger, ACACES Summer School, ,3,4 4

45 45 Topology abstract metrics 2 Hop Count: Number of hops a message takes from source to destination Proxy for network latency Every node, link incurs some propagation delay even when no contention Network diameter: large min hop count in network Average minimum hop count: average across all source/destination pairs Minimal hop count: smallest hop count connecting two nodes Implementation may incorporate non-minimal paths (increase avg hop count) Max=4 Avg=2.2 Source: Natalie Jerger, ACACES Summer School, 2012 Max=4 Avg=1.77 Max=2 Avg=1.33

46 Topology abstract metrics implications Abstract metrics are just proxies: Does not always correlate with the real metric they represent Example: Network A with 2 hops, 5 stage pipeline, 4 cycle link traversal vs. Network B with 3 hops, 1 stage pipeline, 1 cycle link traversal Hop Count says A is better than B But A has 18 cycle latency vs. 6 cycle latency for B Topologies typically trade-off hop count and node degree 46

47 Traffic patterns How to stress a NoC? Synthetic traffic patterns Uniform random Optimistic, it allows to view a bad network as a good one Matrix transpose Many others based on probabilistic distributions and pattern selection algorithms Real traffic patterns Real benchmarks executed on the simulated architecture More accurate Complete evaluation of the system performance Time consuming simulation Is the selected traffic suitable for my application? 47

48 Routing, Arbitration, and Switching Routing Defines the allowed path(s) for each packet (Which paths?) Problems Livelock and Deadlock Arbitration Determines use of paths supplied to packets (When allocated?) Problems Starvation Switching Establishes the connection of paths for packets (How allocated?) Switching techniques Circuit switching, Packet switching 48

49 49 Until now old wine in a new bottle...but for caches Deadlock Packets Routing algorithm Flow control Router/switch Throughtput Where is the difference? Latency

50 50 Until now old wine in a new bottle...but for caches Low power Limited resources High performance High reliability Thermal issues On-chip network criticalities

51 NoC granulatity overview Messages: composed of one or more packets (NOTE:If message size is maximum packet size only one packet created) Packets: composed of one or more flits Flit: flow control digit Phit: physical digit (Subdivides flit into chunks = to link width) Off-chip: channel width limited by pins On-chip: abundant wiring means phit size == flit size 51

52 Routing overview Usually topology discussion assumes ideal routing, while routing algorithm are not ideal in practice Once topology is fixed routing determines the path from source to destination GOAL: distribute traffic evenly among paths Avoid hot spots, contention The more balanced algorithm is the closer to ideal throughput is Keep complexity in mind 52

53 Routing algorithm attributes Types Oblivious: random without adaptiveness routing, that is very efficiently implementable Adaptive: the algorithm uses the network state to modify the routing path for each packet even under the same source,destination pair Routing path Deterministic: all the packets from each couple (source,destination) uses always the same path regardless the network state Minimal: all packets uses the shortest path from source to destination Non-minimal: packets may be routed to a longer path depending for example on network state Number of destinations Unicast: typical and easy solution in NoC Multicast: useful with cache coherence messages Broadcast: typical in bus-based architectures 53

54 The deadlock avoidance property Each packet is occupying a link and waiting for a link Without routing restrictions, a resource cycle can occur Leads to deadlock This is because resource are shared 54

55 Deterministic routing All messages from Source to Destination traverse the same path Common example: Dimension Order Routing (DOR) Message traverses network dimension by dimension Aka XY routing Cons: Eliminates any path diversity provided by topology Poor load balancing Simple and inexpensive to implement Deadlock-free (why???) Pros: 55

56 Deterministic routing aka X-Y Routing Traverse network dimension by dimension Can only turn to Y dimension after finished X It removes a lot of turns to ensure deadlock free property 56

57 Adaptive routing Exploits path diversity Uses network state to make routing decisions Buffer occupancies often used Coupled with flow control mechanism Local information readily available Global information more costly to obtain Network state can change rapidly Use of local information can lead to non-optimal choices Can be minimal or non-minimal 57

58 Minimal adaptive routing Local information can result in sub-optimal choices 58

59 Non-minimal adaptive routing Fully adaptive Not restricted to take shortest path Misrouting: directing packet along non-productive channel Priority given to productive output Some algorithms forbid U-turns Livelock potential: traversing network without ever reaching destination Limit number of misroutings What about power consumption? 59

60 Turn model for adaptive routing DOR eliminates 4 turns in a 2d-mesh topology with two cycles N to E, N to W, S to E, S to W No adaptivity It is possible to do better? Hint: some models relax to eliminate 2 turns instead of 4 in 2d-mesh Turn model 60

61 Turn model for adaptive routing 1 Basic steps Partition channels according to the direction in which they route packets Identify possible turns Identify the cycles combining turns, i.e. the most single cycles Break each simple cycle 61 Check if the combination of simple cycle allows the formation of complex cycles Example on a 2D-mesh 2 simple cycles

62 Turn model for adaptive routing 2 The DOR algorithm avoid 4 turns to ensure deadlock free property What about removing just 1 turn per cycle? Maybe the deadlock property is still valid 62

63 Turn model for adaptive routing 3 Not all turns are valid to remove cycles and preserve deadlock free property Theorem: The minimum number of turns that must be prohibited to prevent deadlock in an n-dimensional mesh is n*(n-1) or a quarter of the possible turns NOTE: However you have to choose carefully the prohibited turns 63

64 Turn model: west-first routing algorithm The first direction to take is west, if any Never possible to go west, after a while!!! An example 64

65 Turn model: north-last routing algorithm Going north is the last thing to do Never possible to go north, at the beginning!!! An example 65

66 Turn model: negative-first routing algorithm y Travel from negative start from negative Never possible to go negative from positive!!! An example x 66

67 Issues in routing algorithms Unbalanced traffic in DOR North: top-right West: top-left South: bottom-left East: bottom-right 67

68 NoC granulatity overview Messages: composed of one or more packets (NOTE:If message size is maximum packet size only one packet created) Packets: composed of one or more flits Flit: flow control digit Phit: physical digit (Subdivides flit into chunks = to link width) Off-chip: channel width limited by pins On-chip: abundant wiring means phit size == flit size 68

69 NoC microarchitecture based on granulatiry Message-based: allocation made at message granularity circuit switching Packet-based: allocation made to whole packets Store and forward (SaF) Large latency and buffer required Virtual Cut Through (VCT) Improves SaF but still large buffers and latency Flit-based: allocation made on a flit-by-flit basis Wormhole Efficient buffer utilization, low latency Suffers Head of Line (HoL) Virtual channels Primary to face deadlock Then face HoL 69

70 Switch/Router Wormhole Microarchitecture Flit-based,i.e. Packet divided in flits Pipelined in 4 stages BW,RC,SA,ST,LT Buffers organized on a flit basis Single buffer per port Buffer states: G idle,routing,active waiting, R output port (route) C credit count P pointers to data 70

71 71 Switch/Router Virtual Channel Microarchitecture

72 Router components Router components Input buffers, route computation logic, virtual channel allocator, switch allocator, crossbar switch Most OCN routers are input buffered Use single-ported memories Buffer store flits for duration in router Contrast with processor pipeline that latches between stages Basic router pipeline (Canonical 5-stage pipeline) BW: Buffer Write RC: Routing computation VA:Virtual Channel Allocation SA: Switch Allocation ST: Switch Traversal LT: Link Traversal 72

73 Router components Routing computation performed once per packet Virtual channel allocated once per packet Body and tail flits inherit this info from head flit Router performance Baseline (no load) dealy: 5 cycles + link delay x Hop + tserialization How to reduce latency? 73

74 Pipeline optimization: lookahead router Overlap with BW Precomputing route allows flits to compete for Vcs immediately after BW RC decodes route header Routing computation needed at next hop Can be computed in parallel with VA 74

75 Pipeline optimization: speculation Assume that Virtual Channel Allocation stage will be successful Valid under low to moderate loads Entire VA and SA in parallel If VA unsuccessful (no virtual channel returned) Must repeat VA/SA in next cycle Prioritize non-speculative requests 75

76 Router Pipeline: module dipendencies Dependence between output of one module and input of another Determine critical path through router Cannot bid for switch port until routing performed Li-Shiuan Peh and William J. Dally A Delay Model and Speculative Architecture for Pipelined Routers 76

77 Router Pipeline: delay model Li-Shiuan Peh and William J. Dally A Delay Model and Speculative Architecture for Pipelined Routers 77

78 Switch/Router Flow Control Flow control 78 determines how a network resources, such as bandwidth, buffer capacity and control state are allocated to packets traversing the network Resource allocation problem: from the resources point of view Contention resolution: from the packet point of view Bufferless, buffered

79 Switch/Router Bufferless Flow Control No buffers Allocate channels and bandwidth to competing packets Two modes Dropping flow control Circuit switching flow control William Dally and Brian Towles Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 79

80 Bufferless Dropping Flow Control 1 Simplest flow control form Allocate channel and bandwidth to competing packets In case of collisions we experience packet drops Collision can be signaled or not using ack-nack messages William Dally and Brian Towles Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 80

81 Bufferless Dropping Flow Control 2 With no ack messages the only viable way is timeout timers Ack messages can reduce latency William Dally and Brian Towles Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 81

82 Bufferless Circuit switching Flow Control 1 It allocates all needed resources before send the message When no further packets must be sent, the circuit is deallocated Head flit arbitrates for resources, and if stalled no resend needed William Dally and Brian Towles Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 82

83 Switch/Router Buffered Flow Control Buffers More flexibility, with the possibility to decouple resource allocation in steps Two modes Wormhole flow control Virtual channel flow control William Dally and Brian Towles Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 83

84 Switch/Router Buffered Wormhole Flow Control Allocate on a per flit basis More efficient in buffer consumption Head of Line (HOL) blocking issues Buffered solutions allow to decouple resource allocation U uppuer outport, L lower outport In port States (I,W,A) (idle, waiting, allocated) Flits (H,B,T) (head, body, tail) William Dally and Brian Towles Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 84

85 Switch/Router Virtual Channel Flow Control Multiple buffers on the same input port Need for a state on each virtual channel More complex wormhole to manage than Allows to manage different flows at the same time Solves the HoL issues Deadlock avoidance property William Dally and Brian Towles Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 85

86 Wormhole HoL issues William Dally and Brian Towles Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 86

87 Buffer Management and Backpressure 87 How to manage buffers between neighbors (i.e. how can I know the downstream destination router buffer is full?) Three ways: Credit based The upstream router keeps track of the available flit slots available in the downstream router Upstream router decreases counter when sends a flit while downstream router increases the couter (backward) when a flit leave the router Accurate fine grain control on flow control, but a lot of messages On/off Threshold mechanism with single bit low overhead to signal upstream router the permission to send Ack/nack No state in the upstream node Sends and wait for ack/nack, no net gain Waist of bandwitdh, sending without ack guarantee

88 Credit-based flow control William Dally and Brian Towles Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 88

89 On-off flow control William Dally and Brian Towles Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 89

90 Ack-nack flow control William Dally and Brian Towles Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 90

91 Evaluation metrics for NOCs Performance Network centric Latency Throughput Application Centric System throughput (Weighted Speedup) Application throughput (IPC) Power/Energy Watts/Joules Energy Delay Product (EDP) Fault-Tolerance Process variation/reliability Thermal Temperature 91

92 Network-on-Chip power consumption Network power breakdown - Buffer power, crossbar power and link power are comparable - Arbiter power is negligible Source: Chita Das, ACACES summer school

93 Bibliography 2 Dally, W. J., and B. Towles [2004]. Principles and Practices of Interconnection Networks, Morgan Kaufmann Publishers, San Francisco. C.A. Nicopoulos, N. Vijaykrishnan, and C.R. Das, Network-on-Chip Architectures: A Holistic Design Exploration, Lecture Notes in Electrical Engineering Book Series, Springer, October G. De Micheli, L. Benini, Networks on Chips: Technology and Tools, Morgan Kaufmann, J. Duato, S. Yalamanchili, and L. Ni, Interconnection Networks: An Engineering Approach, Morgan Kaufmann, R. Marculescu, U. Y. Ogras, L.-S. Peh, N. E. Jerger, Y. Hoskote, 'Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspectives', IEEE Trans. on ComputerAided Design of Integrated Circuits and Systems (TCAD), vol. 28, pp. 3-21, Jan T. Bjerregaard and S. Mahadevan, A survey of research and practices of network-onchip, ACM Comput. Surv., vol. 38, no. 1, pp. 1 51, Mar Natalie Enright-Jerger and Li-Shiuan Peh, "On-Chip Networks", Synthesis Lecture, Morgan-Claypool Publishers, Aug Agarwal, A. [1991]. Limits on interconnection network performance, IEEE Trans. on Parallel and Distributed Systems 2:4 (April), Dally, W. J., and B. Towles [2001]. Route packets, not wires: On-chip interconnection networks, Proc. of the Design Automation Conference, Las Vegas (June). Ho, R., K. W. Mai, and M. A. Horowitz [2001]. The future of wires, Proc. of the IEEE 89:4 (April). Hangsheng Wang, Xinping Zhu, Li-Shiuan Peh and Sharad Malik, "Orion: A Power-Performance Simulator for Interconnection Networks", In Proceedings of MICRO 35, Istanbul, November D. Brooks, R. Dick, R. Joseph, and L. Shang, "Power, thermal, and reliability modeling in nanometer-scale microprocessors, " IEEE Micro,

94 94 Thank you Any questions?

ES1 An Introduction to On-chip Networks

ES1 An Introduction to On-chip Networks December 17th, 2015 ES1 An Introduction to On-chip Networks Davide Zoni PhD mail: davide.zoni@polimi.it webpage: home.dei.polimi.it/zoni Sources Main Reference Book (for the examination) Designing Network-on-Chip

More information

ECE/CS 757: Advanced Computer Architecture II Interconnects

ECE/CS 757: Advanced Computer Architecture II Interconnects ECE/CS 757: Advanced Computer Architecture II Interconnects Instructor:Mikko H Lipasti Spring 2017 University of Wisconsin-Madison Lecture notes created by Natalie Enright Jerger Lecture Outline Introduction

More information

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal Lecture 19 Interconnects: Flow Control Winter 2018 Subhankar Pal http://www.eecs.umich.edu/courses/eecs570/ Slides developed in part by Profs. Adve, Falsafi, Hill, Lebeck, Martin, Narayanasamy, Nowatzyk,

More information

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching. Switching/Flow Control Overview Interconnection Networks: Flow Control and Microarchitecture Topology: determines connectivity of network Routing: determines paths through network Flow Control: determine

More information

Lecture 3: Flow-Control

Lecture 3: Flow-Control High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

Interconnection Networks: Routing. Prof. Natalie Enright Jerger Interconnection Networks: Routing Prof. Natalie Enright Jerger Routing Overview Discussion of topologies assumed ideal routing In practice Routing algorithms are not ideal Goal: distribute traffic evenly

More information

TDT Appendix E Interconnection Networks

TDT Appendix E Interconnection Networks TDT 4260 Appendix E Interconnection Networks Review Advantages of a snooping coherency protocol? Disadvantages of a snooping coherency protocol? Advantages of a directory coherency protocol? Disadvantages

More information

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU Thomas Moscibroda Microsoft Research Onur Mutlu CMU CPU+L1 CPU+L1 CPU+L1 CPU+L1 Multi-core Chip Cache -Bank Cache -Bank Cache -Bank Cache -Bank CPU+L1 CPU+L1 CPU+L1 CPU+L1 Accelerator, etc Cache -Bank

More information

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger Interconnection Networks: Flow Control Prof. Natalie Enright Jerger Switching/Flow Control Overview Topology: determines connectivity of network Routing: determines paths through network Flow Control:

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

Interconnection Networks

Interconnection Networks Interconnection Networks Interconnection Networks Introduction How to connect individual devices together into a group of communicating devices? Device: r r r Component within a computer Single computer

More information

ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures. Introduc1on. Prof. Natalie Enright Jerger

ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures. Introduc1on. Prof. Natalie Enright Jerger ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures Introduc1on Prof. Natalie Enright Jerger Winter 2011 ECE 1749H: Interconnec1on Networks (Enright Jerger) 1 Interconnec1on Networks

More information

Low-Power Interconnection Networks

Low-Power Interconnection Networks Low-Power Interconnection Networks Li-Shiuan Peh Associate Professor EECS, CSAIL & MTL MIT 1 Moore s Law: Double the number of transistors on chip every 2 years 1970: Clock speed: 108kHz No. transistors:

More information

1/12/11. ECE 1749H: Interconnec3on Networks for Parallel Computer Architectures. Introduc3on. Interconnec3on Networks Introduc3on

1/12/11. ECE 1749H: Interconnec3on Networks for Parallel Computer Architectures. Introduc3on. Interconnec3on Networks Introduc3on ECE 1749H: Interconnec3on Networks for Parallel Computer Architectures Introduc3on Prof. Natalie Enright Jerger Winter 2011 ECE 1749H: Interconnec3on Networks (Enright Jerger) 1 Interconnec3on Networks

More information

Lecture 2: Topology - I

Lecture 2: Topology - I ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and

More information

The Multi-core revolution: Design Issues and Bus-based Interconnect

The Multi-core revolution: Design Issues and Bus-based Interconnect Friday, October, 2013 The Multi-core revolution: Design Issues and Bus-based Interconnect Davide Zoni PhD Student email: zoni@elet.polimi.it webpage: home.dei.polimi.it/zoni Outline Power wall and singlecore

More information

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract

More information

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID 1 Virtual Channel Flow Control Each switch has multiple virtual channels per phys. channel Each virtual

More information

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks Lecture: Transactional Memory, Networks Topics: TM implementations, on-chip networks 1 Summary of TM Benefits As easy to program as coarse-grain locks Performance similar to fine-grain locks Avoids deadlock

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

Interconnection Networks

Interconnection Networks Lecture 18: Interconnection Networks Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of these slides were created by Michael Papamichael This lecture is partially

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Lecture 7: Flow Control - I

Lecture 7: Flow Control - I ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 7: Flow Control - I Tushar Krishna Assistant Professor School of Electrical

More information

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection

More information

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs -A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs Pejman Lotfi-Kamran, Masoud Daneshtalab *, Caro Lucas, and Zainalabedin Navabi School of Electrical and Computer Engineering, The

More information

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels Lecture: Interconnection Networks Topics: TM wrap-up, routing, deadlock, flow control, virtual channels 1 TM wrap-up Eager versioning: create a log of old values Handling problematic situations with a

More information

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background 1 Hard Error Tolerance in PCM PCM cells will eventually fail; important to cause gradual capacity degradation

More information

ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Rou1ng. Prof. Natalie Enright Jerger

ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Rou1ng. Prof. Natalie Enright Jerger ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Rou1ng Prof. Natalie Enright Jerger Announcements Feedback on your project proposals This week Scheduled extended 1 week Next week:

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus)

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Routing Algorithm How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Many routing algorithms exist 1) Arithmetic 2) Source-based 3) Table lookup

More information

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees, butterflies,

More information

Lecture 22: Router Design

Lecture 22: Router Design Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO 03, Princeton A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip

More information

Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks

Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks Technical Report #2012-2-1, Department of Computer Science and Engineering, Texas A&M University Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks Minseon Ahn,

More information

ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Rou1ng. Prof. Natalie Enright Jerger

ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Rou1ng. Prof. Natalie Enright Jerger ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Rou1ng Prof. Natalie Enright Jerger Rou1ng Overview Discussion of topologies assumed ideal rou1ng In prac1ce Rou1ng algorithms are

More information

Interconnection Networks

Interconnection Networks Lecture 15: Interconnection Networks Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2016 Credit: some slides created by Michael Papamichael, others based on slides from Onur Mutlu

More information

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip Anh T. Tran and Bevan M. Baas Department of Electrical and Computer Engineering University of California - Davis, USA {anhtr,

More information

Routing Algorithms. Review

Routing Algorithms. Review Routing Algorithms Today s topics: Deterministic, Oblivious Adaptive, & Adaptive models Problems: efficiency livelock deadlock 1 CS6810 Review Network properties are a combination topology topology dependent

More information

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Bhavya K. Daya, Li-Shiuan Peh, Anantha P. Chandrakasan Dept. of Electrical Engineering and Computer

More information

Deadlock and Router Micro-Architecture

Deadlock and Router Micro-Architecture 1 EE482: Advanced Computer Organization Lecture #8 Interconnection Network Architecture and Design Stanford University 22 April 1999 Deadlock and Router Micro-Architecture Lecture #8: 22 April 1999 Lecturer:

More information

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik SoC Design Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik Chapter 5 On-Chip Communication Outline 1. Introduction 2. Shared media 3. Switched media 4. Network on

More information

SURVEY ON LOW-LATENCY AND LOW-POWER SCHEMES FOR ON-CHIP NETWORKS

SURVEY ON LOW-LATENCY AND LOW-POWER SCHEMES FOR ON-CHIP NETWORKS SURVEY ON LOW-LATENCY AND LOW-POWER SCHEMES FOR ON-CHIP NETWORKS Chandrika D.N 1, Nirmala. L 2 1 M.Tech Scholar, 2 Sr. Asst. Prof, Department of electronics and communication engineering, REVA Institute

More information

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect Chris Fallin, Greg Nazario, Xiangyao Yu*, Kevin Chang, Rachata Ausavarungnirun, Onur Mutlu Carnegie Mellon University *CMU

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

Input Buffering (IB): Message data is received into the input buffer.

Input Buffering (IB): Message data is received into the input buffer. TITLE Switching Techniques BYLINE Sudhakar Yalamanchili School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA. 30332 sudha@ece.gatech.edu SYNONYMS Flow Control DEFITION

More information

Network on Chip Architecture: An Overview

Network on Chip Architecture: An Overview Network on Chip Architecture: An Overview Md Shahriar Shamim & Naseef Mansoor 12/5/2014 1 Overview Introduction Multi core chip Challenges Network on Chip Architecture Regular Topology Irregular Topology

More information

Lecture 14: Large Cache Design III. Topics: Replacement policies, associativity, cache networks, networking basics

Lecture 14: Large Cache Design III. Topics: Replacement policies, associativity, cache networks, networking basics Lecture 14: Large Cache Design III Topics: Replacement policies, associativity, cache networks, networking basics 1 LIN Qureshi et al., ISCA 06 Memory level parallelism (MLP): number of misses that simultaneously

More information

Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip

Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing Power and Performance Efficient Partial Circuits in Packet-Switched Networks-on-Chip Nasibeh Teimouri

More information

Communication Performance in Network-on-Chips

Communication Performance in Network-on-Chips Communication Performance in Network-on-Chips Axel Jantsch Royal Institute of Technology, Stockholm November 24, 2004 Network on Chip Seminar, Linköping, November 25, 2004 Communication Performance In

More information

Cache Coherence Protocols for Chip Multiprocessors - I

Cache Coherence Protocols for Chip Multiprocessors - I Cache Coherence Protocols for Chip Multiprocessors - I John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 522 Lecture 5 6 September 2016 Context Thus far chip multiprocessors

More information

Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks

Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks Department of Computer Science and Engineering, Texas A&M University Technical eport #2010-3-1 seudo-circuit: Accelerating Communication for On-Chip Interconnection Networks Minseon Ahn, Eun Jung Kim Department

More information

Message Passing Models and Multicomputer distributed system LECTURE 7

Message Passing Models and Multicomputer distributed system LECTURE 7 Message Passing Models and Multicomputer distributed system LECTURE 7 DR SAMMAN H AMEEN 1 Node Node Node Node Node Node Message-passing direct network interconnection Node Node Node Node Node Node PAGE

More information

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Philipp Gorski, Tim Wegner, Dirk Timmermann University

More information

Lecture 11: Large Cache Design

Lecture 11: Large Cache Design Lecture 11: Large Cache Design Topics: large cache basics and An Adaptive, Non-Uniform Cache Structure for Wire-Dominated On-Chip Caches, Kim et al., ASPLOS 02 Distance Associativity for High-Performance

More information

18-740/640 Computer Architecture Lecture 16: Interconnection Networks. Prof. Onur Mutlu Carnegie Mellon University Fall 2015, 11/4/2015

18-740/640 Computer Architecture Lecture 16: Interconnection Networks. Prof. Onur Mutlu Carnegie Mellon University Fall 2015, 11/4/2015 18-740/640 Computer Architecture Lecture 16: Interconnection Networks Prof. Onur Mutlu Carnegie Mellon University Fall 2015, 11/4/2015 Required Readings Required Reading Assignment: Dubois, Annavaram,

More information

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni

More information

SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto

SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES University of Toronto Interaction of Coherence and Network 2 Cache coherence protocol drives network-on-chip traffic Scalable coherence protocols

More information

Multicomputer distributed system LECTURE 8

Multicomputer distributed system LECTURE 8 Multicomputer distributed system LECTURE 8 DR. SAMMAN H. AMEEN 1 Wide area network (WAN); A WAN connects a large number of computers that are spread over large geographic distances. It can span sites in

More information

Embedded Systems 1: On Chip Bus

Embedded Systems 1: On Chip Bus October 2016 Embedded Systems 1: On Chip Bus Davide Zoni PhD email: davide.zoni@polimi.it webpage: home.deib.polimi.it/zoni Additional Material and Reference Book 2 Reference Book Chapter Principles and

More information

CMSC 611: Advanced. Interconnection Networks

CMSC 611: Advanced. Interconnection Networks CMSC 611: Advanced Computer Architecture Interconnection Networks Interconnection Networks Massively parallel processor networks (MPP) Thousands of nodes Short distance (

More information

Lecture 3: Topology - II

Lecture 3: Topology - II ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 3: Topology - II Tushar Krishna Assistant Professor School of Electrical and

More information

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect 1 A Soft Tolerant Network-on-Chip Router Pipeline for Multi-core Systems Pavan Poluri and Ahmed Louri Department of Electrical and Computer Engineering, University of Arizona Email: pavanp@email.arizona.edu,

More information

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance

More information

STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology

STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology Surbhi Jain Naveen Choudhary Dharm Singh ABSTRACT Network on Chip (NoC) has emerged as a viable solution to the complex communication

More information

Abstract. Paper organization

Abstract. Paper organization Allocation Approaches for Virtual Channel Flow Control Neeraj Parik, Ozen Deniz, Paul Kim, Zheng Li Department of Electrical Engineering Stanford University, CA Abstract s are one of the major resources

More information

Flow Control can be viewed as a problem of

Flow Control can be viewed as a problem of NOC Flow Control 1 Flow Control Flow Control determines how the resources of a network, such as channel bandwidth and buffer capacity are allocated to packets traversing a network Goal is to use resources

More information

Prediction Router: Yet another low-latency on-chip router architecture

Prediction Router: Yet another low-latency on-chip router architecture Prediction Router: Yet another low-latency on-chip router architecture Hiroki Matsutani Michihiro Koibuchi Hideharu Amano Tsutomu Yoshinaga (Keio Univ., Japan) (NII, Japan) (Keio Univ., Japan) (UEC, Japan)

More information

ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Interface with System Architecture. Prof. Natalie Enright Jerger

ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Interface with System Architecture. Prof. Natalie Enright Jerger ECE 1749H: Interconnec1on Networks for Parallel Computer Architectures: Interface with System Architecture Prof. Natalie Enright Jerger Systems and Interfaces Look at how systems interact and interface

More information

Network-on-Chip Architecture

Network-on-Chip Architecture Multiple Processor Systems(CMPE-655) Network-on-Chip Architecture Performance aspect and Firefly network architecture By Siva Shankar Chandrasekaran and SreeGowri Shankar Agenda (Enhancing performance)

More information

Adaptive Routing. Claudio Brunelli Adaptive Routing Institute of Digital and Computer Systems / TKT-9636

Adaptive Routing. Claudio Brunelli Adaptive Routing Institute of Digital and Computer Systems / TKT-9636 1 Adaptive Routing Adaptive Routing Basics Minimal Adaptive Routing Fully Adaptive Routing Load-Balanced Adaptive Routing Search-Based Routing Case Study: Adapted Routing in the Thinking Machines CM-5

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

EE 6900: Interconnection Networks for HPC Systems Fall 2016

EE 6900: Interconnection Networks for HPC Systems Fall 2016 EE 6900: Interconnection Networks for HPC Systems Fall 2016 Avinash Karanth Kodi School of Electrical Engineering and Computer Science Ohio University Athens, OH 45701 Email: kodi@ohio.edu 1 Acknowledgement:

More information

A Survey of Techniques for Power Aware On-Chip Networks.

A Survey of Techniques for Power Aware On-Chip Networks. A Survey of Techniques for Power Aware On-Chip Networks. Samir Chopra Ji Young Park May 2, 2005 1. Introduction On-chip networks have been proposed as a solution for challenges from process technology

More information

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari Global Journal of Computer Science and Technology: E Network, Web & Security Volume 15 Issue 6 Version 1.0 Year 2015 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals

More information

A Novel Energy Efficient Source Routing for Mesh NoCs

A Novel Energy Efficient Source Routing for Mesh NoCs 2014 Fourth International Conference on Advances in Computing and Communications A ovel Energy Efficient Source Routing for Mesh ocs Meril Rani John, Reenu James, John Jose, Elizabeth Isaac, Jobin K. Antony

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

Interconnect Technology and Computational Speed

Interconnect Technology and Computational Speed Interconnect Technology and Computational Speed From Chapter 1 of B. Wilkinson et al., PARAL- LEL PROGRAMMING. Techniques and Applications Using Networked Workstations and Parallel Computers, augmented

More information

Deadlock: Part II. Reading Assignment. Deadlock: A Closer Look. Types of Deadlock

Deadlock: Part II. Reading Assignment. Deadlock: A Closer Look. Types of Deadlock Reading Assignment T. M. Pinkston, Deadlock Characterization and Resolution in Interconnection Networks, Chapter 13 in Deadlock Resolution in Computer Integrated Systems, CRC Press 2004 Deadlock: Part

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

ACCELERATING COMMUNICATION IN ON-CHIP INTERCONNECTION NETWORKS. A Dissertation MIN SEON AHN

ACCELERATING COMMUNICATION IN ON-CHIP INTERCONNECTION NETWORKS. A Dissertation MIN SEON AHN ACCELERATING COMMUNICATION IN ON-CHIP INTERCONNECTION NETWORKS A Dissertation by MIN SEON AHN Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements

More information

Lecture 23: Router Design

Lecture 23: Router Design Lecture 23: Router Design Papers: A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks, ISCA 06, Penn-State ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip

More information

ECE 669 Parallel Computer Architecture

ECE 669 Parallel Computer Architecture ECE 669 Parallel Computer Architecture Lecture 21 Routing Outline Routing Switch Design Flow Control Case Studies Routing Routing algorithm determines which of the possible paths are used as routes how

More information

A Hybrid Interconnection Network for Integrated Communication Services

A Hybrid Interconnection Network for Integrated Communication Services A Hybrid Interconnection Network for Integrated Communication Services Yi-long Chen Northern Telecom, Inc. Richardson, TX 7583 kchen@nortel.com Jyh-Charn Liu Department of Computer Science, Texas A&M Univ.

More information

NOW Handout Page 1. Outline. Networks: Routing and Design. Routing. Routing Mechanism. Routing Mechanism (cont) Properties of Routing Algorithms

NOW Handout Page 1. Outline. Networks: Routing and Design. Routing. Routing Mechanism. Routing Mechanism (cont) Properties of Routing Algorithms Outline Networks: Routing and Design Routing Switch Design Case Studies CS 5, Spring 99 David E. Culler Computer Science Division U.C. Berkeley 3/3/99 CS5 S99 Routing Recall: routing algorithm determines

More information

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem Reading W. Dally, C. Seitz, Deadlock-Free Message Routing on Multiprocessor Interconnection Networks,, IEEE TC, May 1987 Deadlock F. Silla, and J. Duato, Improving the Efficiency of Adaptive Routing in

More information

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013 NetSpeed ORION: A New Approach to Design On-chip Interconnects August 26 th, 2013 INTERCONNECTS BECOMING INCREASINGLY IMPORTANT Growing number of IP cores Average SoCs today have 100+ IPs Mixing and matching

More information

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts School of Electrical and Computer Engineering Cornell University revision: 2017-10-17-12-26 1 Network/Roadway Analogy 3 1.1. Running

More information

Lecture 2 Parallel Programming Platforms

Lecture 2 Parallel Programming Platforms Lecture 2 Parallel Programming Platforms Flynn s Taxonomy In 1966, Michael Flynn classified systems according to numbers of instruction streams and the number of data stream. Data stream Single Multiple

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information

Demand Based Routing in Network-on-Chip(NoC)

Demand Based Routing in Network-on-Chip(NoC) Demand Based Routing in Network-on-Chip(NoC) Kullai Reddy Meka and Jatindra Kumar Deka Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Guwahati, India Abstract

More information

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER A Thesis by SUNGHO PARK Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements

More information

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University Hybrid On-chip Data Networks Gilbert Hendry Keren Bergman Lightwave Research Lab Columbia University Chip-Scale Interconnection Networks Chip multi-processors create need for high performance interconnects

More information

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK DOI: 10.21917/ijct.2012.0092 HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK U. Saravanakumar 1, R. Rangarajan 2 and K. Rajasekar 3 1,3 Department of Electronics and Communication

More information

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220 Admin Homework #5 Due Dec 3 Projects Final (yes it will be cumulative) CPS 220 2 1 Review: Terms Network characterized

More information