Prediction Router: Yet another low-latency on-chip router architecture

Size: px
Start display at page:

Download "Prediction Router: Yet another low-latency on-chip router architecture"

Transcription

1 Prediction Router: Yet another low-latency on-chip router architecture Hiroki Matsutani Michihiro Koibuchi Hideharu Amano Tsutomu Yoshinaga (Keio Univ., Japan) (NII, Japan) (Keio Univ., Japan) (UEC, Japan)

2 Why low-latency router is needed? Tile architecture Many cores (e.g., processors & caches) On-chip interconnection network [Dally, DAC 01] Core Router router router router router router router router router router Packet switched network 16-core tile architecture On-chip router affects the performance and cost of the chip

3 System Topology Routing Switching Flow ctrl MIT RAW 2D mesh (32bit) XY DOR WH, no VC Credit UPMC SPIN Fat Tree (32bit) Up*/down* WH, no VC Credit QuickSilver ACM H-Tree (32bit) Up*/down* 1-flit, no VC Credit UMass Amherst asoc Sun T1 2D mesh Crossbar (128bit) Shortestpath Pipelined CS, no VC Timeslot - - Handshake Cell BE EIB Ring (128bit) Shortestpath no Number Pipelined of hops CS, increases Credit VC TRIPS (operand) TRIPS (on-chip) Why low-latency router is 2D mesh (109bit) 2D mesh (128bit) needed? Intel SCC 2D torus (32bit) XY,YX DOR, Number of cores increases (e.g., 64-core or more?) Their communication latency is a crucial problem YX DOR 1-flit, no VC On/off YX DOR WH, 4 VCs Credit odd-even TM WH, no VC Stall/go Low-latency router architecture has been extensively studied

4 Outline: Prediction router for low-latency NoC Existing low-latency routers Speculative router Look-ahead router Bypassing router Prediction router Architecture and the prediction algorithms Hit rate analysis Evaluations Hit rate, gate count, and energy consumption Case study 1: 2-D mesh (small core size) Case study 2: 2-D mesh (large core size) Case study 3: Fat tree network

5 Wormhole router: Hardware structure 1) selecting an Input ports output channel X+ FIFO 2) arbitration for the selected output channel GRANT ARBITER Output ports X+ X- FIFO X- Y+ FIFO Y+ Y- CORE FIFO FIFO 3) sending the packet to the output channel 5x5 CROSSBAR Y- CORE Routing, arbitration, & switch traversal are performed in a pipeline manner

6 Pipeline structure: 3-cycle router Speculative router: VA/SA in parallel [Peh,HPCA 01] At least 3-cycle for traversing a router RC (Routing computation) VSA (Virtual channel & switch allocations) (Switch traversal) VA & SA are speculatively performed in parallel A packet transfer from router (a) to router C HEAD RC VSA RC VSA RC VSA DATA 1 SA SA SA DATA 2 SA SA SA DATA 3 SA SA SA To perform RC and VSA in parallel, look-ahead routing is used At least 12-cycle for transferring ELAPSED a TIME packet [CYCLE] from router (a) to router (c)

7 Look-ahead router:rc/va in parallel At least 3-cycle for traversing a router NRC (Next routing computation) VSA (Virtual channel & switch allocations) (Switch traversal) HEAD DATA 1 DATA 2 DATA 3 Routing computation for the next hop Output port of router (i+1) is selected by router C NRC VSA SA NRC VSA SA SA SA NRC VSA SA SA ELAPSED TIME [CYCLE] VSA can be performed w/o waiting for NRC SA SA SA

8 Look-ahead router:rc/va in parallel At least 2-cycle for traversing a router NRC + VSA (Next routing computation / arbitrations) (Switch traversal) HEAD DATA 1 DATA 2 DATA 3 No dependency between NRC & VSA NRC & VSA in A NRC C NRC VSA NRC VSA At least 9-cycle for transferring ELAPSED a TIME packet [CYCLE] from router (a) to router (c) [Dally s book, 2004] Typical example of 2-cycle router Packing NRC,VSA, into a single stage frequency harmed

9 Bypassing router: skip some stages Bypassing between intermediate nodes E.g., Express VCs [Kumar, ISCA 07] Virtual bypassing paths SRC 3-cycle Bypassed 3-cycle 1-cycle 3-cycle Bypassed 3-cycle 1-cycle D 3-cycle

10 Bypassing router: skip some stages Bypassing between intermediate nodes E.g., Express VCs [Kumar, ISCA 07] Virtual bypassing paths SRC 3-cycle Bypassed 3-cycle 1-cycle Pipeline bypassing utilizing the regularity of DOR E.g., Mad postman Pipeline stages on frequently used are skipped E.g., Dynamic fast path Pipeline stages on user-specific paths are skipped E.g., Preferred path E.g., DBP 3-cycle [Izu, PDP 94] [Park, HOTI 07] [Michelogiannakis, NOCS 07] [Koibuchi, NOCS 08] Bypassed 3-cycle 1-cycle D 3-cycle We propose a low-latency router based on multiple predictors

11 Outline: Prediction router for low-latency NoC Existing low-latency routers Speculative router Look-ahead router Bypassing router Prediction router Architecture and the prediction algorithms Hit rate analysis Evaluations Hit rate, gate count, and energy consumption Case study 1: 2-D mesh (small core size) Case study 2: 2-D mesh (large core size) Case study 3: Fat tree network

12 Prediction router for 1-cycle transfer [Yoshinaga,IWIA 06] [Yoshinaga,IWIA 07] Each input channel has predictors When an input channel is idle, Predict an output port to be used (RC pre-execution) Arbitration to use the predicted port(sa preexecution) RC & VSA are skipped if prediction hits 1-cycle C HEAD RC VSA RC VSA RC VSA DATA 1 DATA 2 DATA ELAPSED TIME [CYCLE] E.g, we can expect 1.6 cycle transfer if 70% of predictions hit

13 Prediction router for 1-cycle transfer [Yoshinaga,IWIA 06] [Yoshinaga,IWIA 07] Each input channel has predictors When an input channel is idle, Predict an output port to be used (RC pre-execution) Arbitration to use the predicted port(sa preexecution) RC & VSA are skipped if prediction hits 1-cycle transfer C HEAD RC VSA RC VSA RC VSA DATA 1 DATA 2 DATA ELAPSED TIME [CYCLE] E.g, we can expect 1.6 cycle transfer if 70% of predictions hit

14 Prediction router for 1-cycle transfer [Yoshinaga,IWIA 06] [Yoshinaga,IWIA 07] Each input channel has predictors When an input channel is idle, Predict an output port to be used (RC pre-execution) Arbitration to use the predicted port(sa preexecution) RC & VSA are skipped if prediction hits 1-cycle transfer MISS C HEAD RC VSA RC VSA DATA 1 DATA 2 DATA ELAPSED TIME [CYCLE] E.g, we can expect 1.6 cycle transfer if 70% of predictions hit

15 Prediction router for 1-cycle transfer [Yoshinaga,IWIA 06] [Yoshinaga,IWIA 07] Each input channel has predictors When an input channel is idle, Predict an output port to be used (RC pre-execution) Arbitration to use the predicted port(sa preexecution) RC & VSA are skipped if prediction hits 1-cycle transfer MISS HIT HIT HEAD RC VSA DATA 1 DATA 2 DATA ELAPSED TIME [CYCLE] E.g, we can expect 1.6 cycle transfer if 70% of predictions hit

16 Prediction router: Prediction algorithms Efficient predictor is key Prediction router [Yoshinaga,IWIA 06] [Yoshinaga,IWIA 07] 1. Random Single 2. predictor Static isn t Straight enough (SS) for applications An output with channel different on traffic the same patterns dimension is selected (exploiting the regularity of DOR) Multiple 3. predictors Custom for each input channel User can specify which output channel is accelerated Predictors 4. Latest Port (LP) A B Previously C used output channel is selected 5. Finite Context Method (FCM) [Burtscher, TC 02] The most frequently appeared pattern of Select one n of -context them in sequence (n = 0,1,2, ) response 6. to Sampled a given network Pattern Match (SPM) [Jacquet, TIT 02] environment Pattern matching using a record table

17 Basic Correct prediction Idle state: Output port X+ is selected and reserved 1st cycle: Incoming flit is transferred to X+ without RC and VSA 1st cycle: RC is performed The prediction is correct! 2nd cycle: Next flit is transferred to X+ without RC and VSA Predictors A B C ARBITER Correct X+ FIFO X+ X- Y+ Y- CORE Crossbar is reserved 5x5 XBAR X- Y+ Y- CORE 1-cycle transfer using the reserved crossbar-port when prediction hits

18 Basic Miss prediction Idle state: Output port X+ is selected and reserved 1st cycle: Incoming flit is transferred to X+ without RC and VSA 1st cycle: RC is performed The prediction is wrong! (X- is correct) Kill signal to X+ is asserted 2nd/3rd cycle: Dead flit is removed; retransmission to the correct port Predictors A B C ARBITER KILL X+ X- Y+ Y- CORE FIFO Dead flit More energy for retransmission 5x5 XBAR X+ Correct X- Y+ Y- CORE Even with miss prediction, a flit is transferred in 3-cycle as original router

19 Outline: Prediction router for low-latency NoC Existing low-latency routers Speculative router Look-ahead router Bypassing router Prediction router Architecture and the prediction algorithms Hit rate analysis Evaluations Hit rate, gate count, and energy consumption Case study 1: 2-D mesh (small core size) Case study 2: 2-D mesh (large core size) Case study 3: Fat tree network

20 Prediction hit rate analysis Formulas to calculate the prediction hit rates on 2-D torus (Random, LP, SS, FCM, and SPM) 2-D mesh (Random, LP, SS, FCM, and SPM) Fat tree (Random and LRU) To forecast which prediction algorithm is suited for a given network environment w/o simulations Accuracy of the analytical model is confirmed through simulations Derivation of the formulas is omitted in this talk (See Section 4 of our paper for more detail)

21 Outline: Prediction router for low-latency NoC Existing low-latency routers Speculative router Look-ahead router Bypassing router Prediction router Architecture and the prediction algorithms Hit rate analysis Evaluations Hit rate, gate count, and energy consumption Case study 1: 2-D mesh (small core size) Case study 2: 2-D mesh (large core size) Case study 3: Fat tree network

22 How many cycles? miss hit hit Flit-level net simulation Evaluation items hit FIFO FIFO XBAR Design compiler(synthesis) Fujitsu 65nm library Astro (place & route) NC-Verilog (simulation) SAIF SDF Power compiler Hit rate / Comm. latency Area (gate count) Energy cons. [pj / bit] Table 1: Router & network parameters Packet length 4-flit (1-flit: 64 bit) Switching technique wormhole Channel buffer size 4-flit / VC Number of VCs Cycle / hop (miss) 1 or 2VCs 3 stage Cycle *Topology / hop and (hit) traffic 1 are stage mentioned later Table 2: Process library CMOS process 65nm Core voltage 1.20V Temperature 25C Table 3: CAD tools used Design compiler Astro

23 3 case studies of prediction router How many cycles? hit miss hit hit Flit-level net simulation Hit rate / Comm. latency Area (gate count) Energy cons. [pj / bit] 2-D mesh network Fat tree network Case study 1 & 2 FIFO FIFO XBAR Design compiler(synthesis) Fujitsu 65nm library Astro (place & route) NC-Verilog (simulation) SAIF SDF Power compiler The most popular network topology MIT s RAW [Taylor,ISCA 04] Intel s 80-core [Vangal,ISSCC 07] Dimension-order routing (XY routing) Here, we show the results of case studies 1 and 2 together Case study 3

24 Comm. latency [cycles] Case study 1: Zero-load comm.latency Original router Pred router (SS) Pred router (100% hit) Uniform random traffic on 4x4 to 16x16 meshes (*) 1-cycle transfer for correct prediction, 3-cycle for wrong prediction 35.8% reduced for 8x8 cores 48.2% reduced for 16x16 cores Simulation results (analytical model also shows the same result) More latency Network reduced size (k-ary (48% 2-mesh) for k=16) as network size increases

25 Prediction hit rate [%] Case study 2: Hit 8x8 mesh SS: go straight LP: the last one FCM: frequently used pattern Efficient for long straight comm. 7 NAS parallel benchmark programs 4 synthesized traffics

26 Prediction hit rate [%] Case study 2: Hit 8x8 mesh SS: go straight LP: the last one FCM: frequently used pattern Efficient for long straight comm. Efficient for short repeated comm. 7 NAS parallel benchmark programs 4 synthesized traffics

27 Prediction hit rate [%] Case study 2: Hit 8x8 mesh SS: go straight LP: the last one FCM: frequently used pattern Efficient for long straight comm. Efficient for short repeated comm. All arounder! Existing bypassing routers use Only a static or a single bypassing policy However, effective bypassing policy depends on traffic patterns Prediction router supports Multiple predictors which can be switched in a cycle To accelerate a wider range of applications 7 NAS parallel benchmark programs 4 synthesized traffics

28 Case study 2: Area & Energy Area (gate count) Original router Pred router (SS + LP) Pred router (SS+LP+FCM) Energy consumption Light-weight (small overhead) FCM is all-arounder, but requires counters Verilog-HDL designs Router area [kilo gates] Synthesized with 65nm library % increased, depending on type and number of predictors

29 Case study 2: Area & Energy Area (gate count) Original router Pred router (SS + LP) Pred router (SS+LP+FCM) Energy consumption Original router Pred router (70% hit) Pred router (100% hit) This estimation is pessimistic. 1. More energy consumed in links Effect of router energy overhead is reduced 2. Application will be finished early More energy saved Router area [kilo gates] Flit switching energy [pj / bit] % increased, depending on type and number of predictors Miss prediction consumes power; 9.5% increased if hit rate is 70% Latency 35.8%-48.2% saved w/ reasonable area/energy overheads

30 3 case studies of prediction router How many cycles? hit miss hit hit Flit-level net simulation FIFO FIFO XBAR Design compiler(synthesis) Fujitsu 65nm library Astro (place & route) NC-Verilog (simulation) SAIF Hit rate / Comm. latency Area (gate count) Energy cons. [pj / bit] 2-D mesh network Fat tree network SDF Power compiler Case study 1 & 2 Case study 3

31 Case study 3: Fat tree network Up Down 1. LRU algorithm LRU output port is selected for upward transfer 2. LRU + LP algorithm Plus, LP for downward transfer

32 Comm. latency [cycles] Case study 3: Fat tree network Up Down Comm. Original router Pred router (LRU) Pred router (LRU + LP) 1. LRU algorithm LRU output port is selected for upward transfer 2. LRU + LP algorithm Network size (# of cores) Plus, Latency LP for 30.7% downward reduced 256-core; Small area overhead (7.8%)

33 Summary of the prediction router Prediction router for low-latency NoCs Multiple predictors, which can be switched in a cycle Architecture and six prediction algorithms Analytical model of prediction hit rates Evaluations of prediction router Case study 1 : 2-D mesh (small core size) Case study 2 : 2-D mesh (large core size) Case study 3 : Fat tree network From three case studies Area overhead: 6.4% (SS+LP) Energy overhead: 9.5% (worst) Latency reduction: up to 48% Results 1. Prediction router can be applied to various NoCs (from Case studies 1 & 2) 2. Communication latency reduced with small overheads 3. Prediction router with multiple predictors can accelerate a wider range of applications

34 Thank you for your attention It would be very helpful if you would speak slowly. Thank you in advance.

35 Prediction router: New modifications Predictors for each input channel Kill mechanism to remove dead flits Two-level arbiter Reservation higher priority Tentative reservation by the pre-execution of VSA X+ X- Y+ Y- CORE Predictors A B C FIFO KILL signals ARBITER Currently, the critical path is related to the arbiter 5x5 XBAR X+ X- Y+ Y- CORE

36 Prediction router: Predictor selection Static scheme A predictor is selected by user per application Predictors A B C Dynamic scheme A predictor is adaptively selected Predictors A B C Configuration table Application 1 Predictor B Application 2 Predictor A Application 3 Predictor C Count up if each predictor hits Predictor A 100 Predictor B 80 Predictor C 120 A predictor is selected every n cycles (e.g., n =10,000) Simple Pre-analysis is needed Flexible More energy

37 Stage delay [FO4s] Case study 1: Router critical path RC: Routing comp. VSA: Arbitration : Switch traversal can be occurred in these stages of prediction router 6.2% critical path delay increased compared with original router Original router Pred router (SS)

38 Prediction hit rate [%] Case study 2: Hit 8x8 mesh SS: go straight LP: the last one FCM: frequently used pattern Custom: user-specific path Efficient for long straight comm. Efficient for short repeated comm. All arounder! Efficient for simple comm. 7 NAS parallel benchmark programs 4 synthesized traffics

39 Case study 4: Spidergon network Spidergon topology Ring + across links [Coppola,ISSOC 04] Hit Uniform Each router has 3-port Mesh-like 2-D layout Across first routing

40 Prediction hit rate [%] Case study 4: Spidergon network Spidergon topology Ring + across links [Coppola,ISSOC 04] Hit Uniform SS: Go straight LP: Last used one FCM: Frequently used one Hit rates of SS & FCM are almost the same Each router has 3-port Mesh-like 2-D layout Network size (# of cores) High Across hit rate first is achieved routing (80% for 64core; 94% for 256core)

41 4 case studies of prediction router How many cycles? hit miss hit hit Flit-level net simulation FIFO FIFO XBAR Design compiler(synthesis) Fujitsu 65nm library Astro (place & route) NC-Verilog (simulation) SAIF SDF Power compiler Hit rate / Comm. latency Area (gate count) Energy cons. [pj / bit] 2-D mesh network Fat tree network Spidergon network Case study 1 & 2 Case study 3 Case study 4

Lecture 22: Router Design

Lecture 22: Router Design Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO 03, Princeton A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU Thomas Moscibroda Microsoft Research Onur Mutlu CMU CPU+L1 CPU+L1 CPU+L1 CPU+L1 Multi-core Chip Cache -Bank Cache -Bank Cache -Bank Cache -Bank CPU+L1 CPU+L1 CPU+L1 CPU+L1 Accelerator, etc Cache -Bank

More information

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality

More information

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID 1 Virtual Channel Flow Control Each switch has multiple virtual channels per phys. channel Each virtual

More information

3D WiNoC Architectures

3D WiNoC Architectures Interconnect Enhances Architecture: Evolution of Wireless NoC from Planar to 3D 3D WiNoC Architectures Hiroki Matsutani Keio University, Japan Sep 18th, 2014 Hiroki Matsutani, "3D WiNoC Architectures",

More information

Lecture 14: Large Cache Design III. Topics: Replacement policies, associativity, cache networks, networking basics

Lecture 14: Large Cache Design III. Topics: Replacement policies, associativity, cache networks, networking basics Lecture 14: Large Cache Design III Topics: Replacement policies, associativity, cache networks, networking basics 1 LIN Qureshi et al., ISCA 06 Memory level parallelism (MLP): number of misses that simultaneously

More information

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) hyoukjun@gatech.edu April

More information

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Young Hoon Kang, Taek-Jun Kwon, and Jeff Draper {youngkan, tjkwon, draper}@isi.edu University of Southern California

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Lecture 23: Router Design

Lecture 23: Router Design Lecture 23: Router Design Papers: A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks, ISCA 06, Penn-State ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip

More information

Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks

Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks Department of Computer Science and Engineering, Texas A&M University Technical eport #2010-3-1 seudo-circuit: Accelerating Communication for On-Chip Interconnection Networks Minseon Ahn, Eun Jung Kim Department

More information

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER A Thesis by SUNGHO PARK Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements

More information

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels Lecture: Interconnection Networks Topics: TM wrap-up, routing, deadlock, flow control, virtual channels 1 TM wrap-up Eager versioning: create a log of old values Handling problematic situations with a

More information

OASIS Network-on-Chip Prototyping on FPGA

OASIS Network-on-Chip Prototyping on FPGA Master thesis of the University of Aizu, Feb. 20, 2012 OASIS Network-on-Chip Prototyping on FPGA m5141120, Kenichi Mori Supervised by Prof. Ben Abdallah Abderazek Adaptive Systems Laboratory, Master of

More information

Lecture 15: NoC Innovations. Today: power and performance innovations for NoCs

Lecture 15: NoC Innovations. Today: power and performance innovations for NoCs Lecture 15: NoC Innovations Today: power and performance innovations for NoCs 1 Network Power Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO 03, Princeton Energy for a flit

More information

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS OASIS NoC Architecture Design in Verilog HDL Technical Report: TR-062010-OASIS Written by Kenichi Mori ASL-Ben Abdallah Group Graduate School of Computer Science and Engineering The University of Aizu

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Lecture 3: Flow-Control

Lecture 3: Flow-Control High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor

More information

Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom

Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom ISCA 2018 Session 8B: Interconnection Networks Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom Aniruddh Ramrakhyani Georgia Tech (aniruddh@gatech.edu) Tushar

More information

Low-Power Interconnection Networks

Low-Power Interconnection Networks Low-Power Interconnection Networks Li-Shiuan Peh Associate Professor EECS, CSAIL & MTL MIT 1 Moore s Law: Double the number of transistors on chip every 2 years 1970: Clock speed: 108kHz No. transistors:

More information

Lecture 2: Topology - I

Lecture 2: Topology - I ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 2: Topology - I Tushar Krishna Assistant Professor School of Electrical and

More information

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling Bhavya K. Daya, Li-Shiuan Peh, Anantha P. Chandrakasan Dept. of Electrical Engineering and Computer

More information

STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip

STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip Codesign for Tiled Manycore Systems Mingyu Wang and Zhaolin Li Institute of Microelectronics, Tsinghua University, Beijing 100084,

More information

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip ASP-DAC 2010 20 Jan 2010 Session 6C Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip Jonas Diemer, Rolf Ernst TU Braunschweig, Germany diemer@ida.ing.tu-bs.de Michael Kauschke Intel,

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

Deadlock-free XY-YX router for on-chip interconnection network

Deadlock-free XY-YX router for on-chip interconnection network LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ

More information

Lecture 18: Communication Models and Architectures: Interconnection Networks

Lecture 18: Communication Models and Architectures: Interconnection Networks Design & Co-design of Embedded Systems Lecture 18: Communication Models and Architectures: Interconnection Networks Sharif University of Technology Computer Engineering g Dept. Winter-Spring 2008 Mehdi

More information

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background 1 Hard Error Tolerance in PCM PCM cells will eventually fail; important to cause gradual capacity degradation

More information

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals Philipp Gorski, Tim Wegner, Dirk Timmermann University

More information

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect 1 A Soft Tolerant Network-on-Chip Router Pipeline for Multi-core Systems Pavan Poluri and Ahmed Louri Department of Electrical and Computer Engineering, University of Arizona Email: pavanp@email.arizona.edu,

More information

OpenSMART: An Opensource Singlecycle Multi-hop NoC Generator

OpenSMART: An Opensource Singlecycle Multi-hop NoC Generator OpenSMART: An Opensource Singlecycle Multi-hop NoC Generator Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) OpenSMART (https://tinyurl.com/get-opensmart)

More information

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching. Switching/Flow Control Overview Interconnection Networks: Flow Control and Microarchitecture Topology: determines connectivity of network Routing: determines paths through network Flow Control: determine

More information

ISSN Vol.03,Issue.06, August-2015, Pages:

ISSN Vol.03,Issue.06, August-2015, Pages: WWW.IJITECH.ORG ISSN 2321-8665 Vol.03,Issue.06, August-2015, Pages:0920-0924 Performance and Evaluation of Loopback Virtual Channel Router with Heterogeneous Router for On Chip Network M. VINAY KRISHNA

More information

The Design and Implementation of a Low-Latency On-Chip Network

The Design and Implementation of a Low-Latency On-Chip Network The Design and Implementation of a Low-Latency On-Chip Network Robert Mullins 11 th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan 24-27 th, 2006, Yokohama, Japan. Introduction Current

More information

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks Lecture: Transactional Memory, Networks Topics: TM implementations, on-chip networks 1 Summary of TM Benefits As easy to program as coarse-grain locks Performance similar to fine-grain locks Avoids deadlock

More information

A Multi-Vdd Dynamic Variable-Pipeline On-Chip Router for CMPs

A Multi-Vdd Dynamic Variable-Pipeline On-Chip Router for CMPs A Multi-Vdd Dynamic Variable-Pipeline On-Chip Router for CMPs Hiroki Matsutani 1, Yuto Hirata 1, Michihiro Koibuchi 2, Kimiyoshi Usami 3, Hiroshi Nakamura 4, and Hideharu Amano 1 1 Keio University 2 National

More information

Evaluating Bufferless Flow Control for On-Chip Networks

Evaluating Bufferless Flow Control for On-Chip Networks Evaluating Bufferless Flow Control for On-Chip Networks George Michelogiannakis, Daniel Sanchez, William J. Dally, Christos Kozyrakis Stanford University In a nutshell Many researchers report high buffer

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

A Low Latency Router Supporting Adaptivity for On-Chip Interconnects

A Low Latency Router Supporting Adaptivity for On-Chip Interconnects A Low Latency Supporting Adaptivity for On-Chip Interconnects 34.2 Jongman Kim, Dongkook Park, T. Theocharides, N. Vijaykrishnan and Chita R. Das Department of Computer Science and Engineering The Pennsylvania

More information

A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin

A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin 50 IEEE EMBEDDED SYSTEMS LETTERS, VOL. 1, NO. 2, AUGUST 2009 A Layer-Multiplexed 3D On-Chip Network Architecture Rohit Sunkam Ramanujam and Bill Lin Abstract Programmable many-core processors are poised

More information

Special Course on Computer Architecture

Special Course on Computer Architecture Special Course on Computer Architecture #9 Simulation of Multi-Processors Hiroki Matsutani and Hideharu Amano Outline: Simulation of Multi-Processors Background [10min] Recent multi-core and many-core

More information

Computer Engineering Mekelweg 4, 2628 CD Delft The Netherlands MSc THESIS

Computer Engineering Mekelweg 4, 2628 CD Delft The Netherlands  MSc THESIS Computer Engineering Mekelweg 4, 2628 CD Delft The Netherlands http://ce.et.tudelft.nl/ 2014 MSc THESIS NoC characterization framework for design space exploration Sriram Prakash Adiga Abstract A Network

More information

Ultra-Fast NoC Emulation on a Single FPGA

Ultra-Fast NoC Emulation on a Single FPGA The 25 th International Conference on Field-Programmable Logic and Applications (FPL 2015) September 3, 2015 Ultra-Fast NoC Emulation on a Single FPGA Thiem Van Chu, Shimpei Sato, and Kenji Kise Tokyo

More information

ACCELERATING COMMUNICATION IN ON-CHIP INTERCONNECTION NETWORKS. A Dissertation MIN SEON AHN

ACCELERATING COMMUNICATION IN ON-CHIP INTERCONNECTION NETWORKS. A Dissertation MIN SEON AHN ACCELERATING COMMUNICATION IN ON-CHIP INTERCONNECTION NETWORKS A Dissertation by MIN SEON AHN Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements

More information

Tightly-Coupled Multi-Layer Topologies for 3-D NoCs

Tightly-Coupled Multi-Layer Topologies for 3-D NoCs Tightly-Coupled Multi-Layer Topologies for -D NoCs Hiroki Matsutani, Michihiro Koibuchi, and Hideharu Amano Keio University National Institute of Informatics -4-, Hiyoshi, Kohoku-ku, Yokohama, --, Hitotsubashi,

More information

Phastlane: A Rapid Transit Optical Routing Network

Phastlane: A Rapid Transit Optical Routing Network Phastlane: A Rapid Transit Optical Routing Network Mark Cianchetti, Joseph Kerekes, and David Albonesi Computer Systems Laboratory Cornell University The Interconnect Bottleneck Future processors: tens

More information

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 02, 2014 ISSN (online): 2321-0613 Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek

More information

Asynchronous Bypass Channel Routers

Asynchronous Bypass Channel Routers 1 Asynchronous Bypass Channel Routers Tushar N. K. Jain, Paul V. Gratz, Alex Sprintson, Gwan Choi Department of Electrical and Computer Engineering, Texas A&M University {tnj07,pgratz,spalex,gchoi}@tamu.edu

More information

ES1 An Introduction to On-chip Networks

ES1 An Introduction to On-chip Networks December 17th, 2015 ES1 An Introduction to On-chip Networks Davide Zoni PhD mail: davide.zoni@polimi.it webpage: home.dei.polimi.it/zoni Sources Main Reference Book (for the examination) Designing Network-on-Chip

More information

Part IV: 3D WiNoC Architectures

Part IV: 3D WiNoC Architectures Wireless NoC as Interconnection Backbone for Multicore Chips: Promises, Challenges, and Recent Developments Part IV: 3D WiNoC Architectures Hiroki Matsutani Keio University, Japan 1 Outline: 3D WiNoC Architectures

More information

EE 6900: Interconnection Networks for HPC Systems Fall 2016

EE 6900: Interconnection Networks for HPC Systems Fall 2016 EE 6900: Interconnection Networks for HPC Systems Fall 2016 Avinash Karanth Kodi School of Electrical Engineering and Computer Science Ohio University Athens, OH 45701 Email: kodi@ohio.edu 1 Acknowledgement:

More information

Implementing Flexible Interconnect Topologies for Machine Learning Acceleration

Implementing Flexible Interconnect Topologies for Machine Learning Acceleration Implementing Flexible Interconnect for Machine Learning Acceleration A R M T E C H S Y M P O S I A O C T 2 0 1 8 WILLIAM TSENG Mem Controller 20 mm Mem Controller Machine Learning / AI SoC New Challenges

More information

Lecture 12: SMART NoC

Lecture 12: SMART NoC ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 12: SMART NoC Tushar Krishna Assistant Professor School of Electrical and Computer

More information

CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION. Stephen Chui Bachelor of Engineering Ryerson University, 2012.

CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION. Stephen Chui Bachelor of Engineering Ryerson University, 2012. CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION by Stephen Chui Bachelor of Engineering Ryerson University, 2012 A thesis presented to Ryerson University in partial fulfillment of the

More information

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect Chris Fallin, Greg Nazario, Xiangyao Yu*, Kevin Chang, Rachata Ausavarungnirun, Onur Mutlu Carnegie Mellon University *CMU

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 2, Issue 9, September 2012 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Heterogeneous

More information

Architecture and Design of Efficient 3D Network-on-Chip for Custom Multi-Core SoC

Architecture and Design of Efficient 3D Network-on-Chip for Custom Multi-Core SoC BWCCA 2010 Fukuoka, Japan November 4-6 2010 Architecture and Design of Efficient 3D Network-on-Chip for Custom Multi-Core SoC Akram Ben Ahmed, Abderazek Ben Abdallah, Kenichi Kuroda The University of Aizu

More information

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on on-chip Architecture Avinash Kodi, Ashwini Sarathy * and Ahmed Louri * Department of Electrical Engineering and

More information

A Lightweight Fault-Tolerant Mechanism for Network-on-Chip

A Lightweight Fault-Tolerant Mechanism for Network-on-Chip A Lightweight Fault-Tolerant Mechanism for Network-on-Chip Michihiro Koibuchi 1, Hiroki Matsutani 2, Hideharu Amano 2, and Timothy Mark Pinkston 3 1 National Institute of Informatics, 2-1-2, Hitotsubashi,

More information

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip Nauman Jalil, Adnan Qureshi, Furqan Khan, and Sohaib Ayyaz Qazi Abstract

More information

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC QoS Aware BiNoC Architecture Shih-Hsin Lo, Ying-Cherng Lan, Hsin-Hsien Hsien Yeh, Wen-Chung Tsai, Yu-Hen Hu, and Sao-Jie Chen Ying-Cherng Lan CAD System Lab Graduate Institute of Electronics Engineering

More information

SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto

SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES University of Toronto Interaction of Coherence and Network 2 Cache coherence protocol drives network-on-chip traffic Scalable coherence protocols

More information

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus)

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Routing Algorithm How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus) Many routing algorithms exist 1) Arithmetic 2) Source-based 3) Table lookup

More information

A Novel Approach to Reduce Packet Latency Increase Caused by Power Gating in Network-on-Chip

A Novel Approach to Reduce Packet Latency Increase Caused by Power Gating in Network-on-Chip A Novel Approach to Reduce Packet Latency Increase Caused by Power Gating in Network-on-Chip Peng Wang Sobhan Niknam Zhiying Wang Todor Stefanov Leiden Institute of Advanced Computer Science, Leiden University,

More information

SURVEY ON LOW-LATENCY AND LOW-POWER SCHEMES FOR ON-CHIP NETWORKS

SURVEY ON LOW-LATENCY AND LOW-POWER SCHEMES FOR ON-CHIP NETWORKS SURVEY ON LOW-LATENCY AND LOW-POWER SCHEMES FOR ON-CHIP NETWORKS Chandrika D.N 1, Nirmala. L 2 1 M.Tech Scholar, 2 Sr. Asst. Prof, Department of electronics and communication engineering, REVA Institute

More information

A Survey of Techniques for Power Aware On-Chip Networks.

A Survey of Techniques for Power Aware On-Chip Networks. A Survey of Techniques for Power Aware On-Chip Networks. Samir Chopra Ji Young Park May 2, 2005 1. Introduction On-chip networks have been proposed as a solution for challenges from process technology

More information

A thesis presented to. the faculty of. In partial fulfillment. of the requirements for the degree. Master of Science. Yixuan Zhang.

A thesis presented to. the faculty of. In partial fulfillment. of the requirements for the degree. Master of Science. Yixuan Zhang. High-Performance Crossbar Designs for Network-on-Chips (NoCs) A thesis presented to the faculty of the Russ College of Engineering and Technology of Ohio University In partial fulfillment of the requirements

More information

Performance Explorations of Multi-Core Network on Chip Router

Performance Explorations of Multi-Core Network on Chip Router Performance Explorations of Multi-Core Network on Chip Router U.Saravanakumar Department of Electronics and Communication Engineering PSG College of Technology Coimbatore, India saran.usk@gmail.com R.

More information

Interconnection Network

Interconnection Network Interconnection Network Recap: Generic Parallel Architecture A generic modern multiprocessor Network Mem Communication assist (CA) $ P Node: processor(s), memory system, plus communication assist Network

More information

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture

More information

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes

EE482, Spring 1999 Research Paper Report. Deadlock Recovery Schemes EE482, Spring 1999 Research Paper Report Deadlock Recovery Schemes Jinyung Namkoong Mohammed Haque Nuwan Jayasena Manman Ren May 18, 1999 Introduction The selected papers address the problems of deadlock,

More information

NOC: Networks on Chip SoC Interconnection Structures

NOC: Networks on Chip SoC Interconnection Structures NOC: Networks on Chip SoC Interconnection Structures COE838: Systems-on-Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering

More information

DESIGN AND PERFORMANCE EVALUATION OF ON CHIP NETWORK ROUTERS

DESIGN AND PERFORMANCE EVALUATION OF ON CHIP NETWORK ROUTERS DESIGN AND PERFORMANCE EVALUATION OF ON CHIP NETWORK ROUTERS 1 U.SARAVANAKUMAR, 2 R.RANGARAJAN 1 Asst Prof., Department of ECE, PSG College of Technology, Coimbatore, INDIA 2 Professor & Principal, Indus

More information

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip Anh T. Tran and Bevan M. Baas Department of Electrical and Computer Engineering University of California - Davis, USA {anhtr,

More information

Future Gigascale MCSoCs Applications: Computation & Communication Orthogonalization

Future Gigascale MCSoCs Applications: Computation & Communication Orthogonalization Basic Network-on-Chip (BANC) interconnection for Future Gigascale MCSoCs Applications: Computation & Communication Orthogonalization Abderazek Ben Abdallah, Masahiro Sowa Graduate School of Information

More information

Interconnection Networks

Interconnection Networks Lecture 18: Interconnection Networks Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Credit: many of these slides were created by Michael Papamichael This lecture is partially

More information

Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks

Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks Technical Report #2012-2-1, Department of Computer Science and Engineering, Texas A&M University Early Transition for Fully Adaptive Routing Algorithms in On-Chip Interconnection Networks Minseon Ahn,

More information

Lecture 3: Topology - II

Lecture 3: Topology - II ECE 8823 A / CS 8803 - ICN Interconnection Networks Spring 2017 http://tusharkrishna.ece.gatech.edu/teaching/icn_s17/ Lecture 3: Topology - II Tushar Krishna Assistant Professor School of Electrical and

More information

STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology

STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology Surbhi Jain Naveen Choudhary Dharm Singh ABSTRACT Network on Chip (NoC) has emerged as a viable solution to the complex communication

More information

EECS 570 Final Exam - SOLUTIONS Winter 2015

EECS 570 Final Exam - SOLUTIONS Winter 2015 EECS 570 Final Exam - SOLUTIONS Winter 2015 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points 1 / 21 2 / 32

More information

Evaluation of NOC Using Tightly Coupled Router Architecture

Evaluation of NOC Using Tightly Coupled Router Architecture IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 01-05 www.iosrjournals.org Evaluation of NOC Using Tightly Coupled Router

More information

Interconnection Networks

Interconnection Networks Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact

More information

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection

More information

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers Stavros Volos, Ciprian Seiculescu, Boris Grot, Naser Khosro Pour, Babak Falsafi, and Giovanni De Micheli Toward

More information

NoC Test-Chip Project: Working Document

NoC Test-Chip Project: Working Document NoC Test-Chip Project: Working Document Michele Petracca, Omar Ahmad, Young Jin Yoon, Frank Zovko, Luca Carloni and Kenneth Shepard I. INTRODUCTION This document describes the low-power high-performance

More information

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS 1 JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS Shabnam Badri THESIS WORK 2011 ELECTRONICS JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

More information

PRIORITY BASED SWITCH ALLOCATOR IN ADAPTIVE PHYSICAL CHANNEL REGULATOR FOR ON CHIP INTERCONNECTS. A Thesis SONALI MAHAPATRA

PRIORITY BASED SWITCH ALLOCATOR IN ADAPTIVE PHYSICAL CHANNEL REGULATOR FOR ON CHIP INTERCONNECTS. A Thesis SONALI MAHAPATRA PRIORITY BASED SWITCH ALLOCATOR IN ADAPTIVE PHYSICAL CHANNEL REGULATOR FOR ON CHIP INTERCONNECTS A Thesis by SONALI MAHAPATRA Submitted to the Office of Graduate and Professional Studies of Texas A&M University

More information

MULTIPATH ROUTER ARCHITECTURES TO REDUCE LATENCY IN NETWORK-ON-CHIPS. A Thesis HRISHIKESH NANDKISHOR DESHPANDE

MULTIPATH ROUTER ARCHITECTURES TO REDUCE LATENCY IN NETWORK-ON-CHIPS. A Thesis HRISHIKESH NANDKISHOR DESHPANDE MULTIPATH ROUTER ARCHITECTURES TO REDUCE LATENCY IN NETWORK-ON-CHIPS A Thesis by HRISHIKESH NANDKISHOR DESHPANDE Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment

More information

Express Virtual Channels: Towards the Ideal Interconnection Fabric

Express Virtual Channels: Towards the Ideal Interconnection Fabric Express Virtual Channels: Towards the Ideal Interconnection Fabric Amit Kumar, Li-Shiuan Peh, Partha Kundu and Niraj K. Jha Dept. of Electrical Engineering, Princeton University, Princeton, NJ 8544 Microprocessor

More information

ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology

ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology 1 ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology Mikkel B. Stensgaard and Jens Sparsø Technical University of Denmark Technical University of Denmark Outline 2 Motivation ReNoC Basic

More information

Extending the Performance of Hybrid NoCs beyond the Limitations of Network Heterogeneity

Extending the Performance of Hybrid NoCs beyond the Limitations of Network Heterogeneity Journal of Low Power Electronics and Applications Article Extending the Performance of Hybrid NoCs beyond the Limitations of Network Heterogeneity Michael Opoku Agyeman 1, *, Wen Zong 2, Alex Yakovlev

More information

ECE 669 Parallel Computer Architecture

ECE 669 Parallel Computer Architecture ECE 669 Parallel Computer Architecture Lecture 21 Routing Outline Routing Switch Design Flow Control Case Studies Routing Routing algorithm determines which of the possible paths are used as routes how

More information

A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks

A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks Jongman Kim Chrysostomos Nicopoulos Dongkook Park Vijaykrishnan Narayanan Mazin S. Yousif Chita R. Das Dept.

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture On-chip Networking Prof. Michel A. Kinsy Virtual Channel Router VC 0 Routing Computation Virtual Channel Allocator Switch Allocator Input Ports VC x VC 0 VC x It s a system

More information

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema [1] Laila A, [2] Ajeesh R V [1] PG Student [VLSI & ES] [2] Assistant professor, Department of ECE, TKM Institute of Technology, Kollam

More information

Prevention Flow-Control for Low Latency Torus Networks-on-Chip

Prevention Flow-Control for Low Latency Torus Networks-on-Chip revention Flow-Control for Low Latency Torus Networks-on-Chip Arpit Joshi Computer Architecture and Systems Lab Department of Computer Science & Engineering Indian Institute of Technology, Madras arpitj@cse.iitm.ac.in

More information