Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture

Size: px
Start display at page:

Download "Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture"

Transcription

1 Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on on-chip Architecture Avinash Kodi, Ashwini Sarathy * and Ahmed Louri * Department of Electrical Engineering and Computer Science, Ohio University, Athens, OH 457 * Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ kodi@ohio.edu, sarathya@ece.arizona.edu, louri@ece.arizona.edu Sponsored: National Science Foundation (NSF) grant ECCS (at the High Performance Computing Architectures and Technologies Lab, University of Arizona, Tucson) ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS 7) Dec 3-4, 27

2 Talk Outline Motivation & Introduction ideal Inter-router Dual-function Energy and Area-efficient Links for NoC architectures Link and Router Architecture Performance Evaluation Power & Area estimation for the Links & Routers Simulation results for Throughput, Latency & Overall network power Conclusions 2

3 Motivation System-on-Chip Chip (SoC) ( SoC) ) paradigm Processing Elements (Processors, DSPs, Peripheral Controllers, Memory Subsystems) UART / GPIO Processor Cores USB / Ethernet controllers (,3) (,3) (2,3) (3,3) (,2) (,2) (2,2) (3,2) NOC Router Channels or Links SRAM/Flash & Memory Controllers (,) (,) (2,) (3,) - Increasing wire delay with decreasing feature size - Scalable, modular interconnect Network-on on-chip (NoC( NoC) (,) (,) (2,) (3,) 3

4 Motivation Generic NoC Router Recent Recent NSF-sponsored NSF-sponsored workshop workshop on on On- On- Chip Chip Interconnection Interconnection Networks Networks :: + x Route Computation (RC) Input Buffers Virtual Channel (VC) Switch Allocator (SA) Crossbar Switch The The most most important important technology technology constraint constraint for for on-chip on-chip networks networks is ispower consumption. consumption. Power Power consumption consumption of of OCINs OCINsimplemented with with current current techniques techniques exceeds exceeds expected expected needs needs by by aa factor factor of of.. -x Power Break-up in the NoC Router + y -y Buffers, 46% Clock Buffer, 6% Arbiter, 3% Processing Element (PE) Crossbar, 35%. Reference : J.D.Owens, W.J.Dally, R.Ho, D.N.Jayasimha, S.W.Keckler and L.S.Peh, Research Challenges for On-Chip Interconnection Networks, IEEE Micro, vol. 27, no. 5, pp. 96 8, September-October 27. 4

5 ideal Inter-router router Dual-function Energy and Area- efficient Links for NoC architectures ideal ideal Methodology (circuit and and architectural techniques) --Reduce Reduce the the number of of router router buffers buffers --To To prevent performance degradation, use use adaptive channel buffers buffers to to store store data data along along the the links links when when required -- Dynamic buffer buffer allocation within within the the router router buffers buffers + x -x Route Computation (RC) Input Buffers Virtual Channel (VC) Switch Allocator (SA) Crossbar Switch Adaptive channel buffers along the link -x Route Computation (RC) Input Buffers Virtual Channel (VC) Switch Allocator (SA) Crossbar Switch + y -y Reduced router buffer size + y -y Processing Element (PE) Generic NoC architecture Processing Element (PE) ideal architecture 5

6 Conventional Links Output Port of Router A Input Port of Router B 6

7 ideal Channel Buffer Design (/2) Output Port of Router A Input Port of Router B Control block Control block Congestion 7

8 ideal Channel Buffer Design (2/2) Control block Functions as a conventional repeater when there is no congestion. Control block Repeater tri-stated and holds the sampled value, during congestion. Control block is turned OFF. Control block is turned ON. 8

9 ideal Control Block O/P Port Router A I/P Port Router A Congestion signal CLK CLK2 CLK CLK2 CLK --Power Power efficient --Stable Stable at at varying varying frequencies 9

10 ideal : Dual-function Link Cycle Data-In Cycle 2 Data-In 3 2 Congestion Signal Cycle 3 Data-In 3 2 Congestion Signal Data-Out 3 2 Congestion Release

11 Link - Power & Area Estimation Output Port of Router A Input Port of Router B P segment(repeater) (Dynamic, leakage, short-circuit) P segment(chl-buffer) (leakage, control block) Control block Control block Congestion P control-blk (inverters, clock, switched-cap.) CLK CLK2 Congestion CLK

12 ideal Router Buffer Design Input Port P VCID DEMUX Flit Flit r VC State Table Flit vc vc v MUX Static buffer allocation - Fixed number of buffers per VC - HoL blocking Flit r Credit Return VC State Table Congestion Control VC RP WP OP OVC CR C* Status v RP RP = read read pointer, pointer, WP WP = write write pointer, pointer, OP OP = output output port, port, OVC OVC = output output VC, VC, CR CR = credits, credits, C* C* = congestion congestion Status Status = status status of of the the VC VC (idle, (idle, waiting, waiting, RC, RC, VA, VA, SA, SA, ST) ST) 2

13 ideal Router Buffer Design Input Port P Flit Dynamic buffer allocation DEMUX Flit r Flit r+ Flit 2r MUX - Approximately (z + c)/v buffers per VC (z = router buffers, c = channel buffers, v = # of VCs) Credit Return Input Flit Tracking Flit (v-) r + Flit z Write Pointer Unified VC State Table Congestion Control Buffer Slot Availability Read Pointer Output Flit Tracking VC v RP 3 6 WP OP OVC CR Status F F F (z+c)/v N N 5 N Buffer Slot 2 Free Y N N N 3 6 N 5 N N N z N RP = read pointer, WP = write pointer, OP = output port, OVC = output VC, CR = credits, C* = congestion Status = status of the VC (idle, waiting, RC, VA, SA, ST) 3

14 ideal Router Buffer Design Example illustrating Dynamic buffer allocation in ideal Buffer Slot Free N N N NY N N YN 7 N Input Flit Tracking Write Pointer Unified VC State Table Buffer Slot Availability Read Pointer VC 2 3 Output Flit Tracking RP WP OP OVC CR Status F F F 3 F 4 3 N 4 SA ST N 2 N 4 VC N6 N N N N N N 4 Idle N N N N N SA 5 N N Incoming flit (VCID = ) Congestion Control RP = read pointer, WP = write pointer, OP = output port, OVC = output VC, CR = credits, C* = congestion Status = status of the VC (idle, waiting, RC, VA, SA, ST) 4

15 Router - Power & Area Estimation Route Computation (RC) Virtual Channel (VC) 6T SRAM cell Buffer Power (P write + P read ) Input Buffers Switch Allocator (SA) Crossbar Switch Crossbar Power Bitlines (Switch + Arbiter) Wordlines Sense Amp Processing Element (PE) Power reduces on decreasing the buffer size 5

16 Performance Evaluation Evaluated on a cycle-accurate on-chip network simulator Simulated 8 x 8 Mesh and 8 x 8 Folded Torus topologies Synthetic benchmarks such as uniform, and non-uniform workloads (Butterfly, Complement, Perfect Shuffle, Matrix Transpose, Bit Reversal) were evaluated Parameters evaluated include throughput, latency and overall network power Considered 5 different configurations (vn V rn R cn C ) (n V = No. of VCs per input port, n R = No. of router buffers per VC, n C = number of channel buffers) Baseline = , 428, 344, 53 6

17 Power Estimation - Summary vnv rnr - cnc Buffer Power (mw) % Change Mesh % Change Folded Torus % Change Link + Control Link + Control Power (mw) Power (mw) v4-r4-c v4-r3-c v4-r2-c v4-r2-c v3-r4-c v3-r3-c v5-r2-c v5-r3-c n V = number of VCs per input port n R = number of router buffers per VC n C = number of channel buffers 7

18 Buffer Power 8x8 Mesh and Folded Torus Buffer Power (8x8 Mesh) UN - Dynamic Buffer Power (8x8 Folded Torus) UN - Dynamic.8.8 Power (watts).6.4 Power (watts) v4-r4-c v4-r3-c4 v4-r2-c8 v3-r4-c4 v5-r3-c Configuration v4-r4-c v4-r3-c4 v4-r2-c8 v3-r4-c4 v5-r3-c Configuration Uniformly distributed traffic traffic Nearly 4% power savings for for 5% buffer size size reduction (428), using Dynamic buffer allocation (428 (428 = 4 VCs VCs per per port, port, 2 router router buffers buffers per per VC, VC, 8 channel buffers) 8

19 Throughput 8x8 Mesh and Folded Torus Throughput (GBps) Throughput (8x8 Mesh) UN - Dynamic v4-r4-c v4-r3-c4 v4-r2-c8 v3-r4-c4 v5-r3-c Throughput (GBps) Throughput (8x8 Folded Torus) UN - Dynamic v4-r4-c v4-r3-c4 v4-r2-c8 v3-r4-c4 v5-r3-c Offered Load (as a fraction of network capacity) Offered Load (as a fraction of network capacity) Uniformly distributed traffic traffic Only about 5% 5% drop in in throughput for for the the case (Dynamic buffer allocation) (428 (428 = 4 VCs VCs per per port, port, 2 router router buffers buffers per per VC, VC, 8 channel buffers) 9

20 Overall Network Power 8x8 Mesh and Folded Torus Total Power (8x8 Mesh) UN - Dynamic Total Power (8x8 Folded Torus) UN - Dynamic Power (watts).5.5 Congestion Power Link Power Crossbar Power Buffer Power Power (watts) Congestion Power Link Power Crossbar Power Buffer Power v4-r4-c v4-r3-c4 v4-r2-c8 v3-r4-c4 v5-r3-c Configuration v4-r4-c v4-r3-c4 v4-r2-c8 v3-r4-c4 v5-r3-c Configuration Total Total power power consumed for for a network load load of of.5.5 Nearly 2% savings for for the the 428, using Dynamic buffer allocation (428 (428 = 4 VCs VCs per per port, port, 2 router router buffers buffers per per VC, VC, 8 channel buffers) 2

21 Buffer Power 8x8 Mesh all Traffic Patterns Power (watts) v4-r4-c v4-r3-c4 v4-r2-c8 Buffer Power (8x8 Mesh) at an offered load =.5 S - UN S - CO S - TO S - PS S - BR S - MT S - NE S - BU D - UN D - CO D - TO D - PS D - BR D - MT D - NE D - BU Ptt Traffic Pattern Reduction Reduction in in power power for for all all configurations, configurations, under under all all traffic traffic patterns, patterns, compared compared to to the the baseline baseline (44) (44) For For example, example, under under Complement Complement traffic traffic the the configuration configuration achieves achieves 45% 45% savings savings under under Static Static allocation allocation and and 37.5% 37.5% savings savings under under Dynamic Dynamic allocation allocation 2

22 Throughput 8x8 Mesh all Traffic Patterns Throughput (GBps) v4-r4-c v4-r3-c4 vr-r2-c8 Throughput (8x8 Mesh) at an offered load =.5 S - UN S - CO S - TO S - PS S - BR S - MT S - NE S - BU Traffic Pattern D - UN D - CO D - TO D - PS D - BR D - MT D - NE D - BU No No significant significant decrease decrease in in throughput throughput under under any any traffic traffic pattern, pattern, using using Dynamic Dynamic allocation allocation 22

23 Conclusion ideal architecture provides a Low-Power Area-efficient solution for NoCs, by reducing power consumption through circuit-level and architecture-level techniques. Simulation results show that by reducing the buffer size in half, a 4-52% savings in power is achieved, with a significant reduction in router area. There is only a marginal -5% drop in performance, under dynamic buffer allocation. Future work will involve (a) Simulation using real-application traces (b) Exploring architectural improvements such as aggressive speculation in the credit loop 23

24 Backup Slides 24

25 Area Estimation Summary with values from Synopsys Design Compiler vnv rnr - cnc Buffer Area (μm 2 ) Link Repeater Area (μm 2 ) Total Buffer + Link Area (μm 2 ) % Change v4-r4-c 8, ,439 - v4-r3-c4 63, , -2.4 v4-r2-c8 48, , v4-r2-c8 48, , v3-r4-c4 63, , v3-r3-c7 5, , v5-r2-c6 53, , v5-r3-c 73, , n V = number of VCs per input port, n R = number of router buffers per VC, n C = number of channel buffers

26 Latency 8x8 Mesh and Folded Torus Average Latency (microsec) Average Latency (8x8 Mesh) - UN - Dynamic v4-r4-c v4-r3-c4 v4-r2-c8 v3-r4-c4 v5-r3-c Offered Load (as a fraction of network capacity) Average Latency (microsec) Average Latency (8x8 Folded Torus) UN - Dynamic v4-r4-c v4-r3-c4 v4-r2-c8 v3-r4-c4 v5-r3-c Offered Load (as a fraction of network capacity) Uniformly distributed traffic traffic For all all cases (except 53), saturation for for a network load load of of about.3.3 in in case case of of Mesh and and about.4.4 in in case case of of Folded torus 26

27 Comparison with FC-CB CB and DAMQ Throughput (in GBps) Comparison of Saturation Throughput (8x8 Mesh) - Uniform Traffic v4-r3-c4 v4-r2-c8 FC-CB DAMQ Configuration FC-CB FC-CB shows shows similar similar performance as as the the dynamically allocated case case and and achieve nearly nearly 4% 4% increaseinin saturation throughput compared to to FC-CB FC-CB achieves nearly nearly 2.5% 2.5% improvementinin saturation throughput compared to to DAMQ DAMQ Average Latency (in microsec) Comparison of Average Latency (8x8 Mesh) - Uniform Traffic v4-r3-c4 v4-r2-c8 FC-CB DAMQ Offered Traffic (as a fraction of network capacity) 27

28 Power (Watts) Buffer Power (8x8 Mesh) - Uniform Traffic Power calculations using Synopsys Power Compiler Leakage Power Dynamic Power case case shows shows nearly nearly 4% 4% reduction in in buffer buffer power power alone alone Nearly Nearly 3% 3% decrease in in overall overall network power power for for the the case case v4-r4-c v5-r3-c v3-r4-c4 v4-r3-c4 v4-r2-c8 Configuration Power (Watts) Total Power (8x8 Mesh) - Uniform Traffic Control Link Switch Arbiter Buffer 2 v4-r4-c v5-r3-c v3-r4-c4 v4-r3-c4 v4-r2-c8 Configuration 28

29 Data flow Control Simulated with Synopsys VCS 5 MHz Clock Signal Data_in Congestion input Congestion at stage Congestion at stage 2 Congestion at stage 3 Congestion at stage 4 Data_out from stage Data_out2 from stage 2 Data_out3 from stage 3 Data_out4 from stage Time (ns) 29

30 Router - Power Estimation Component Power / Area Calculation C buf (/2 x W x L x C ox ) + (W x L ov x C ox ) Ṕ dynamic a x [k(c o + C p + C buf ) + lc w ] x V DD2 x freq Ṕ leakage Ṕ short-ckt 2 x [/2 x V DD x (I off (W n + W p )k)] a x t rise x W n x k x V DD x I sc x freq Explanation C buf = additional capacitance due to three-state repeater along the links W, L = Width & Length of min. sized inverter C ox = oxide capacitance L ov = gate-drain/source overlap length a = activity factor, k = repeater sizing, l = repeater spacing C o = diffusion capacitance C p = gate capacitance C w = wire capacitance V DD = supply voltage freq = operating frequency I off = subthreshold leakage current W n (W p ) = width of the NMOS (PMOS) in the repeater t rise = rise time of the short-ckt current I sc 3

31 ideal Control Block Self-checking Double-sampling technique for the Control block Output Port of Router A Input Port of Router B MUX MUX Double-sampling the congestion input D Flip-Flop D Flip-Flop Error XOR Clock Error XOR Congestion Clock Delay Buffer Slightly more power (.2 uw v/s.6 uw) and area, but more reliable 3

32 Aggressive Speculation Throughput (in GBps) Saturation Throughput (8x8 Folded Torus) - Uniform Traffic v4-r4-c v4-r3-c4 v4-r2-c8 v3-r4-c4 v3-r3-c7 Configuration Aggressive speculation by by increasing the the number of of credits credits available to to 8 Additional credits credits are are accounted for for by by the the channel buffers buffers Saturation throughput improves by by % % for for the the case case Average Latency (in microsec) Average Latency (8x8 Folded Torus) - Uniform Traffic v4-r4-c v4-r3-c4 v4-r2-c8 v3-r4-c4 v3-r3-c7 Power (Watts) Total Power (8x8 Folded Torus) - Uniform Traffic Control Link Switch Arbiter Buffer Offered Traffic (as a fraction of network capacity) v4-r4-c v3-r4-c4 v4-r3-c4 v3-r3-c7 v4-r2-c8 Configuration 32

33 Power Estimation Summary with values from Synopsys Power Compiler vnv rnr - cnc Buffer Power (mw) Mesh Link + Control Power (mw) Total Power (Buffer + Link) (mw) % Change v4-r4-c v4-r3-c v4-r2-c v4-r2-c8 v4-r2-c v3-r4-c v3-r3-c v5-r2-c v5-r3-c n V = number of VCs per input port, n R = number of router buffers per VC, n C = number of channel buffers

TECHNOLOGY scaling is expected to continue into the deep

TECHNOLOGY scaling is expected to continue into the deep IEEE TRANSACTIONS ON COMPUTERS, VOL. 57, NO. 9, SEPTEMBER 2008 1169 Adaptive Channel Buffers in On-Chip Interconnection Networks A Power and Performance Analysis Avinash Karanth Kodi, Member, IEEE, Ashwini

More information

Design of Adaptive Communication Channel Buffers for Low-Power Area-Efficient Network-on-Chip Architecture

Design of Adaptive Communication Channel Buffers for Low-Power Area-Efficient Network-on-Chip Architecture Design of Adaptive Communication Channel Buffers for Low- Area-Efficient etwork-on-chip Architecture Avinash Kodi, Ashwini Sarathy and Ahmed Louri Dept. of Electrical Engineering and Computer Science,

More information

NETWORK-ON-CHIPS (NoCs) [1], [2] design paradigm

NETWORK-ON-CHIPS (NoCs) [1], [2] design paradigm IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS 1 Extending the Energy Efficiency and Performance With Channel Buffers, Crossbars, and Topology Analysis for Network-on-Chips Dominic DiTomaso,

More information

Lecture 22: Router Design

Lecture 22: Router Design Lecture 22: Router Design Papers: Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO 03, Princeton A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip

More information

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers Young Hoon Kang, Taek-Jun Kwon, and Jeff Draper {youngkan, tjkwon, draper}@isi.edu University of Southern California

More information

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,

More information

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Interconnection Networks: Topology. Prof. Natalie Enright Jerger Interconnection Networks: Topology Prof. Natalie Enright Jerger Topology Overview Definition: determines arrangement of channels and nodes in network Analogous to road map Often first step in network design

More information

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER A Thesis by SUNGHO PARK Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements

More information

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 727 A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing 1 Bharati B. Sayankar, 2 Pankaj Agrawal 1 Electronics Department, Rashtrasant Tukdoji Maharaj Nagpur University, G.H. Raisoni

More information

Lecture 23: Router Design

Lecture 23: Router Design Lecture 23: Router Design Papers: A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks, ISCA 06, Penn-State ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip

More information

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect 1 A Soft Tolerant Network-on-Chip Router Pipeline for Multi-core Systems Pavan Poluri and Ahmed Louri Department of Electrical and Computer Engineering, University of Arizona Email: pavanp@email.arizona.edu,

More information

Power and Area Efficient NOC Router Through Utilization of Idle Buffers

Power and Area Efficient NOC Router Through Utilization of Idle Buffers Power and Area Efficient NOC Router Through Utilization of Idle Buffers Mr. Kamalkumar S. Kashyap 1, Prof. Bharati B. Sayankar 2, Dr. Pankaj Agrawal 3 1 Department of Electronics Engineering, GRRCE Nagpur

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU Thomas Moscibroda Microsoft Research Onur Mutlu CMU CPU+L1 CPU+L1 CPU+L1 CPU+L1 Multi-core Chip Cache -Bank Cache -Bank Cache -Bank Cache -Bank CPU+L1 CPU+L1 CPU+L1 CPU+L1 Accelerator, etc Cache -Bank

More information

ScienceDirect. Packet-based Adaptive Virtual Channel Configuration for NoC Systems

ScienceDirect. Packet-based Adaptive Virtual Channel Configuration for NoC Systems Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 34 (2014 ) 552 558 2014 International Workshop on the Design and Performance of Network on Chip (DPNoC 2014) Packet-based

More information

A thesis presented to. the faculty of. In partial fulfillment. of the requirements for the degree. Master of Science. Yixuan Zhang.

A thesis presented to. the faculty of. In partial fulfillment. of the requirements for the degree. Master of Science. Yixuan Zhang. High-Performance Crossbar Designs for Network-on-Chips (NoCs) A thesis presented to the faculty of the Russ College of Engineering and Technology of Ohio University In partial fulfillment of the requirements

More information

Ultra-Fast NoC Emulation on a Single FPGA

Ultra-Fast NoC Emulation on a Single FPGA The 25 th International Conference on Field-Programmable Logic and Applications (FPL 2015) September 3, 2015 Ultra-Fast NoC Emulation on a Single FPGA Thiem Van Chu, Shimpei Sato, and Kenji Kise Tokyo

More information

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University 18 447 Lecture 26: Interconnects James C. Hoe Department of ECE Carnegie Mellon University 18 447 S18 L26 S1, James C. Hoe, CMU/ECE/CALCM, 2018 Housekeeping Your goal today get an overview of parallel

More information

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Lecture 12: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) 1 Topologies Internet topologies are not very regular they grew

More information

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality

More information

An Asynchronous NoC Router in a 14nm FinFET Library: Comparison to an Industrial Synchronous Counterpart

An Asynchronous NoC Router in a 14nm FinFET Library: Comparison to an Industrial Synchronous Counterpart An Asynchronous NoC Router in a 14nm FinFET Library: Comparison to an Industrial Synchronous Counterpart Weiwei Jiang Columbia University, USA Gabriele Miorandi University of Ferrara, Italy Wayne Burleson

More information

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) hyoukjun@gatech.edu April

More information

OASIS Network-on-Chip Prototyping on FPGA

OASIS Network-on-Chip Prototyping on FPGA Master thesis of the University of Aizu, Feb. 20, 2012 OASIS Network-on-Chip Prototyping on FPGA m5141120, Kenichi Mori Supervised by Prof. Ben Abdallah Abderazek Adaptive Systems Laboratory, Master of

More information

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow Abstract: High-level synthesis (HLS) of data-parallel input languages, such as the Compute Unified Device Architecture

More information

Lecture: Interconnection Networks

Lecture: Interconnection Networks Lecture: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm 1 Packets/Flits A message is broken into multiple packets (each packet

More information

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID Lecture 25: Interconnection Networks, Disks Topics: flow control, router microarchitecture, RAID 1 Virtual Channel Flow Control Each switch has multiple virtual channels per phys. channel Each virtual

More information

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology

A 256-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology http://dx.doi.org/10.5573/jsts.014.14.6.760 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.6, DECEMBER, 014 A 56-Radix Crossbar Switch Using Mux-Matrix-Mux Folded-Clos Topology Sung-Joon Lee

More information

Lecture 3: Flow-Control

Lecture 3: Flow-Control High-Performance On-Chip Interconnects for Emerging SoCs http://tusharkrishna.ece.gatech.edu/teaching/nocs_acaces17/ ACACES Summer School 2017 Lecture 3: Flow-Control Tushar Krishna Assistant Professor

More information

Energy-Efficient and Fault-Tolerant Unified Buffer and Bufferless Crossbar Architecture for NoCs

Energy-Efficient and Fault-Tolerant Unified Buffer and Bufferless Crossbar Architecture for NoCs Energy-Efficient and Fault-Tolerant Unified Buffer and Bufferless Crossbar Architecture for NoCs Yixuan Zhang, Randy Morris, Dominic DiTomaso and Avinash Kodi School of Electrical Engineering and Computer

More information

Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems

Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems 1 Swizzle Switch: A Self-Arbitrating High-Radix Crossbar for NoC Systems Ronald Dreslinski, Korey Sewell, Thomas Manville, Sudhir Satpathy, Nathaniel Pinckney, Geoff Blake, Michael Cieslak, Reetuparna

More information

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor

More information

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching. Switching/Flow Control Overview Interconnection Networks: Flow Control and Microarchitecture Topology: determines connectivity of network Routing: determines paths through network Flow Control: determine

More information

Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks

Pseudo-Circuit: Accelerating Communication for On-Chip Interconnection Networks Department of Computer Science and Engineering, Texas A&M University Technical eport #2010-3-1 seudo-circuit: Accelerating Communication for On-Chip Interconnection Networks Minseon Ahn, Eun Jung Kim Department

More information

Network-on-chip (NOC) Topologies

Network-on-chip (NOC) Topologies Network-on-chip (NOC) Topologies 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and performance

More information

Embedded Systems: Hardware Components (part II) Todor Stefanov

Embedded Systems: Hardware Components (part II) Todor Stefanov Embedded Systems: Hardware Components (part II) Todor Stefanov Leiden Embedded Research Center, Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded

More information

Low-Power Interconnection Networks

Low-Power Interconnection Networks Low-Power Interconnection Networks Li-Shiuan Peh Associate Professor EECS, CSAIL & MTL MIT 1 Moore s Law: Double the number of transistors on chip every 2 years 1970: Clock speed: 108kHz No. transistors:

More information

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect Chris Fallin, Greg Nazario, Xiangyao Yu*, Kevin Chang, Rachata Ausavarungnirun, Onur Mutlu Carnegie Mellon University *CMU

More information

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects

EECS 598: Integrating Emerging Technologies with Computer Architecture. Lecture 12: On-Chip Interconnects 1 EECS 598: Integrating Emerging Technologies with Computer Architecture Lecture 12: On-Chip Interconnects Instructor: Ron Dreslinski Winter 216 1 1 Announcements Upcoming lecture schedule Today: On-chip

More information

Prediction Router: Yet another low-latency on-chip router architecture

Prediction Router: Yet another low-latency on-chip router architecture Prediction Router: Yet another low-latency on-chip router architecture Hiroki Matsutani Michihiro Koibuchi Hideharu Amano Tsutomu Yoshinaga (Keio Univ., Japan) (NII, Japan) (Keio Univ., Japan) (UEC, Japan)

More information

Topologies. Maurizio Palesi. Maurizio Palesi 1

Topologies. Maurizio Palesi. Maurizio Palesi 1 Topologies Maurizio Palesi Maurizio Palesi 1 Network Topology Static arrangement of channels and nodes in an interconnection network The roads over which packets travel Topology chosen based on cost and

More information

Lecture 11: MOS Memory

Lecture 11: MOS Memory Lecture 11: MOS Memory MAH, AEN EE271 Lecture 11 1 Memory Reading W&E 8.3.1-8.3.2 - Memory Design Introduction Memories are one of the most useful VLSI building blocks. One reason for their utility is

More information

Lecture 15: NoC Innovations. Today: power and performance innovations for NoCs

Lecture 15: NoC Innovations. Today: power and performance innovations for NoCs Lecture 15: NoC Innovations Today: power and performance innovations for NoCs 1 Network Power Power-Driven Design of Router Microarchitectures in On-Chip Networks, MICRO 03, Princeton Energy for a flit

More information

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design

A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design A Thermal-aware Application specific Routing Algorithm for Network-on-chip Design Zhi-Liang Qian and Chi-Ying Tsui VLSI Research Laboratory Department of Electronic and Computer Engineering The Hong Kong

More information

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip ASP-DAC 2010 20 Jan 2010 Session 6C Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip Jonas Diemer, Rolf Ernst TU Braunschweig, Germany diemer@ida.ing.tu-bs.de Michael Kauschke Intel,

More information

STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip

STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip Codesign for Tiled Manycore Systems Mingyu Wang and Zhaolin Li Institute of Microelectronics, Tsinghua University, Beijing 100084,

More information

Abstract. Paper organization

Abstract. Paper organization Allocation Approaches for Virtual Channel Flow Control Neeraj Parik, Ozen Deniz, Paul Kim, Zheng Li Department of Electrical Engineering Stanford University, CA Abstract s are one of the major resources

More information

3D WiNoC Architectures

3D WiNoC Architectures Interconnect Enhances Architecture: Evolution of Wireless NoC from Planar to 3D 3D WiNoC Architectures Hiroki Matsutani Keio University, Japan Sep 18th, 2014 Hiroki Matsutani, "3D WiNoC Architectures",

More information

A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache

A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache A Dual-Core Multi-Threaded Xeon Processor with 16MB L3 Cache Stefan Rusu Intel Corporation Santa Clara, CA Intel and the Intel logo are registered trademarks of Intel Corporation or its subsidiaries in

More information

OpenSMART: An Opensource Singlecycle Multi-hop NoC Generator

OpenSMART: An Opensource Singlecycle Multi-hop NoC Generator OpenSMART: An Opensource Singlecycle Multi-hop NoC Generator Hyoukjun Kwon and Tushar Krishna Georgia Institute of Technology Synergy Lab (http://synergy.ece.gatech.edu) OpenSMART (https://tinyurl.com/get-opensmart)

More information

Fast Flexible FPGA-Tuned Networks-on-Chip

Fast Flexible FPGA-Tuned Networks-on-Chip This work was funded by NSF. We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations. Fast Flexible FPGA-Tuned Networks-on-Chip Michael K. Papamichael, James C. Hoe

More information

DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC

DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC DESIGN OF EFFICIENT ROUTING ALGORITHM FOR CONGESTION CONTROL IN NOC 1 Pawar Ruchira Pradeep M. E, E&TC Signal Processing, Dr. D Y Patil School of engineering, Ambi, Pune Email: 1 ruchira4391@gmail.com

More information

Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China

Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China CMOS Crossbar Ting Wu, Chi-Ying Tsui, Mounir Hamdi Hong Kong University of Science & Technology Hong Kong SAR, China OUTLINE Motivations Problems of Designing Large Crossbar Our Approach - Pipelined MUX

More information

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced

More information

3. Implementing Logic in CMOS

3. Implementing Logic in CMOS 3. Implementing Logic in CMOS 3. Implementing Logic in CMOS Jacob Abraham Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 27 September, 27 ECE Department,

More information

ACCELERATING COMMUNICATION IN ON-CHIP INTERCONNECTION NETWORKS. A Dissertation MIN SEON AHN

ACCELERATING COMMUNICATION IN ON-CHIP INTERCONNECTION NETWORKS. A Dissertation MIN SEON AHN ACCELERATING COMMUNICATION IN ON-CHIP INTERCONNECTION NETWORKS A Dissertation by MIN SEON AHN Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background 1 Hard Error Tolerance in PCM PCM cells will eventually fail; important to cause gradual capacity degradation

More information

SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto

SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES. Natalie Enright Jerger University of Toronto SIGNET: NETWORK-ON-CHIP FILTERING FOR COARSE VECTOR DIRECTORIES University of Toronto Interaction of Coherence and Network 2 Cache coherence protocol drives network-on-chip traffic Scalable coherence protocols

More information

Chapter 2 Designing Crossbar Based Systems

Chapter 2 Designing Crossbar Based Systems Chapter 2 Designing Crossbar Based Systems Over the last decade, the communication architecture of SoCs has evolved from single shared bus systems to multi-bus systems. Today, state-of-the-art bus based

More information

ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology

ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology 1 ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology Mikkel B. Stensgaard and Jens Sparsø Technical University of Denmark Technical University of Denmark Outline 2 Motivation ReNoC Basic

More information

Leaving One Slot Empty: Flit Bubble Flow Control for Torus Cache-coherent NoCs

Leaving One Slot Empty: Flit Bubble Flow Control for Torus Cache-coherent NoCs .9/TC.23.2295523, Leaving One Slot Empty: Flit Bubble Flow Control for Torus Cache-coherent NoCs Sheng Ma, Zhiying Wang, Member, IEEE, Zonglin Liu and Natalie Enright Jerger, Senior Member, IEEE Abstract

More information

NoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad

NoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad NoC Round Table / ESA Sep. 2009 Asynchronous Three Dimensional Networks on on Chip Frédéric ric PétrotP Outline Three Dimensional Integration Clock Distribution and GALS Paradigm Contribution of the Third

More information

VLSI D E S. Siddhardha Pottepalem

VLSI D E S. Siddhardha Pottepalem HESIS UBMITTED IN ARTIAL ULFILLMENT OF THE EQUIREMENTS FOR THE EGREE OF M T IN VLSI D E S BY Siddhardha Pottepalem EPARTMENT OF LECTRONICS AND OMMUNICATION NGINEERING ATIONAL NSTITUTE OF ECHNOLOGY OURKELA

More information

Parallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved.

Parallel Systems Prof. James L. Frankel Harvard University. Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved. Parallel Systems Prof. James L. Frankel Harvard University Version of 6:50 PM 4-Dec-2018 Copyright 2018, 2017 James L. Frankel. All rights reserved. Architectures SISD (Single Instruction, Single Data)

More information

CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION. Stephen Chui Bachelor of Engineering Ryerson University, 2012.

CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION. Stephen Chui Bachelor of Engineering Ryerson University, 2012. CONGESTION AWARE ADAPTIVE ROUTING FOR NETWORK-ON-CHIP COMMUNICATION by Stephen Chui Bachelor of Engineering Ryerson University, 2012 A thesis presented to Ryerson University in partial fulfillment of the

More information

4. Networks. in parallel computers. Advances in Computer Architecture

4. Networks. in parallel computers. Advances in Computer Architecture 4. Networks in parallel computers Advances in Computer Architecture System architectures for parallel computers Control organization Single Instruction stream Multiple Data stream (SIMD) All processors

More information

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy

Power Reduction Techniques in the Memory System. Typical Memory Hierarchy Power Reduction Techniques in the Memory System Low Power Design for SoCs ASIC Tutorial Memories.1 Typical Memory Hierarchy On-Chip Components Control edram Datapath RegFile ITLB DTLB Instr Data Cache

More information

Quality-of-Service for a High-Radix Switch

Quality-of-Service for a High-Radix Switch Quality-of-Service for a High-Radix Switch Nilmini Abeyratne, Supreet Jeloka, Yiping Kang, David Blaauw, Ronald G. Dreslinski, Reetuparna Das, and Trevor Mudge University of Michigan 51 st DAC 06/05/2014

More information

Lecture 11 SRAM Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

Lecture 11 SRAM Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010 EE4800 CMOS Digital IC Design & Analysis Lecture 11 SRAM Zhuo Feng 11.1 Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitryit Multiple Ports Outline Serial Access Memories 11.2 Memory Arrays

More information

Fast Scalable FPGA-Based Network-on-Chip Simulation Models

Fast Scalable FPGA-Based Network-on-Chip Simulation Models We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations and support. Computer Architecture Lab at Carnegie Mellon Fast Scalable FPGA-Based Network-on-Chip Simulation

More information

548 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 4, APRIL 2011

548 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 4, APRIL 2011 548 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 4, APRIL 2011 Extending the Effective Throughput of NoCs with Distributed Shared-Buffer Routers Rohit Sunkam

More information

Extending the Performance of Hybrid NoCs beyond the Limitations of Network Heterogeneity

Extending the Performance of Hybrid NoCs beyond the Limitations of Network Heterogeneity Journal of Low Power Electronics and Applications Article Extending the Performance of Hybrid NoCs beyond the Limitations of Network Heterogeneity Michael Opoku Agyeman 1, *, Wen Zong 2, Alex Yakovlev

More information

Basic Low Level Concepts

Basic Low Level Concepts Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock

More information

Memory. Outline. ECEN454 Digital Integrated Circuit Design. Memory Arrays. SRAM Architecture DRAM. Serial Access Memories ROM

Memory. Outline. ECEN454 Digital Integrated Circuit Design. Memory Arrays. SRAM Architecture DRAM. Serial Access Memories ROM ECEN454 Digital Integrated Circuit Design Memory ECEN 454 Memory Arrays SRAM Architecture SRAM Cell Decoders Column Circuitry Multiple Ports DRAM Outline Serial Access Memories ROM ECEN 454 12.2 1 Memory

More information

An Overview of Standard Cell Based Digital VLSI Design

An Overview of Standard Cell Based Digital VLSI Design An Overview of Standard Cell Based Digital VLSI Design With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker,

More information

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC

CAD System Lab Graduate Institute of Electronics Engineering National Taiwan University Taipei, Taiwan, ROC QoS Aware BiNoC Architecture Shih-Hsin Lo, Ying-Cherng Lan, Hsin-Hsien Hsien Yeh, Wen-Chung Tsai, Yu-Hen Hu, and Sao-Jie Chen Ying-Cherng Lan CAD System Lab Graduate Institute of Electronics Engineering

More information

Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures

Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures Vdd Programmable and Variation Tolerant FPGA Circuits and Architectures Prof. Lei He EE Department, UCLA LHE@ee.ucla.edu Partially supported by NSF. Pathway to Power Efficiency and Variation Tolerance

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #4 1/24/2018 Xuehai Qian xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Announcements PA #1

More information

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK DOI: 10.21917/ijct.2012.0092 HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK U. Saravanakumar 1, R. Rangarajan 2 and K. Rajasekar 3 1,3 Department of Electronics and Communication

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University Hybrid On-chip Data Networks Gilbert Hendry Keren Bergman Lightwave Research Lab Columbia University Chip-Scale Interconnection Networks Chip multi-processors create need for high performance interconnects

More information

Packet Switch Architecture

Packet Switch Architecture Packet Switch Architecture 3. Output Queueing Architectures 4. Input Queueing Architectures 5. Switching Fabrics 6. Flow and Congestion Control in Sw. Fabrics 7. Output Scheduling for QoS Guarantees 8.

More information

The Design and Implementation of a Low-Latency On-Chip Network

The Design and Implementation of a Low-Latency On-Chip Network The Design and Implementation of a Low-Latency On-Chip Network Robert Mullins 11 th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan 24-27 th, 2006, Yokohama, Japan. Introduction Current

More information

PRIORITY BASED SWITCH ALLOCATOR IN ADAPTIVE PHYSICAL CHANNEL REGULATOR FOR ON CHIP INTERCONNECTS. A Thesis SONALI MAHAPATRA

PRIORITY BASED SWITCH ALLOCATOR IN ADAPTIVE PHYSICAL CHANNEL REGULATOR FOR ON CHIP INTERCONNECTS. A Thesis SONALI MAHAPATRA PRIORITY BASED SWITCH ALLOCATOR IN ADAPTIVE PHYSICAL CHANNEL REGULATOR FOR ON CHIP INTERCONNECTS A Thesis by SONALI MAHAPATRA Submitted to the Office of Graduate and Professional Studies of Texas A&M University

More information

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology Outline SoC Interconnect NoC Introduction NoC layers Typical NoC Router NoC Issues Switching

More information

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP 1 M.DEIVAKANI, 2 D.SHANTHI 1 Associate Professor, Department of Electronics and Communication Engineering PSNA College

More information

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions

OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions OUTLINE Introduction Power Components Dynamic Power Optimization Conclusions 04/15/14 1 Introduction: Low Power Technology Process Hardware Architecture Software Multi VTH Low-power circuits Parallelism

More information

Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs

Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs This work was funded by NSF. We thank Xilinx for their FPGA and tool donations. We thank Bluespec for their tool donations. Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs

More information

Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles

Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California, Davis, USA Outline Introduction Timing issues

More information

Lecture 18: Communication Models and Architectures: Interconnection Networks

Lecture 18: Communication Models and Architectures: Interconnection Networks Design & Co-design of Embedded Systems Lecture 18: Communication Models and Architectures: Interconnection Networks Sharif University of Technology Computer Engineering g Dept. Winter-Spring 2008 Mehdi

More information

A Survey of Techniques for Power Aware On-Chip Networks.

A Survey of Techniques for Power Aware On-Chip Networks. A Survey of Techniques for Power Aware On-Chip Networks. Samir Chopra Ji Young Park May 2, 2005 1. Introduction On-chip networks have been proposed as a solution for challenges from process technology

More information

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor

More information

QNoC: QoS architecture and design process for network on chip

QNoC: QoS architecture and design process for network on chip Journal of Systems Architecture xxx (23) xxx xxx www.elsevier.com/locate/sysarc QNoC: QoS architecture and design process for network on chip Evgeny Bolotin *, Israel Cidon, Ran Ginosar, Avinoam Kolodny

More information

Deadlock-free XY-YX router for on-chip interconnection network

Deadlock-free XY-YX router for on-chip interconnection network LETTER IEICE Electronics Express, Vol.10, No.20, 1 5 Deadlock-free XY-YX router for on-chip interconnection network Yeong Seob Jeong and Seung Eun Lee a) Dept of Electronic Engineering Seoul National Univ

More information

TCEP: Traffic Consolidation for Energy-Proportional High-Radix Networks

TCEP: Traffic Consolidation for Energy-Proportional High-Radix Networks TCEP: Traffic Consolidation for Energy-Proportional High-Radix Networks Gwangsun Kim Arm Research Hayoung Choi, John Kim KAIST High-radix Networks Dragonfly network in Cray XC30 system 1D Flattened butterfly

More information

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip Anh T. Tran and Bevan M. Baas Department of Electrical and Computer Engineering University of California - Davis, USA {anhtr,

More information

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts

ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts ECE 4750 Computer Architecture, Fall 2017 T06 Fundamental Network Concepts School of Electrical and Computer Engineering Cornell University revision: 2017-10-17-12-26 1 Network/Roadway Analogy 3 1.1. Running

More information

A Low Latency Router Supporting Adaptivity for On-Chip Interconnects

A Low Latency Router Supporting Adaptivity for On-Chip Interconnects A Low Latency Supporting Adaptivity for On-Chip Interconnects 34.2 Jongman Kim, Dongkook Park, T. Theocharides, N. Vijaykrishnan and Chita R. Das Department of Computer Science and Engineering The Pennsylvania

More information

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor

PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor PicoServer : Using 3D Stacking Technology To Enable A Compact Energy Efficient Chip Multiprocessor Taeho Kgil, Shaun D Souza, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Steve Reinhardt, Krisztian Flautner,

More information

Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns

Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns March 12, 2018 Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns Wen Wen Lei Zhao, Youtao Zhang, Jun Yang Executive Summary Problems: performance and reliability of write operations

More information