udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults

Similar documents
IEEE TRANSACTIONS ON COMPUTERS, VOL. 65, NO. X, XXXXX Resource Conscious Diagnosis and Reconfiguration for NoC Permanent Faults

udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC faults

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Lecture: Interconnection Networks

NoCAlert: An On-Line and Real- Time Fault Detection Mechanism for Network-on-Chip Architectures

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

726 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 31, NO. 5, MAY 2012

Lecture 22: Router Design

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

Basic Low Level Concepts

Deadlock. Reading. Ensuring Packet Delivery. Overview: The Problem

NEtwork-on-Chip (NoC) [3], [6] is a scalable interconnect

Lecture 23: Router Design

POLYMORPHIC ON-CHIP NETWORKS

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals

Deadlock: Part II. Reading Assignment. Deadlock: A Closer Look. Types of Deadlock

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

4. Networks. in parallel computers. Advances in Computer Architecture

Network on Chip Architecture: An Overview

Lecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background

Introduction to the Bertacco Lab

NoCAlert: An On-Line and Real-Time Fault Detection Mechanism for Network-on-Chip Architectures

/12/$31.00 c 2012 IEEE

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect

ReliNoC: A Reliable Network for Priority-Based On-Chip Communication

Lecture 18: Communication Models and Architectures: Interconnection Networks

NetSpeed ORION: A New Approach to Design On-chip Interconnects. August 26 th, 2013

Lecture 26: Interconnects. James C. Hoe Department of ECE Carnegie Mellon University

TDT Appendix E Interconnection Networks

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections

Formally Enhanced Runtime Verification to Ensure NoC Functional Correctness

Vicis: A Reliable Network for Unreliable Silicon

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

A Literature Review of on-chip Network Design using an Agent-based Management Method

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Interconnection Networks

3D WiNoC Architectures

NoC Test-Chip Project: Working Document

Switching/Flow Control Overview. Interconnection Networks: Flow Control and Microarchitecture. Packets. Switching.

Synchronized Progress in Interconnection Networks (SPIN) : A new theory for deadlock freedom

Evaluation of NOC Using Tightly Coupled Router Architecture

DLABS: a Dual-Lane Buffer-Sharing Router Architecture for Networks on Chip

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

ECE/CS 757: Advanced Computer Architecture II Interconnects

Module 16: Distributed System Structures

Boosting the Performance of Myrinet Networks

Lecture 3: Flow-Control

EE 6900: Interconnection Networks for HPC Systems Fall 2016

Fault-tolerant & Adaptive Stochastic Routing Algorithm. for Network-on-Chip. Team CoheVer: Zixin Wang, Rong Xu, Yang Jiao, Tan Bie

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

Network-on-chip (NOC) Topologies

Interconnection Networks

PRIORITY BASED SWITCH ALLOCATOR IN ADAPTIVE PHYSICAL CHANNEL REGULATOR FOR ON CHIP INTERCONNECTS. A Thesis SONALI MAHAPATRA

Improving Fault Tolerance of Network-on-Chip Links via Minimal Redundancy and Reconfiguration

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 33, NO. 2, FEBRUARY

Self-Repair for Robust System Design. Yanjing Li Intel Labs Stanford University

Extended Junction Based Source Routing Technique for Large Mesh Topology Network on Chip Platforms

Highly Resilient Minimal Path Routing Algorithm for Fault Tolerant Network-on-Chips

Lecture 14: Large Cache Design III. Topics: Replacement policies, associativity, cache networks, networking basics

Distributed System Chapter 16 Issues in ch 17, ch 18

Implementation of a Novel Fault Tolerant Routing Technique for Mesh Network on Chip

Flow Control can be viewed as a problem of

Low-Power Interconnection Networks

A Lightweight Fault-Tolerant Mechanism for Network-on-Chip

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

FAULT-TOLERANT ROUTING SCHEME FOR NOCS

Packet Switch Architecture

Packet Switch Architecture

Recall: The Routing problem: Local decisions. Recall: Multidimensional Meshes and Tori. Properties of Routing Algorithms

Post-Silicon Platform for the Functional Diagnosis and Debug of Networks-on-Chip

Hybrid On-chip Data Networks. Gilbert Hendry Keren Bergman. Lightwave Research Lab. Columbia University

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

A Survey of Techniques for Power Aware On-Chip Networks.

Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing?

Lecture 15: NoC Innovations. Today: power and performance innovations for NoCs

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip

Module 16: Distributed System Structures. Operating System Concepts 8 th Edition,

Dynamic Router Design For Reliable Communication In Noc

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

BARP-A Dynamic Routing Protocol for Balanced Distribution of Traffic in NoCs

Topologies. Maurizio Palesi. Maurizio Palesi 1

Formally Enhanced Verification at Runtime to Ensure NoC Functional Correctness

Lecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control

CCNoC: Specializing On-Chip Interconnects for Energy Efficiency in Cache-Coherent Servers

Part IV: 3D WiNoC Architectures

Network Control and Signalling

Switching and Forwarding Reading: Chapter 3 1/30/14 1

Efficient And Advance Routing Logic For Network On Chip

Embedded Systems: Hardware Components (part II) Todor Stefanov

Progress Report No. 15. Shared Segments Protection

Power and Area Efficient NOC Router Through Utilization of Idle Buffers

Quest for High-Performance Bufferless NoCs with Single-Cycle Express Paths and Self-Learning Throttling

Multi-level Fault Tolerance in 2D and 3D Networks-on-Chip

Module 15: Network Structures

Deadlock-free XY-YX router for on-chip interconnection network

Transcription:

1/45 1/22 MICRO-46, 9 th December- 213 Davis, California udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults Ritesh Parikh and Valeria Bertacco Electrical Engineering & Computer Science Department The University of Michigan, Ann Arbor

Device Scaling and Processor Evolution 65nm 45nm Shrinking transistor size Transition to multicore 32nm 22nm Waning reliability NoC based interconnect Simple bus Point-to-point bus High-speed ring network 2D mesh network-on-chip Core2Duo - 27 Intel Nehalem - 28 Intel Sandy Bridge - 211 Intel SCC- 29 2/45 2/22

Device Scaling and Processor Evolution 65nm Growing adoption of Network-on-Chip: 45nm 32nm 22nm Sole medium of on-chip communication Waning reliability Potential single-point of failure NoC based interconnect Shrinking transistor size Transition to multicore Simple bus Intel Identifies Cougar Point Error, Point-to-point SATA bus Connection High-speed Single ring network Point of 2D Failure. mesh network-on-chip January, 211. Cost: $7 Million Core2Duo - 27 Intel Nehalem - 28 Intel Sandy Bridge - 211 Intel SCC- 29 3/45 3/22

On-Chip Interconnect Evolution core/ip core-1 network interface core-2 network interface bus core-3 network interface router #1 core-4 network interface router #2 link Bus-based multicore/soc T data B packet B B H NoC-based multicore/soc Routers connected via links Cores connected via network interface (NI) tail T B B B H body header i n p u t s input port FIFO routing router #3 arbiters /allocators router #4 output port o u t p u t s flits crossbar credit, FIFO and routing control 4/45 4/22

5/45 5/22 Permanent Faults in NoCs Permanent wear-out Device aging NoCs: significant silicon footprint and heavy activity Detection Diagnosis Reconfiguration Recovery Detect if fault has occurred proc NI rtr S Diagnose where fault has occurred D Reconfigure network to account for fault cannot send on faulty path need to re-route around fault Recover and resume normal operation udirec solves two problems: Fault diagnosis at a fine resolution Reconfiguration to find new routes

6/45 6/22 Contributions udirec (for unified DIagnosis and REConfiguration) incorporates: routing-aware scheme for diagnosis route-reconfiguration to circumvent faults Fine-grain fault model for NoCs derived from: end-to-end diagnosis scheme frugal reconfiguration algorithm A deadlock-free routing algorithm for irregular networks with unidirectional links 1 2 3 S D 6 7 8 Error!! Error!! Error!! Tightly integrated software implementation enables: low overhead no performance overhead during fault-free execution

7/45 7/22 Rest of This Talk Prior Work Fine-Resolution Diagnosis Routing Algorithm Reconfiguration Algorithm Experimental Evaluation 1 2 3 4 5 6 7 8

Prior Art Permanent Fault Diagnosis Entire regions/routers [Puente 4] One or more bidirectional links [Fick 9] 1 2 3 4 5 Dedicated testing and high overhead 6 7 8 Reconfiguration around Faults Based on routing on irregular networks [Aisopos 11] Constrained by number and location of faults [Flich 7, Fukushima 9] lost nodes in a 64-node CMP 4 3 2 1 ~2/3 rd loss of on-chip processor nodes 2 4 6 # of permanent faults 8/45 8/22

9/45 9/22 Detection and Diagnosis Mechanism Inspired by Ghofrani et al., VTS 12 Diagnose both datapath and control faults at low area overhead (< 3%) No runtime performance overhead when NoC is fault-free Datapath faults End-to-end software based Scoreboard to collect symptoms of corrupted data host core (software score-board) router core NI core ECC router core NI core NI NI router router data packet counting and timeout error Control faults Distributed hardware based Counting and timeout techniques

1/45 1/22 Fine-Resolution Fault Diagnosis Packets are augmented with ECC [Shamshiri et al., ITC 11] Erroneous transmission are reported to SW supervisor node 1 1 2 3 4 1 +1 1 5 6 7 1 1 8 2 1 1 2 5 4 2 9 3 4 5 2 2 2 1 1 6 7 8 A B Differentiable from network ends A B Undistinguishable from network ends Finest granularity of: Routing reconfiguration End-to-end diagnosis

11/45 11/22 Fine-Grain Fault Model upstream router crossbar A B Undistinguishable: O/P port buffer downstream router crossbar Unidirectional link I/P port Crossbar contacts datapath segment packets entering KEY OBSERVATION: disable only ONE unidirectional link on most faults same packets leaving 1 2 3 4 5 6 7 8 96% faults localized to a single datapath segment, or a unidirectional link Error!! Error!! Error!!

Benefits of the Fine-Grain Fault Model Path diversity fault! 2 1 2 1 X 1 2 1 2 1 2 Connectivity 1 2 1 2 connected! 1 2 to fault! fault! disconnected! from node node Fault manifestation Coarse-grain fault model Fine-grain fault model 12/45 12/22

13/45 13/22 Rest of This Talk Prior Work Fine-Resolution Diagnosis Routing Algorithm Reconfiguration Algorithm Experimental Evaluation 1 2 3 4 5 6 7 8

Routing in Irregular Networks Routing algorithm should disable paths that lead to deadlock Up*/down* routing disables turns to avoid deadlock 1. Construct spanning tree rooted at a node (assumes bidirectional links) 2. Mark links towards root: up (down otherwise) 3. Disable all down up turns 4. Follow up links towards root and down links to destination root darker closer to root dist dist 1 1 3 2 dist 1 dist 2 root down up down up 1 towards root node up down down up away from root node down up 3 Up*/down* does not work with unidirectional links disabled turns 14/45 14/22

Routing with Unidirectional Links Separate spanning trees for up (up-tree) and down (down-tree) links Up-tree: unidirectional links towards root Down-tree: unidirectional links away from root Consistent ordering/labeling: no link marked both up and down As links are marked according to up*/down* principle (and no conflicts) udirec routing is deadlock free (disable down up turns) Network is connected if both trees are complete up-tree 1 to 2 {2 nd } {1 st } { th } root to conflict! down-tree from from 1 2 {1 st } {2 nd } { th } root 15/45 15/22

Growing Trees Concurrently Up-tree and down-tree can be constructed: Independently: may lead to inconsistent marking Concurrently: consistent labeling ensured by construction Grow tree beyond a node only if reachable by both up-tree and down-tree Choice of root node affects connectivity Step#2 Step#1 root 1 2 halt down forward down up { th } {1 st } down up {2 nd } COMPLETE disable down up turn halt down Step#1 1 2 {-} {-} down { th } up root halt up INCOMPLETE 16/45 16/22

17/45 17/22 Reconfiguration Overview Integrated with the software implementation of diagnosis scheme Suspension of network operation on fault detection Diagnosis of fault site via software scoreboard Supervisor core NI core NI Determination of surviving topology at supervisor Calculation of new routes in software Selection of root that maximizes connectivity Distribution of new deadlock-free routes to routers Resumption of operation from uncorrupted state router core NI router router core NI router

18/45 18/22 Reconfiguration Algorithm Root selection process Exhaustive search of the optimal root node Optimality of the root node Based on number of connected nodes (in our experiments) Based on critical functionality within sub-network nodes=5 nodes=3 root root 1 2 3 4 5 68 root 7 8 nodes=1 root trial #1: node 1 is root root trial #2: node 2 is root root trial #6: node 6 is root root trial #8: node 6 is root local winner! local winner! local winner!

Reconfiguration Implementation Permanent faults are rare occurrences Routing table data = 4 (directions) * 64 (destinations) * 64 (RTs) < 2 KB Distribute using unidirectional ring of 1-control and 1-data wire unidirectional ring router RT proc proc Routing table 1-bit ctrl wire 1-bit data wire RT RT -1-1---1--1- proc from previous controller RT ctrl data Serial transfer of control and data proc Distributed controller Supervisor node from proc data ctrl decode/ forward ctrl data to next controller assemble /forward update RT 19/45 19/22

2/45 2/22 avg. no. of dropped nodes 3 25 2 15 1 5 Reliability Results 1/3 rd dropped nodes 2x less partitioning up*/down* udirec 1 2 3 4 5 injected faults packet delivery rate 1.8.6.4.2 up*/down* udirec 1 2 3 4 5 injected faults As faults accumulate networks become disconnected udirec looses fewer nodes and partitions into fewer networks

avg. network latency 4 3 2 1 Performance Results 7% lower latency path diversity dominates up*/down* udirec connectivity dominates 1 2 3 4 5 saturation throughput 9 6 3 up*/down* udirec 1 2 3 4 5 injected faults injected faults Initially latency degrades gracefully; at higher fault rates up*/down* drops much more nodes, hence lower latency udirec consistently delivers higher throughput >2% higher throughput 21/45 21/22

22/45 22/22 Conclusions Proposed udirec: a unified diagnosis and reconfiguration solution Fine-grained fault diagnosis, a novel fault model, a deadlock-free routing algorithm and a software-based reconfiguration algorithm udirec drops only 1/3 rd nodes compared to state-of-the-art and delivers >2% higher throughput beyond 15 faults udirec incurs less than 1% wiring overhead 1/3 rd loss of nodes >2% higher

Thank you! Questions? 23/45 23/22