Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies Mohsin Y Ahmed Conlan Wesson
Overview NoC: Future generation of many core processor on a single chip Current multicore processor cores communicate over shared bus. Only one core can send a message at a time. Limited number of cores.
Overview (Contd.) NoC allows for more cores i.e. ensuring scalability. Multiple cores to send messages simultaneously. Somewhat similar to computer network. Route Packets, Not Wires - William J. Dally, Stanford University, NVIDIA
Interconnect Technology The shared medium arbitrated bus: most frequently used on-chip interconnect architecture. All communication devices share the same transmission medium. Advantages o o o simple topology low area cost extensibility
Interconnect Technology (Contd.) Disadvantages o High intrinsic parasitic resistance and capacitance o Increased delay in bit transfer with increase in processing elements, eventually exceed the targeted clock period o Limits the system scalability
Novel NoC Architectures A network-on-chip (NoC) resembles the interconnect architecture of high-performance parallel computing systems. The functional IP blocks communicate with each other with the help of intelligent switches. NoC allows the decoupling of the processing elements (i.e., the IPs) from the communication fabric (i.e., the network). Employs explicit parallelism, exhibits modularity to minimize the use of global wires, and utilizes locality for power minimization.
SPIN SPIN Scalable, Programmable, Integrated Network. Uses a fat-tree architecture. Every node has four children and the parent is replicated four times at any level.
BFT BFT - Butterfly Fat-Tree. The IPs are placed at the leaves and switches placed at the vertices. At each subsequent level, the number of required switches reduces by a factor of 2.
CLICHE CLICHE (Chip-Level Integration of Communicating Heterogeneous Elements. Consists of an m x n mesh of switches interconnecting computational resources (IPs). Every switch, except those at the edges, is connected to four neighboring switches and one IP block.
2D Torus Basically the same as a regular mesh. Only difference is that the switches at the edges are connected to the switches at the opposite edge through wrap-around channels. Long end-around connections can yield excessive delays.
Folded Torus The long end around delay can be avoided by folding the torus. This renders to a more suitable VLSI implementation.
Octagon Communication between any pair of nodes takes at most two hops within the basic octagonal unit. Each functional IP has dedicated switch.
SWITCHING METHODOLOGIES Switching techniques determine o o When and how internal switches connect their inputs to outputs The time at which message components may be transferred along these paths Different types of switching techniques o o o Circuit Switching, Packet Switching Wormhole Switching
Circuit Switching A physical path from source to destination is reserved prior to the transmission of the data. The path is held until all the data has been transmitted. Network bandwidth is reserved for the entire duration of the data. Valuable resources are also tied up for the duration of the transmitted data. Set up of an end-to-end path may cause unnecessary delays.
Packet Switching Data is divided into fixed-length blocks called packets. Whenever the source has a packet to be sent, it transmits the data. The need for storing entire packets in a switch in case of conventional packet switching makes the buffer requirement high. In an NoC environment, the requirement is that switches should not consume a large fraction of silicon area compared to the IP blocks.
Wormhole Switching Packets are divided into fixed length flow control units (flits). The input and output buffers are expected to store only a few flits. The buffer space requirement in the switches is small i.e. the switches are small and compact. The first flit, i.e., header flit, of a packet contains routing information. Header flit decoding enables the switches to establish the path and subsequent flits simply follow this path in a pipelined fashion.
Wormhole Switching (Contd.) Each incoming data flit of a message packet is simply forwarded along the same output channel as the preceding data flit. No packet reordering is required at destinations Drawbacko o o Transmission of distinct messages cannot be interleaved or multiplexed. Messages must cross the channel in their entirety before the channel can be used by another message. Decrease channel utilization if a flit from a given packet is blocked in a buffer.
Wormhole Switching (Contd.) By introducing virtual channels in the input and output ports, channel utility can be increased considerably. If a flit belonging to a particular packet is blocked in one of the virtual channels, then flits of alternate packets can use the other virtual channel buffers.
NoC PERFORMANCE METRICS It is desirable that an NoC interconnect architecture exhibits high throughput, low latency, energy efficiency, and low area overhead. In today s power constrained environments, it is increasingly critical to be able to identify the most energy efficient architectures and to be able to quantify the energy-performance trade-offs.
Message Throughput Message throughput is measured as the fraction of the maximum load that the network is capable of physically handling. Throughput 1 corresponds to all end nodes receiving one flit every cycle. Measured in flits/cycle/ip.
Transport Latency Defined as the time (in clock cycles) that elapses from between the occurrence of a message header injection into the network at the source node and the occurrence of a tail flit reception at the destination node. Depending on the source/destination pair and the routing algorithm, each message may have a different latency.
Experimental Results
Experimental Results (Contd.)
Experimental Results (Contd.)
Wireless NoC Replacement of some long wired lines by RF wireless links. On chip Carbon Nano Tube (CNT) antennas. Long range wireless links, short wire-line links.
Wireless NoC Architecture The WiNoC architecture is based on the Small World property. Networks with the small world property have a very small average path length. A small-world topology can be constructed from a locally connected network by rewiring connections randomly to any other node, which creates short-cuts in the network.
Scale Free Networks Maximum nodes have low degree. Few nodes have very high degree.
Wireless NoC Architecture (Contd.) The whole system is divided into multiple small clusters of neighboring cores called subnets. The cores in a subnet are connected to a centrally located hub through direct links. The hubs from all subnets are connected in a 2 nd level network. Due to limitations of wireless links, a few wireless links are distributed between hubs separated by relatively long distances.
WiNoC Experimental Results
WiNoC Experimental Results (Contd.)
NoC Security It is likely to have cores and other devices of different manufacturers embedded on a single chip. Makes vulnerable to hardware Trojans. Malicious Trojans try to bypass or disable the security fence of a system. It can continuously broadcast garbage data, leak confidential information by radio emission, or route flits in wrong directions or even tamper the flits. As soon as a hardware Trojan is detected in a system, it may required to remove from the system immediately with minimum effect on the system.
Fault Tolerant NoC Architecture We performed a study to find a NoC architecture which would show maximum fault tolerance in case of a node deletion. Study performed on both Mesh and Small World topologies. For the small world topology, we devised an algorithm for finding an attack tolerant architecture by iteratively reorganizing the initial topology.
Routing Algorithm Dijkstra s shortest path routing is adopted for routing the SW NoC. This graph search algorithm solves the single-source shortest path problem for a graph with nonnegative edge path costs, producing a shortest path tree.
Optimal Fault Tolerant Architecture The attack tolerant architecture is achieved by applying an algorithm based on Simulated Annealing. Specific cores in the small world topology are attacked i.e. they are isolated from all their neighbors so that they can neither send nor receive flits. The topology is reorganized iteratively until convergence of throughput by reordering one of its existing link.
Simulated Annealing Metrics M = (i, j) d (i, j) / N(N-1), where i, j are NoC cores, d(i, j) are their shortest path distance according to Dijkstra s algorithm and N is the total number of cores in the system. ρ = dm/ dl, where L is the number of levels of neighbors up to which a core is attacked. The objective is to minimize ρ to find an optimal solution.
Simulated Annealing Algorithm Initial Network Setup ρ < ρ? no Current Network = Initial network yes Generate uniform random number r in [0, 1] Compute Metric for Current Network, ρ Generate New Network Configuration, Compute new Metric ρ Rendomly pick & rewire 1 link Dijkstra Routing Algorithm Current Network = New network Reached convergence? no yes yes itr * e (ρ ρ ) > r? no Optimal network configurati on
Simulation Results
Questions? THANK YOU