SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

Similar documents
FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

The CoreConnect Bus Architecture

Lecture 18: Communication Models and Architectures: Interconnection Networks

Reconfigurable Computing. On-line communication strategies. Chapter 7

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

Lecture 3: Flow-Control

Network on Chip Architecture: An Overview

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology

NOC Deadlock and Livelock

SoC Design Lecture 11: SoC Bus Architectures. Shaahin Hessabi Department of Computer Engineering Sharif University of Technology

A Dynamic NOC Arbitration Technique using Combination of VCT and XY Routing

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

JUNCTION BASED ROUTING: A NOVEL TECHNIQUE FOR LARGE NETWORK ON CHIP PLATFORMS

Lecture 12: Interconnection Networks. Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E)

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

SONA: An On-Chip Network for Scalable Interconnection of AMBA-Based IPs*

Buses. Maurizio Palesi. Maurizio Palesi 1

Extended Junction Based Source Routing Technique for Large Mesh Topology Network on Chip Platforms

Flow Control can be viewed as a problem of

VLSI Design of Multichannel AMBA AHB

4. Networks. in parallel computers. Advances in Computer Architecture

Deadlock and Livelock. Maurizio Palesi

Lecture: Interconnection Networks

High Performance Interconnect and NoC Router Design

Dynamic Router Design For Reliable Communication In Noc

ECE 669 Parallel Computer Architecture

OASIS NoC Architecture Design in Verilog HDL Technical Report: TR OASIS

OASIS Network-on-Chip Prototyping on FPGA

Embedded Busses. Large semiconductor. Core vendors. Interconnect IP vendors. STBUS (STMicroelectronics) Many others!

Routing Algorithm. How do I know where a packet should go? Topology does NOT determine routing (e.g., many paths through torus)

SEMICON Solutions. Bus Structure. Created by: Duong Dang Date: 20 th Oct,2010

NOC: Networks on Chip SoC Interconnection Structures

Networks-on-Chip Router: Configuration and Implementation

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

Efficient And Advance Routing Logic For Network On Chip

Qsys and IP Core Integration

Deadlock-free XY-YX router for on-chip interconnection network

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

NOW Handout Page 1. Outline. Networks: Routing and Design. Routing. Routing Mechanism. Routing Mechanism (cont) Properties of Routing Algorithms

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP

Lecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance

Lecture 7: Flow Control - I

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

TDT Appendix E Interconnection Networks

Packet Switch Architecture

Packet Switch Architecture

Embedded Systems: Hardware Components (part II) Todor Stefanov

Final Presentation. Network on Chip (NoC) for Many-Core System on Chip in Space Applications. December 13, 2017

Interconnection Networks: Flow Control. Prof. Natalie Enright Jerger

Architecture and Design of Efficient 3D Network-on-Chip for Custom Multi-Core SoC

Routing Algorithms, Process Model for Quality of Services (QoS) and Architectures for Two-Dimensional 4 4 Mesh Topology Network-on-Chip

Lecture 25: Interconnection Networks, Disks. Topics: flow control, router microarchitecture, RAID

Introduction to the Qsys System Integration Tool

Noc Evolution and Performance Optimization by Addition of Long Range Links: A Survey. By Naveen Choudhary & Vaishali Maheshwari

NoC Test-Chip Project: Working Document

Keywords- AMBA, AHB, APB, AHB Master, SOC, Split transaction.

Evaluation of NOC Using Tightly Coupled Router Architecture

Demand Based Routing in Network-on-Chip(NoC)

Lecture 16: On-Chip Networks. Topics: Cache networks, NoC basics

ISSN Vol.03,Issue.06, August-2015, Pages:

Design of Router Architecture Based on Wormhole Switching Mode for NoC

EECS 570. Lecture 19 Interconnects: Flow Control. Winter 2018 Subhankar Pal

Re-Examining Conventional Wisdom for Networks-on-Chip in the Context of FPGAs

Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies. Admin

SoC Interconnect Bus Structures

Lecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control

Network-on-Chip Architecture

Designing Customizable Network-on-Chip with support for Embedded Private Memory for Multi- Processor System-on-Chips

Fast Flexible FPGA-Tuned Networks-on-Chip

Interconnection Networks

Midterm Exam. Solutions

Lecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels

Design of network adapter compatible OCP for high-throughput NOC

The Network Layer and Routers

A Comparison of Five Different Multiprocessor SoC Bus Architectures

Organic Computing. Dr. rer. nat. Christophe Bobda Prof. Dr. Rolf Wanka Department of Computer Science 12 Hardware-Software-Co-Design

Joint consideration of performance, reliability and fault tolerance in regular Networks-on-Chip via multiple spatially-independent interface terminals

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Design and Implementation of a Packet Switched Dynamic Buffer Resize Router on FPGA Vivek Raj.K 1 Prasad Kumar 2 Shashi Raj.K 3

Design of an Efficient FSM for an Implementation of AMBA AHB in SD Host Controller

Lecture: Transactional Memory, Networks. Topics: TM implementations, on-chip networks

Interconnection Networks: Routing. Prof. Natalie Enright Jerger

ISSN:

Chapter 2 The AMBA SOC Platform

Dynamic Stress Wormhole Routing for Spidergon NoC with effective fault tolerance and load distribution

A Literature Review of on-chip Network Design using an Agent-based Management Method

Design and Test Solutions for Networks-on-Chip. Jin-Ho Ahn Hoseo University

Design of a System-on-Chip Switched Network and its Design Support Λ

Communication Services for Networks on Chip

A Modified NoC Router Architecture with Fixed Priority Arbiter

Cross Clock-Domain TDM Virtual Circuits for Networks on Chips

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema

Chapter 6 Storage and Other I/O Topics

SAMBA-BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ. Ruibing Lu and Cheng-Kok Koh

International Journal of Research and Innovation in Applied Science (IJRIAS) Volume I, Issue IX, December 2016 ISSN

Interconnect Technology and Computational Speed

Design and Analysis of On-Chip Router for Network On Chip

Network-on-chip (NOC) Topologies

Transcription:

SoC Design Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

Chapter 5 On-Chip Communication

Outline 1. Introduction 2. Shared media 3. Switched media 4. Network on Chip 5. NoC General Implementation 6. NoC Problem Oriented Implementation 3

1. Introduction Communication channels are need between components on a chip A communication channel must provide o Communication media o Communication protocol Implementation is an optimization problem with trade-off o Speed o Resources consumption o reliability Two implementation possibilities exist o Shared media Ex: bus o Switched media Crossbar switch 4

2. Shared media Use of a common communication link Only one master at a time o The master writes on a BUS and all components listen Advantage o Resource efficient o Broadcast easy Drawbacks o slow o Need for arbitration Centralized arbitration Ex: PCI, CoreConnect, AMBA Decentral arbitration Ex: CAN, Ethernet o Not fault tolerant No communication possible on failure Mod4 Mod 1 Arbiter Mod3 Mod2 5

2. SoC-Buses ARM AMBA o Consist of two BUS-Systems Advance High-Speed Bus (AHB): high-performance system interconnect for connecting processor to high-performance modules Advance Peripheral Bus (APB), used to connect the slower peripherals o Cross Communication via a bridge 6

2. SoC-Buses IBM CoreConnect o Processor Local Bus (PLB): highperformance bus, used to connect high-bandwidth devices such as processor cores, memory, and DMA controllers o On-Chip Peripheral Bus (OPB): a secondary bus used to decoupled the peripherals from the PLB in order to avoid a lost of system performance o Device Control Register (DCR): allows lower performance status and configuration registers to be read and written 7

2. SoC-Buses SoC-Buses are usually realized using large OR-Gates o Easy to implement 8

3. Switched media Supports point-to-point communication Many masters at a time Advantage o Performance o Tailored communication o No need for arbitration o Not fault tolerant Mod4 Mod 1 Drawbacks o Resource hungry Mod2 Mod3 9

3. SoC-Switched media Altera AVALON: designed to provide greater flexibility and performance, while consuming minimal logic resources than shared system bus Binds together components in a system based on the Avalon interface connects Avalon master and slave ports on components in a system Some features o Components of differing data widths o Components operating in different clock domains o Components using multiple Avalon ports 10

3. SoC-Switched media Silicore Whishbone: designed to foster design reuse by alleviating systemon-a-chip integration problems This is accomplished by creating a common, logical interface between IP cores Improved portability and reliability of the system faster time-to-market for the end user The Wisbone specification makes use of o RULES, o RECOMMENDATIONS, o SUGGESTIONS, o PERMISSIONS and o OBSERVATIONS simple, open, highly configurable interface 11

3. SoC-Switched media Slave Slave Master Interconnection Point-to-point Data flow Shared bus Crossbar switch Master Master Master Slave Created by the designer Concrete implementation by the System integrator 12

3. SoC-Switched media 1-D Switching: Reconfigurable Multiple Bus Network (RMB) o Connections among components are dynamically realized at run-time o Drawback: Time consuming computation for the route Resources hungry o Advantage: Flexibility Switch 1 2 3 4 5 13

3. SoC-Switched media Controller: o manages the switch at a local level o receive requests from the left, right, local o Four kinds of command REQUEST, REPLY, CANCEL, DESTROY Data network: o Transportation of data and command FIFOs: o buffer for commands coming from different side of the crosspoint 14

3. SoC-Switched media 2-D Switching: Reconfigurable Multiple Bus Network (RMB) o Drawback: Time consuming computation for the route Resources hungry o Advantage: Flexibility Switches 15

4. Network on Chip Hemani Kumar & Jantsch [00], Benini & DeMicheli [02], Dally [01] A Network on Chip consists of o A set of processing elements (PE) Processor Memory Custom hardware block o A set of router Route message to destination Communication is done by sending packets 16

4. Network on Chip Router o Must Fast and efficient o 5 inputs and outputs o Input-FIFOs o Data lines o Address lines o Additional control signals to neighbors Router structure o Router control via messages o Messages are sent in packets consisting of the address of the destination router, control bits and the payload (data) 17

4. Network on Chip FIFO-Design 18

4. Network on Chip Output-Arbiter-Design 19

4. Network on Chip - Routing Circuit Switching steps: 1. routing probe traversing the network from source to destination 2. upon reaching the address, an acknowledgment is sent back to the source address 3. data are transferred at the full bandwidth of the hardware 4. release the lock on the links at the end of the transmission Store-and-Forward (SAF) 1. The packets are temporally store in nodes 2. The routing information examined to determine which output channel to direct the packet to Virtual Cut-Through (VCT) o Address the deficiency of SAF (buffering message at each nodes) o Does not store the packet in a node if an output channel is free o Packets cut through the router of the node to an available output channel 20

4. Network on Chip - Routing o Reduce the hop count (#packet stations) o Alleviates the need for an excessive amount of memory along the path of a message Wormhole routing o Conceived to address the deficiency in VCT if an output channel is not available the packet must be stored in the current node s memory o Wormhole routing divides a message into smaller flow-control digits than packets called flits o Each message contains: one header flit, which carries the routing and control information data flits to store the remaining data for the message o The header flit always goes first to allocate a path for the data flits less memory requirement o If an output channel is available, the header flit is routed and the remaining data flits follow in a pipeline style fashion 21

4. Network on Chip - Routing Deterministic routing o XY-Routing if (Xrouter < Xdest) the packet ist forwarded in the east direction if (Xrouter > Xdest) the packet ist forwarded in the west direction if (Xrouter = Xdest and Yrouter > Ydest) the packet is sent to the south of the current router if (Xrouter = Xdest and Yrouter < Ydest) the packet is sent to the north of the current router if (Xrouter = Xdest and Yrouter = Ydest) the packet is sent to the local PE 22

4. Network on Chip - Routing Adaptive routing o the direction where to send an incoming packet is not fixed a priori o The routing algorithm may decides to use more complex schemes for routing o Usually used to improve the performance in the presence of localized traffic provide fault-tolerance in the network o Packets are not always routed along the shortest path Ex: Q-routing o adaptive routing algorithm based on Q-learning, a form of reinforcement learning o Initially build a routing table based on the delivery times (Q values) of the packets to every router in the network o Delivery times are updated every time a router forwards a packet for a particular destination o Router learn with the time the efficient route to all destination 23

4. Network on Chip - Routing performance metrics o Latency: time a message need from its source to its destination difference between the time where the last packet of the message arrives at destination and the time when the first packet of the message is output from the source o Throughput: maximum traffic a network can accept per unit of time typically measured as bytes or packets per node per cycle Deadlock and Livelock o Deadlock is a situation that occurs when a packet is waiting for an event that can never happen due to a circular dependence on resources o Livelock, on the other hand, is a configuration of the network in which packets continue to move, but never reach their destination 24

5. 3x3 NoC Implementation (1,1) (2,1) (3,1) (1,1) (2,1) (3,1) TC LV TC LV (2,2) (3,2) (1,2) (2,2) (3,2) (1,2) (1,3) (2,3) VGA (3,3) (1,3) (2,3) VGA (3,3) Area constraint in PACE PAR in FPGA Editor 25

6. NoC Efficient Implementation - ClusteRing Transceiver 1 Transceiver 2 LB 1 LB Mas ter R Ring Slave Ring Slave R LB Mas ter LB 2 LB Slave S Ring Master Ring 1 Ring Master S LB Slave LB Mas ter R Ring Slave Ring Slave R LB Mas ter LB Slave S Ring Master Ring LB LB 3 LB 4 Master Transceiver 3 Transceiver 4 Ro uter S Slave Ring 2 26

6. NoC ClusteRing Transceiver & Router RAM RAM Reg Reg ProcA FSM 2 FSM 1 FSM 2 FSM 1 ProcB Reg Reg Ring FSM 1 FSM 2 Reg Reg FSM 1 FSM 2 reg Reg 27

6. NoC ClusteRing Data transfer protocol Client 0 Client 1 Client 2 Client 1: # of bytes Client 1: Status Code Client 0: # of bytes Client 0: Status Code Client 2: # of bytes Client 2: Status Code Client 1: # of bytes Client 1: Status Code Client n: # of bytes Client n: Status Code Received data Client n: # of bytes Client n: Status Code Received data Client n: # of bytes Client n: Status Code Received data 28

6. NoC ClusteRing Case study SVD: hardwarenah o 8x8 Matrix 1 Prozessor: 149 us 2 Prozessoren: 151 us 4 Prozessoren: 160 us o 200x32 Matrix 1 Prozessor: 59707 us 2 Prozessoren: 36534 us 31839 Berechnung (88 %) MB Proc DDR RAM Perif. MB Proc Block RAM Ring Block RAM MB Proc 4694 Kommunikation (12 %) 4 Prozessoren: 18150 us 12960 us Berechnung (71%) 5190 us Kommunikation (29 %) Block RAM MB Proc 29

6. NoC The Singular Value Decomposition (SVD) A = U *Σ* V T P1 P2 Pn 30

Computation of the SVD 31

Parallel implementation Because the post multiplication of A (k) by Q (k) affects only the columns i and j, a parallel implementation is possible. Pairwise column orthogonalization (Brent & Luk) Mapping of virtual processors to physical ones 32

Parallel Implementation Block Orthogonalization: 33