ECE 551 System on Chip Design

Similar documents
Embedded Systems: Hardware Components (part II) Todor Stefanov

SEMICON Solutions. Bus Structure. Created by: Duong Dang Date: 20 th Oct,2010

Buses. Maurizio Palesi. Maurizio Palesi 1

Embedded Systems 1: On Chip Bus

Lecture 25: Busses. A Typical Computer Organization

Buses. Disks PCI RDRAM RDRAM LAN. Some slides adapted from lecture by David Culler. Pentium 4 Processor. Memory Controller Hub.

SAMBA-BUS: A HIGH PERFORMANCE BUS ARCHITECTURE FOR SYSTEM-ON-CHIPS Λ. Ruibing Lu and Cheng-Kok Koh

Chapter 2 The AMBA SOC Platform

The CoreConnect Bus Architecture

A High Performance Bus Communication Architecture through Bus Splitting

EE108B Lecture 17 I/O Buses and Interfacing to CPU. Christos Kozyrakis Stanford University

SoC Interconnect Bus Structures

Embedded Busses. Large semiconductor. Core vendors. Interconnect IP vendors. STBUS (STMicroelectronics) Many others!

Bus AMBA. Advanced Microcontroller Bus Architecture (AMBA)

SoC Design Lecture 11: SoC Bus Architectures. Shaahin Hessabi Department of Computer Engineering Sharif University of Technology

VLSI Design of Multichannel AMBA AHB

On-Chip Communications

Synchronous Bus. Bus Topics

2. System Interconnect Fabric for Memory-Mapped Interfaces

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Chapter 3. Top Level View of Computer Function and Interconnection. Yonsei University

Hardware Design. MicroBlaze 7.1. This material exempt per Department of Commerce license exception TSU Xilinx, Inc. All Rights Reserved

Interconnection Structures. Patrick Happ Raul Queiroz Feitosa

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics

An introduction to SDRAM and memory controllers. 5kk73

Unit 3 and Unit 4: Chapter 4 INPUT/OUTPUT ORGANIZATION

Chapter 6. I/O issues

SONICS, INC. Sonics SOC Integration Architecture. Drew Wingard. (Systems-ON-ICS)

DESIGN A APPLICATION OF NETWORK-ON-CHIP USING 8-PORT ROUTER

Interfacing. Introduction. Introduction Addressing Interrupt DMA Arbitration Advanced communication architectures. Vahid, Givargis

Design of an Efficient FSM for an Implementation of AMBA AHB in SD Host Controller

Keywords- AMBA, AHB, APB, AHB Master, SOC, Split transaction.

Interconnecting Components

Page 1. ElapC5 05/11/2012 ELETTRONICA APPLICATA E MISURE 2012 DDC 1. C5 Bus protocols. Ingegneria dell Informazione

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Design of an AMBA AHB Reconfigurable Arbiter for On-chip Bus Architecture

Replacement Policy: Which block to replace from the set?

ECE 485/585 Microprocessor System Design

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

A More Sophisticated Snooping-Based Multi-Processor

Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses

Lecture 2: Topology - I

INPUT/OUTPUT ORGANIZATION

ISSN:

Novel Intelligent I/O Architecture Eliminating the Bus Bottleneck

AMBA AHB Bus Protocol Checker

DEVELOPMENT AND VERIFICATION OF AHB2APB BRIDGE PROTOCOL USING UVM TECHNIQUE

Memory. Lecture 22 CS301

Page 1. SMP Review. Multiprocessors. Bus Based Coherence. Bus Based Coherence. Characteristics. Cache coherence. Cache coherence

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

Bus System. Bus Lines. Bus Systems. Chapter 8. Common connection between the CPU, the memory, and the peripheral devices.

Input/Output Introduction

Introduction. Motivation Performance metrics Processor interface issues Buses

Introduction to Embedded System I/O Architectures

A Basic Snooping-Based Multi-processor

INPUT/OUTPUT ORGANIZATION

The Nios II Family of Configurable Soft-core Processors

The RM9150 and the Fast Device Bus High Speed Interconnect

The von Neuman architecture characteristics are: Data and Instruction in same memory, memory contents addressable by location, execution in sequence.

White Paper AHB to Avalon & Avalon to AHB Bridges

6 Direct Memory Access (DMA)

Chapter 5 Input/Output Organization. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

Ref: AMBA Specification Rev. 2.0

CS152 Computer Architecture and Engineering Lecture 20: Busses and OS s Responsibilities. Recap: IO Benchmarks and I/O Devices

I/O Handling. ECE 650 Systems Programming & Engineering Duke University, Spring Based on Operating Systems Concepts, Silberschatz Chapter 13

Lecture #9-10: Communication Methods

4. Networks. in parallel computers. Advances in Computer Architecture

MULTIPROCESSORS. Characteristics of Multiprocessors. Interconnection Structures. Interprocessor Arbitration

PCI and PCI Express Bus Architecture

Quantitative Analysis of Transaction Level Models for the AMBA Bus

Unit 1. Chapter 3 Top Level View of Computer Function and Interconnection

Chapter 9 Multiprocessors

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Chapter 2 Designing Crossbar Based Systems

Switching. An Engineering Approach to Computer Networking

Design & Implementation of AHB Interface for SOC Application

Sri Vidya College of Engineering and Technology. EC6703 Embedded and Real Time Systems Unit IV Page 1.

A MULTIPROCESSOR SYSTEM. Mariam A. Salih

Shared Memory Multiprocessors

ECE 485/585 Microprocessor System Design

Network-on-Chip Architecture

Handout 3 Multiprocessor and thread level parallelism

NoC Generic Scoreboard VIP by François Cerisier and Mathieu Maisonneuve, Test and Verification Solutions

Introduction to the Qsys System Integration Tool

AHB-Lite Multilayer Interconnect IP. AHB-Lite Multilayer Interconnect IP User Guide Roa Logic, All rights reserved

PON Functional Requirements: Services and Performance

EE414 Embedded Systems Ch 5. Memory Part 2/2

Shared Memory Multiprocessors. Symmetric Shared Memory Architecture (SMP) Cache Coherence. Cache Coherence Mechanism. Interconnection Network

Introducing Multi-core Computing / Hyperthreading

Multiprocessors & Thread Level Parallelism

Lecture 9: Bridging. CSE 123: Computer Networks Alex C. Snoeren

Chapter 6 Storage and Other I/O Topics


Portland State University ECE 587/687. Superscalar Issue Logic

Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Lecture 9: Bridging & Switching"

ISSN Vol.03, Issue.08, October-2015, Pages:

COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence

Interconnect Routing

Transcription:

ECE 551 System on Chip Design Introducing Bus Communications Garrett S. Rose Fall 2018

Emerging Applications Requirements

Data Flow vs. Processing µp µp Mem Bus DRAMC Core 2 Core N Main Bus µp Core 1 SoCs Sub system I/O Bus Circa 2002 SoCs Circa 2008 Critical Decision Was up Choice Data flow replacing data processing major design challenge Exploding core counts requiring more advanced interconnects EDA cannot solve this architectural problem easily Complexity too high to hand craft (and verify!) Critical Decision Is Interconnect Choice Communication Architecture Design and Verification becoming Highest Priority in Contemporary SoC Design!

Example: IBM Cell Ring Bus

Tech Scaling Trends: On Chip Interconnect Length

Tech Scaling Trends: Interconnect Performance Increasing wire delay limits achievable performance

Bus Based Communication Architectures Buses are the simplest and most widely used SoC interconnection networks Bus collection of signals (wires) with one or more IP components connected Only one IP component can transfer data on the shared bus at any given time Microcontroller Digital Signal Processor Input/ Output Device Memory

Bus Terminology

Bus Terminology Master (or Initiator) IP component that initiates a read or write data transfer Slave (or Target) IP component that does not initiate transfers and only responds to incoming transfer requests Arbiter Controls access to the shared bus Uses arbitration scheme to select master to grant access to bus Decoder Determines which component a transfer is intended for Bridge Connects two busses Acts as slave on one side and master on the other

Bus Signal Lines address lines data lines control lines Address Carry address of destination for which transfer is initiated Can be shared or separate for read, write data Data Carry information between source and destination components Can be shared or separate for read, write data Choice of data width critical for application performance Control (Requests and acknowledgements) Specify more information about type of data transfer Byte enable, burst size, cacheable/bufferable, write-back/through

Bus Topologies: Shared

Bus Topologies: Hierarchical Improves system throughput Multiple ongoing transfers on different buses

Bus Topologies: Split Bus Reduces impact of capacitance across two segments Reduces contention and energy

Bus Topologies: Full Matrix/Crossbar

Bus Topologies: Full Matrix/Crossbar

Bus Topologies: Ring Bus ** The IBM CELL Processor uses a ring bus

Bus Physical Structure: Tri-State Buffer Based Commonly used in off-chip/backplane buses Pro: take up fewer wires, smaller area footprint Con: higher power consumption & delay, hard to debug

Bus Physical Structure: MUX Based Separate read, write channels

Bus Physical Structure: AND-OR Based

Bus Clocking: Synchronous Includes a clock in control lines Fixed protocol for communication that is relative to clock Involves very little logic and can run very fast Require frequency converters across frequency domains

Bus Clocking: Asynchronous Not clocked Requires a handshaking protocol Performance not as good as that of synchronous bus No need for frequency converters, but does need extra lines Does not suffer from clock skew like the synchronous bus

Decoding and Arbitration Decoding determines the target for any transfer initiated by a master Arbitration decides which master can use the shared bus if more than one master request bus access simultaneously Decoding and Arbitration can be: centralized distributed

Centralized Decoding and Arbitration Minimal change is required if new components are added to the system

Distributed Decoding and Arbitration Pro: requires fewer signals compared to the centralized approach Con: more hardware duplication, more logic/area, less scalable

Arbitration Schemes Random Randomly select master to grant bus access to Static priority Masters assigned static priorities Higher priority master request always serviced first Can be pre-emptive (AMBA2) or non-preemptive (AMBA3) May lead to starvation of low priority masters Round Robin (RR) Masters allowed to access bus in a round-robin manner No starvation every master guaranteed bus access Inefficient if masters have vastly different data injection rates High latency for critical data streams

Arbitration Schemes TDMA Time division multiple access Assign slots to masters based on BW requirements If a master does not have anything to read/write during its time slots, leads to low performance Choice of time slot length and number critical TDMA/RR Two-level scheme If master does not need to utilize its time slot, second level RR scheme grants access to another waiting master Better bus utilization Higher implementation cost for scheme (more logic, area)

Arbitration Schemes Dynamic priority Dynamically vary priority of master during application execution Gives masters with higher injection rates a higher priority Requires additional logic to analyze traffic at runtime Adapts to changing data traffic profiles High implementation cost (several registers to track priorities and traffic profiles) Programmable priority Simpler variant of dynamic priority scheme Programmable register in arbiter allows software to change priority

Bus Data Transfer Modes: Single Non-Pipelined Simplest transfer mode first request for access to bus from arbiter on being granted access, set address and control signals Send/receive data in subsequent cycles

Bus Data Transfer Modes: Pipelined Overlap address and data phases Only works if separate address and data buses are present

Bus Data Transfer Modes: Non-Pipelined Burst Send multiple data, arbitrate once for entire transaction Master indicates to arbiter intention to perform burst transfer Saves time spent requesting for arbitration

Bus Data Transfer Modes: Pipelined Burst Useful when separate address and data buses available Reduces data transfer latency

Bus Data Transfer Modes: Split Transfer If slaves take a long time to read/write data, it can prevent other masters from using the bus Split transfers improve performance by splitting a transaction Master sends read request to slave Slave relinquishes control of bus as it prepares data Arbiter can grant bus access to another waiting master Allows utilizing otherwise idle cycles on the bus When slave is ready, it requests bus access from arbiter On being granted access, it sends data to master Explicit support for split transfers required from slaves and arbiters (additional signals, logic)

Bus Data Transfer Modes: Out-of-Order Transfer Multiple transfers from different masters, or even from the same master, are SPLIT by slave and are in progress simultaneously on single bus Masters can initiate transfers without waiting for earlier transfers to complete Allows better parallelism, performance in buses Additional signals needed to transmit IDs for every data transfer in the system Master interfaces need to be extended to handle data transfer IDs and be able to reorder received data Slave interfaces have out-of-order buffers for reads, writes, to keep track of pending transactions, plus logic for processing IDs Any application typically has a limited buffer size beyond which performance doesn t increase

Bus Data Transfer Modes: Broadcast Every time a data item is transmitted over a bus, it is physically broadcast to every component on the bus Useful for snooping and cache coherence protocols Example: when several components on bus have a private cache fed from a single memory, a problem arises when the memory is updated when a cache line is written to memory by a component It is essential that private caches of the components on the bus invalidate (or update) their cache entries to prevent reading incorrect values Broadcasting allows address of the memory location (or cache line) being updated to be transmitted to all the components on the bus, so they can invalidate (or update) their local copies