Memory Architectures for NoC-Based Real-Time Mixed Criticality Systems

Similar documents
ARTIST-Relevant Research from Linköping

Hardware Design. MicroBlaze 7.1. This material exempt per Department of Commerce license exception TSU Xilinx, Inc. All Rights Reserved

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CONTENTION IN MULTICORE HARDWARE SHARED RESOURCES: UNDERSTANDING OF THE STATE OF THE ART

BlueVisor: A Scalable Real-time Hardware Hypervisor for Many-core Embedded System

EE414 Embedded Systems Ch 5. Memory Part 2/2

Real-Time Mixed-Criticality Wormhole Networks

FCUDA-NoC: A Scalable and Efficient Network-on-Chip Implementation for the CUDA-to-FPGA Flow

A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems

Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

STLAC: A Spatial and Temporal Locality-Aware Cache and Networkon-Chip

Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema

Lecture 18: Communication Models and Architectures: Interconnection Networks

[ ] In earlier lectures, we have seen that switches in an interconnection network connect inputs to outputs, usually with some kind buffering.

Preview. Memory Management

SpaceWire-RT. SpaceWire-RT Status SpaceWire-RT IP Core ASIC Feasibility SpaceWire-RT Copper Line Transceivers

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Ultra-Fast NoC Emulation on a Single FPGA

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Thomas Moscibroda Microsoft Research. Onur Mutlu CMU

The RM9150 and the Fast Device Bus High Speed Interconnect

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Hardware Design. University of Pannonia Dept. Of Electrical Engineering and Information Systems. MicroBlaze v.8.10 / v.8.20

CS250 VLSI Systems Design Lecture 9: Patterns for Processing Units and Communication Links

Chapter 8 Memory Basics

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road QUESTION BANK (DESCRIPTIVE) UNIT-I

Dell PowerEdge 11 th Generation Servers: R810, R910, and M910 Memory Guidance

A Cache Hierarchy in a Computer System

Embedded Systems Design: A Unified Hardware/Software Introduction. Outline. Chapter 5 Memory. Introduction. Memory: basic concepts

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 5 Memory. Outline. Introduction

Copyright 2012, Elsevier Inc. All rights reserved.

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

Trade Offs in the Design of a Router with Both Guaranteed and Best-Effort Services for Networks on Chip

CS 856 Latency in Communication Systems

COSC 6385 Computer Architecture - Memory Hierarchies (II)

Understanding and Using the Controller Area Network Communication Protocol

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Distributed Embedded Systems and realtime networks

A Cost Model for Data Stream Processing on Modern Hardware Constantin Pohl, Philipp Götze, Kai-Uwe Sattler

An introduction to SDRAM and memory controllers. 5kk73

NoC Test-Chip Project: Working Document

GPU Architecture. Alan Gray EPCC The University of Edinburgh

Embedded Systems: Hardware Components (part II) Todor Stefanov

High Performance Computing Lecture 26. Matthew Jacob Indian Institute of Science

Negotiating the Maze Getting the most out of memory systems today and tomorrow. Robert Kaye

LECTURE 5: MEMORY HIERARCHY DESIGN

Real-Time (Paradigms) (47)

A Dual-Criticality Memory Controller (DCmc): Proposal and Evaluation of a Space Case Study

AirTight: A Resilient Wireless Communication Protocol for Mixed- Criticality Systems

COSC 6385 Computer Architecture - Memory Hierarchies (III)

Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Interconnection Networks

PC-based data acquisition II

ISSN Vol.03, Issue.08, October-2015, Pages:

Memory System Design. Outline

Network Calculus: A Comparison

+ Random-Access Memory (RAM)

Hardware Implementation of TRaX Architecture

CENG4480 Lecture 09: Memory 1


Contents. Memory System Overview Cache Memory. Internal Memory. Virtual Memory. Memory Hierarchy. Registers In CPU Internal or Main memory

EE 457 Unit 7b. Main Memory Organization

CENG3420 Lecture 08: Memory Organization

CMSC 611: Advanced. Interconnection Networks

Lecture 20: Memory Hierarchy Main Memory and Enhancing its Performance. Grinch-Like Stuff

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

Basic Organization Memory Cell Operation. CSCI 4717 Computer Architecture. ROM Uses. Random Access Memory. Semiconductor Memory Types

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses

Scaling Without Sharding. Baron Schwartz Percona Inc Surge 2010

NoCo: ILP-based Worst-Case Contention Estimation for Mesh Real-Time Manycores

Design of Embedded DSP Processors Unit 5: Data access. 9/11/2017 Unit 5 of TSEA H1 1

The MC9S12 address, data and control buses The MC9S12 single-chip mode memory map Simplified write/read cycle. Address, Data and Control Buses

«Computer Science» Requirements for applicants by Innopolis University

The S6000 Family of Processors

Atacama: An Open Experimental Platform for Mixed-Criticality Networking on Top of Ethernet

ReNoC: A Network-on-Chip Architecture with Reconfigurable Topology

Memory Systems IRAM. Principle of IRAM

FIST: A Fast, Lightweight, FPGA-Friendly Packet Latency Estimator for NoC Modeling in Full-System Simulations

Cross Clock-Domain TDM Virtual Circuits for Networks on Chips

The Nios II Family of Configurable Soft-core Processors

Interconnection Networks: Topology. Prof. Natalie Enright Jerger

machine cycle, the CPU: (a) Fetches an instruction, (b) Decodes the instruction, (c) Executes the instruction, and (d) Stores the result.

Module objectives. Integrated services. Support for real-time applications. Real-time flows and the current Internet protocols

Managing Memory for Timing Predictability. Rodolfo Pellizzoni

Lecture 3: Flow-Control

Memory Efficient Scheduling for Multicore Real-time Systems

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory

Overview of Potential Software solutions making multi-core processors predictable for Avionics real-time applications

Lecture 9: Bridging. CSE 123: Computer Networks Alex C. Snoeren

Caches. Hiding Memory Access Times

Memory. Lecture 22 CS301

Developing deterministic networking technology for railway applications using TTEthernet software-based end systems

TMS320C6678 Memory Access Performance

Fault Tolerant and Secure Architectures for On Chip Networks With Emerging Interconnect Technologies. Mohsin Y Ahmed Conlan Wesson

Worst Case Analysis of DRAM Latency in Hard Real Time Systems

Design & Implementation of AHB Interface for SOC Application

Hyperthreading Technology

Transcription:

Memory Architectures for NoC-Based Real-Time Mixed Criticality Systems Neil Audsley Real-Time Systems Group Computer Science Department University of York York United Kingdom 2011-12 1

Overview Motivation: Examine provision of memory hierarchy for MC Structure: 1. A Story 2. Memory Hierarchy 3. Colourful Digression 4. Issues / Requirements 5. Memory Tree for NoC 2

A Story. Data Good / Bad C P U C P U C P U C P U M E M O R Y Access to memory strictly TDMA Not allowed to compute application if dont have TDMA slot 3

Fundamental Trade-off Efficient Resource Usage v Physical Resource Separation Must maintain Safety Properties - stop bad things happening Hard for memory Issues of single fault (bit flip) causing system failure, correct MMUs Need performance Especially as we move to many / multi-core architectures 4

Memory Hierarchy PROCESSOR REGISTERS CACHE PHYSICAL MEMORY (RAM / DDR etc) Embedded Real-Time Systems SOLID STATE MEMORY (Non-volatile Flash-based memory) Increasing Latency Worst- Case Execu/on Time increasingly hard to calculate: - Caches - Shared memory - File system access 5

Memory and Mixed Criticality Sufficient physical partitioning to meet safety requirements Impact upon system architecture MC analysis - WCET up as criticality increases Exploit pessimism in modelling / analysis for WCET System architecture unchanged And / Or allow architecture to provide increased physical separation as criticality increases Increase conservatism - increase latencies / end-to-end memory access times 6

Intel Xeon Phi (Knights Landing) Regular structure Replicated unit Not suited to shared bus Many other examples 7

Network-on-Chip Architecture Network is mesh Arbiters / Routers Local connections between and arbiter Packet Switched Worm-hole routing prevalent local memory Often assume local memory big enough for all code / data no cache misses 8

Network-on-Chip Architecture - External Memory Memory transactions via contended network 9

Network-on-Chip Architecture - Memory Tree has connections to inter- mesh memory tree Memory requests are multiplexed through the tree End-to-end depends on arbitration policy at Full-duplex Supports cache / SPM operations reduces local memory sizes 10

Memory Tree Arbitration ALTERNATE policy Requests arriving one same clock cycle take turns Fixed arbitration pattern 11

Memory Tree Arbitration ALTERNATE policy Requests arriving one same clock cycle take turns Fixed arbitration pattern 12

Memory Tree Arbitration ALTERNATE policy Requests arriving one same clock cycle take turns Fixed arbitration pattern 13

Memory Tree Arbitration ALTERNATE policy Requests arriving one same clock cycle take turns Fixed arbitration pattern 14

Memory Tree Arbitration ALTERNATE policy Requests arriving one same clock cycle take turns Fixed arbitration pattern 15

Memory Tree Arbitration ALTERNATE policy Requests arriving one same clock cycle take turns Fixed arbitration pattern Latched in memory controller during: cycle 2 cycle 3 cycle 4 cycle 5 - exhibits WC 16

Memory Tree Arbitration PRIORITISE policy Set arbitration (dynamically) according to system requirement Eg. criticality High Medium Medium Low Secondary arbitration for equal criticality, eg. ALTERNATE Can be set dynamically eg. at context switch eg. per memory request 17

Memory Tree Arbitration PRIORITISE policy Set arbitration (dynamically) according to system requirement Eg. criticality High Medium Medium Low Danger of Priority Inversion 18

Memory Tree Arbitration PRIORITISE policy Set arbitration (dynamically) according to system requirement Eg. criticality High Medium Medium Low Danger of Priority Inversion 19

Memory Tree Arbitration PRIORITISE policy Set arbitration (dynamically) according to system requirement Eg. criticality High Medium Medium Low Danger of Priority Inversion 20

Memory Tree Arbitration PRIORITISE policy Set arbitration (dynamically) according to system requirement Eg. criticality High Medium Medium Low Danger of Priority Inversion 21

Memory Tree Arbitration PRIORITISE policy Set arbitration (dynamically) according to system requirement Eg. criticality High Medium Medium Low Danger of Priority Inversion Investigating priority inheritance across a 22

Memory Tree Timing - Worst-Case Worst-case 2 cycles to cross each from to Delay at external memory controller & physical 1 cycle to cross each from to End-to-end depends on arbitration policy at & scheduling defines blocking time at each ALTERNATE - max 1 cycle per on request 23

Memory Tree Timing - Worst-Case blocked on a cache miss typical characteristic of s limits contending memory transactions SPM loads must complete before next can be issued ie. blocking 24

Memory Tree Timing - Worst-Case Burst requests supported One memory request from converted into a number of sequential accesses at memory controller Worst-case time in for sequential requests much better than random Currently 1/2/4/8 supported 25

Memory Tree Timing - Worst-Case Burst requests supported One memory request from converted into a number of sequential accesses at memory controller Worst-case time in for sequential requests much better than random Currently 1/2/4/8 supported Bandwidth control within memory memory controller (TUE) Effectively multiple channels / bandwidths can be set up Currently max 4 Note still one channel between memory controller and 26

Memory Tree Timing - Worst-Case Dual port cache provided between and Allow and memory tree to access local memory at same time SPM Cache Control Being extended to limit bandwidth of request issued by Mechanism can also be used to limit effect of babbling idiot Data Cache Instru ction Cache Server0/1 Interface Bluetree Multiplexer Cache Data Path DLMB FSL1 Xilinx MicroBlaze ILMB External Memory Bus FSL0 Bluetiles Router Control Path Home Interface FIFO Buffer 27

16 Mesh, tree in middle 28

NoC & Overlayed Memory Mesh 29

NoC & Overlayed Memory Mesh 30

Summary Mixed criticality memory systems based on predictable memory systems (physical separation) Can support: Per-latency analysis Bandwidth analysis - (current work) more amenable to improved average case 31

Summary Mixed criticality memory systems based on predictable memory systems (physical separation) Can support: Per-latency analysis Bandwidth analysis - (current work) more amenable to improved average case Mixed Criticality Systems: a safety-critical system with a high performance average case system trying to get out? 32