Mastering The Behavior of Multi-Core Systems to Match Avionics Requirements

Similar documents
Overview of Potential Software solutions making multi-core processors predictable for Avionics real-time applications

Integration of Mixed Criticality Systems on MultiCores: Limitations, Challenges and Way ahead for Avionics

AUTOBEST: A microkernel-based system (not only) for automotive applications. Marc Bommert, Alexander Züpke, Robert Kaiser.

Applying MILS to multicore avionics systems

Design and Analysis of Time-Critical Systems Introduction

Managing Memory for Timing Predictability. Rodolfo Pellizzoni

Evaluating Multicore Architectures for Application in High Assurance Systems

RAMP-White / FAST-MP

CNES requirements w.r.t. Next Generation General Purpose Microprocessor

Challenges in Future Avionic Systems on Multi-core Platforms

Ensuring Schedulability of Spacecraft Flight Software

Distributed IMA with TTEthernet

Copyright 2016 Xilinx

Towards AADL to SystemC mapping for partitioned systems. Etienne Borde Laurent Pautet Marc Gatti

Communication Patterns in Safety Critical Systems for ADAS & Autonomous Vehicles Thorsten Wilmer Tech AD Berlin, 5. March 2018

An introduction to SDRAM and memory controllers. 5kk73

Memory Hierarchy. Slides contents from:

Freescale QorIQ Program Overview

On-Chip Debugging of Multicore Systems

Roadrunner. By Diana Lleva Julissa Campos Justina Tandar

AUTOBEST: A United AUTOSAR-OS And ARINC 653 Kernel. Alexander Züpke, Marc Bommert, Daniel Lohmann

Worst Case Analysis of DRAM Latency in Multi-Requestor Systems. Zheng Pei Wu Yogen Krish Rodolfo Pellizzoni

MultiChipSat: an Innovative Spacecraft Bus Architecture. Alvar Saenz-Otero

ESE532: System-on-a-Chip Architecture. Today. Message. Real Time. Real-Time Tasks. Real-Time Guarantees. Real Time Demands Challenges

Challenges for Next Generation Networking AMP Series

Memory Hierarchy. Slides contents from:

Context. Giorgio Buttazzo. Scuola Superiore Sant Anna. Embedded systems are becoming more complex every day: more functions. higher performance

Context. Hardware Performance. Increasing complexity. Software Complexity. And the Result is. Embedded systems are becoming more complex every day:

History. PowerPC based micro-architectures. PowerPC ISA. Introduction

HEAD HardwarE Accelerated Deduplication

Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology

REDUCING CERTIFICATION GRANULARITY TO INCREASE ADAPTABILITY OF AVIONICS SOFTWARE

Optimizing Data Sharing and Address Translation for the Cell BE Heterogeneous CMP

Timing analysis and timing predictability

T1042-based Single Board Computer

Leveraging OpenSPARC. ESA Round Table 2006 on Next Generation Microprocessors for Space Applications EDD

XPU A Programmable FPGA Accelerator for Diverse Workloads

Challenges of FSW Schedulability on Multicore Processors

Last Time. Making correct concurrent programs. Maintaining invariants Avoiding deadlocks

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

The Nios II Family of Configurable Soft-core Processors

Embedded Systems: Hardware Components (part II) Todor Stefanov

Multiprocessors and Thread Level Parallelism Chapter 4, Appendix H CS448. The Greed for Speed

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Simplify System Complexity

CellSs Making it easier to program the Cell Broadband Engine processor

CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation

Atacama: An Open Experimental Platform for Mixed-Criticality Networking on Top of Ethernet

Multicore for safety-critical embedded systems: challenges andmarch opportunities 15, / 28

Controlling Execution Time Variability Using COTS for Safety Critical Systems

Computer Systems Architecture I. CSE 560M Lecture 19 Prof. Patrick Crowley

Real-Time Cache Management for Multi-Core Virtualization

CSE502: Computer Architecture CSE 502: Computer Architecture

Applying Multi-core and Virtualization to Industrial and Safety-Related Applications

System Impact of Distributed Multicore Systems December 5th 2012

High Performance Computing: Blue-Gene and Road Runner. Ravi Patel

Chapter 5. Introduction ARM Cortex series

How to Write Fast Code , spring th Lecture, Mar. 31 st

Hardware-Software Codesign. 1. Introduction

Computer Architecture!

Simplify System Complexity

VLSI Design of Multichannel AMBA AHB

CONTENTION IN MULTICORE HARDWARE SHARED RESOURCES: UNDERSTANDING OF THE STATE OF THE ART

Performance comparison between a massive SMP machine and clusters

Multithreaded Processors. Department of Electrical Engineering Stanford University

RAD55xx Platform SoC. Dean Saridakis, Richard Berger, Joseph Marshall *** *** *** *** *** *** *** photo courtesy of NASA

Multithreading: Exploiting Thread-Level Parallelism within a Processor

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers

A Predictable Simultaneous Multithreading Scheme for Hard Real-Time

Hardware-Software Codesign. 1. Introduction

SUCCESSFULL MULTICORE CERTIFICATION WITH SOFTWARE-PARTITIONING Efficient Implementation for DO-178C, EN 50128, ISO 26262

Design and Analysis of Real-Time Systems Predictability and Predictable Microarchitectures

BlueVisor: A Scalable Real-time Hardware Hypervisor for Many-core Embedded System

SMD149 - Operating Systems - Multiprocessing

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Gedae cwcembedded.com. The CHAMP-AV6 VPX-REDI. Digital Signal Processing Card. Maximizing Performance with Minimal Porting Effort

Intel CoFluent Studio in Digital Imaging

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013

ECE 172 Digital Systems. Chapter 15 Turbo Boost Technology. Herbert G. Mayer, PSU Status 8/13/2018

Using a Hypervisor to Manage Multi-OS Systems Cory Bialowas, Product Manager

M7: Next Generation SPARC. Hotchips 26 August 12, Stephen Phillips Senior Director, SPARC Architecture Oracle

Real-Time Mixed-Criticality Wormhole Networks

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Design and Analysis of Time-Critical Systems Timing Predictability and Analyzability + Case Studies: PTARM and Kalray MPPA-256

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)

High-Performance Real-Time Lab (HiPeRT) Marko Bertogna University of Modena, Italy

POWER7: IBM's Next Generation Server Processor

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Mobile Processors. Jose R. Ortiz Ubarri

Computer Architecture

Copyright 2012, Elsevier Inc. All rights reserved.

Exploration of Cache Coherent CPU- FPGA Heterogeneous System

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

BREAKING THE MEMORY WALL

Time-Triggered Ethernet

Computer Architecture!

HPC Architectures. Types of resource currently in use

A novel way to efficiently simulate complex full systems incorporating hardware accelerators

Computer Architecture!

Computer Architecture Crash course

Transcription:

www.thalesgroup.com Mastering The Behavior of Multi-Core Systems to Match Avionics Requirements Hicham AGROU, Marc GATTI, Pascal SAINRAT, Patrice TOILLON {hicham.agrou,marc-j.gatti, patrice.toillon}@fr.thalesgroup.com {agrou,sainrat}@irit.fr

Summary Introduction

Summary Introduction Multi-Core Architectures State of Art Academic Processor, COTS Architecture,

Summary Introduction Multi-Core Architectures State of Art Academic Processor, COTS Architecture, Evaluation of QorIQ Platform (P4080) from Freescale Procedures Results

Summary Introduction Multi-Core Architectures State of Art Academic Processor, COTS Architecture, Evaluation of QorIQ Platform (P4080) from Freescale Procedures Results THALES Avionics AMISIS Concept First performance Results

INTRODUCTION

Evolution of Avionic Embedded Computers

Pre-IMA Generation Evolution of Avionic Embedded Computers

Evolution of Avionic Embedded Computers 90 s A330/A340 1 unit = 1 function Intel, DSP 80 s A300/A320/B737 1 unit = 1 function Intel, 68010 Pre-IMA Generation 70 s Concorde 1 unit = 1 function Analog only

IMA Generation Evolution of Avionic Embedded Computers

Evolution of Avionic Embedded Computers 2000/2010 0/2010 A380/B7877 3 to 5 functions/unit PowerPC, A653+RTOS Generalization of IMA PowerPC ISA s Legacy IMA Generation ~1995 B777 2 to 3 functions/unit AMD29050 1st Generation of IMA

Evolution of Avionic Embedded Computers Next generation: IMA on Multicore? 10 or more functions/unit IMA Generation

Current Avionics requirements Safety Certification Determinism Level Failure Condition Failure Rate A Catastrophic <1 in 10 9 hours of flight B Hazardous <1 in 10 7 hours of flight C Major <1 in 10 5 hours of flight D Minor <1 in 10 3 hours of flight No Effect E Partitioning Spatial Time & Space isolation Temporal Application #1 Application #2 Application Application #N Application Programing Interface (ARINC 653) Communication, synchronisation services Time, fault, and task management Operating System Layer (ARINC 653) Partition scheduling Package Driver Processor Hardware

Future Avionics Requirements Increase Performance Host more functions per unit Improvement ratio Performance / Watts Reduce Environmental Footprint Less energy consumption Reduce number of units

Future Avionics Requirements Increase Performance Host more functions per unit Improvement ratio Performance / Watts Reduce Environmental Footprint Less energy consumption Reduce number of units Smaller Modules More embedded functions per chip MULTI-CORE seems to be the solution

Academic & COTS Architectures MULTI-CORE ARCHITECTURES STATE OF ART

Multi-Core Architectures State of The Art Academic Predictable Multi-core Processor: MERASA & PRET processors COTS architecture: IBM Cell, Freescale s MPC8641D & QorIQ Platform Local memories (scratchpads & caches): Best cache policy (for analyzability), Cache Analysis (optimization to reduce cache pollution), Shared Cache Strategy to reduce interferences Interconnect Element: Shared Bus (bounding access time), Ring protocols, CoreNet, & Data Path Accelerator Architecture Hicham Agrou, Marc Gatti, Pascal Sainrat, Patrice Toillon. A Design Approach for Predictable and Efficient Multi-Core for Avionics. In: Digital Avionics Systems Conference (DASC 2011), Seattle, 16/10/2011-20/10/2011, Vol. 7D3, IEEE, p. 1-11; October 2011.

Multi-Core Architectures State of The Art Academic Predictable Multi-core Processor: MERASA & PRET processors COTS architecture: IBM Cell, Freescale s MPC8641D & QorIQ Platform Local memories (scratchpads & caches): Best cache policy (for analyzability), Cache Analysis (optimization to reduce cache pollution), Shared Cache Strategy to reduce interferences Interconnect Element: Shared Bus (bounding access time), Ring protocols, CoreNet, & Data Path Accelerator Architecture Lack of studies for multicore architecture in avionics Focus on core level & local memory evolutions At interconnect level, no partitioning warranty is given

Multi-Core Architectures State of The Art Academic Predictable Multi-core Processor: MERASA & PRET processors COTS architecture: IBM Cell, Freescale s MPC8641D & QorIQ Platform Local memories (scratchpads & caches): Best cache policy (for analyzability), Cache Analysis (optimization to reduce cache pollution), Shared Cache Strategy to reduce interferences Interconnect Element: Shared Bus (bounding access time), Ring protocols, CoreNet, & Data Path Accelerator Architecture Our approach is to focus on a smart interconnect to manage all transactions in a multi-core system Lack of studies for multicore architecture in avionics Focus on core level & local memory evolutions At interconnect level, no partitioning warranty is given

Procedure & Results EVALUATION OF A QORIQ PLATFORM (P4080)

Evaluation of a QorIQ Platform (P4080) from Freescale Objective Definition of usage profiles compatible with temporal aspects of avionics constraints Procedure To find thresholds of transaction density beyond which CoreNet TM introduces low-performance or/and an abnormal behavior A(0,1) A(x,n) Our Procedure A(0) A(n) OS(0) OS(x) OS OR Hypervisor (s) Stress Application HW HW HW SMP Configuration AMP or AMP+SMP Configuration Bare-metal Configuration

P4080 Processor : test perimeter P4080 DS Cores : 1,2 GHz DDR3 # 1 :1200 MHz P4080 CoreNet : 600 MHz

Procedure This core performs a transaction and measures its duration

Procedure > The Implemented Transaction Initiators DMA Controllers Flooding Cores Each measure is in AREA 1 of DDR 3

Test Perimeter > The Implemented Memories

Procedure

Procedure > Platform s Initialisation 1

Procedure > Transactions of Flooding Cores Flooding Core 2

Procedure > Direct Memory Accesses 3 2

Procedure > Transaction of the Witness 3 2 4

Procedure > Storage of the Transaction Duration 5 If 0 Flooding Core in step 2

Procedure > Storage of the Transaction Duration 5 If 1 Flooding Core in step 2

Procedure > Storage of the Transaction Duration 5 If 2 Flooding Cores in step 2

Procedure > Storage of the Transaction Duration 5 If 3 Flooding Cores in step 2

Procedure > Storage of the Transaction Duration 5 If 4 Flooding Cores in step 2

Procedure > Storage of the Transaction Duration 5 If 5 Flooding Cores in step 2

Procedure > Storage of the Transaction Duration 5 If 6 Flooding Cores in step 2

Procedure > Storage of the Transaction Duration 5 If 7 Flooding Cores in step 2

Scenario 1

Scenario 1 Similar results when testing With no/1/2 DMA controller(s) 1 to 8 active cores Parameters 512 sizes of transaction (witness core) have been tested ~ 16 hours of test for each scenario Each test is repeated ~ 10 000 times

Scenario 1

Scenario 2

Scenario 1 & 2 > Results Scenario 1 Scenario 2

Scenario 1 & 2 > Results Scenario 1 Scenario 2 These results show that DMAs in DDR3 (memory controller 1) can increase transaction latency of the witness core in CPC 1

Scenario 3 DMA Load & Store transactions

Scenario 3 > Results

Scenario 3 > Results Several transaction durations of 7-core configuration are not saved

Scenario 4 All transaction initiators perform their transactions in DDR3 (AREA 2)

Scenario 4 > Results

Scenario 4 > Results All transaction durations are saved in DDR 3 (memory controller 1) and their values increase

Scenario 5 All transaction initiators perform their transactions in DDR3 s AREA 1

Scenario 5 > Results

Scenario 5 > Results Several transaction durations are not saved in DDR3 and their value have globally decreased

Scenario 6

Scenario 6

Scenario 6 : Delaying the backup of each measure 0 µs 3,3 µs 4,3 µs 5 µs

Scenario 6 : Delaying the measure storage 0 µs 3,3 µs 4,3 µs 5 µs Delaying the measure storage into AREA 1 (DDR3 of memory controller 1) makes the phenomenon disappear

Concept, Features & First Performance Results THALES AVIONICS AMISIS

AMISIS Architecture Avionics Multi-core Interconnect for Scalable Integrated System Concept Mastering temporal and spatial behavior of each transaction initiator, Ensuring that each transaction initiator will respect an insertion contract Implementation of Hardware Services Maximal Transaction Delay Measure(Max-TDM) t t Maximal Technology Transaction Delay Minimal Technology Transaction Delay Minimal Transaction Delay Measure (Min-TDM) COTS Black Box Approach t CUSTOM White Box Approach

AMISIS Architecture Avionics Multi-core Interconnect for Scalable Integrated System Concept Mastering temporal and spatial behavior of each transaction initiator, Ensuring that each transaction initiator will respect an insertion contract Implementing Hardware Services Experimentation s Implementation Objective : Definition of temporal impacts of AMISIS in transaction durations Procedure : Design of AMISIS units in FPGA Measure of AMISIS units temporal impacts

First Performance Results of AMISIS AMISIS 125 MHz Memory 400 MHz Load 08 Load 16 Load 32

First Performance Results of AMISIS 1 cycle 2 cycles AMISIS 125 MHz Memory 400 MHz

First Performance Results of AMISIS AMISIS 125 MHz Memory 400 MHz

First Performance Results of AMISIS AMISIS 125 MHz Memory 400 MHz 1 cycle 2 cycles

CONCLUSION

Conclusion Processing Elements use embedded hardware features that worsen Worst Case Execution Time Estimation

Conclusion Processing Elements use embedded hardware features that worsen Worst Case Execution Time Estimation

Conclusion Processing Elements use embedded hardware features that worsen Worst Case Execution Time Estimation Dynamic Branch Prediction Out-Of-Order Speculative Execution Larger Pipelines More Ways Policy Replacement : PLRU, FIFO, Round Robin

Conclusion Processing Elements use embedded hardware features that worsen Worst Case Execution Time Estimation Mastered Dynamic Branch Prediction Out-Of-Order Speculative Execution Larger Pipelines More Ways Policy Replacement : PLRU, FIFO, Round Robin

Conclusion Processing Elements use embedded hardware features that worsen Worst Case Execution Time Estimation Multi-core context implies concurrent accesses that require management to ensure proper spatial and temporal partitioning

Conclusion Processing Elements use embedded hardware features that worsen Worst Case Execution Time Estimation Multi-core context implies concurrent accesses that require management to ensure proper spatial and temporal partitioning Scenario 2 shows that increasing the number of shared resources sometimes increases transaction durations Scenario 1 Scenario 2

Conclusion Processing Elements use embedded hardware features that worsen Worst Case Execution Time Estimation Multi-core context implies concurrent accesses that require management to ensure proper spatial and temporal partitioning Scenario 2 shows that increasing the number of shared resources sometimes increases transaction durations Scenario 3 and 5 show that unexpected phenomena may appear

Conclusion Processing Elements use embedded hardware features that worsen Worst Case Execution Time Estimation Multi-core context implies concurrent accesses that require management to ensure proper spatial and temporal partitioning Scenario 2 shows that increasing the number of shared resources sometimes increases transaction durations Scenario 3 and 5 show that unexpected phenomena may appear First performance results of THALES Avionics AMISIS show that the temporal impact of its controls is negligible

Thank You for Your Attention