Predictable hardware: The AURIX Microcontroller Family

Similar documents
System Performance Optimization Methodology for Infineon's 32-Bit Automotive Microcontroller Architecture

Timing analysis and timing predictability

Timing Anomalies Reloaded

32-Bit TC1791. Data Sheet. Microcontrollers. Microcontroller. 32-Bit Single-Chip Microcontroller V

Quick Reference Card. Timing and Stack Verifiers Supported Platforms. SCADE Suite 6.3

Handling Challenges of Multi-Core Technology in Automotive Software Engineering

Enabling safer embedded systems

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

systems such as Linux (real time application interface Linux included). The unified 32-

Measurement Solution for new Radar Microcontroller V

Copyright 2016 Xilinx

Chapter 5. Introduction ARM Cortex series

ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7

A Fast Powertrain Microcontroller. Erik Norden, Patrick Leteinturier, Jens Barrenscheen, Klaus Scheibert, Frank Hellwig. Steering.

Current and Prospective High-speed Measurement Systems

TRACE32. Product Overview

_ V ST STM8 Family On-Chip Emulation. Contents. Technical Notes

Precise Continuous Non-Intrusive Measurement-Based Execution Time Estimation. Boris Dreyer, Christian Hochberger, Simon Wegener, Alexander Weiss

STM32 MICROCONTROLLER

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

32-Bit TC1793. Data Sheet. Microcontrollers. Microcontroller. 32-Bit Single-Chip Microcontroller V

Modern Computer Architecture. Lecture 12 embedded Applications, classical DSP, automotive (Tricore)

The Challenges of System Design. Raising Performance and Reducing Power Consumption

32-Bit TC1798. Data Sheet. Microcontrollers. Microcontroller. 32-Bit Single-Chip Microcontroller V

Taking the Right Turn with Safe and Modular Solutions for the Automotive Industry

AVR XMEGA Product Line Introduction AVR XMEGA TM. Product Introduction.

ARM Processors for Embedded Applications

SPC58NE84E7, SPC58NE84C3

From Hardware Trace to. System Knowledge

A Seamless Tool Access Architecture from ESL to End Product

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 18 Multicore Computers

Migration Path. from TC1767 to TC1782. May 2010 V1.2

Predictability Considerations in the Design of Multi-Core Embedded Systems

32-Bit TC1767. Data Sheet. Microcontrollers. 32-Bit Single-Chip Microcontroller V

AstréeA From Research To Industry

SPC584C80C3, SPC58EC80C3

Achieving Predictable Multicore Execution of Automotive Applications Using the LET Paradigm

BlueBox IOM Accessories

ECE 551 System on Chip Design

NEWS 2017 CONTENTS HYPERVISOR. Seamless debugging through all software layers. English Edition

Hercules ARM Cortex -R4 System Architecture. Processor Overview

Interconnects, Memory, GPIO

Main Points of the Computer Organization and System Software Module

Functional Safety on Multicore Microcontrollers for Industrial Applications

ECE 3055: Final Exam

Operating System Review Part

SPC584Cx, SPC58ECx. 32-bit Power Architecture microcontroller for automotive ASIL-B applications. Features

Support for RISC-V. Lauterbach GmbH. Bob Kupyn Lauterbach Markus Goehrle - Lauterbach GmbH

Hardware Design. MicroBlaze 7.1. This material exempt per Department of Commerce license exception TSU Xilinx, Inc. All Rights Reserved

Nitro240/260 CPU Board Scalable 680x0 VME board for I/O intensive applications

Chapter 15 ARM Architecture, Programming and Development Tools

On the Use of Context Information for Precise Measurement-Based Execution Time Estimation

Next Generation Multi-Purpose Microprocessor

An introduction to MTV

Scope-based Method Cache Analysis

Embedded Systems: Hardware Components (part II) Todor Stefanov

Designing with ALTERA SoC Hardware

Kinetis Software Optimization

The RM9150 and the Fast Device Bus High Speed Interconnect

ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS

CEC 450 Real-Time Systems

AURIX After-Lunch-Seminar Performance meets Safety. Safety & Security with professional Software-Components. Björn Assmann (Hitex GmbH)

The Nios II Family of Configurable Soft-core Processors

Isolation of Cores. Reduce costs of mixed-critical systems by using a divide-and-conquer startegy on core level

BlueBox On-Chip Analyzers

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006

Infineon DAP Active Probe

Freescale i.mx6 Architecture

Memory Hierarchy. Slides contents from:

Embedded Systems. 7. System Components

Memory Hierarchy. Slides contents from:

FPQ6 - MPC8313E implementation

Cyber security mechanisms for connected vehicles

A Seamless Tool Access Architecture from ESL to End Product. Albrecht Mayer (Infineon Microcontrollers) S4D Conference Sophia Antipolis, Sept.

Design and Analysis of Time-Critical Systems Introduction

NXP Unveils Its First ARM Cortex -M4 Based Controller Family

Simplifying the Development and Debug of 8572-Based SMP Embedded Systems. Wind River Workbench Development Tools

AVR XMEGA TM. A New Reference for 8/16-bit Microcontrollers. Ingar Fredriksen AVR Product Marketing Director

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola

The CoreConnect Bus Architecture

ARM CORTEX-R52. Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture.

New STM32 F7 Series. World s 1 st to market, ARM Cortex -M7 based 32-bit MCU

L2 - C language for Embedded MCUs

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory

A 1-GHz Configurable Processor Core MeP-h1

Test and Verification Solutions. ARM Based SOC Design and Verification

STM32: Peripherals. Alberto Bosio November 29, Univeristé de Montpellier

RZ Embedded Microprocessors

Age nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications

Precise and Efficient FIFO-Replacement Analysis Based on Static Phase Detection

Remote Keyless Entry In a Body Controller Unit Application

Virtual Hardware ECU How to Significantly Increase Your Testing Throughput!

AMDC 2017 Liviona Multi-Core in Automotive Powertrain and Next Steps Towards Parallelization

Course Introduction. Purpose: Objectives: Content: Learning Time:

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

Department of Computer Science, Institute for System Architecture, Operating Systems Group. Real-Time Systems '08 / '09. Hardware.

The ARM Cortex-A9 Processors

Computer Science Window-Constrained Process Scheduling for Linux Systems

Computer Hardware Requirements for ERTSs: Microprocessors & Microcontrollers

Transcription:

Predictable hardware: The AURIX Microcontroller Family Worst-Case Execution Time Analysis WCET 2013, July 9, 2013, Paris, France Jens Harnisch (Jens.Harnisch@Infineon.com), Infineon Technologies AG, Automotive Microcontroller, Application and Concept Engineering Page 1

Outline Challenges for Automotive Microcontrollers Power train application characteristics AURIX: WCET related system features AURIX: WCET related core features Multi-Core Software Considerations Tracing Options Page 2

Automotive Microcontroller The Challenge Performance Safety & Security Microcontroller Concept Cost Power Consumption Keep Constraints for Hard Real Time Support: Predictability Keep latency constraints (e.g. for interrupts) Non intrusive trace support Page 3

Characteristics of typical applications running on AURIX (power train) Static assignment of tasks to cores, separated schedulers (AMP) Time and event driven components (event driven components handled via preemption) Roughly every 7 th instruction is a branch Floating point instructions: varying, depends on code generation Data locality: Data (except for LUTs) fits usually into DSPR, having 0 clocks latency Look up tables are frequently used Instruction level parallelism: 0.8 is often a good result; with high optimization more than 1 instruction per cycle is feasible ( TC has 3 pipelines) Page 4

Ethernet FlexRay MultiCAN+ MSC ASCLIN QSPI SENT PSI5(S) I²C FCE IOM CCU6x GPT12x STM SCU BCU HSSL Confidential AURIX Multi-Core Controller: WCET related system features Checker Core Highly predictable architecture with duplicated resources (local memories, crossbar) to avoid resource conflicts Program Flash Data Flash Crossbar Sys RAM FFT Bridge SDMA Priority driven and round robin arbitration for masters attached to crossbar GTM HSM DS-ADC Starvation protection in crossbar Peripheral Bus ADC SCR No cache coherence EVR 5V or 3.3V single supply Dedicated and scalable communication instructions 10/04/2013 Copyright Infineon Technologies 2013. 2011. All rights reserved. Page 5

1.6E and : WCET related features 1.6Efficiency erformance F D E1 E2 F1 F2 PD D E1 E2 PD D E1 E2 PD D E1 E2 Scalar Harvard Superscalar Harvard Common 1.6 Instruction Set One pipeline only for high power efficiency 4 pipeline stages for up to 200MHz Power 0.2mW/MHz 1.2 1.4 PS/MHz Superscalar: integer, load store and loop pipeline 6 pipeline stages for up to 300MHz Power 0.3mW/MHz 1.6 2.3 PS/MHz Caches: 2 way LRU Easy to describe RAW and structural dependencies No support of simultaneous multithreading 4 stage data store buffer Page 6

Support for hard real time systems Support for high average case performance usually contradicts high predictability Static timing analysis: precision of results and efficiency of analysis strongly depend on hardware architecture features Three classes of hardware architectures [1]: Fully timing compositional architectures: no timing anomalies Compositional architectures with constant-bounded effects: timing anomalies without domino effects -> is assumed to belong to this class Non-compositional architectures [1] Predictability Considerations in the Design of Multi-Core Embedded Systems. C. Cullmann, C. Ferdinand, G. Gebhard, D. Grund, C. Maiza, J. Reineke, B. Triquet, S. Wegener, and R. Wilhelm. Ingénieurs de l'automobile, 807, 2010. Page 7

Design Guidelines for predictable multicore architectures Established by C. Cullmann, C. Ferdinand, G. Gebhard, D. Grund, C. Maiza, J. Reineke, B. Triquet, S. Wegener and R. Wilhelm [1] AURIX 1. Fully timing compositional architecture (with bounds) 2. Disjoint instruction and data caches 3. Caches with LRU replacement policy 4. A shared bus protocol with bounded access delay 5. Private caches 6. Private memories, or, only share lonely resources [1] Predictability Considerations in the Design of Multi-Core Embedded Systems. C. Cullmann, C. Ferdinand, G. Gebhard, D. Grund, C. Maiza, J. Reineke, B. Triquet, S. Wegener, and R. Wilhelm. Ingénieurs de l'automobile, 807, 2010. Page 8

Some software mapping examples Interrupt driven tasks on one core, time driven tasks on other(s) Split all tasks across cores to balance performance With and without OS (computationally intensive functionality) Split by ASIL level Split by tier 1 / OEM Important: Differentiate Overlay LMU RAM Data Flash BROM Key Flash SRI Cross Bar PMU Progr. Flash Progr. Flash Progr. Flash PMU Progr. Flash between your(!) optimization objectives (metrics) and constraints Checker Core Overlay Checker Core 1.6E Overlay Standby Examples for both optimization objectives and constraints : Core load, memory balance, specific task latencies, end-to-end latencies Page 9

What does mapping of software mean? To map: functionality to cores, code and data to memories! To use: scheduling and communication (AUTOSAR mechanisms and/or direct mechanisms). Major mapping criteria: Core performance, support for lock step, memory access latencies and sizes, access conflicts Summary for the software developer: Mapping of functionality often constrained by performance of cores, lock step and hence is not so complex Mapping of code and data: Up to 9(!) different memory locations to consider: possibly complex RAM PMU PMU SRI Cross Bar Checker Core Checker Core 1.6E Page 10

Real life example: Race condition and deadlock after few micro seconds! RAM core 0 ready core 1 ready core 2 ready not used SRI Cross Bar 1.6E 3 bits of a 32 bit variable used to synchronize all three cores, located in RAM and cached in all 3 s Page 11

Real life example: Details 3 bits of a 32 bit variable used in RAM to synchronize all 3 cores, after synchronization cores should proceed independently. Atomic operation applied to 32 bit variable to modify ready bit for each core. Because the synchronization may often be necessary, the developer decided to cache the variable. However, atomicity refers only to physical memory location (in this case the cache, and not LMU RAM)! Atomicity was wanted, but not really guaranteed, hence a race condition was provoked! Incorrect status of variable (due to race condition) led to deadlock. Incorrect solution of the developer: change timing via cross bar priorities -> software was executing without deadlock, but was still incorrect! Correct solution: Do not cache globally used variables! RAM SRI Cross Bar 1.6E Page 12

Trace, Measure and Calibrate in Target System with Emulation Devices (EDs) TC1798ED Product Chip Part (SoC) Cerberus SoC Shared Bus I/F (DMA) DAP/ JTAG MLI0 SBCU PCP2 SRI XBAR TC1.6 LMU EEC EMLI BOB POB BOB POB Message Sequencer MCX DMC MCDS IOC32 Emulation Memory EMEM (768 KB SRAM) EEC Back Bone Bus BBB EBCU Tracing as non-intrusive technology (with regards to timing) gains importance for multi-core software development EDs: Same package, no additional pins needed Calibration memory with same access speed as flash ED_block_diagram.vsd Page 13

Flow for using MCDS GBits Target Trace Qualification Execution Control Establish test case Configure trigger, event, action Trace Message Generate only relevant information Trace Storage Save messages chronologically Trace Reconstruction Interpolate history mhz Debug Tool Present in human readable form Page 14

Summary Reaching higher performance with more cores rather than with core and memory hierarchy optimizations might simplify WCET, if there is little interference between the cores Powerful protection and tracing mechanisms will be needed to assure non-interference or at least assess the degree of interference AURIX has a focus on protection mechanisms to avoid interference (where not needed) and powerful tracing options to assess interference (where unavoidable) Page 15

Timing and Performance Analysis Partners, Acknowledgements Academia Partners Prof. Dr. Bernhard Bauer, Augsburg University Prof. Dr. Heiko Falk, Ulm University Industry Partners AbsInt Angewandte Informatik GmbH, Gliwa GmbH, Symtavison GmbH, Timing Architects GmbH Debugger Vendors with Trace support: isystem AG für Informatiksysteme, Lauterbach GmbH, PLS Programmierbare Logik & Systeme GmbH Acknowledgements Prof. Dr. Claire Maiza (Grenoble INP/ Ensimag) Reinhard Deml, Frank Hellwig, Pawel Jewstafjew, Dr. Albrecht Mayer (Infineon) Page 16

Thank you for the attention Page 17