Towards Transactional Memory for Safety-Critical Embedded Systems

Similar documents
Enhancing Real-Time Behaviour of Parallel Applications using Intel TSX

Commercial-Off-the-shelf Hardware Transactional Memory for Tolerating Transient Hardware Errors

Fault-Tolerant Execution on COTS Multi-core Processors with Hardware Transactional Memory Support

Chapter 39: Concepts of Time-Triggered Communication. Wenbo Qiao

Transactional Memory for Dependable Embedded Systems

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior

Context. Hardware Performance. Increasing complexity. Software Complexity. And the Result is. Embedded systems are becoming more complex every day:

Context. Giorgio Buttazzo. Scuola Superiore Sant Anna. Embedded systems are becoming more complex every day: more functions. higher performance

Architectural Time-predictability Factor (ATF) to Measure Architectural Time Predictability

Atacama: An Open Experimental Platform for Mixed-Criticality Networking on Top of Ethernet

FAULT TOLERANT SYSTEMS

ARTIST-Relevant Research from Linköping

Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all

A Multi-Modal Composability Framework for Cyber-Physical Systems

1. Introduction. 1 Multi-Core Execution of Hard Real-Time Applications Supporting Analysability. This research is partially funded by the

FAULT TOLERANT SYSTEMS

An Encapsulated Communication System for Integrated Architectures

Ensuring Schedulability of Spacecraft Flight Software

A Byzantine Fault-Tolerant Key-Value Store for Safety-Critical Distributed Real-Time Systems

CDA 5140 Software Fault-tolerance. - however, reliability of the overall system is actually a product of the hardware, software, and human reliability

D 8.4 Workshop Report

FIT: A Distributed Database Performance Tradeoff. Faleiro and Abadi CS590-BDS Thamir Qadah

parmerasa Dissemination Event Address of Welcome

Distributed Embedded Systems and realtime networks

Safety and Reliability of Software-Controlled Systems Part 14: Fault mitigation

ARCHITECTURE DESIGN FOR SOFT ERRORS

Page 1 FAULT TOLERANT SYSTEMS. Coordinated Checkpointing. Time-Based Synchronization. A Coordinated Checkpointing Algorithm

A Server-based Approach for Predictable GPU Access Control

Distributed Systems

Operating Systems, Concurrency and Time. real-time communication and CAN. Johan Lukkien

ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability. Copyright 2010 Daniel J. Sorin Duke University

FAULT TOLERANT SYSTEMS

FaulTM: Fault-Tolerance Using Hardware Transactional Memory

Eliminating Single Points of Failure in Software Based Redundancy

Real-Time Component Software. slide credits: H. Kopetz, P. Puschner

Communication Patterns in Safety Critical Systems for ADAS & Autonomous Vehicles Thorsten Wilmer Tech AD Berlin, 5. March 2018

CHAPTER 3 RECOVERY & CONCURRENCY ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

Design and Analysis of Time-Critical Systems Introduction

DISTRIBUTED REAL-TIME SYSTEMS

Memory Architectures for NoC-Based Real-Time Mixed Criticality Systems

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation

I/O CANNOT BE IGNORED

Overview of Potential Software solutions making multi-core processors predictable for Avionics real-time applications

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

FlexRay International Workshop. Protocol Overview

Fault tolerance with transactions: past, present and future. Dr Mark Little Technical Development Manager, Red Hat

Predicting Program Phases and Defending against Side-Channel Attacks using Hardware Performance Counters

A Pattern-supported Parallelization Approach

A Statically Scheduled Time- Division-Multiplexed Networkon-Chip for Real-Time Systems

RTC: Language Support for Real-Time Concurrency

SpaceWire-RT Project and Baseline Concepts

TU Wien. Fault Isolation and Error Containment in the TT-SoC. H. Kopetz. TU Wien. July 2007

Fault Tolerance. Distributed Systems IT332

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer?

Exam 2 Review. Fall 2011

AUTOBEST: A microkernel-based system (not only) for automotive applications. Marc Bommert, Alexander Züpke, Robert Kaiser.

ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors

Overview. Prerequisites. VMware vsphere 6.5 Optimize, Upgrade, Troubleshoot

Deterministic Ethernet & Unified Networking

Real-Time Systems and Programming Languages

Software LEIC/LETI. Lecture 20

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

SpaceWire-RT Update. EU FP7 Project Russian and European Partners. SUAI, SubMicron, ELVEES University of Dundee, Astrium GmbH

Lecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM

Chapter 18 Parallel Processing

QoS support for Intelligent Storage Devices

Lecture 2. Basics of networking in automotive systems: Network. topologies, communication principles and standardised protocols

A High Integrity Distributed Deterministic Java Environment. WORDS 2002 January 7, San Diego CA

Validation of real-time properties of a robotic software architecture

Time-Triggered Ethernet

Distributed IMA with TTEthernet

Leveraging Transactional Memory for Energy-efficient Computing below Safe Operation Margins

PROBABILISTIC SCHEDULING MICHAEL ROITZSCH

Transactions in Task Models

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)

Announcements. R3 - There will be Presentations

Issues in Programming Language Design for Embedded RT Systems

Mixed Criticality Scheduling in Time-Triggered Legacy Systems

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

Overall Structure of RT Systems

In This Lecture. Transactions and Recovery. Transactions. Transactions. Isolation and Durability. Atomicity and Consistency. Transactions Recovery

Fault tolerant scheduling in real time systems

Other Optimistic Mechanisms, Memory Management!

Today: Fault Tolerance. Replica Management

CENTRUM INDUSTRIAL IT - Where IT meets Automation -

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors

Fault Tolerant Computing CS 530

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Fault Tolerance. o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication. o Distributed Commit

A Formal Model of Crash Recovery in Distributed Software Transactional Memory (Extended Abstract)

Single-Path Programming on a Chip-Multiprocessor System

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations

Evolving the CORBA standard to support new distributed real-time and embedded systems

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

I/O CANNOT BE IGNORED

TWO-PHASE COMMIT ATTRIBUTION 5/11/2018. George Porter May 9 and 11, 2018

Current Topics in OS Research. So, what s hot?

Transcription:

Towards Transactional Memory for Safety-Critical Embedded Systems Stefan Metzlaff, Sebastian Weis, and Theo Ungerer Department of Computer Science, University of Augsburg, Germany Euro-TM Workshop on Transactional Memory April 14, 2013 WTM13 Metzlaff, Weis, and Ungerer / TM for Safety-Critical Embedded Systems 1

Motivation Safety-critical embedded systems Avionics or automotive domain Real-time constraints Fault tolerance constraints Different certification requirements (SL 1-4, DAL A-E) Trend towards High performance Low power E.g. autonomous driving, Multi-core processors and parallel applications WTM13 A380, [1] Metzlaff, Weis, and Ungerer / TM for Safety-Critical Embedded Systems Google Driverless Car, [2] 2

Motivation Transactional memory in safety-critical systems Concurrency control Predictable execution in multi-cores Real-time capable concurrency Bounding communication interferences Fault tolerance Fault containment Fault detection Fault recovery WTM13 Metzlaff, Weis, and Ungerer / TM for Safety-Critical Embedded Systems 3

Real-Time, Multi-Core & TM nterferences at shared resources Access to bus, memory, and /O Predictable arbitration with bandwidth guarantees (e.g. TDMA) Concurrency control nterferences at application level Requirements for hard real-time (HRT) TM Commit guarantee for each transaction Calculable number of transaction aborts HRT contention management Related work: [Fahmy et al. 2009] and [Schoeberl et al. 2010] Core 1 Core 2 Cache Memory Bus /O Device WTM13 Metzlaff, Weis, and Ungerer / TM for Safety-Critical Embedded Systems 4

HRT-TM Design Overview Lazy versioning No cascading roll-backs Predictable transactions by commit ordering FFO transaction commit queue Registering transactions on transaction begin Commit serialisation Bounded number of aborts and transaction delay 1 2 3 4 Running Waiting Commiting Aborting Allows estimation of WCET bounds (requires the set of concurrent transactions) Predictable concurrency control in shared memory systems WTM13 Metzlaff, Weis, and Ungerer / TM for Safety-Critical Embedded Systems 5

HRT-TM Enhancement for Non Real-Time Applications with tasks of different RT requirements E.g.: Advanced Driver Assistance System Hard real-time (HRT): collision avoidance Soft real-time (SRT): night vision Best-effort (BE): traffic sign recognition Data sharing among applications Limiting interference of non-hrt tasks Prioritised TM contention manager nterferences only during commit of BE task Analysis requires profiling BE working sets Preliminary results: minimal impact of BE tasks on WCET bounds of HRT tasks Collision Avoidance, HRT, [3] Night Vision, SRT, [4] Traffic Sign Recognition, BE, [5] WTM13 Metzlaff, Weis, and Ungerer / TM for Safety-Critical Embedded Systems 6

Fault Tolerance & TM Encapsulation of vulnerable code in transactions Redundant execution of transactions Fault model Core: transient and permanent faults nterconnect: transient faults only LLC & Memory: protected by ECC (not covered in this work) Related work: [Yalcin et al. 2010] and [Sanchez et al. 2010] Permanent Faults Core Local Memory Memory Core Local Memory Bus / nterconnect LLC Transient Faults WTM13 Metzlaff, Weis, and Ungerer / TM for Safety-Critical Embedded Systems 7

FT-TM Fault Detection and Recovering Fault containment: lazy versioning TM Fault detection: redundant execution of TXs Spatial, temporal, or both cannot change global state Comparison of write sets of s and register sets Fault recovery: check-pointing system state State of memory already managed by TM Register set needs to be saved on TX begin Rollback to TX begin on fault via TX retry Fault Containment Fault-Detection Fault Recovery Contention Manager WTM13 Metzlaff, Weis, and Ungerer / TM for Safety-Critical Embedded Systems 8

FT-TM Levels of Fault Tolerance Tasks with different FT properties Low or high error rate HRT or BE requirements Fault detection and recovery schemes (1) 1 core, > 2 execution time overhead on fault (transient only) (2) 2 cores, > 1 execution time overhead on fault (3) 3 cores, < 1 execution time overhead on fault (1) (2) (3) Fault-Detection Fault-Detection...... Roll-back Recovery Fault-Detection Send Commit Forward Error Correction Towards an individual level of fault tolerance for each task WTM13 Metzlaff, Weis, and Ungerer / TM for Safety-Critical Embedded Systems 9

Conclusions and Future Work Transactional memory for safety-critical embedded systems Hard real-time: isolation and predictability Fault tolerance: fault containment, detection, and recovery Mixed criticality systems: different requirements for tasks Future work Enhance HRT-TM by soft real-time support Fault recovery schemes for FT-TM ntegration of real-time and fault tolerance WTM13 Metzlaff, Weis, and Ungerer / TM for Safety-Critical Embedded Systems 10

Questions? References: [1] http://www.flickr.com/photos/8313254@n08/496320750/ [2] http://www.flickr.com/photos/jurvetson/5499949739/ [3] http://www.flickr.com/photos/13524418@n07/2921138655/ [4] http://www.flickr.com/photos/jurvetson/22226826/ [5] from Eichner, M.L.; Breckon, T.P., ntegrated speed limit detection and recognition from real-time video, ntelligent Vehicles Symposium, pp.626-631, 2008, EEE WTM13 Metzlaff, Weis, and Ungerer / TM for Safety-Critical Embedded Systems 11

References: [Fahmy et al. 2009]: S. F. Fahmy, B. Ravindran, and E. D. Jensen. On bounding response times under software transactional memory in distributed multiprocessor real-time systems. DATE, 2009 [Schoeberl et al. 2010]: M. Schoeberl, F. Brandner, and J. Vitek. RTTM: real-time transactional memory. SAC, 2010 [Sanchez et al. 2010]: D. Sanchez, J.L. Aragon, and J.M. Garcia. A log-based redundant architecture for reliable parallel computation. HiPC, 2010. [Yalcin et al. 2010]: G. Yalcin, O. Unsal,. Hur, A. Cristal, and M. Valero. FaulTM: Fault-Tolerance Using Hardware Transactional Memory. Pespma, 2010. WTM13 Metzlaff, Weis, and Ungerer / TM for Safety-Critical Embedded Systems 12