Analysis of Probabilistic Cache Related Pre-emption Delays. Rob Davis, Luca Santinelli, Sebastian Altmeyer, Claire Maiza, and Liliana Cucu-Grosjean

Similar documents
On the Comparison of Deterministic and Probabilistic WCET Estimation Techniques

EPC: Extended Path Coverage for Measurement-based Probabilistic Timing Analysis

Probabilistic Worst-Case Response-Time Analysis for the Controller Area Network

Achieving Timing Composability with Measurement-Based Probabilistic Timing Analysis

1 Unweighted Set Cover

Static WCET Analysis: Methods and Tools

MC2: Multicore and Cache Analysis via Deterministic and Probabilistic Jitter Bounding

(Refer Slide Time: 1:27)

DTM: Degraded Test Mode for Fault-Aware Probabilistic Timing Analysis

Proceedings of the 4th International Real- Time Scheduling Open Problems Seminar

A Fast and Precise Worst Case Interference Placement for Shared Cache Analysis

Theory of Computations Spring 2016 Practice Final Exam Solutions

A Cache Design for Probabilistically Analysable Real-time Systems

Elementary maths for GMT. Algorithm analysis Part I

Proceedings of the 8 th Real-Time Scheduling Open Problems Seminar (RTSOPS)

1. Practice the use of the C ++ repetition constructs of for, while, and do-while. 2. Use computer-generated random numbers.

CSE 120. Translation Lookaside Buffer (TLB) Implemented in Hardware. July 18, Day 5 Memory. Instructor: Neil Rhodes. Software TLB Management

Outline Purpose How to analyze algorithms Examples. Algorithm Analysis. Seth Long. January 15, 2010

CPUs. Caching: The Basic Idea. Cache : MainMemory :: Window : Caches. Memory management. CPU performance. 1. Door 2. Bigger Door 3. The Great Outdoors

CS 188: Artificial Intelligence Fall 2008

Announcements. CS 188: Artificial Intelligence Fall Large Scale: Problems with A* What is Search For? Example: N-Queens

Architectural performance analysis of FPGA synthesized LEON processors

HW/SW Codesign. WCET Analysis

Measurement-Based Probabilistic Timing Analysis and Its Impact on Processor Architecture

APPROXIMATING A PARALLEL TASK SCHEDULE USING LONGEST PATH

Perspectives on Network Calculus No Free Lunch but Still Good Value

Precise and Efficient FIFO-Replacement Analysis Based on Static Phase Detection

Mathematical and Algorithmic Foundations Linear Programming and Matchings

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures

Data Structures and Algorithms

Adam Blank Lecture 2 Winter 2017 CSE 332. Data Structures and Parallelism

Advanced Operations Research Techniques IE316. Quiz 1 Review. Dr. Ted Ralphs

Copyright 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin Introduction to the Design & Analysis of Algorithms, 2 nd ed., Ch.

CSC 8301 Design & Analysis of Algorithms: Linear Programming

Today s class. Roots of equation Finish up incremental search Open methods. Numerical Methods, Fall 2011 Lecture 5. Prof. Jinbo Bi CSE, UConn

EPC Enacted: Integration in an Industrial Toolbox and Use Against a Railway Application

-- the Timing Problem & Possible Solutions

The Last Mile An Empirical Study of Timing Channels on sel4

History-based Schemes and Implicit Path Enumeration

2 What does it mean that a crypto system is secure?

Rec 8 Notes. 1 Divide-and-Conquer Convex Hull. 1.1 Algorithm Idea. 1.2 Algorithm. Here s another random Divide-And-Conquer algorithm.

Planning and Control: Markov Decision Processes

Algorithm efficiency can be measured in terms of: Time Space Other resources such as processors, network packets, etc.

Recitation 4: Elimination algorithm, reconstituted graph, triangulation

Mean Value Analysis and Related Techniques

Barcelona Supercomputing Center. Spanish National Research Council (IIIA-CSIC)

SMFF: System Models for Free

Statistical Timing Analysis Using Bounds and Selective Enumeration

Lossy Compression for Worst-Case Execution Time Analysis of PLRU Caches

Algorithms and Data Structures

ECE6095: CAD Algorithms. Optimization Techniques

Experiment 1 Yahtzee or Validating the t-table

Full Datapath. Chapter 4 The Processor 2

1 Probabilistic analysis and randomized algorithms

Lecture Overview. 2 Online Algorithms. 2.1 Ski rental problem (rent-or-buy) COMPSCI 532: Design and Analysis of Algorithms November 4, 2015

Time Complexity of an Algorithm

Sequences and Series. Copyright Cengage Learning. All rights reserved.

Optimising Task Layout to Increase Schedulability via Reduced Cache Related Pre-emption Delays

An Evaluation of WCET Analysis using Symbolic Loop Bounds

More on weighted servers

Overview. Sporadic tasks. Recall. Aperiodic tasks. Real-time Systems D0003E 2/26/2009. Loosening D = T. Aperiodic tasks. Response-time analysis

Contents. 1 Introduction. 2 Searching and Traversal Techniques. Preface... (vii) Acknowledgements... (ix)

Linear Programming Duality and Algorithms

Probabilistic Graphical Models

PROGRAM EFFICIENCY & COMPLEXITY ANALYSIS

The most important development from the work on public-key cryptography is the digital signature. Message authentication protects two parties who

ESO207A: Data Structures and Algorithms End-semester exam

This is a repository copy of Response-time analysis for fixed-priority systems with a write-back cache.

Algorithmen II. Peter Sanders, Christian Schulz, Simon Gog. Übungen: Michael Axtmann. Institut für Theoretische Informatik, Algorithmik II.

Adversarial Policy Switching with Application to RTS Games

Resource Sharing and Partitioning in Multicore

Course Outline. Processes CPU Scheduling Synchronization & Deadlock Memory Management File Systems & I/O Distributed Systems

CACHE-RELATED PREEMPTION DELAY COMPUTATION FOR SET-ASSOCIATIVE CACHES

More on Polynomial Time and Space

Task-Set Generator for Schedulability Analysis using the TACLeBench benchmark suite

Introduction to Linear Programming

Mean Value Analysis and Related Techniques

Mid-Year Report. Discontinuous Galerkin Euler Equation Solver. Friday, December 14, Andrey Andreyev. Advisor: Dr.

Interference Alignment Approaches

Will Monroe July 21, with materials by Mehran Sahami and Chris Piech. Joint Distributions

Goal of the course: The goal is to learn to design and analyze an algorithm. More specifically, you will learn:

SYSTEMS OF NONLINEAR EQUATIONS

CS 5523 Operating Systems: Memory Management (SGG-8)

Performance Evaluations for Parallel Image Filter on Multi - Core Computer using Java Threads

Efficient Microarchitecture Modeling and Path Analysis for Real-Time Software. Yau-Tsun Steven Li Sharad Malik Andrew Wolfe

Information Security CS526

Worst-Case Utilization Bound for EDF Scheduling on Real-Time Multiprocessor Systems

COSC 311: ALGORITHMS HW1: SORTING

Lecture notes on the simplex method September We will present an algorithm to solve linear programs of the form. maximize.

COMPSCI 650 Applied Information Theory Feb 2, Lecture 5. Recall the example of Huffman Coding on a binary string from last class:

CS1800 Discrete Structures Final Version A

Copyright 2000, Kevin Wayne 1

EXAM (Tentamen) TDDI11 Embedded Software. Good Luck! :00-12:00. On-call (jour): Admitted material: General instructions:

A Cache Utility Monitor for Multi-core Processor

Final Exam Solutions

Chapter 8 & Chapter 9 Main Memory & Virtual Memory

Managing Hybrid On-chip Scratchpad and Cache Memories for Multi-tasking Embedded Systems

Massively Parallel Seesaw Search for MAX-SAT

Discrete Optimization. Lecture Notes 2

Timing Anomalies Reloaded

Transcription:

Analysis of Probabilistic Cache Related Pre-emption Delays Rob Davis, Luca Santinelli, Sebastian Altmeyer, Claire Maiza, and Liliana Cucu-Grosjean

Outline Probabilistic real-time systems System model Random cache replacement policies (Evict-on-Access, Evict-on-Miss) Static Probabilistic Timing Analysis (SPTA) Single path programs Complexity Cache Related Pre-emption Delays At a specific point Upper bounding the effect at any point Multiple pre-emptions Extension to Multi-path programs Evaluation Case study and simulation Conclusions and future work 2

Probabilistic Real-Time Systems What do we mean by a probabilistic real-time system? One or more parameters are described by random variables Example: instead of a single WCET value, we have a probabilistic Worst-Case Execution Time (pwcet) Characterised by a probability distribution 1.E+00 1.E-01 1.E-02 1-CDF Probability 1.E-03 1.E-04 1.E-05 1.E-06 1.E-07 1.E-08 0 2 4 6 8 10 Execution time Common question: What does this mean? Isn t WCET defined as the single worst-case execution time value? 3

Analogy: dice and instructions Rolling 10 dice (only interested in how many sixes) WCET equates to 10 sixes pwcet upper bound probability distribution on number of sixes rolled Probability 1.E+00 1.E-01 1.E-02 1.E-03 1.E-04 1.E-05 1.E-06 1e-06 Exceedance function (1 CDF) 1.E-07 1.E-08 0 2 4 6 8 10 Number of Sixes 8 What should the budget for sixes be such that we get an expected failure rate no higher than 1 per 1 million rolls of the set of dice? i.e. runs of the program. (Failure = more sixes than budgeted) 4

Common misunderstanding: Difference between pet and pwcet Analogy: two options 10x ordinary dice 3x big dice that show pairs of values e.g. 2 sixes at once Like a program with two paths 1.E+00 1.E-01 1.E-02 Different pets for the two options (typically dependent) Probability 1.E-03 1.E-04 1.E-05 1.E-06 1.E-07 1.E-08 0 2 4 6 8 10 Number of Sixes pwcet is a tight upper bound on all possible pets (independent) pwcets can be composed to get pwcrts 5

Static Probabilistic Timing Analysis (SPTA) Aim is to show that the probability of timing failure falls below some threshold e.g. 10-9 failures per hour: pwcet v. budget Inputs Deterministic Random replacement policy CPU Instruction Cache Memory Deterministic Probabilistic WCET (pwcet) (single distribution value) Probability 1.E+00 1.E-01 1-CDF 1.E-02 1.E-03 1.E-04 1.E-05 1.E-06 1.E-07 1.E-08 0 2 4 6 8 10 Execution time 6

Cache model Fully associative instruction cache of N blocks Memory blocks can be loaded into any block in cache Each instruction resides in a memory block Memory blocks may contain multiple instructions Instruction modelling When an instruction is requested its memory block may be in cache (a hit) or not (a miss) If it is not in cache, then it has to be fetched from main memory and loaded into the cache. On a miss, a random location is chosen in the cache to accommodate the new memory block (Evict-on-Miss random replacement policy) Each cache block has the same probability of being evicted 1/N 7

Evict-on-miss random replacement Cache with memory blocks a,b,c,d,e loaded next instruction is in memory block f c e a f b d 8

Instruction modelling Instructions are either: Cache hit or cache miss (when executed) Program path Is a sequence of instructions Represented by the sequence of memory blocks for those instructions e.g. a, b, a, c, d, b, c, d, a, e, b, f, e, g, a, b, h Re-use distance k Defined as the maximum possible number of evictions since the last access to the memory block containing the required instruction a, b, a 1, c, d, b 3, c 2, d 2, a 5, e, b 4, f, e 2, g, a 5, b 4, h Can have re-use distance of zero (instructions in the same block & EoM) a, a 0, b, b 0, b 0, b 0, a 1, 9

Probability of cache hits and misses Each instructions has a probability of being a cache hit or a cache miss: Described by a discrete random variable (PMF) Note H and M are times for a cache Hit and cache Miss Example: Probability of a cache hit = 0.75 with an execution time of 1 Probability of a cache miss = 0.25 with an execution time of 10 1 0.75 10 0.25 For each instruction we aim to lower bound the probability of a cache hit independent of whether previous instructions were hits or misses 10

Probabilistic real-time analysis Requires independence: Two random variables X and Y are independent if they describe two events such that the outcome of one event does not have any impact on the outcome of the other In our context an instruction having a particular execution time is an event There is a dependency between these events via the cache Key idea is to conservatively model the execution times of instructions as independent random variables (which have no dependency on whether previous instructions were cache hits or cache misses) Actual probability of a cache hit P{hit} is dependent on the outcome of previous events (hits or misses) but we lower bound it with P hit which is independent then we can use convolution to get pwcet distribution for a sequence of instructions 11

Probabilistic real-time analysis Summation of independent random variables is via convolution 1 0.8 10 1 0.2 0.7 10 0.3 2 = 0.56 11 0.38 20 0.06 12

Static Probabilistic Timing Analysis (SPTA) Sequence of instructions represented by their memory blocks and re-use distances a, b, a 1, c, d, b 3, c 2, d 2, a 5, e, b 4, f, e 2, g, a 5, b 4, h Evict-on-miss random replacement policy (Recall: Fully associative cache, N cache blocks, on a cache miss we randomly choose a cache block to be evicted) Initial analysis by Zhou [17] 2010 P hit ( k) = N 1 N k Depends only on re-use distance k (not on actual cache hit / miss behaviour) Formulation is not strictly correct due to a dependency via the finite size of the cache 13

Counter example: Consider a cache of size N = 2 a, b, c, b 1, a 3, If the 2 nd access to b is a hit, then b and c must be in cache at that point and so the 2 nd access to a is certain to be a miss Probability that the 2 nd access to block a is a hit is not independent of whether previous instructions were hits or misses Joint probability that 2 nd accesses to both a and b are hits is zero, not 1/16 (as obtained from Zhou formula and convolution) Solution: Need to model instruction PMFs as independent (so can we can compose using convolution) Upper bound the maximum amount of known information (h blocks that could be known to be in cache) and consider how this may reduce the effective cache size and number of possible evictions HOW? Problem of Independence 14

Evict-on-Miss With h intervening hits assumed (if h N then P hit = 0) Lower bound (for all values of h) so crucially independent of previous hits / misses Similarly for Evict-on-Access (Cucu-Grosjean et al. [6]) Easy to see that Evict-on-Miss dominates Evict-on-Access 15 Static Probabilistic Timing Analysis h k hit h N h N h k P = 1 ), ( < = N k N k N N k P k hit EoM 0 1 ) ( < = N k N k k N k N k P k hit EoA 0 1) ( 1 1) ( ) ( Proof in the paper

Static Probabilistic Timing Analysis Upper bound pwcet for each instruction based on re-use distance k using formula modelling independent (lower bound) probability of a cache hit pwcet for a single path by convolution Convolution is commutative and associative Can represent a sequence of accesses a, b, a 1, c, d, b 3, c 2, d 2, a 5, e, b 4, f, e 2, g, a 5, b 4, h by their re-use distances: 16

pwcet distribution (1-CDF) pwcet without pre-emption 1e-09 143 17

Complexity of SPTA Convolving pwcets for n instructions Might seem to have exponential complexity O(2 n ) (The case if each distribution had two arbitrary values) Max value is a small constant M so after n convolutions, max value is nm and 2nM operations are required for the (n+1)th convolution Complexity is pseudo-polynomial O(Mn 2 ) where M is a small constant Problem is tractable in practice Can also use re-sampling to reduce the size of the distributions 18

Probabilistic Cache Related Pre-emption Delays (pcrpd) Effects of pre-emption at a single specific program point Pre-emption assumed to flush the cache making some re-use distances infinite Pre-emption after 1 st access a, b, a 1, c, d, b 3, c 2, d 2, a 5, e, b 4, f, e 2, g, a 5, b 4, h Pre-emption after 5 th access a, b, a 1, c, d, b 3, c 2, d 2, a 5, e, b 4, f, e 2, g, a 5, b 4, h Accounting for effects of pre-emption Remove values from representation of program (path) 19

pcrpd: Pre-emption at any point Effects of a single pre-emption at any program point Concept of a dominant virtual pre-emption point with an impact that upper bounds the impact of pre-emption at any actual program point Method to create virtual pre-emption point P* a, b, a 1, c, d, b 3, c 2, d 2, a 5, e, b 4, f, e 2, g, a 5, b 4, h Pad representations of pre-emption effects so they are all the same length e.g Apply So Do this for all possible pre-emption points: Remove values from representation of program (path) {2,2,4,4,5,,,,,,,,,,,, } 20

pwcet distribution (1-CDF) pwcet without pre-emption pwcet with 1 pre-emption 1e-09 143 170 21

pcrpd: Multiple pre-emptions Effects of multiple pre-emptions Remove values multiple times If a specific value is no longer present (this is due to pessimism in the analysis) remove next larger value (don t remove smaller ones) Example: a, b, c, d, a 3, b 3, c 3, d 3, d 0, d 0, d 0, d 0, d 0, d 0 4 pre-emptions are not enough to force this program to all misses 22

Multi-path Programs Static Probabilistic Timing Analysis (SPTA) and pcrpd extended to multi-path programs SPTA intuition Upper bound re-use distances using program analysis (fixed point iteration) Combine & collapse sub-paths to get a synthetic path representation that upper bounds the pwcet of any path through the program pcrpd intuition Upper bound the pre-emption effect at each program point using program analysis (on the re-use distances obtained by SPTA before collapsing) Combine effects for all program points into a single dominant virtual pre-emption point P* Apply P* to synthetic path as in the single path case Details in the paper 23

Evaluation Used Malardalen Bechmarks FAC, FIBCALL, FDCT, JFDCINT (single path with loops) BS, INSERTSORT, FIR (multi-path) Compared Evict-on-Miss and Evict-on-Access random replacement policies Varied: Number of pre-emptions Cache size (N = 256, 128, 64, 32) Memory block sizes (1, 2, 4, 8 instructions) Assumed H = 1, M = 10 Also compared SPTA and pcrpd analysis with simulation 24

FAC Benchmark Evict on Miss Memory block size = 1 Cache size N = 128 pwcet without pre-emption 4 pre-emption pwcet estimate from simulation 10 7 runs 3 pre-emption 2 pre-emption 1 pre-emption 25

FAC Benchmark Evict on Miss Memory block size = 1 Cache size N = 128 26

FAC Benchmark Evict on Access Memory block size = 1 Cache size N = 128 Simulation much closer due to evictions on every access Worse performance than Evict-on-Miss 27

FAC Benchmark Evict on Miss Memory block size = 4 Cache size N = 128 28

FAC Benchmark Evict on Access Memory block size = 4 Cache size N = 128 29

FAC Benchmark Evict on Miss Varying memory block size Cache size N =128 30

FAC Benchmark Evict on Miss Varying memory block size and cache size 31

Technical Report YCS-2012-477 Find it on Rob Davis publications page: http://www-users.cs.york.ac.uk/~robdavis/publications.html Results for other benchmarks FAC is very simple code. Others require many more pre-emptions to reduce them to all misses (e.g. > 500 pre-emptions for INSERTSORT) 32

Conclusions Main contributions Revised Static Probabilistic Timing Analysis for Evict-on-Miss random cache replacement policy Fixed a problem with dependency Extended SPTA to multipath programs Introduced analysis of pcrpd Including multiple pre-emptions of multi-path programs Evaluations Method is feasible and provides results that give a useful upper bound on the pwcet Future work Improvements to the pwcet analysis via loop un-rolling Comparisons with deterministic analysis for systems with traditional cache replacement policies Reduce the pessimism in SPTA 33

Questions? 34