Worst-Case Execution Time (WCET)

Similar documents
Worst-Case Execution Time (WCET)

Cache behavior prediction by abstract interpretation

HW/SW Codesign. WCET Analysis

Why AI + ILP is good for WCET, but MC is not, nor ILP alone

ait: WORST-CASE EXECUTION TIME PREDICTION BY STATIC PROGRAM ANALYSIS

Caches in Real-Time Systems. Instruction Cache vs. Data Cache

Caches in Real-Time Systems. Instruction Cache vs. Data Cache

A Refining Cache Behavior Prediction using Cache Miss Paths

Fast and Precise WCET Prediction by Separated Cache and Path Analyses

Improving the Precision of Abstract Interpretation Based Cache Persistence Analysis

Precise and Efficient FIFO-Replacement Analysis Based on Static Phase Detection

Timing Anomalies Reloaded

Timing analysis and timing predictability

CPE 631 Advanced Computer Systems Architecture: Homework #2

Evaluation and Validation

Hardware-Software Codesign. 9. Worst Case Execution Time Analysis

Shared Cache Aware Task Mapping for WCRT Minimization

Design and Analysis of Real-Time Systems Microarchitectural Analysis

Last Time. Response time analysis Blocking terms Priority inversion. Other extensions. And solutions

FIFO Cache Analysis for WCET Estimation: A Quantitative Approach

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

A Fast and Precise Worst Case Interference Placement for Shared Cache Analysis

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

Approximation of the Worst-Case Execution Time Using Structural Analysis. Matteo Corti and Thomas Gross Zürich

Timing Analysis. Dr. Florian Martin AbsInt

SOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name:

Efficient Microarchitecture Modeling and Path Analysis for Real-Time Software. Yau-Tsun Steven Li Sharad Malik Andrew Wolfe

Static Analysis of Worst-Case Stack Cache Behavior

Programming Language Processor Theory

Handling Write Backs in Multi-Level Cache Analysis for WCET Estimation

Perform page replacement. (Fig 8.8 [Stal05])

EE382V: System-on-a-Chip (SoC) Design

Introduction to Machine-Independent Optimizations - 4

Memory Management. To improve CPU utilization in a multiprogramming environment we need multiple programs in main memory at the same time.

Chapter 2: Memory Hierarchy Design Part 2

Static Memory and Timing Analysis of Embedded Systems Code

A Review on Cache Memory with Multiprocessor System

Lecture notes for CS Chapter 2, part 1 10/23/18

Fig 7.30 The Cache Mapping Function. Memory Fields and Address Translation

Managing Hybrid On-chip Scratchpad and Cache Memories for Multi-tasking Embedded Systems

MAKING DYNAMIC MEMORY ALLOCATION STATIC TO SUPPORT WCET ANALYSES 1 Jörg Herter 2 and Jan Reineke 2

ETH, Design of Digital Circuits, SS17 Review Session Questions I

DAT (cont d) Assume a page size of 256 bytes. physical addresses. Note: Virtual address (page #) is not stored, but is used as an index into the table

Chapter 2 MRU Cache Analysis for WCET Estimation

Goals of Program Optimization (1 of 2)

Evaluating Static Worst-Case Execution-Time Analysis for a Commercial Real-Time Operating System

Chapter 2: Memory Hierarchy Design Part 2

Cache Memory Mapping Techniques. Continue to read pp

1. PowerPC 970MP Overview

Debugging Program Slicing

Probabilistic Modeling of Data Cache Behavior

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Memory Management

CS6401- Operating System UNIT-III STORAGE MANAGEMENT

CS433 Homework 2 (Chapter 3)

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

Q3: Block Replacement. Replacement Algorithms. ECE473 Computer Architecture and Organization. Memory Hierarchy: Set Associative Cache

Main Memory and the CPU Cache

Linear Data Structures

Cache Structure. Replacement policies Overhead Implementation Handling writes Cache simulations. Comp 411. L15-Cache Structure 1

ECE 752 Adv. Computer Architecture I

CS/CoE 1541 Mid Term Exam (Fall 2018).

Timing Analysis - timing guarantees for hard real-time systems- Reinhard Wilhelm Saarland University Saarbrücken

Memory Hierarchies &

Code Optimization. Code Optimization

Predictable paging in real-time systems: an ILP formulation

ETH, Design of Digital Circuits, SS17 Review Session Questions I - SOLUTIONS

Evaluation and Validation

ECE 341 Final Exam Solution

Operating systems. Part 1. Module 11 Main memory introduction. Tami Sorgente 1

indicates problems that have been selected for discussion in section, time permitting.

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Module 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R A case study in modern microarchitecture.

ETH, Design of Digital Circuits, SS17 Practice Exercises II - Solutions

Live Variable Analysis. Work List Iterative Algorithm Rehashed

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

HW1 Solutions. Type Old Mix New Mix Cost CPI

Topics. Digital Systems Architecture EECE EECE Need More Cache?

CEC 450 Real-Time Systems

Cache Controller with Enhanced Features using Verilog HDL

Compiler Optimization and Code Generation

Question 1: (20 points) For this question, refer to the following pipeline architecture.

Com S 321 Problem Set 3

6.004 Tutorial Problems L14 Cache Implementation

Cache Structure. Replacement policies Overhead Implementation Handling writes Cache simulations. Comp 411. L18-Cache Structure 1

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory II

DOWNLOAD PDF LINKED LIST PROGRAMS IN DATA STRUCTURE

Memory Management! Goals of this Lecture!

Week 2: Tiina Niklander

Timing Anomalies and WCET Analysis. Ashrit Triambak

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Caches Part 2

13-1 Memory and Caches

Review: Computer Organization

ECE 2300 Digital Logic & Computer Organization. More Caches

Modeling Hardware Timing 1 Caches and Pipelines

CS 61C: Great Ideas in Computer Architecture Caches Part 2

Timing Analysis - timing guarantees for hard real-time systems- Reinhard Wilhelm Saarland University Saarbrücken

Operating Systems, Fall

In-order vs. Out-of-order Execution. In-order vs. Out-of-order Execution

Operating Systems Lecture 6: Memory Management II

Computer System Architecture Quiz #1 March 8th, 2019

Transcription:

Introduction to Cache Analysis for Real-Time Systems [C. Ferdinand and R. Wilhelm, Efficient and Precise Cache Behavior Prediction for Real-Time Systems, Real-Time Systems, 17, 131-181, (1999)] Schedulability Analysis WCET Simple Platforms WCMP (memory performance) e.g. no caches Ignoring cache leads to significant resource under-utilization. Q: How to appropriately account for cache? R. Bettati Worst-Case Execution Time (WCET) WCET analysis must be safe WCET analysis must be tight WCET depends on: execution path (in program) cache behavior (depends on execution history) pipelining (depends on very recent execution history) 1

Problems with Cache Memories Two fundamental problems with cache analysis: 1. Large differences in cache behavior (and execution time!) result from minor changes in program code in input data 2. Inter-task cache interference WCET analysis for single tasks remains prerequisite for multi-task analysis! Cache Memories Major parameters of caches: 1. Capacity: how many bytes in the cache? 2. Line size (block size): number of contiguous bytes that are transferred from memory on cache miss. Cache contains n = capacity / line_size blocks 3. Associativity: In how many cache locations can a particular block reside? A = 1 => A = n => direct mapped fully associative 2

Cache Semantics A-way associative cache can be considered as a sequence of n/a fully associative sets F = <f 1,, f n/a > Each set f i is a sequence of lines L = <l 1,, l A > The store is a set of memory blocks M = {m 1,..., m s } The function adr : M -> integers gives address of each block. The function set : M -> F denotes where block gets stored: set(m) = f i, where i = adr(m) % (n/a) + 1 No memory in a set line: M = M u {I} Cache Semantics (II) Cache semantics separates two aspects: 1. Set, where memory block is stored. Can be statically determined, as it depends only on address of the memory block. Dynamic allocation of memory blocks to sets is modeled by the cache states. 2. Replacement strategy within one set of the cache: History of memory references if relevant here. Modeled by set states. Def: set state is a function s: L -> M, what memory block in in given line? Note: In fully associative cache a memory block occurs only once. Def: Set S of all set states. Def: cache state is a function c: F -> S, what lines does set contain? 3

LRU Replacement Policy The side effects of referencing memory on the set/cache is represented by an update function. We note that behavior of sets is independent of each other order of blocks within a set indicates relative age of block. We number cache lines according to relative age of their memory block: s(l x ) = m, m!= I describes the relative age of block m according to LRU, not its physical position in the set. Def: set update function U S : S x M -> S describes new set state for given set state and referenced memory block. Def: cache update function U C : C x M -> C describes new cache state for a given cache state and a referenced memory block. LRU Replacement Policy (II) Definition 5 (set update). A set update function U S: S M S describes the new set state for a given set state and a referenced memory block. Definition 6 (cache update). A cache update function U C: C M C describes the new cache state for a given cache state and a referenced memory block. Updates of fully associative sets with LRU replacement strategy are modeled in the following way (see Figure 3): [l1 m, li s(li 1) i = 2...h, US(s, m) = li s(li) i = h + 1...A]; if lh: s(lh) = m [l1 m, li s(li 1) for i = 2... A]; otherwise Updates of A-way set associative caches are modeled in the following way: UC(c, m) = c[set(m) US(c(set(m)), m)] Figure 3. Update of a concrete fully associative set. The most recently referenced memory block is put in the first position l1 of the set. If the referenced memory block m is in the set already, then all memory blocks in the set that have been more recently used than m are shifted by one position to the next set line, i.e., they increase their relative age by one. If the memory block m is not yet in the set, then all memory blocks in the set are shifted and, if the set is full, the oldest, i.e., least recently used memory block is removed from the set. 4

Control Flow Representation Program represented as Control Flow Graph (CFG): Nodes are Basic Blocks. Basic block is a sequence of instructions in which control flow enters at the beginning and leaves at the end without halt or possibility of branching except at the end. For each basic block the sequence of memory references is known. We can map control flow nodes to sequences of memory blocks (at least for instruction caches) and represent this as function L: V -> M* We can extend U C to sequences of memory references: U C (c, <m 1,, m y >) = U C (... U C (c,m 1 )..., m y ) Extend UC to path <k 1,..., k p > in control flow graph: U C (c, <k 1,, k p >) = U C (c, L(k 1 ),, L(k p )) Must Analysis vs. May Analysis Must Analysis determines set of memory blocks definitely in the cache whenever control reaches a given program point. May Analysis determines all memory blocks that may be in the cache at a given program point. May analysis is used to guarantee absence of a memory block in the cache. Analysis for basic blocks and paths of basic blocks is simple. What about when paths merge?! 5

Abstract Cache States Def: abstract set state is a function s^: L -> 2 M, maps set lines to sets of memory blocks. Def: Set S^ of all abstract set states. An abstract set state sˆ describes a set of concrete set states s. Def: abstract cache state is a function c^: F -> S^, maps sets to abstract set states. Def: Set C^ of all abstract cache states. An abstract cache state cˆ describes a set of concrete cache states c. MUST Analysis ma sˆ(l x ) for some x means that the memory block ma is in the cache. Observation 1: The position (relative age) of a memory block ma in a set can only be changed by memory references that go into the same set. i.e. by references to memory blocks mb with set(ma) = set(mb). Observation 2: The position is not changed by references to memory blocks mb sˆ(l y ) where y x, i.e., memory blocks that are already in the cache and are younger or the same age as ma. Observation 3: ma will stay in the cache at least for the next A x references that go to the same set and are not yet in the cache or are older than ma. 6

MUST Analysis: Update Function Figure 4. Update of an abstract fully associative set for the Must analysis. Û (ŝ, m) = Ŝ [l 1 {m}, l i ŝ(l i 1 ) i = 2...h 1, l h ŝ(l h 1 ) (ŝ(l h ) {m}), l i ŝ(l i ) i = h + 1...A]; if l h : m ŝ(l h ) [l 1 {m}, l i ŝ(l i 1 ) i = 2...A]; otherwise ˆ J Ŝ (ŝ 1, ŝ 2 ) =ŝ, where: ŝ(l x ) ={m l a, l b with m ŝ 1 (l a ), m ŝ 2 (l b ) and x = max(a, b)} MUST Analysis and Control Flow Recall: control flow node k issues sequence of memory blocks L(k) = <m 1,, m y > Simple case: node k has only one direct predecessor, say k. Then there is an equation : c^k = U^C^(c^k, <m 1,, m y >) = U^C^(... U^C^(c k,m 1 )..., m y ) Solve equation by fixpoint iteration. Abstract cache state c^k describes all possible cache states when control reaches node k. Let s^ = c^k(set(m)) for some memory block m. If m sˆ(l y ) for some set line l y, then m is definitely in the cache every time control reaches k. Therefore, reference to m is classified as ALWAYS HIT. 7

Multiple Predecessors: Join Function Definition 9 (join function). A join function ˆ J : Ĉ Ĉ Ĉ combines two abstract cache states. On control flow nodes with at least two predecessors, join functions are used to combine the abstract cache states. Figure 5. Join for the Must analysis. ˆ J Ŝ (ŝ 1, ŝ 2 ) =ŝ, where: ŝ(l x ) ={m l a, l b with m ŝ 1 (l a ), m ŝ 2 (l b ) and x = max(a, b)} MUST Analysis with multiple Predecessors Simple case: node k has more than one predecessor, say k 1 to k x. Then there is an equation : c^k = U^C^(c^k, <m 1,, m y >) = U^C^(... U^C^(c^k,m 1 )..., m y ) where c^k = J^C^(c^k1,... J^C^(c^x-1,c^x)...) Rest of classification of memory reference stays the same. 8

MAY Analysis Q: How do we know if a memory reference is an ALWAYS MISS? A: We determine the set of memory blocks that MAY be in the cache. MAY Analysis ma sˆ(l x ) for some x means that the memory block ma may be in the cache. Observation 1: The position (relative age) of a memory block ma in a set can only be changed by memory references that go into the same set. i.e. by references to memory blocks mb with set(ma) = set(mb). Observation 2: The position is not changed by references to memory blocks mb sˆ(l y ) where y x, i.e., memory blocks that are already in the cache and are younger or the same age as ma. Observation 3: ma will stay in the cache at least for the next A x + 1 references that go to the same set and are not yet in the cache or are older than or have the same age as ma. 9

MAY Analysis: Update Function Figure 6. Update of an abstract fully associative set for the May analysis. Û (ŝ, m) = Ŝ [l 1 {m}, l i ŝ(l i 1 ) i = 2...h, l h+1 ŝ(l h+1 ) (ŝ(l h ) {m}), l i ŝ(l i ) i = h + 2...A]; if l h : m ŝ(l h ) [l 1 {m}, l i ŝ(l i 1 ) i = 2...A]; otherwise Join Function for MAY Analysis Figure 7. Join for the May analysis. ˆ J Ŝ (ŝ 1, ŝ 2 ) =ŝ, where: ŝ(l x ) = {m l a, l b with m ŝ 1 (l a ), m ŝ 2 (l b ) and x = min(a, b)} {m m ŝ 1 (l x ) and l a with m ŝ 2 (l a )} {m m ŝ 2 (l x ) and l a with m ŝ 1 (l a )} 10

MAY Analysis and Control Flow Let s^ = c^k(set(m)) for some memory block m. If m is not in sˆ(l y ) for at least one line l y, then m is definitely NOT in the cache whenever control reaches k. Therefore, reference to m is classified as ALWAYS MISS. 11