Timing Predictability of Processors

Size: px

Start display at page:

Download "Timing Predictability of Processors"

Angelica Anderson
5 years ago
Views:

1 Timing Predictability of Processors

2 Introduction Airbag opens ca. 100ms after the crash Controller has to trigger the inflation in 70ms Airbag useless if opened too early or too late Controller has to provide the correct result and deliver it in the correct period of time A correct value too late is also wrong! How to determine when to schedule the task? Picture Source:

3 Introduction To determine when to schedule the task, the Worst Case Execution Time (WCET) has to be known A WCET analysis requires knowledge about all timing effects But modern processors are designed for average case performance and not for timing predictability For timing predictability timing guarantees in all situations are required not just in the average case In the following we will have a look at dynamic scheduling, caches and pipelines with branch prediction

4 Dynamic Scheduling Out-of-Order Execution: Instructions are loaded and buffered. If all operands of a buffered instruction are available, the instruction can be scheduled Instructions can be executed out-of-program-order Ifmore than one instruction can be scheduled, a dynamic scheduling decision has to be made In-Order Execution: Instructions can only be executed in the order of the program

5 Timing Predictability of Out-of-Order Execution Execution time of A is increased by two cycles Instruction A depends on instruction B Instruction B depends on instruction D Overall execution time increased by four cycles => Penalty for the different execution order is higher, than the penalty in the execution of A Picture source: Principles of Timing Anomalies in Superscalar Processors, Wenzel et Al.

6 Timing Predictability of In-Order Execution Execution time of A increased by two cycles Instruction D can only be executed on FU2 Overall execution time increased by three cycles => Overall execution time is increased more than the penalty in the execution of instruction A Picture source: Principles of Timing Anomalies in Superscalar Processor, Wenzel et al.

7 Criterion for Timing Unpredictabilities Decision in out-of-order execution: Schedule one of the instructions which are ready to be executed Decision in in-order execution: Schedule the instruction on one ofthe functional units If no decision is made, different execution orders are not possible =>A necessary condition for timing unpredictability is the existance of dynamic scheduling decisions =>All possible execution orders resulting out of all possible timing effects are hard to analyze

8 Caches Cache divided into different Cache Sets Each Cache Set consists of multiple Cache Lines A disjunct subset of all memory cells can be stored in each Cache Set, For example with four sets, every fifth memory block maps to the same set In a Cache Line one Memory Block can be stored A Memory Block consists ofmultiple memory cells The number of Cache Lines in a Cache Set is called the Associativity of the Cache

9 Timing Predictability of Caches Sources of Uncertainty: Control Flow Join: Cache state of both paths needs to be combined Preemption of Taks: Content of the Cache may be changed by another task Dynamic determined addresses: A static WCET analysis can only guess these addresses Unknown initial state: Content depends on the previously executed task

10 Timing Predictability of Caches Factors influencing the predictability of caches: Associativity Separation between instruction and data caches Replacement Policy Write Policy But the replacement policy has the strongest influence on the timing predictability* *Source: Timing Predictability of Cache Replacement Policies, Reineke et Al.

11 Timing Predictability of Caches To evaluate a replacement policy it is necessary to determine how many accesses are required to achieve a fully known cache state, starting with a unknown state A replacement policy can be evaluated with two metrics: Evict(k): Number of accesses need to predict that some elements are no more in the cache Fill(k): Number of accesses needed to completly fill the cache with known content Metrics depend on the associativity k

12 Timing Predictability of Caches Policy Evict(k) Fill(k) Evict(k) Fill(k) LRU k k k k FIFO k k 2k-1 3k-1 MRU 2k-2 2k-2 PLRU 2k-sqrt(2k) 2k-1 0.5*k*ld(k)+1 0.5*k*ld(k)+k-1 Sequence with only misses Sequence with hits and misses At the minimum k accesses are required to fill k Cache Lines After evict accesses the number of memory blocks that could potentially be in the cache is shrinking Before evict accesses every memory block could potentially be in the cache Fill accesses are required to fill the cache with known content => A lower number of acccesses implies a higher predictability Results from: Timing Predictability of Cache Replacement Policies, Reineke et Al.

reads/writes the register that a former instruction writes/reads Control Conflict: A branch may change the

13 Pipelining Execution split into stages Every stage can be executed in parallel => Parallel execution of instructions Parallel execution causes conflicts: Data Conflict: A later instruction in the pipeline reads/writes the register that a former instruction writes/reads Control Conflict: A branch may change the program counter, this may invalidate the already loaded instructions Picture source:

Timing Predictability of Pipelining For the correct execution the conflicts need to be resolved The simplest conflict resolution technique is to stall the pipeline Ifconflicts between non-adjacent

14 Timing Predictability of Pipelining For the correct execution the conflicts need to be resolved The simplest conflict resolution technique is to stall the pipeline Ifconflicts between non-adjacent instructions can appear, the penalties for conflicts can influence each other The more deeply the pipeline the more conflicts need to be resolved and the higher is the penalty Other corrective actions are more complex, but complexity contributes to timing unpredictability Picture Source: Processor Pipelines and Static Worst-Case Execution Time Analysis, Engblom

15 Branch Prediction Static Branch Predictors: Backward Taken, Forward Not taken(btfn) Fetch instructions beyond the branch Dynamic Branch Predictors: One-Level Branch Predictor: The outcome of the branch is predicted based on the outcome of the last n executions of the branch Two-Level Branch Predictor: The branch history is taken as an index into a pattern history table. In the pattern history table for every possible history, a saturation counter is stored. Based on this counter the prediction is made.

16 Timing Predictability of Branch Prediction NEC V850E with a static branch predictor that keeps fetching instructions beyond the branch Code is used to analyze the branch predictor For every execution of the outermost loop the time is measured Code fits completly into the cache All variables fit into the registers => Penalty for memory access is low All pictures for branch prediction from: Analysis of the Execution Time Unpredictability caused by Dynamic Branch Prediction, Engblom

17 Timing Predictability of Branch Prediction Two-level dynamic branch predictor Complex branch prediction scheme with two hierarchical branch predictors

18 Timing Dependencies between Branch Prediction and Caches Wrong prediction leads to the loading of wrong instructions To load instructions into the pipeline, memory accesses are required These memory accesses alter the content of the cache The state of the replacement policy is manipulated Wrong content is loaded into the cache to feed the pipeline => The unpredictability of the branch predictor contributes to the unpredictability of the cache => Components cannot be analyzed isolated

19 Conclusion Components of modern processors are not timing predictable WCET hard to precisely predict Performance improvement techniques are harder to predict, but are relevant for the performance of a processor => Modern processors are not completly applicable in the time critical domain => Design processors for timing predictability with lower performance but also lower WCETs

20 Timing Predictability of Caches Intuitively one would expect, if a cache miss occurs the execution is delayed by the penalty for the cache miss Execution is delayed by 10 cycles, but the penalty for the cache miss is only 8 cycles Deviation is caused by the out-of-order execution in MCIU Picture source: Timing Anomalies in Dynamically Scheduled Microprocessors, Lundqvist and Stenström

Execution time of A increased by two cycles Overall execution time increased by four

21 Timing Predictability of Out-of-Order Execution Execution time of A increased by two cycles due to unintentional behavior Overall execution time decreased by two cycles Execution time of A increased by two cycles Overall execution time increased by four cycles Picture source: Principles of Timing Anomalies in Superscalar Processors, Wenzel et Al.

Execution time of A increased by two cycles Instruction D can only be executed on FU2 Overall execution time

22 Timing Predictability of In-Order Execution Execution time of B increased by two cycles due to unintentional behavior Instruction D can only be executed on FU2 Overall execution time decreased by one clock cycle Execution time of A increased by two cycles Instruction D can only be executed on FU2 Overall execution time increased by three cycles Picture source: Principles of Timing Anomalies in Superscalar Processor, Wenzel et al.

Timing Anomalies and WCET Analysis. Ashrit Triambak

Timing Anomalies and WCET Analysis Ashrit Triambak October 20, 2014 Contents 1 Abstract 2 2 Introduction 2 3 Timing Anomalies 3 3.1 Retated work........................... 4 3.2 Limitations of Previous