SIC: Provably Timing-Predictable Strictly In-Order Pipelined Processor Core

Size: px

Start display at page:

Download "SIC: Provably Timing-Predictable Strictly In-Order Pipelined Processor Core"

Geoffrey Carr
5 years ago
Views:

1 SIC: Provably Timing-Predictable Strictly In-Order Pipelined Processor Core Sebastian Hahn and Jan Reineke RTSS, Nashville December, 2018 saarland university computer science

2 SIC: Provably Timing-Predictable Strictly In-Order Pipelined Processor Core Sebastian Hahn and Jan Reineke RTSS, Nashville December, 2018 saarland university computer science

3 State-of-the-Art Timing Analysis Taskset running on Hardware platform All tasks meet their deadline. At least one task misses its deadline. Sebastian Hahn Strictly In-Order Core RTSS / 17

4 State-of-the-Art Timing Analysis per task Taskset running on Hardware platform Task-Level Analysis All tasks meet their deadline. At least one task misses its deadline. Sebastian Hahn Strictly In-Order Core RTSS / 17

5 State-of-the-Art Timing Analysis per task Task-Level Analysis Taskset running on Hardware platform characteristics per task, e.g. WCET C i All tasks meet their deadline. At least one task misses its deadline. Sebastian Hahn Strictly In-Order Core RTSS / 17

6 State-of-the-Art Timing Analysis per task Taskset running on Hardware platform Task-Level Analysis characteristics per task, e.g. WCET C i Schedulability Analysis All tasks meet their deadline. At least one task misses its deadline. Sebastian Hahn Strictly In-Order Core RTSS / 17

7 State-of-the-Art Timing Analysis per task Taskset running on Hardware platform Task-Level Analysis characteristics per task, e.g. WCET C i Schedulability Analysis All tasks meet their deadline. At least one task misses its deadline. Inside Schedulability Analysis: R i := C i + j hp(i) Ri T j C j Sebastian Hahn Strictly In-Order Core RTSS / 17

8 State-of-the-Art Timing Analysis Multi-Core System per task Taskset running on Hardware platform Task-Level Analysis characteristics per task, e.g. WCET C i, #BusAcc.,... Schedulability Analysis All tasks meet their deadline. At least one task misses its deadline. Inside Schedulability Analysis: R i := C i + j hp(i) Ri T j C j Sebastian Hahn Strictly In-Order Core RTSS / 17

9 State-of-the-Art Timing Analysis Multi-Core System per task Taskset running on Hardware platform Task-Level Analysis characteristics per task, e.g. WCET C i, #BusAcc.,... Schedulability Analysis All tasks meet their deadline. At least one task misses its deadline. Inside Schedulability Analysis: R i := C i + j hp(i) Ri I bus (i, R i ): worst-case interference on shared bus T j C j + I bus (i, R i ) penalty +... Sebastian Hahn Strictly In-Order Core RTSS / 17

10 State-of-the-Art Timing Analysis Multi-Core System per task Taskset running on Hardware platform Task-Level Analysis characteristics per task, e.g. WCET C i, #BusAcc.,... Schedulability Analysis All tasks meet their deadline. At least one task misses its deadline. Inside Schedulability Analysis: R i := C i + j hp(i) Ri Requires I bus Compositionality for Soundness (i, R i ): worst-case interference on shared bus T j C j + I bus (i, R i ) penalty +... Does not hold even for simple hardware platforms [Hahn et al., RTNS 16] Sebastian Hahn Strictly In-Order Core RTSS / 17

11 Why Task-Level Analysis is Expensive Microarchitectural State Space Exploration Set of System States Processor Cycle Sebastian Hahn Strictly In-Order Core RTSS / 17

12 Why Task-Level Analysis is Expensive Microarchitectural State Space Exploration + Longest Path Search Set of System States Processor Cycle Sebastian Hahn Strictly In-Order Core RTSS / 17

13 Why Task-Level Analysis is Expensive Microarchitectural State Space Exploration + Longest Path Search cache miss... Set of System States Processor Cycle cache hit Sebastian Hahn Strictly In-Order Core RTSS / 17

14 Why Task-Level Analysis is Expensive Microarchitectural State Space Exploration + Longest Path Search cache miss... Set of System States Processor Cycle cache hit State Space Explosion due to Timing Anomalies Sebastian Hahn Strictly In-Order Core RTSS / 17

15 Compositionality + Anomaly Freedom = Timing Predictability Sebastian Hahn Strictly In-Order Core RTSS / 17

16 Timing Anomalies and Execution Progress cache miss cache hit Sebastian Hahn Strictly In-Order Core RTSS / 17

17 Timing Anomalies and Execution Progress cache miss cache hit Sebastian Hahn Strictly In-Order Core RTSS / 17

18 Timing Anomalies and Execution Progress cache miss cache hit less progress Sebastian Hahn Strictly In-Order Core RTSS / 17

19 Timing Anomalies and Execution Progress cache miss cache hit less progress more progress Sebastian Hahn Strictly In-Order Core RTSS / 17

20 Timing Anomalies and Execution Progress cache miss cache hit less progress more progress Timing anomaly Cycle behaviour is non-monotonic Sebastian Hahn Strictly In-Order Core RTSS / 17

21 Timing Anomalies and Execution Progress cache miss cache hit less progress more progress Timing anomaly Cycle behaviour is non-monotonic Cycle behaviour is monotonic No timing anomalies Sebastian Hahn Strictly In-Order Core RTSS / 17

22 Monotonicity w.r.t. Progress is Key to... Anomaly Freedom c miss cache miss cache hit less progress c hit Sebastian Hahn Strictly In-Order Core RTSS / 17

23 Monotonicity w.r.t. Progress is Key to... Anomaly Freedom c miss cache miss cache hit less progress c hit less progress less progress less progress Sebastian Hahn Strictly In-Order Core RTSS / 17

24 Monotonicity w.r.t. Progress is Key to... Anomaly Freedom c miss cache miss cache hit less progress c hit less progress less progress less progress finish(c miss) finish(c hit) Sebastian Hahn Strictly In-Order Core RTSS / 17

25 Monotonicity w.r.t. Progress is Key to... Timing Compositionality with Penalty p c miss cache miss cache hit c hit p cycles c rectified Sebastian Hahn Strictly In-Order Core RTSS / 17

26 Monotonicity w.r.t. Progress is Key to... Timing Compositionality with Penalty p c miss cache miss cache hit c hit p cycles c rectified finish(c miss ) = finish(c rectified ) + p Sebastian Hahn Strictly In-Order Core RTSS / 17

27 Monotonicity w.r.t. Progress is Key to... Timing Compositionality with Penalty p c miss cache miss cache hit c hit p cycles more progress c rectified finish(c miss ) = finish(c rectified ) + p Sebastian Hahn Strictly In-Order Core RTSS / 17

28 Monotonicity w.r.t. Progress is Key to... Timing Compositionality with Penalty p c miss cache miss cache hit c hit p cycles more progress c rectified finish(c miss ) = finish(c rectified ) + p finish(c rectified ) finish(c hit ) Sebastian Hahn Strictly In-Order Core RTSS / 17

29 Monotonicity w.r.t. Progress is Key to... Timing Compositionality with Penalty p c miss cache miss cache hit c hit p cycles more progress c rectified finish(c miss ) = finish(c rectified ) + p finish(c rectified ) + p finish(c hit ) + p Sebastian Hahn Strictly In-Order Core RTSS / 17

30 Even Simple Microarchitectures Behave Non-Monotonically Conventional In-Order Pipeline Fetch Decode Execute Memory Write-Back Instruction Cache Data Cache Memory Sebastian Hahn Strictly In-Order Core RTSS / 17

31 Even Simple Microarchitectures Behave Non-Monotonically Conventional In-Order Pipeline Fetch Decode Execute Memory Write-Back add, 5 load ready Sebastian Hahn Strictly In-Order Core RTSS / 17

32 Even Simple Microarchitectures Behave Non-Monotonically Conventional In-Order Pipeline Fetch Decode Execute Memory Write-Back add, 5 load ready add load ready Sebastian Hahn Strictly In-Order Core RTSS / 17

33 Even Simple Microarchitectures Behave Non-Monotonically Conventional In-Order Pipeline Fetch Decode Execute Memory Write-Back add, 5 load ready add load ready cycle cycle add fetch miss load data miss Sebastian Hahn Strictly In-Order Core RTSS / 17

34 Even Simple Microarchitectures Behave Non-Monotonically Conventional In-Order Pipeline Fetch Decode Execute Memory Write-Back add, 5 load ready add load ready cycle cycle add fetch miss load data miss add, 4 load ready Sebastian Hahn Strictly In-Order Core RTSS / 17

35 Even Simple Microarchitectures Behave Non-Monotonically Conventional In-Order Pipeline Fetch Decode Execute Memory Write-Back add, 5 load ready add load ready cycle add, 4 load ready add cycle load, ml add fetch miss load data miss Sebastian Hahn Strictly In-Order Core RTSS / 17

36 Even Simple Microarchitectures Behave Non-Monotonically Conventional In-Order Pipeline Fetch Decode Execute Memory Write-Back add, 5 load ready add load ready cycle cycle add fetch miss load data miss add, 4 load ready add load, ml Sebastian Hahn Strictly In-Order Core RTSS / 17

37 Even Simple Microarchitectures Behave Non-Monotonically Conventional In-Order Pipeline Fetch Decode Execute Memory Write-Back add, 5 load ready add load ready add fetch miss Relative progress ofcycle instructions influences ordercycle of bus accesses load data miss add, 4 load ready add load, ml Sebastian Hahn Strictly In-Order Core RTSS / 17

38 Even Simple Microarchitectures Behave Non-Monotonically Conventional In-Order Pipeline Relative progress of instructions influences order of bus accesses Sebastian Hahn Strictly In-Order Core RTSS / 17

39 Even Simple Microarchitectures Behave Non-Monotonically Conventional In-Order Pipeline Relative progress of instructions influences order of bus accesses Idea: Make order of bus accesses independent of relative progress Sebastian Hahn Strictly In-Order Core RTSS / 17

40 Even Simple Microarchitectures Behave Non-Monotonically Conventional In-Order Pipeline Relative progress of instructions influences order of bus accesses Idea: Make order of bus accesses independent of relative progress Mechanism: Prioritise data accesses over instruction fetches Sebastian Hahn Strictly In-Order Core RTSS / 17

41 SIC: Strictly In-Order Pipelined Core Enforce Monotonic Behaviour Implementation: Prevent bus accesses of instruction fetches while Implementation: data-accessing instructions in the pipeline Sebastian Hahn Strictly In-Order Core RTSS / 17

42 SIC: Strictly In-Order Pipelined Core Enforce Monotonic Behaviour Implementation: Prevent bus accesses of instruction fetches while Implementation: data-accessing instructions in the pipeline Fetch Decode Execute Memory Write-Back Instruction Cache Data Cache Memory Sebastian Hahn Strictly In-Order Core RTSS / 17

43 SIC: Strictly In-Order Pipelined Core Enforce Monotonic Behaviour Implementation: Prevent bus accesses of instruction fetches while Implementation: data-accessing instructions in the pipeline fetch ins Fetch Decode Execute Memory Write-Back Instruction Cache Data Cache Memory Sebastian Hahn Strictly In-Order Core RTSS / 17

44 SIC: Strictly In-Order Pipelined Core Enforce Monotonic Behaviour Implementation: Prevent bus accesses of instruction fetches while Implementation: data-accessing instructions in the pipeline fetch ins Fetch Decode Execute Memory Write-Back ins misses Instruction Cache Data Cache Memory Sebastian Hahn Strictly In-Order Core RTSS / 17

45 SIC: Strictly In-Order Pipelined Core Enforce Monotonic Behaviour Implementation: Prevent bus accesses of instruction fetches while Implementation: data-accessing instructions in the pipeline load/store pending? fetch ins Fetch Decode Execute Memory Write-Back ins misses Instruction Cache Data Cache Memory Sebastian Hahn Strictly In-Order Core RTSS / 17

46 SIC: Strictly In-Order Pipelined Core Enforce Monotonic Behaviour Implementation: Prevent bus accesses of instruction fetches while Implementation: data-accessing instructions in the pipeline load/store pending? fetch ins Fetch Decode Execute Memory Write-Back ins misses Instruction Cache Data Cache blocked Memory Sebastian Hahn Strictly In-Order Core RTSS / 17

47 SIC: Strictly In-Order Pipelined Core Enforce Monotonic Behaviour Implementation: Prevent bus accesses of instruction fetches while Implementation: data-accessing instructions in the pipeline load/store pending? fetch ins Fetch Decode Execute Memory Write-Back ins misses Instruction Cache Data Cache blocked Memory strict bus access order Sebastian Hahn Strictly In-Order Core RTSS / 17

48 In Our Paper Design and Analysis of SIC: A Provably Timing-Predictable Pipelined Processor Core Sebastian Hahn and Jan Reineke Saarland University Saarland Informatics Campus Saarbrücken, Germany {sebastian.hahn, reineke}@cs.uni-saarland.de Abstract We introduce the strictly in-order core (SIC), a share resources such as buses, caches, or memory channels timing-predictable pipelined processor core. SIC is provably among multiple cores. As a consequence, the execution time timing compositional and free of timing anomalies. This enables of a task depends on the interference on shared resources that precise and efficient worst-case execution time (WCET) and multi-core timing analysis. it experiences due to co-running tasks on other cores. SIC s key underlying property is the monotonicity of its As in single-core WCET analysis, assuming the worst-case transition relation w.r.t. a natural partial order on its microarchitectural states. This monotonicity is achieved by carefully elimi- option as it would result in highly pessimistic execution time latency upon every shared-resource access is not a viable nating some of the dependencies between consecutive instructions bounds. The greatest analysis precision would be achieved by from a standard in-order pipeline design. SIC preserves most of the benefits of pipelining: it is only fully-integrated timing analyses [6] [8]: such analyses simultaneously analyze the tasks running on different cores of a multi about 6-7% slower than a conventional pipelined processor. Its timing predictability enables orders-of-magnitude faster WCET core, precisely capturing all possible interleavings of resource and multi-core timing analysis than conventional designs. accesses from different cores. Unfortunately, this approach appears to be practically infeasible for realistic systems due I. INTRODUCTION to the astronomical number of system states to explore. The One of the main challenges for timing analysis is the most promising approach to multi-core timing analysis to date dependence of the execution time on the state of the underlying is compositional timing analysis [9] [18], which can be seen hardware platform. Even simple single-core processors feature as a natural extension of the classical two-step approach to stateful performance-enhancing mechanisms such as pipelining and caches. In the presence of such stateful mechanisms WCET analysis, computes the resource demand of each task timing analysis: low-level analysis, corresponding to classical an individual instruction s execution time may vary widely for each shared resource. Given such task characterizations, depending on the state of the hardware when the instruction is schedulability analysis then determines, whether each task executed. For example, a cache miss usually takes significantly can be guaranteed to meet its deadlines, accounting for the longer than a cache hit. interference it may experience on each of the shared resources. Simply assuming the worst-case latency of instructions Compositional timing analysis relies on the assumption that throughout a program s execution would result in a dramatic the response time of a task may be decomposed into contributions from different resources, which can each be efficiently overestimation of its worst-case execution time (WCET). To achieve accurate results, WCET analysis thus needs to precisely take into account in which hardware states a program s We have shown in previous work that, unfortunately, even analyzed separately. instructions are executed. State-of-the-art static WCET analysis tools [1] explore a program s possible executions on a given do not admit compositional timing analysis [19]. simple in-order pipelined cores feature timing anomalies and hardware platform by a combination of explicit and implicit Based on preliminary ideas presented in [20], in this paper, techniques. It is desirable to employ implicit techniques, we introduce the strictly in-order core (SIC), a pipelined such as abstractions [2], to efficiently explore large sets of processor core that is provably free of timing anomalies and states. While precise and efficient abstractions are known that admits compositional timing analysis. for caches [3], the efficient implicit analysis of pipelining is The starting point of this work has been the observation impeded by the presence of timing anomalies [4], [5]. Due that the presence of timing anomalies in conventional inorder pipelines can be traced back to the non-monotonicity of to timing anomalies it is not safe for WCET analysis to only explore local worst-case successors of pipeline states; nor their timing behavior. This non-monotonicity is due to dependencies between consecutive instructions, where progress of is it possible to devise efficient abstractions in which sets of concrete pipeline states are represented by individual abstract one instruction may be detrimental to the progress of another pipeline states. instruction. The key property of SIC is the monotonicity of Timing analysis for applications deployed on multi-core its timing behavior, which is enforced by carefully eliminating processors is even more challenging. Multi-core processors some of the dependencies between instructions in the pipeline. Formal definition of progress Definition of strictly in-order behaviour Proof of monotonicity Corollary: anomaly freedom Corollary: timing compositionality Sebastian Hahn Strictly In-Order Core RTSS / 17

49 Evaluation Questions 1.) Performance Overhead of Enforcing Access Order 2.) Gain in Analysis Efficiency Sebastian Hahn Strictly In-Order Core RTSS / 17

50 Performance SIC versus Conventional In-Order 1.5 performance degradation of SIC versus in-order 0.5 cache size sets 256 sets 1024 sets 64 sets 256 sets 1024 sets 64 sets 256 sets 1024 sets memory latency 4 cycles 12 cycles 100 cycles Sebastian Hahn Strictly In-Order Core RTSS / 17

51 Performance SIC versus Conventional In-Order FPGA implementation overhead 0.2% 1.5 FPGA simulation performance degradation of SIC versus in-order cache size 0 64 sets 256 sets 1024 sets 64 sets 256 sets 1024 sets 64 sets 256 sets 1024 sets memory latency 4 cycles 12 cycles 100 cycles Sebastian Hahn Strictly In-Order Core RTSS / 17

52 Performance SIC versus Conventional In-Order performance degradation of SIC versus in-order cache size sets sets sets sets sets FPGA simulation static analysis sets sets sets sets memory latency 4 cycles 12 cycles 100 cycles Sebastian Hahn Strictly In-Order Core RTSS / 17

53 Performance Single PTARM-Thread versus SIC A Precision-Timed (PRET) Microarchitecture Implementation [Liu et al., ICCD 12] performance degradation of PTARM versus SIC cache size sets sets 1024 sets sets 256 sets FPGA simulation static analysis 1024 sets 64 sets sets sets memory latency 4 cycles 12 cycles 100 cycles Sebastian Hahn Strictly In-Order Core RTSS / 17

54 Task-Level Analysis Cost PTARM slowdown in analysis runtime relative to SIC cache size 64 sets 256 sets 1024 sets 64 sets 256 sets 1024 sets 64 sets 256 sets 1024 sets memory latency 4 cycles 12 cycles 100 cycles Sebastian Hahn Strictly In-Order Core RTSS / 17

55 Task-Level Analysis Cost PTARM in-order (anomalies) slowdown in analysis runtime relative to SIC cache size 64 sets 256 sets 1024 sets 64 sets 256 sets 1024 sets 64 sets 256 sets 1024 sets memory latency 4 cycles 12 cycles 100 cycles Sebastian Hahn Strictly In-Order Core RTSS / 17

56 Task-Level Analysis Cost Compositional Base Bound [Hahn et al., RTNS 16] slowdown in analysis runtime relative to SIC cache size sets sets sets PTARM in-order (anomalies) in-order (using compositional base bound) sets 256 sets 1024 sets sets 256 sets 1024 sets memory latency 4 cycles 12 cycles 100 cycles Sebastian Hahn Strictly In-Order Core RTSS / 17

57 Conclusions Strictly in-order pipeline designed to behave monotonically provably timing-predictable: anomaly freedom + compositionality c miss cache miss cache hit less progress c hit c miss cache miss cache hit c hit less progress less progress p cycles more progress less progress c rectified yet performant practical: low implementation overhead Sebastian Hahn Strictly In-Order Core RTSS / 17

Design and Analysis of Time-Critical Systems Timing Predictability and Analyzability + Case Studies: PTARM and Kalray MPPA-256

Design and Analysis of Time-Critical Systems Timing Predictability and Analyzability + Case Studies: PTARM and Kalray MPPA-256 Jan Reineke @ saarland university computer science ACACES Summer School 2017