Preemption Delay Analysis for the Stack Cache

Size: px

Start display at page:

Download "Preemption Delay Analysis for the Stack Cache"

Cordelia Fleming
5 years ago
Views:

1 Preemption Delay nalysis for the Stack ache mine Naji U2IS ENST ParisTech Florian randner LTI, NRS Telecom ParisTech This work is supported by the Digiteo project PM-TOP. 1/18

2 Real-Time Systems Strict timing guarantees ritical tasks have to be completed in time 2/18

3 Real-Time Systems Strict timing guarantees ritical tasks have to be completed in time ound Worst-ase Execution Time (WET) Worst-ase Execution Time ound # Executions Worst-ase Execution Time est-ase Execution Time verage Execution Time Overestimation Execution Time 2/18

4 ache Related Preemption Delay - On Standard aches ache Related Preemption Delay (RPD): Time penalty introduced by cache misses due to task preemption. τ1 Preemption τ2 ache state 0x100 0x103 0x102 0x104 τ1 modifies the cache 0x100 0x105 0x103 0x104 0x100 0x103 0x102 0x104 ache miss Penalty Time Some cache blocks of a preempting task may evict cache blocks of a preempted task. ache misses may occur when the preempted task is resumed. 3/18

5 What is a Stack ache? Dedicated cache for stack data Simple ring buer (FIFO replacement) ll stack accesses are guaranteed hits (no need to analyze them) Dedicated stack control instructions (need to be analyzed) sres x : reserve x blocks on the stack sfree x : free x blocks on the stack sens x : ensure that at least x blocks are cached Intuitively: a cache window following the stack top 4/18

6 Example: Stack ache (1) function () function () function () (2) sres 2 <0> sres 2 <0> sres 3 <3> (3) call () call () sfree 3 (4) sens 2 <2> sens 2 <1> (5) sfree 2 sfree 2 Logical stack Stack cache ache conguration: 4 blocks 5/18

7 Example: Stack ache (1) function () function () function () (2) sres 2 <0> sres 2 <0> sres 3 <3> (3) call () call () sfree 3 (4) sens 2 <2> sens 2 <1> (5) sfree 2 sfree 2 Logical stack Stack cache ache conguration: 4 blocks 5/18

8 Example: Stack ache (1) function () function () function () (2) sres 2 <0> sres 2 <0> sres 3 <3> (3) call () call () sfree 3 (4) sens 2 <2> sens 2 <1> (5) sfree 2 sfree 2 Logical stack Stack cache ache conguration: 4 blocks 5/18

9 Example: Stack ache (1) function () function () function () (2) sres 2 <0> sres 2 <0> sres 3 <3> (3) call () call () sfree 3 (4) sens 2 <2> sens 2 <1> (5) sfree 2 sfree 2 Logical stack Stack cache ache conguration: 4 blocks 5/18

10 Example: Stack ache (1) function () function () function () (2) sres 2 <0> sres 2 <0> sres 3 <3> (3) call () call () sfree 3 (4) sens 2 <2> sens 2 <1> (5) sfree 2 sfree 2 Logical stack Stack cache ache conguration: 4 blocks 5/18

11 Example: Stack ache (1) function () function () function () (2) sres 2 <0> sres 2 <0> sres 3 <3> (3) call () call () sfree 3 (4) sens 2 <2> sens 2 <1> (5) sfree 2 sfree 2 Logical stack Stack cache ache conguration: 4 blocks 5/18

12 Example: Stack ache (1) function () function () function () (2) sres 2 <0> sres 2 <0> sres 3 <3> (3) call () call () sfree 3 (4) sens 2 <2> sens 2 <1> (5) sfree 2 sfree 2 Logical stack Stack cache ache conguration: 4 blocks 5/18

13 Example: Stack ache (1) function () function () function () (2) sres 2 <0> sres 2 <0> sres 3 <3> (3) call () call () sfree 3 (4) sens 2 <2> sens 2 <1> (5) sfree 2 sfree 2 Logical stack Stack cache ache conguration: 4 blocks 5/18

14 Example: Stack ache (1) function () function () function () (2) sres 2 <0> sres 2 <0> sres 3 <3> (3) call () call () sfree 3 (4) sens 2 <2> sens 2 <1> (5) sfree 2 sfree 2 Logical stack Stack cache ache conguration: 4 blocks 5/18

15 Example: Stack ache (1) function () function () function () (2) sres 2 <0> sres 2 <0> sres 3 <3> (3) call () call () sfree 3 (4) sens 2 <2> sens 2 <1> (5) sfree 2 sfree 2 Logical stack Stack cache ache conguration: 4 blocks 5/18

16 Example: Stack ache (1) function () function () function () (2) sres 2 <0> sres 2 <0> sres 3 <3> (3) call () call () sfree 3 (4) sens 2 <2> sens 2 <1> (5) sfree 2 sfree 2 Logical stack Stack cache ache conguration: 4 blocks 5/18

17 ache Related Preemption Delay - On Stack ache The original design of the stack cache did not consider the multitasking aspects. The stack space cannot be shared with tasks. ontext of preempted task has to be saved/restored. Two analysis problems: ontext Saving ontext Restoring We seek to compute RPD relative to the stack cache. 6/18

18 Preemption ost Examples 1 ) func () 2 ) sres ) () 4 ) sens ) sfree 2 1 ) func () 2 ) sres ) nop 4 ) () 5 ) sens ) sfree 2 1 ) func () 2 ) sres ) sfree 3 7/18

19 Preemption ost Examples 1 ) func () 2 ) sres ) () 4 ) sens ) sfree 2 1 ) func () 2 ) sres ) nop 4 ) () 5 ) sens ) sfree 2 1 ) func () 2 ) sres ) sfree i 2 i 2 i 3 i 2 i 5 i 4 Simple approach. 7/18

20 Preemption ost Examples 1 ) func () 2 ) sres ) () 4 ) sens ) sfree 2 1 ) func () 2 ) sres ) nop 4 ) () 5 ) sens ) sfree 2 1 ) func () 2 ) sres ) sfree i 2 i 2 i 3 i 2 i 5 i 4 i 2 i 2 i 3 i 2 i 5 i 4 Simple approach. Improved approach. 7/18

21 Drawbacks of the Naïve pproach Naive approach: Simple analysis, only based on the cache occupancy (originally provided by S analysis) However, some inaccuracies may be introduced... Not all saved/restored data may be accessed afterwards. Unnecessary saving of coherent data to main memory. nalysis does not take advantage of placement of ensures 8/18

22 ontext Saving nalysis - Overview Split the stack cache into three regions by introducing two pointers: Dead Pointer (DP) keeps track of dead data. Lazy Pointer (LP) keeps track of coherent data. Two pointer denes two areas, Dead rea is located below DP, while oherent rea is located above LP. Regions size is computed using local data-ow analyses. MT LP DP ST oherent data Data to save Dead data 9/18

23 ontext Restoring nalysis - Overview Split the stack cache into three dierent regions introducing two pointers Dead data is not restored, and Restore Pointer (RP) keeps track of data that has to be restored explicitly. sens instruction will load the rest. Size of these regions is computed using function local data-ow analyses. Only the stack frame of the current function is restored Inter-procedural analysis accounts for additional overheads 10/18 MT RP DP ST Ensured data Data to restore Dead data

24 Global ost of Ensure Instruction Intuitively, sens instructions will partially restore stack frames for free. We need to account for the additional cost paid by sens instructions. 1 ) func () 2 ) sres ) () 4 ) sens ) sfree 2 1 ) func () 2 ) sres ) () 5 ) sens ) sfree 2 1 ) func () 2 ) sres ) nop 4 ) sfree 3 11/18

25 Global nalysis of Ensure ost osts can be derived from the longest path in a weighted G from the program's entry node to the current function. Weights represents the dierence between the corresponding ensure bounds and the calling function reserved size. Path length represent additional cost. 12/18

26 Global nalysis of Ensure ost - Example Global Ensure and Reserve nalyses - Example: 1 ) func () 2 ) sres ) () 4 ) sens ) sfree 2 1 ) func () 2 ) sres ) () 5 ) sens ) sfree 2 1 ) func () 2 ) sres ) nop 4 ) sfree 3 13/18

27 Global nalysis of Ensure ost - Example Global Ensure and Reserve nalyses - Example: 1 ) func () 2 ) sres ) () 4 ) sens ) sfree 2 1 ) func () 2 ) sres ) () 5 ) sens ) sfree 2 1 ) func () 2 ) sres ) nop 4 ) sfree /18

28 Global nalysis of Ensure ost - Example Global Ensure and Reserve nalyses - Example: 1 ) func () 2 ) sres ) () 4 ) sens ) sfree 2 1 ) func () 2 ) sres ) () 5 ) sens ) sfree 2 1 ) func () 2 ) sres ) nop 4 ) sfree Observation The length of the path is bounded by the stack cache size and the minimum amount of stack data remaining in the stack cache after returning from the function. 13/18

29 Experimental Setup Miench benchmark suite LLVM compiler 3.5 for the Patmos processor ompiled with optimizations enabled (-O2) Stack cache congurations: 256 ompile benchmarks and perform preemption delay analyses 14/18

30 Experiments: ontext Saving nalysis Full Optimized # asic locks > 250 Transfer osts (bytes) Shift from right to the left Improvement in around 16% of basic blocks 15/18

31 Experiments: ontext Restoring nalysis Full Optimized # asic locks > 250 Transfer osts (bytes) Drastic shift from right to the left Improvement in around 99% of basic blocks In some cases the program runs even faster (1.7%) 16/18

32 onclusion We proposed a static analysis to determine preemption delay associated with the stack cache. Several function-local data-ow analysis Inter-procedural eects are handled through variants of the longest path problem. 17/18

33 Thanks for your attention ny Question? Full Optimized # asic locks > 250 Transfer osts (bytes) 18/18

Static Analysis of Worst-Case Stack Cache Behavior

Static Analysis of Worst-Case Stack Cache Behavior Florian Brandner Unité d Informatique et d Ing. des Systèmes ENSTA-ParisTech Alexander Jordan Embedded Systems Engineering Sect. Technical University