Shared Cache Aware Task Mapping for WCRT Minimization

Size: px

Start display at page:

Download "Shared Cache Aware Task Mapping for WCRT Minimization"

Lynette Johnston
5 years ago
Views:

University of Singapore Yun Liang Center for

1 Shared Cache Aware Task Mapping for WCRT Minimization Huping Ding & Tulika Mitra School of Computing, National University of Singapore Yun Liang Center for Energy-efficient Computing and Applications, School of EECS, Peking University

2 Multi-core Processor in Embedded Systems Core 1 Core 2. Core n Shared Resources Embedded systems adopt multi-core processors due to performance, thermal and power constraints Embedded systems have real-time constraints, e.g., time deadline of tasks Shared Resources, like shared cache, make the timing difficult to estimate, e.g. Worst-Case Response Time (WCRT) analysis

Worst-Case Response Time (WCRT) of the application An important metric for schedulability

3 Worst-Case Response Time t1 A multitasking application t2 t3 t4 t5 Tasks can be mapped to any core in a multi-core systems The maximal latest finish time among all these tasks is the Worst-Case Response Time (WCRT) of the application An important metric for schedulability analysis Our purpose is to minimize WCRT in multi-core systems with shared cache via task mapping

4 Previous Task Mapping Approaches They usually focus on balancing the workload They are agnostic to the shared cache behavior State-of-the-art works on shared cache usually fix task mapping before shared L2 cache analysis

L2 access in the worst (best) case w/o L2 cache modeling with L2 cache modeling 12 10 8 It is imperative to

5 WCRT (million cycles) Motivating Example Task graph ndes crc adpcm minver compress Configuration: 2-core, private L1 cache: 256 bytes, shared L2 cache: 2KB w/o L2 cache modeling: Assume cache miss (hit) for each L2 access in the worst (best) case w/o L2 cache modeling with L2 cache modeling It is imperative to design a shared cache aware task mapping solution! Task mappings

6 Our Architecture Core1 CPU Core 2 CPU Core n CPU L1 cache L1 cache L1 cache Shared L2 cache Main memory A private L1 cache for each core All cores share a L2 cache Tasks execute in a non-preemptive fashion

7 Private Cache Systems CPU Cache Off-chip Main Memory Usually only the intra-task analysis is required to estimate the WCET (Worst-Case Execution Time) No inter-core cache conflicts

Shared Cache Multi-core Systems Inter-core cache conflicts Lead to extra L2 cache misses This makes the timing analysis complex Hit Core 1 Task T Core 2 Miss Core

8 Shared Cache Multi-core Systems Inter-core cache conflicts Lead to extra L2 cache misses This makes the timing analysis complex Hit Core 1 Task T Core 2 Miss Core 1 Task T Core 2 Task T Access m2 Access m2 Access m1 m1 m2 m1 m2 m1 Shared L2 cache young old LRU replacement policy Shared L2 cache young old LRU replacement policy

9 Task Interference Task lifetime : [Earliest Ready, Latest Finish] Interfering tasks : Two tasks mapped to different cores with lifetime overlapped Two tasks with dependence between them can never interfere with each other t1 Core 1 Core 2 t1 t2 t3 Time t2 t3 t4 t4 Interfering tasks: t2 and t3, t2 and t4

10 Current Efforts on Shared Cache They focus on modeling cache and inter-core cache conflicts (Yan and Zhang RTAS 2008, Hardy and Puaut RTSS 2008 and Liang et al. RTS 2012) Task mapping is fixed (known) before analysis, and agnostic to the shared cache conflicts

Impact of Task Mapping Task interference Two tasks can interfere with each other only when they are mapped to different cores Task interference Inter-core cache

11 Impact of Task Mapping Task interference Two tasks can interfere with each other only when they are mapped to different cores Task interference Inter-core cache conflict Execution time of task WCRT Workload balance among cores Workload balance influences the total execution time of a multitasking application, thus impacts its WCRT

12 Our Aims We target the task mapping problem in multi-core systems with shared cache, in order to optimize the WCRT We not only consider workload balance, but also minimize inter-core cache conflicts

13 Challenges Exhaustive enumeration of all mappings is impossible as # of cores and # of tasks increase Significant complexity due to inter-dependency between task mapping and task execution time Task mapping Workload balancing Task execution time Task interference Cache conflict

14 Our Approach An approach based on ILP (integer linear programming) formulation Intra-task cache analysis Task mapping modeling Shared cache modeling WCET computation modeling WCRT computation modeling ILP problem Objective: minimize WCRT ILP solver Result: a task mapping Accurate WCRT analysis framework Final WCRT

15 Intra-task Analysis Must analysis Captures the memory blocks that are guaranteed in the cache May analysis Captures the memory blocks that are never in the cache Memory block access classification Always hit Always miss Non-classified

Task Mapping Modeling Suppose there are N tasks and M cores in the multi-core system. MAP ik is a 0-1 decision variable to indicate if task i is mapped to core k. Then for any task i (0<i N).

16 Task Mapping Modeling Suppose there are N tasks and M cores in the multi-core system. MAP ik is a 0-1 decision variable to indicate if task i is mapped to core k. Then for any task i (0<i N). 0 < k M MAP ik = 1 For two task i and j, SC ij is a 0-1 decision variable to indicate if tasks i and j are mapped to the same core; DC ij is a 0-1 decision variable to indicate if tasks i and j are mapped to different cores SC ij = < k M, MAP ik = 1 and MAP jk = 1 otherwise DC ij = 1 SC ij

17 Shared Cache Modeling {m, a, b, c, d, e} are mapped to the same set s in shared L2 cache. m, a, b and c are from task i, while d and e are tasks u and v respectively young old Cache state before considering cache conflicts a b m c LRU replacement policy (age m + j intf(i) conflict j s ) A Possible cache states after considering cache conflicts d a b m d e a b Hit Miss

18 WCET and WCRT Computation WCET Intra-task analysis and shared cache modeling WCRT WCRT is the summation of WCET value on the critical path

19 Approximations in ILP Interfering tasks Two tasks i and j have no dependence and are mapped to different cores interfere ij = DC ij WCET = Original WCET + Cache conflict penalty Intra-task cache analysis Original WCET Shared cache modeling Cache conflict penalty Why approximations? Make the problem easy to solve Estimate accurate WCRT with the mapping released by ILP formulation by a WCRT analysis framework

Iterative Analysis Framework Yun Liang, Huping Ding,

Timing Analysis of Concurrent Programs Running on Shared

In Real-Time System Journal 2012 L1 cache analysis Core

Filter L2 cache analysis Initial task interference

20 Iterative Analysis Framework Yun Liang, Huping Ding, Tulika Mitra, Abhik Roychoudhury, Yan Li, Vivy Suhendra Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores. In Real-Time System Journal 2012 L1 cache analysis Core 1 Core 2 Filter L2 cache analysis L1 cache analysis Filter L2 cache analysis Initial task interference Estimated WCRT L2 cache Conflict analysis WCRT analysis Modified task interference no Interference changes? yes

Experimental Evaluation Real-world benchmark: DEBIE benchmark Synthetic benchmarks: MRTC benchmark suite Comparison among different approaches Optimal mapping Our mapping Mapping w/o L2 cache

21 Experimental Evaluation Real-world benchmark: DEBIE benchmark Synthetic benchmarks: MRTC benchmark suite Comparison among different approaches Optimal mapping Our mapping Mapping w/o L2 cache modeling Enumerate all Task mappings ILP formulation Enumerate all Task mappings Evaluate WCRT with L2 cache modeling Solve ILP problem Evaluate WCRT w/o L2 cache modeling Select the best mapping Task mapping released Select the best mapping Iterative analysis framework

22 WCRT (million cycles) DEBIE Benchmark Results Optimal Our approach w/o L2 cache modeling L1:1KB L2:16KB Configuration: 4-core

23 Relative WCRT Synthetic Benchmarks Results Task graph 1 Optimal Our approach w/o L2 cache modeling Task graph 2 Task Task Task Task Task graph 3 graph 4 graph 5 graph 6 graph 7 Configurations: 4-core, L1:512B, L2:4KB Task graph 8 Task graph 9

24 Runtime for Synthetic Benchmarks Task graph # of tasks Our approach (seconds) Optimal (seconds) , ,794.00

25 Conclusions Tasking mapping has a great impact on WCRT An ILP formulation approach that is aware of the shared cache conflict Our approach not only considers the workload balance but also minimizes inter-core cache conflict Significant reduction in WCRT compared to traditional approach agnostic to shared cache conflicts

26 Q & A

Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern

Sireesha R Basavaraju Embedded Systems Group, Technical University of Kaiserslautern Introduction WCET of program ILP Formulation Requirement SPM allocation for code SPM allocation for data Conclusion