Improved Analysis and Evaluation of Real-Time Semaphore Protocols for P-FP Scheduling

Size: px

Start display at page:

Download "Improved Analysis and Evaluation of Real-Time Semaphore Protocols for P-FP Scheduling"

Audrey Parsons
6 years ago
Views:

1 Improved Analysis and Evaluation of Real-Time Semaphore Protocols for P-FP Scheduling RTAS 13 April 10, 2013 Björn B.

2 Semaphores + P-FP Scheduling Used in practice: VxWorks, QNX, ThreadX, Real-Time Linux variants, Binary Semaphores J 1 J 2 J 3 J 4 Q4 Q3 Q2 Q1 in POSIX Core 1 Core 2 Core 3 Core 4 pthread_mutex_lock( ) // critical section pthread_mutex_unlock( ) L2 Cache Main Memory L2 Cache binary semaphore A blocked task suspends & yields processor. Partitioned Fixed-Priority (P-FP) scheduling Tasks statically assigned to cores. 2

3 How to Implement Semaphores? How to order conflicting critical sections? FIFO vs. Priority Queues

4 How to Implement Semaphores? How to order conflicting critical sections? FIFO vs. Priority Queues Where to execute critical sections? Shared-Memory vs. Distributed Locking Protocols 4

5 Example: Shared-Memory Protocol CPU 2 CPU 1 Task A critical section CPU 3 time 5

6 Example: Shared-Memory Protocol CPU 1 (CPU 1 not affected by critical sections on other CPUs) CPU 2 Task A critical section CPU 3 Task B task suspended critical section time lock requested lock acquired lock released 6

7 Example: Distributed Protocol CPU 2 CPU 1 Task A request issued CPU 3 time 7

8 Example: Distributed Protocol CPU 1 resource agent inactive critical section A CPU 2 Task A task suspended request issued response received CPU 3 time 8

9 Example: Distributed Protocol CPU 1 resource agent inactive critical section A critical section B resource agent inactive CPU 2 Task A task suspended request issued response received CPU 3 Task B task suspended time request issued response received 9

10 Semaphore Protocol Choices How to order conflicting critical sections? MPCP (Rajkumar, 1990; Lakshmanan et al., 2009) FMLP+ (, 2011) DPCP (Rajkumar et al., 1988) Where to execute critical sections? DFLP (, 2012) Wait Queue priority queue FIFO queue priority queue FIFO queue Protocol Type shared-memory shared-memory distributed distributed Asymptotically optimal? (w.r.t. maximum blocking) NO YES NO YES

11 Multiprocessor Priority Ceiling Protocol Semaphore Protocol Choices MPCP (Rajkumar, 1990; Lakshmanan et al., 2009) FMLP+ (, 2011) DPCP (Rajkumar et al., 1988) DFLP (, 2012) Wait Queue priority queue FIFO queue priority queue FIFO queue Protocol Type shared-memory shared-memory distributed distributed Asymptotically optimal? (w.r.t. maximum blocking) NO YES NO YES

12 FIFO Multiprocessor Locking Protocol Semaphore Protocol Choices MPCP (Rajkumar, 1990; Lakshmanan et al., 2009) FMLP+ (, 2011) DPCP (Rajkumar et al., 1988) DFLP (, 2012) Wait Queue priority queue FIFO queue priority queue FIFO queue Protocol Type shared-memory shared-memory distributed distributed Asymptotically optimal? (w.r.t. maximum blocking) NO YES NO YES

13 Semaphore Protocol Choices MPCP (Rajkumar, 1990; Lakshmanan et al., 2009) FMLP+ (, 2011) DPCP (Rajkumar et al., 1988) DFLP (, 2012) Wait Queue priority queue FIFO queue priority queue FIFO queue Protocol Type shared-memory shared-memory distributed distributed Asymptotically optimal? (w.r.t. maximum blocking) NO YES NO YES

14 Distributed Priority Ceiling Protocol Semaphore Protocol Choices MPCP (Rajkumar, 1990; Lakshmanan et al., 2009) FMLP+ (, 2011) DPCP (Rajkumar et al., 1988) DFLP (, 2012) Wait Queue priority queue FIFO queue priority queue FIFO queue Protocol Type shared-memory shared-memory distributed distributed Asymptotically optimal? (w.r.t. maximum blocking) NO YES NO YES

15 Semaphore Protocol Choices Distributed FIFO Locking Protocol MPCP (Rajkumar, 1990; Lakshmanan et al., 2009) FMLP+ (, 2011) DPCP (Rajkumar et al., 1988) DFLP (, 2012) Wait Queue priority queue FIFO queue priority queue FIFO queue Protocol Type shared-memory shared-memory distributed distributed Asymptotically optimal? (w.r.t. maximum blocking) NO YES NO YES

16 Asymptotically FIFO queues offer lower maximum blocking. Semaphore Protocol Choices But: constants matter MPCP (Rajkumar, 1990; Lakshmanan et al., 2009) FMLP+ (, 2011) DPCP (Rajkumar et al., 1988) DFLP (, 2012) Wait Queue priority queue FIFO queue priority queue FIFO queue Protocol Type shared-memory shared-memory distributed distributed Asymptotically optimal? (w.r.t. maximum blocking) NO YES NO YES

17 Part 1 Improved Analysis 17

18 Non-Asymptotic, Fine-Grained Analysis Derive tightest possible bound reflecting all constant factors. A A conflicting critical sections B B B B pending interval (vulnerable to contention) job under analysis time 18

19 Exploit activation frequency. Don t overestimate worst-case contention. Non-Asymptotic, Fine-Grained Analysis Derive tightest possible bound reflecting all constant factors. A A conflicting critical sections B B B B pending interval (vulnerable to contention) job under analysis time 19

20 Exploit activation frequency. Don t overestimate worst-case contention. Non-Asymptotic, Fine-Grained Analysis Derive tightest possible bound reflecting all constant factors. A A conflicting critical sections B B B B pending interval (vulnerable to contention) job under analysis time Exploit per-task maximum critical section lengths. Don t overestimate worst-case duration of lock unavailability. 20

Exploit activation frequency. Don t overestimate worst-case contention. Non-Asymptotic, Fine-Grained Analysis Derive tightest possible bound reflecting all constant factors.

21 Exploit activation frequency. Don t overestimate worst-case contention. Non-Asymptotic, Fine-Grained Analysis Derive tightest possible bound reflecting all constant factors. A A conflicting critical sections B B B B pending interval (vulnerable to contention) Exploit protocol-specific properties. E.g., strong progress in FIFO queues. job under analysis time Exploit per-task maximum critical section lengths. Don t overestimate worst-case duration of lock unavailability. 21

22 Non-Asymptotic, Fine-Grained Analysis Derive tightest possible bound reflecting all constant factors. The key problem: A A conflicting critical sections ease of exposition B B B B vs. pending interval (vulnerable to contention) pessimism! job under analysis time 22

23 Declarative Fine-Grained Analysis Concise. Easy to write, easy to read, easy to check. Not inherently pessimistic. Compositional: analyst should not have to reason about protocol as a whole. Easy to implement. Sound by construction. Not based on ad-hoc formalism. 23

24 Declarative Fine-Grained Analysis Concise. Easy to write, easy to read, easy to check. Approach Not inherently pessimistic. Compositional: analyst should not have Use linear programming to derive to reason about protocol as a whole. task-set-specific blocking bounds. Easy to implement. (not integer linear programming!) Sound by construction. Not based on ad-hoc formalism. 24

25 Blocking Fractions with regard to a fixed schedule (i.e., one particular execution) For the v th concurrent critical section of conflicting task Tx w.r.t. resource q: Xx,q,v = actual amount of blocking caused maximum critical section length w.r.t. q 0 Xx,q,v 1 25

26 Blocking Fraction Example suppose maximum critical section length of task Tx is 3 time units CPU 2 TX critical section CPU 1 Ti task blocked critical section time

27 Blocking Fraction Example suppose maximum critical section length of task Tx is 3 time units In this particular schedule actual blocking = 1 time unit CPU 2 TX blocking fraction = actual / max = 1/3 critical section CPU 1 Ti task blocked critical section time

28 Total Blocking in a Fixed Schedule total blocking incurred by one job = Xx,q,v maximum critical section length w.r.t. q each resource q each task Tx each CS v Xx,q,v = actual amount of blocking caused maximum critical section length w.r.t. q 28

Total Blocking in a Fixed Schedule Actual amount of blocking

All potentially concurrent critical sections.

29 Total Blocking in a Fixed Schedule Actual amount of blocking incurred (may be zero). total blocking incurred by one job = Xx,q,v maximum critical section length w.r.t. q each resource q each task Tx each CS v All potentially concurrent critical sections. (No cleverness and hence no errors involved!) Xx,q,v = actual amount of blocking caused maximum critical section length w.r.t. q 29

30 Linear combination of all blocking fractions! Total Blocking in a Fixed Schedule Use this as the objective function of a linear program (LP). total blocking incurred by one job = Xx,q,v maximum critical section length w.r.t. q each each each resource q task Tx CS v Xx,q,v = actual amount of blocking caused maximum critical section length w.r.t. q 30

31 From a Fixed to All Possible Schedules maximize Xx,q,v maximum critical section length w.r.t. q each resource q each task Tx each CS v subject to $WORKLOAD-CONSTRAINTS $PROTOCOL-CONSTRAINTS 31

32 From a Fixed to All Possible Schedules maximize Find worst-case blocking across all possible schedules. Xx,q,v maximum critical section length w.r.t. q each resource q each task Tx each CS v Rule out impossible schedules. subject to $WORKLOAD-CONSTRAINTS $PROTOCOL-CONSTRAINTS 32

33 Example FIFO Constraint rule out impossible schedules not compliant with FIFO ordering Constraint 12. In any schedule of under the FMLP + : 8`q : 8T x 2 i : N i Xx,q v=1 X D x,q,v apple N i,q. 33

34 Example FIFO Constraint rule out impossible schedules not compliant with FIFO ordering For each resource q and each conflicting task Tx Constraint 12. In any schedule of under the FMLP + : 8`q : 8T x 2 i : N i Xx,q v=1 X D x,q,v apple N i,q. 34

35 Example FIFO Constraint rule out impossible schedules not compliant with FIFO ordering For each resource q and each conflicting task Tx all possibly concurrent critical sections v Constraint 12. In any schedule of under the FMLP + : 8`q : 8T x 2 i : N i Xx,q v=1 X D x,q,v apple N i,q. 35

36 Example FIFO Constraint rule out impossible schedules not compliant with FIFO ordering For each resource q and each conflicting task Tx all possibly concurrent critical sections v Constraint 12. In any schedule of under the FMLP + : 8`q : 8T x 2 i : N i Xx,q v=1 X D x,q,v apple N i,q. the sum of all blocking fractions cannot exceed the number of requests for the resource issued by task Ti (the task under analysis). 36

In any schedule of under the FMLP + : 8`q : 8T x 2 i : N i Xx,q v=1 X D

$fractions of one task Tx exceeds the number of requests issued.$

37 Example FIFO Constraint rule out impossible schedules not compliant with FIFO ordering Constraint 12. In any schedule of under the FMLP + : 8`q : 8T x 2 i : N i Xx,q v=1 X D x,q,v apple N i,q. Suppose not: then there exists a schedule in which the sum of the blocking fractions of one task Tx exceeds the number of requests issued. Thus one request of Ti must have been blocked by at least two requests of Tx. This is impossible in a FIFO queue. 37

38 Advantages Constraint 12. In any schedule of under the FMLP + : 8`q : 8T x 2 i : N i Xx,q v=1 X D x,q,v apple N i,q. Powerful analysis technique compositional: LP solver combines constraints flexible: can handle many protocols declarative: much easier to read and check Accuracy Never counts a request twice. Much less pessimistic 38

39 Improved Analysis Accuracy fraction of schedulable task sets (out of 1000) higher is better higher task count = less idle time and more contention Fig. 3. Comparison of schedulability under the FMLP, the MPCP with 39

40 Improved Analysis Accuracy 16 cores, 16 shared resources, max. 5 critical sections per task and resource, 10µs-50µs CS length, each task accesses a given resource with probability 0.1 Fig. 3. Comparison of schedulability under the FMLP, the MPCP with 40

41 Improved Analysis Accuracy 16 cores, 16 shared resources, max. 5 critical sections per task and resource, 10µs-50µs CS length, each task accesses a given resource with probability 0.1 MPCP (with prior analysis) FMLP+ (with new analysis) MPCP (with new analysis) Fig. 3. Comparison of schedulability under the FMLP, the MPCP with 41

42 Improved Analysis Accuracy Fig cores, 16 shared resources, max. 5 critical sections per task and resource, 10µs-50µs CS length, each task accesses a given resource with probability 0.1 Comparison of schedulability under the FMLP, the MPCP with New analysis roughly doubled number of supported tasks! And: offers new observations on MPCP / FMLP + relative performance. 42

43 Part 2 Improved Evaluation 43

44 Evaluated Protocols MPCP (Rajkumar, 1990; Lakshmanan et al., 2009) FMLP+ (, 2011) DPCP (Rajkumar et al., 1988) DFLP (, 2012) Wait Queue priority queue FIFO queue priority queue FIFO queue Protocol Type shared-memory shared-memory distributed distributed Asymptotically optimal? (w.r.t. maximum blocking) NO YES NO YES

45 Distributed locking protocols require cross-core interaction. Evaluated Protocols Overheads matter MPCP (Rajkumar, 1990; Lakshmanan et al., 2009) FMLP+ (, 2011) DPCP (Rajkumar et al., 1988) DFLP (, 2012) Wait Queue priority queue FIFO queue priority queue FIFO queue Protocol Type shared-memory shared-memory distributed distributed Asymptotically optimal? (w.r.t. maximum blocking) NO YES NO YES

46 Taking Overheads Into Account Platform Linux Testbed for Multiprocessor Scheduling in Real-Time Systems 8-core / 16-core 2.0 GHz Intel Xeon X7550 Overhead Tracing > 20h of real-time exeuction > 400 GB of trace data > 15 billion valid samples Statistics and details in online appendix. 46

47 Example: Context Switch Overhead P-FP: measured context-switch overhead (host=nanping-16) min=0.26us max=40.99us avg=2.72us median=2.15us number of samples 1e+09 1e+08 1e+07 1e samples: total= [IQR filter not applied] overhead in microseconds (bin size = 0.50us) max = 40.99µs avg = 2.72µs med = 2.15µs 47

48 number of samples Example: Context Switch Overhead 1e+09 1e+08 1e+07 1e Overhead Experiments: P-FP: measured context-switch overhead (host=nanping-16) min=0.26us max=40.99us avg=2.72us median=2.15us No statistical outlier filtering was applied. samples: total= [IQR filter not applied] overhead in microseconds (bin size = 0.50us) max = 40.99µs avg = 2.72µs med = 2.15µs 48

49 Overhead-Aware Schedulability fraction of schedulable task sets (out of 1000) higher is better higher task count = less idle time and more contention 49

50 Overhead-Aware Schedulability 8 cores, 8 shared resources, max. 5 critical sections per task and resource, 10µs-50µs CS length, each task accesses a given resource with probability

51 Distributed Protocols Perform Well Distributed Protocols 8 cores, 8 shared resources, max. 5 critical sections per task and resource, 10µs-50µs CS length, each task accesses a given resource with probability

52 FIFO Protocols Perform Well 8 cores, 8 shared resources, max. 5 critical sections per task and resource, 10µs-50µs CS length, each task accesses a given resource with probability 0.3 FIFO Protocols 52

53 Choice of Protocol Matters Blocking is a significant bottleneck. w.r.t. schedulability. 8 cores, 8 shared resources, max. 5 critical sections per task and resource, In this example: DFLP can host >20% more tasks than MPCP. 10µs-50µs CS length, each task accesses a given resource with probability

54 What is the best protocol? All results available online (>6,000 plots). 54

55 What is the best protocol? There is no single best protocol! (w.r.t. schedulability) Results are highly workload-dependent! 55

56 What is the best protocol? How to order conflicting critical sections? FIFO works well, but priority queues needed for highly heterogenous timing parameters. Where to execute critical sections? Distributed protocols very competitive for many resources with high contention; shared-memory protocols better for few resources. 56

57 Summary & Outlook Contributions There is no single best protocol yet. Distributed protocols perform surprisingly well. Use linear programs to analyze blocking. Future Work Generalized protocol that always works. Support clustered scheduling. Analyze spin locks with LPs. Extend LPs to handle nesting. 57

Thanks! Linux Testbed for Multiprocessor Scheduling in Real-Time Systems www.litmus-rt.

58 Thanks! Linux Testbed for Multiprocessor Scheduling in Real-Time Systems SchedCAT Schedulability test Collection And Toolkit 58

A Fully Preemptive Multiprocessor Semaphore Protocol for Latency-Sensitive Real-Time Applications

A Fully Preemptive Multiprocessor Semaphore Protocol for Latency-Sensitive Real-Time Applications ECRTS 13 July 12, 2013 Björn B. bbb@mpi-sws.org A Rhetorical Question On uniprocessors, why do we use the