Improved Analysis and Evaluation of Real-Time Semaphore Protocols for P-FP Scheduling
|
|
- Audrey Parsons
- 6 years ago
- Views:
Transcription
1 Improved Analysis and Evaluation of Real-Time Semaphore Protocols for P-FP Scheduling RTAS 13 April 10, 2013 Björn B.
2 Semaphores + P-FP Scheduling Used in practice: VxWorks, QNX, ThreadX, Real-Time Linux variants, Binary Semaphores J 1 J 2 J 3 J 4 Q4 Q3 Q2 Q1 in POSIX Core 1 Core 2 Core 3 Core 4 pthread_mutex_lock( ) // critical section pthread_mutex_unlock( ) L2 Cache Main Memory L2 Cache binary semaphore A blocked task suspends & yields processor. Partitioned Fixed-Priority (P-FP) scheduling Tasks statically assigned to cores. 2
3 How to Implement Semaphores? How to order conflicting critical sections? FIFO vs. Priority Queues
4 How to Implement Semaphores? How to order conflicting critical sections? FIFO vs. Priority Queues Where to execute critical sections? Shared-Memory vs. Distributed Locking Protocols 4
5 Example: Shared-Memory Protocol CPU 2 CPU 1 Task A critical section CPU 3 time 5
6 Example: Shared-Memory Protocol CPU 1 (CPU 1 not affected by critical sections on other CPUs) CPU 2 Task A critical section CPU 3 Task B task suspended critical section time lock requested lock acquired lock released 6
7 Example: Distributed Protocol CPU 2 CPU 1 Task A request issued CPU 3 time 7
8 Example: Distributed Protocol CPU 1 resource agent inactive critical section A CPU 2 Task A task suspended request issued response received CPU 3 time 8
9 Example: Distributed Protocol CPU 1 resource agent inactive critical section A critical section B resource agent inactive CPU 2 Task A task suspended request issued response received CPU 3 Task B task suspended time request issued response received 9
10 Semaphore Protocol Choices How to order conflicting critical sections? MPCP (Rajkumar, 1990; Lakshmanan et al., 2009) FMLP+ (, 2011) DPCP (Rajkumar et al., 1988) Where to execute critical sections? DFLP (, 2012) Wait Queue priority queue FIFO queue priority queue FIFO queue Protocol Type shared-memory shared-memory distributed distributed Asymptotically optimal? (w.r.t. maximum blocking) NO YES NO YES
11 Multiprocessor Priority Ceiling Protocol Semaphore Protocol Choices MPCP (Rajkumar, 1990; Lakshmanan et al., 2009) FMLP+ (, 2011) DPCP (Rajkumar et al., 1988) DFLP (, 2012) Wait Queue priority queue FIFO queue priority queue FIFO queue Protocol Type shared-memory shared-memory distributed distributed Asymptotically optimal? (w.r.t. maximum blocking) NO YES NO YES
12 FIFO Multiprocessor Locking Protocol Semaphore Protocol Choices MPCP (Rajkumar, 1990; Lakshmanan et al., 2009) FMLP+ (, 2011) DPCP (Rajkumar et al., 1988) DFLP (, 2012) Wait Queue priority queue FIFO queue priority queue FIFO queue Protocol Type shared-memory shared-memory distributed distributed Asymptotically optimal? (w.r.t. maximum blocking) NO YES NO YES
13 Semaphore Protocol Choices MPCP (Rajkumar, 1990; Lakshmanan et al., 2009) FMLP+ (, 2011) DPCP (Rajkumar et al., 1988) DFLP (, 2012) Wait Queue priority queue FIFO queue priority queue FIFO queue Protocol Type shared-memory shared-memory distributed distributed Asymptotically optimal? (w.r.t. maximum blocking) NO YES NO YES
14 Distributed Priority Ceiling Protocol Semaphore Protocol Choices MPCP (Rajkumar, 1990; Lakshmanan et al., 2009) FMLP+ (, 2011) DPCP (Rajkumar et al., 1988) DFLP (, 2012) Wait Queue priority queue FIFO queue priority queue FIFO queue Protocol Type shared-memory shared-memory distributed distributed Asymptotically optimal? (w.r.t. maximum blocking) NO YES NO YES
15 Semaphore Protocol Choices Distributed FIFO Locking Protocol MPCP (Rajkumar, 1990; Lakshmanan et al., 2009) FMLP+ (, 2011) DPCP (Rajkumar et al., 1988) DFLP (, 2012) Wait Queue priority queue FIFO queue priority queue FIFO queue Protocol Type shared-memory shared-memory distributed distributed Asymptotically optimal? (w.r.t. maximum blocking) NO YES NO YES
16 Asymptotically FIFO queues offer lower maximum blocking. Semaphore Protocol Choices But: constants matter MPCP (Rajkumar, 1990; Lakshmanan et al., 2009) FMLP+ (, 2011) DPCP (Rajkumar et al., 1988) DFLP (, 2012) Wait Queue priority queue FIFO queue priority queue FIFO queue Protocol Type shared-memory shared-memory distributed distributed Asymptotically optimal? (w.r.t. maximum blocking) NO YES NO YES
17 Part 1 Improved Analysis 17
18 Non-Asymptotic, Fine-Grained Analysis Derive tightest possible bound reflecting all constant factors. A A conflicting critical sections B B B B pending interval (vulnerable to contention) job under analysis time 18
19 Exploit activation frequency. Don t overestimate worst-case contention. Non-Asymptotic, Fine-Grained Analysis Derive tightest possible bound reflecting all constant factors. A A conflicting critical sections B B B B pending interval (vulnerable to contention) job under analysis time 19
20 Exploit activation frequency. Don t overestimate worst-case contention. Non-Asymptotic, Fine-Grained Analysis Derive tightest possible bound reflecting all constant factors. A A conflicting critical sections B B B B pending interval (vulnerable to contention) job under analysis time Exploit per-task maximum critical section lengths. Don t overestimate worst-case duration of lock unavailability. 20
21 Exploit activation frequency. Don t overestimate worst-case contention. Non-Asymptotic, Fine-Grained Analysis Derive tightest possible bound reflecting all constant factors. A A conflicting critical sections B B B B pending interval (vulnerable to contention) Exploit protocol-specific properties. E.g., strong progress in FIFO queues. job under analysis time Exploit per-task maximum critical section lengths. Don t overestimate worst-case duration of lock unavailability. 21
22 Non-Asymptotic, Fine-Grained Analysis Derive tightest possible bound reflecting all constant factors. The key problem: A A conflicting critical sections ease of exposition B B B B vs. pending interval (vulnerable to contention) pessimism! job under analysis time 22
23 Declarative Fine-Grained Analysis Concise. Easy to write, easy to read, easy to check. Not inherently pessimistic. Compositional: analyst should not have to reason about protocol as a whole. Easy to implement. Sound by construction. Not based on ad-hoc formalism. 23
24 Declarative Fine-Grained Analysis Concise. Easy to write, easy to read, easy to check. Approach Not inherently pessimistic. Compositional: analyst should not have Use linear programming to derive to reason about protocol as a whole. task-set-specific blocking bounds. Easy to implement. (not integer linear programming!) Sound by construction. Not based on ad-hoc formalism. 24
25 Blocking Fractions with regard to a fixed schedule (i.e., one particular execution) For the v th concurrent critical section of conflicting task Tx w.r.t. resource q: Xx,q,v = actual amount of blocking caused maximum critical section length w.r.t. q 0 Xx,q,v 1 25
26 Blocking Fraction Example suppose maximum critical section length of task Tx is 3 time units CPU 2 TX critical section CPU 1 Ti task blocked critical section time
27 Blocking Fraction Example suppose maximum critical section length of task Tx is 3 time units In this particular schedule actual blocking = 1 time unit CPU 2 TX blocking fraction = actual / max = 1/3 critical section CPU 1 Ti task blocked critical section time
28 Total Blocking in a Fixed Schedule total blocking incurred by one job = Xx,q,v maximum critical section length w.r.t. q each resource q each task Tx each CS v Xx,q,v = actual amount of blocking caused maximum critical section length w.r.t. q 28
29 Total Blocking in a Fixed Schedule Actual amount of blocking incurred (may be zero). total blocking incurred by one job = Xx,q,v maximum critical section length w.r.t. q each resource q each task Tx each CS v All potentially concurrent critical sections. (No cleverness and hence no errors involved!) Xx,q,v = actual amount of blocking caused maximum critical section length w.r.t. q 29
30 Linear combination of all blocking fractions! Total Blocking in a Fixed Schedule Use this as the objective function of a linear program (LP). total blocking incurred by one job = Xx,q,v maximum critical section length w.r.t. q each each each resource q task Tx CS v Xx,q,v = actual amount of blocking caused maximum critical section length w.r.t. q 30
31 From a Fixed to All Possible Schedules maximize Xx,q,v maximum critical section length w.r.t. q each resource q each task Tx each CS v subject to $WORKLOAD-CONSTRAINTS $PROTOCOL-CONSTRAINTS 31
32 From a Fixed to All Possible Schedules maximize Find worst-case blocking across all possible schedules. Xx,q,v maximum critical section length w.r.t. q each resource q each task Tx each CS v Rule out impossible schedules. subject to $WORKLOAD-CONSTRAINTS $PROTOCOL-CONSTRAINTS 32
33 Example FIFO Constraint rule out impossible schedules not compliant with FIFO ordering Constraint 12. In any schedule of under the FMLP + : 8`q : 8T x 2 i : N i Xx,q v=1 X D x,q,v apple N i,q. 33
34 Example FIFO Constraint rule out impossible schedules not compliant with FIFO ordering For each resource q and each conflicting task Tx Constraint 12. In any schedule of under the FMLP + : 8`q : 8T x 2 i : N i Xx,q v=1 X D x,q,v apple N i,q. 34
35 Example FIFO Constraint rule out impossible schedules not compliant with FIFO ordering For each resource q and each conflicting task Tx all possibly concurrent critical sections v Constraint 12. In any schedule of under the FMLP + : 8`q : 8T x 2 i : N i Xx,q v=1 X D x,q,v apple N i,q. 35
36 Example FIFO Constraint rule out impossible schedules not compliant with FIFO ordering For each resource q and each conflicting task Tx all possibly concurrent critical sections v Constraint 12. In any schedule of under the FMLP + : 8`q : 8T x 2 i : N i Xx,q v=1 X D x,q,v apple N i,q. the sum of all blocking fractions cannot exceed the number of requests for the resource issued by task Ti (the task under analysis). 36
37 Example FIFO Constraint rule out impossible schedules not compliant with FIFO ordering Constraint 12. In any schedule of under the FMLP + : 8`q : 8T x 2 i : N i Xx,q v=1 X D x,q,v apple N i,q. Suppose not: then there exists a schedule in which the sum of the blocking fractions of one task Tx exceeds the number of requests issued. Thus one request of Ti must have been blocked by at least two requests of Tx. This is impossible in a FIFO queue. 37
38 Advantages Constraint 12. In any schedule of under the FMLP + : 8`q : 8T x 2 i : N i Xx,q v=1 X D x,q,v apple N i,q. Powerful analysis technique compositional: LP solver combines constraints flexible: can handle many protocols declarative: much easier to read and check Accuracy Never counts a request twice. Much less pessimistic 38
39 Improved Analysis Accuracy fraction of schedulable task sets (out of 1000) higher is better higher task count = less idle time and more contention Fig. 3. Comparison of schedulability under the FMLP, the MPCP with 39
40 Improved Analysis Accuracy 16 cores, 16 shared resources, max. 5 critical sections per task and resource, 10µs-50µs CS length, each task accesses a given resource with probability 0.1 Fig. 3. Comparison of schedulability under the FMLP, the MPCP with 40
41 Improved Analysis Accuracy 16 cores, 16 shared resources, max. 5 critical sections per task and resource, 10µs-50µs CS length, each task accesses a given resource with probability 0.1 MPCP (with prior analysis) FMLP+ (with new analysis) MPCP (with new analysis) Fig. 3. Comparison of schedulability under the FMLP, the MPCP with 41
42 Improved Analysis Accuracy Fig cores, 16 shared resources, max. 5 critical sections per task and resource, 10µs-50µs CS length, each task accesses a given resource with probability 0.1 Comparison of schedulability under the FMLP, the MPCP with New analysis roughly doubled number of supported tasks! And: offers new observations on MPCP / FMLP + relative performance. 42
43 Part 2 Improved Evaluation 43
44 Evaluated Protocols MPCP (Rajkumar, 1990; Lakshmanan et al., 2009) FMLP+ (, 2011) DPCP (Rajkumar et al., 1988) DFLP (, 2012) Wait Queue priority queue FIFO queue priority queue FIFO queue Protocol Type shared-memory shared-memory distributed distributed Asymptotically optimal? (w.r.t. maximum blocking) NO YES NO YES
45 Distributed locking protocols require cross-core interaction. Evaluated Protocols Overheads matter MPCP (Rajkumar, 1990; Lakshmanan et al., 2009) FMLP+ (, 2011) DPCP (Rajkumar et al., 1988) DFLP (, 2012) Wait Queue priority queue FIFO queue priority queue FIFO queue Protocol Type shared-memory shared-memory distributed distributed Asymptotically optimal? (w.r.t. maximum blocking) NO YES NO YES
46 Taking Overheads Into Account Platform Linux Testbed for Multiprocessor Scheduling in Real-Time Systems 8-core / 16-core 2.0 GHz Intel Xeon X7550 Overhead Tracing > 20h of real-time exeuction > 400 GB of trace data > 15 billion valid samples Statistics and details in online appendix. 46
47 Example: Context Switch Overhead P-FP: measured context-switch overhead (host=nanping-16) min=0.26us max=40.99us avg=2.72us median=2.15us number of samples 1e+09 1e+08 1e+07 1e samples: total= [IQR filter not applied] overhead in microseconds (bin size = 0.50us) max = 40.99µs avg = 2.72µs med = 2.15µs 47
48 number of samples Example: Context Switch Overhead 1e+09 1e+08 1e+07 1e Overhead Experiments: P-FP: measured context-switch overhead (host=nanping-16) min=0.26us max=40.99us avg=2.72us median=2.15us No statistical outlier filtering was applied. samples: total= [IQR filter not applied] overhead in microseconds (bin size = 0.50us) max = 40.99µs avg = 2.72µs med = 2.15µs 48
49 Overhead-Aware Schedulability fraction of schedulable task sets (out of 1000) higher is better higher task count = less idle time and more contention 49
50 Overhead-Aware Schedulability 8 cores, 8 shared resources, max. 5 critical sections per task and resource, 10µs-50µs CS length, each task accesses a given resource with probability
51 Distributed Protocols Perform Well Distributed Protocols 8 cores, 8 shared resources, max. 5 critical sections per task and resource, 10µs-50µs CS length, each task accesses a given resource with probability
52 FIFO Protocols Perform Well 8 cores, 8 shared resources, max. 5 critical sections per task and resource, 10µs-50µs CS length, each task accesses a given resource with probability 0.3 FIFO Protocols 52
53 Choice of Protocol Matters Blocking is a significant bottleneck. w.r.t. schedulability. 8 cores, 8 shared resources, max. 5 critical sections per task and resource, In this example: DFLP can host >20% more tasks than MPCP. 10µs-50µs CS length, each task accesses a given resource with probability
54 What is the best protocol? All results available online (>6,000 plots). 54
55 What is the best protocol? There is no single best protocol! (w.r.t. schedulability) Results are highly workload-dependent! 55
56 What is the best protocol? How to order conflicting critical sections? FIFO works well, but priority queues needed for highly heterogenous timing parameters. Where to execute critical sections? Distributed protocols very competitive for many resources with high contention; shared-memory protocols better for few resources. 56
57 Summary & Outlook Contributions There is no single best protocol yet. Distributed protocols perform surprisingly well. Use linear programs to analyze blocking. Future Work Generalized protocol that always works. Support clustered scheduling. Analyze spin locks with LPs. Extend LPs to handle nesting. 57
58 Thanks! Linux Testbed for Multiprocessor Scheduling in Real-Time Systems SchedCAT Schedulability test Collection And Toolkit 58
A Fully Preemptive Multiprocessor Semaphore Protocol for Latency-Sensitive Real-Time Applications
A Fully Preemptive Multiprocessor Semaphore Protocol for Latency-Sensitive Real-Time Applications ECRTS 13 July 12, 2013 Björn B. bbb@mpi-sws.org A Rhetorical Question On uniprocessors, why do we use the
More informationBlocking Analysis of FIFO, Unordered, and Priority-Ordered Spin Locks
On Spin Locks in AUTOSAR: Blocking Analysis of FIFO, Unordered, and Priority-Ordered Spin Locks Alexander Wieder and Björn Brandenburg MPI-SWS RTSS 2013 12/04/2013 Vancouver, Canada Motivation: AUTOSAR:
More informationA Comparison of Scheduling Latency in Linux, PREEMPT_RT, and LITMUS RT. Felipe Cerqueira and Björn Brandenburg
A Comparison of Scheduling Latency in Linux, PREEMPT_RT, and LITMUS RT Felipe Cerqueira and Björn Brandenburg July 9th, 2013 1 Linux as a Real-Time OS 2 Linux as a Real-Time OS Optimizing system responsiveness
More informationMultiprocessor Real-Time Locking Protocols: from Homogeneous to Heterogeneous
Multiprocessor Real-Time Locking Protocols: from Homogeneous to Heterogeneous Kecheng Yang Department of Computer Science, University of North Carolina at Chapel Hill Abstract In this project, we focus
More informationA Synchronous IPC Protocol for Predictable Access to Shared Resources in Mixed-Criticality Systems
A Synchronous IPC Protocol for Predictable Access to Shared Resources in Mixed-Criticality Systems RTSS 14 December 4, 2014 Björn B. bbb@mpi-sws.org Linux Testbed for Multiprocessor Scheduling in Real-Time
More informationCarnegie Mellon University. Hard Real-Time Multiprocessor Scheduling (Part II) Marcus Völp
Hard eal-time Multiprocessor Scheduling (Part II) Marcus Völp Lessons Learned from UP two concepts to counteract priority inversion priority inheritance priority ceiling priority ceiling priority inheritance
More informationReal-Time Resource-Sharing under Clustered Scheduling: Mutex, Reader-Writer, and k-exclusion Locks
Real-Time Resource-Sharing under Clustered Scheduling: Mutex, Reader-Writer, and k-exclusion Locks Björn B. Brandenburg Department of Computer Science University of North Carolina at Chapel Hill bbb@cs.unc.edu
More informationResource Sharing among Prioritized Real-Time Applications on Multiprocessors
Resource Sharing among Prioritized Real-Time Applications on Multiprocessors Sara Afshar, Nima Khalilzad, Farhang Nemati, Thomas Nolte Email: sara.afshar@mdh.se MRTC, Mälardalen University, Sweden ABSTRACT
More informationOn the Complexity of Worst-Case Blocking Analysis of Nested Critical Sections
On the Complexity of Worst-Case Blocking Analysis of Nested Critical Sections Alexander Wieder Björn B. Brandenburg Max Planck Institute for Software Systems (MPI-SWS) Abstract Accurately bounding the
More informationSchedulability Analysis of the Linux Push and Pull Scheduler with Arbitrary Processor Affinities
Schedulability Analysis of the Linux Push and Pull Scheduler with Arbitrary Processor Affinities Arpan Gujarati, Felipe Cerqueira, and Björn Brandenburg Multiprocessor real-time scheduling theory Global
More informationA Comparison of the M-PCP, D-PCP, and FMLP on LITMUS RT
A Comparison of the M-PCP, D-PCP, and FMLP on LITMUS RT Björn B. Brandenburg and James H. Anderson The University of North Carolina at Chapel Hill Dept. of Computer Science Chapel Hill, NC 27599-375 USA
More informationFast on Average, Predictable in the Worst Case
Fast on Average, Predictable in the Worst Case Exploring Real-Time Futexes in LITMUSRT Roy Spliet, Manohar Vanga, Björn Brandenburg, Sven Dziadek IEEE Real-Time Systems Symposium 2014 December 2-5, 2014
More informationNested Multiprocessor Real-Time Locking with Improved Blocking
Nested Multiprocessor Real-Time Locking with Improved Blocking Bryan C. Ward and James H. Anderson Department of Computer Science, University of North Carolina at Chapel Hill Abstract Existing multiprocessor
More informationMultiprocessor Synchronization and Hierarchical Scheduling
Multiprocessor Synchronization and Hierarchical Scheduling Farhang Nemati, Moris Behnam, Thomas Nolte Mälardalen Real-Time Research Centre P.O. Box 883, SE-721 23 Västerås, Sweden farhang.nemati@mdh.se
More informationREPORT DOCUMENTATION PAGE Form Approved OMB NO
REPORT DOCUMENTATION PAGE Form Approved OMB NO. 0704-0188 The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,
More informationCSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)
CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable) Past & Present Have looked at two constraints: Mutual exclusion constraint between two events is a requirement that
More informationA Server-based Approach for Predictable GPU Access Control
A Server-based Approach for Predictable GPU Access Control Hyoseung Kim * Pratyush Patel Shige Wang Raj Rajkumar * University of California, Riverside Carnegie Mellon University General Motors R&D Benefits
More informationReal-Time Resource-Sharing under Clustered Scheduling: Mutex, Reader-Writer, and k-exclusion Locks
Real-Time Resource-Sharing under Clustered Scheduling: Mutex, Reader-Writer, and k-exclusion Locks Björn B. Brandenburg and James H. Anderson Department of Computer Science, University of North Carolina
More informationFixed-Priority Multiprocessor Scheduling
Fixed-Priority Multiprocessor Scheduling Real-time Systems N periodic tasks (of different rates/periods) r i T i C i T i C C J i Ji i ij i r i r i r i Utilization/workload: How to schedule the jobs to
More informationReplica-Request Priority Donation: A Real-Time Progress Mechanism for Global Locking Protocols
Replica-Request Priority Donation: A Real-Time Progress Mechanism for Global Locking Protocols Bryan C. Ward, Glenn A. Elliott, and James H. Anderson Department of Computer Science, University of North
More informationProcesses and Threads. Processes: Review
Processes and Threads Processes and their scheduling Threads and scheduling Multiprocessor scheduling Distributed Scheduling/migration Lecture 3, page 1 Processes: Review Multiprogramming versus multiprocessing
More informationMulti-Resource Real-Time Reader/Writer Locks for Multiprocessors
Multi-Resource Real-Time Reader/Writer Locks for Multiprocessors Bryan C. Ward and James H. Anderson Dept. of Computer Science, The University of North Carolina at Chapel Hill Abstract A fine-grained locking
More informationReal-Time Synchronization on Multiprocessors: To Block or Not to Block, to Suspend or Spin?
Real-Time Synchronization on Multiprocessors: To lock or Not to lock, to Suspend or Spin? jörn. randenburg, John Calandrino, Aaron lock, Hennadiy Leontyev, and James H. Anderson Department of Computer
More informationA Server-based Approach for Predictable GPU Access with Improved Analysis
A Server-based Approach for Predictable GPU Access with Improved Analysis Hyoseung Kim 1, Pratyush Patel 2, Shige Wang 3, and Ragunathan (Raj) Rajkumar 4 1 University of California, Riverside, hyoseung@ucr.edu
More informationPerf-Method-06.PRZ The Swami's Performance Methodology Ideas Dan Janda The Swami of VSAM The Swami of VSE/VSAM Dan Janda VSE, VSAM and CICS Performanc
The Swami's Performance Methodology Ideas Dan Janda The Swami of VSAM The Swami of VSE/VSAM Dan Janda VSE, VSAM and CICS Performance Consultant RR 2 Box 49E Hamlin Road Montrose, PA 18801-9624 World Alliance
More informationExtending SLURM with Support for GPU Ranges
Available on-line at www.prace-ri.eu Partnership for Advanced Computing in Europe Extending SLURM with Support for GPU Ranges Seren Soner a, Can Özturana,, Itir Karac a a Computer Engineering Department,
More informationSupporting Nested Locking in Multiprocessor Real-Time Systems
Supporting Nested Locking in Multiprocessor Real-Time Systems Bryan C. Ward and James H. Anderson Department of Computer Science, University of North Carolina at Chapel Hill Abstract This paper presents
More informationFixed-Priority Multiprocessor Scheduling. Real-time Systems. N periodic tasks (of different rates/periods) i Ji C J. 2 i. ij 3
0//0 Fixed-Priority Multiprocessor Scheduling Real-time Systems N periodic tasks (of different rates/periods) r i T i C i T i C C J i Ji i ij i r i r i r i Utilization/workload: How to schedule the jobs
More informationAdaptive Clustered EDF in LITMUS RT
Adaptive Clustered EDF in LITMUS RT Aaron Block,. Sherman, Texas William Kelley, BAE Systems. Ft. Worth, Texas Adaptable System: Whisper Whisper is a motion tracking system Speakers are placed on users
More informationParalleX. A Cure for Scaling Impaired Parallel Applications. Hartmut Kaiser
ParalleX A Cure for Scaling Impaired Parallel Applications Hartmut Kaiser (hkaiser@cct.lsu.edu) 2 Tianhe-1A 2.566 Petaflops Rmax Heterogeneous Architecture: 14,336 Intel Xeon CPUs 7,168 Nvidia Tesla M2050
More informationTales of the Tail Hardware, OS, and Application-level Sources of Tail Latency
Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports and Steven D. Gribble February 2, 2015 1 Introduction What is Tail Latency? What
More informationMultiprocessor and Real-Time Scheduling. Chapter 10
Multiprocessor and Real-Time Scheduling Chapter 10 1 Roadmap Multiprocessor Scheduling Real-Time Scheduling Linux Scheduling Unix SVR4 Scheduling Windows Scheduling Classifications of Multiprocessor Systems
More informationReal-Time & Embedded Operating Systems
Real-Time & Embedded Operating Systems VO Embedded Systems Engineering (Astrit ADEMAJ) Real-Time Operating Systems Scheduling Embedded Operating Systems Power Consumption Embedded Real-Time Operating Systems
More informationAn FPGA Implementation of Wait-Free Data Synchronization Protocols
An FPGA Implementation of Wait-Free Data Synchronization Protocols Benjamin Nahill 1, Ari Ramdial 1, Haibo Zeng 1, Marco Di Natale 2, Zeljko Zilic 1 1 McGill University, email: {benjaminnahill, ariramdial}@mailmcgillca,
More informationLecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter
Lecture Topics Today: Advanced Scheduling (Stallings, chapter 10.1-10.4) Next: Deadlock (Stallings, chapter 6.1-6.6) 1 Announcements Exam #2 returned today Self-Study Exercise #10 Project #8 (due 11/16)
More informationGlobal Scheduling in Multiprocessor Real-Time Systems
Global Scheduling in Multiprocessor Real-Time Systems Alessandra Melani 1 Global vs Partitioned scheduling Single shared queue instead of multiple dedicated queues Global scheduling Partitioned scheduling
More informationArachne. Core Aware Thread Management Henry Qin Jacqueline Speiser John Ousterhout
Arachne Core Aware Thread Management Henry Qin Jacqueline Speiser John Ousterhout Granular Computing Platform Zaharia Winstein Levis Applications Kozyrakis Cluster Scheduling Ousterhout Low-Latency RPC
More informationLecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)
Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized for most of the time by doubling
More informationA Server-based Approach for Predictable GPU Access Control
A Server-based Approach for Predictable GPU Access Control Hyoseung Kim 1, Pratyush Patel 2, Shige Wang 3, and Ragunathan (Raj) Rajkumar 2 1 University of California, Riverside 2 Carnegie Mellon University
More informationScaling Global Scheduling with Message Passing
Scaling Global Scheduling with Message Passing Felipe Cerqueira Manohar Vanga Björn B. Brandenburg Max Planck Institute for Software Systems (MPI-SWS) Abstract Global real-time schedulers have earned the
More informationSupporting Nested Locking in Multiprocessor Real-Time Systems
Supporting Nested Locking in Multiprocessor Real-Time Systems Bryan C. Ward and James H. Anderson Department of Computer Science, University of North Carolina at Chapel Hill Abstract This paper presents
More informationCourse Syllabus. Operating Systems
Course Syllabus. Introduction - History; Views; Concepts; Structure 2. Process Management - Processes; State + Resources; Threads; Unix implementation of Processes 3. Scheduling Paradigms; Unix; Modeling
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More informationThreads. Raju Pandey Department of Computer Sciences University of California, Davis Spring 2011
Threads Raju Pandey Department of Computer Sciences University of California, Davis Spring 2011 Threads Effectiveness of parallel computing depends on the performance of the primitives used to express
More informationA Blocking Bound for Nested FIFO Spin Locks
locking ound for Nested FIFO Spin Locks lessandro iondi, jörn. randenburg and lexander Wieder* * * * MPI-SWS, Kaiserslautern, Germany Scuola Superiore Sant nna, Pisa, Italy 2 THIS TLK 1 nalysis of nested
More informationTo Everyone... iii To Educators... v To Students... vi Acknowledgments... vii Final Words... ix References... x. 1 ADialogueontheBook 1
Contents To Everyone.............................. iii To Educators.............................. v To Students............................... vi Acknowledgments........................... vii Final Words..............................
More informationData Processing on Modern Hardware
Data Processing on Modern Hardware Jens Teubner, TU Dortmund, DBIS Group jens.teubner@cs.tu-dortmund.de Summer 2014 c Jens Teubner Data Processing on Modern Hardware Summer 2014 1 Part V Execution on Multiple
More informationBachelor Thesis. Analyzing and Implementing Resource Synchronization Protocols on RTEMS in Multi-core Systems. Nadzeya Liabiodka Juni, 2017
Bachelor Thesis Analyzing and Implementing Resource Synchronization Protocols on RTEMS in Multi-core Systems Nadzeya Liabiodka Juni, 2017 Advisors: Prof. Dr. Jian-Jia Chen M.Sc. Kuan-Hsun Chen Faculty
More informationACM SOSP 99 paper by Zuberi et al.
ACM SOSP 99 paper by Zuberi et al. Motivation Overview of EMERALDS Minimizing Code Size Minimizing Execution Overheads Conclusions 11/17/10 2 Small-memory embedded systems used everywhere! automobiles
More informationAnalysis and Research on Improving Real-time Performance of Linux Kernel
Analysis and Research on Improving Real-time Performance of Linux Kernel BI Chun-yue School of Electronics and Computer/ Zhejiang Wanli University/Ningbo, China ABSTRACT With the widespread application
More informationMultiprocessor and Real- Time Scheduling. Chapter 10
Multiprocessor and Real- Time Scheduling Chapter 10 Classifications of Multiprocessor Loosely coupled multiprocessor each processor has its own memory and I/O channels Functionally specialized processors
More informationDr. D. M. Akbar Hussain DE5 Department of Electronic Systems
Concurrency 1 Concurrency Execution of multiple processes. Multi-programming: Management of multiple processes within a uni- processor system, every system has this support, whether big, small or complex.
More informationCPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University
Operating Systems (Fall/Winter 2018) CPU Scheduling Yajin Zhou (http://yajin.org) Zhejiang University Acknowledgement: some pages are based on the slides from Zhi Wang(fsu). Review Motivation to use threads
More informationCSE 120 Principles of Operating Systems
CSE 120 Principles of Operating Systems Spring 2018 Lecture 15: Multicore Geoffrey M. Voelker Multicore Operating Systems We have generally discussed operating systems concepts independent of the number
More informationAn Introduction to Parallel Programming
An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe
More informationAlgorithm Analysis. (Algorithm Analysis ) Data Structures and Programming Spring / 48
Algorithm Analysis (Algorithm Analysis ) Data Structures and Programming Spring 2018 1 / 48 What is an Algorithm? An algorithm is a clearly specified set of instructions to be followed to solve a problem
More informationConcept of a process
Concept of a process In the context of this course a process is a program whose execution is in progress States of a process: running, ready, blocked Submit Ready Running Completion Blocked Concurrent
More informationEnsuring Schedulability of Spacecraft Flight Software
Ensuring Schedulability of Spacecraft Flight Software Flight Software Workshop 7-9 November 2012 Marek Prochazka & Jorge Lopez Trescastro European Space Agency OUTLINE Introduction Current approach to
More informationMultiprocessors II: CC-NUMA DSM. CC-NUMA for Large Systems
Multiprocessors II: CC-NUMA DSM DSM cache coherence the hardware stuff Today s topics: what happens when we lose snooping new issues: global vs. local cache line state enter the directory issues of increasing
More informationEmbedded Systems: OS. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Embedded Systems: OS Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Standalone Applications Often no OS involved One large loop Microcontroller-based
More informationEECS 571 Principles of Real-Time Embedded Systems. Lecture Note #10: More on Scheduling and Introduction of Real-Time OS
EECS 571 Principles of Real-Time Embedded Systems Lecture Note #10: More on Scheduling and Introduction of Real-Time OS Kang G. Shin EECS Department University of Michigan Mode Changes Changes in mission
More informationOnline Course Evaluation. What we will do in the last week?
Online Course Evaluation Please fill in the online form The link will expire on April 30 (next Monday) So far 10 students have filled in the online form Thank you if you completed it. 1 What we will do
More informationCommercial Real-time Operating Systems An Introduction. Swaminathan Sivasubramanian Dependable Computing & Networking Laboratory
Commercial Real-time Operating Systems An Introduction Swaminathan Sivasubramanian Dependable Computing & Networking Laboratory swamis@iastate.edu Outline Introduction RTOS Issues and functionalities LynxOS
More informationCPU Scheduling: Objectives
CPU Scheduling: Objectives CPU scheduling, the basis for multiprogrammed operating systems CPU-scheduling algorithms Evaluation criteria for selecting a CPU-scheduling algorithm for a particular system
More informationExploring System Challenges of Ultra-Low Latency Solid State Drives
Exploring System Challenges of Ultra-Low Latency Solid State Drives Sungjoon Koh Changrim Lee, Miryeong Kwon, and Myoungsoo Jung Computer Architecture and Memory systems Lab Executive Summary Motivation.
More informationSYNCHRONIZATION M O D E R N O P E R A T I N G S Y S T E M S R E A D 2. 3 E X C E P T A N D S P R I N G 2018
SYNCHRONIZATION M O D E R N O P E R A T I N G S Y S T E M S R E A D 2. 3 E X C E P T 2. 3. 8 A N D 2. 3. 1 0 S P R I N G 2018 INTER-PROCESS COMMUNICATION 1. How a process pass information to another process
More informationOperating Systems. Memory Management. Lecture 9 Michael O Boyle
Operating Systems Memory Management Lecture 9 Michael O Boyle 1 Memory Management Background Logical/Virtual Address Space vs Physical Address Space Swapping Contiguous Memory Allocation Segmentation Goals
More informationCS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it
Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1
More informationMultiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.
Multiprocessor System Multiprocessor Systems Chapter 8, 8.1 We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 10 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 Chapter 6: CPU Scheduling Basic Concepts
More informationTechnical Brief: Specifying a PC for Mascot
Technical Brief: Specifying a PC for Mascot Matrix Science 8 Wyndham Place London W1H 1PP United Kingdom Tel: +44 (0)20 7723 2142 Fax: +44 (0)20 7725 9360 info@matrixscience.com http://www.matrixscience.com
More informationAn Implementation of the PCP, SRP, D-PCP, M-PCP, and FMLP Real-Time Synchronization Protocols in LITMUS RT
An Implementation of the PCP, SRP, D-PCP, M-PCP, and FMLP Real-Time Synchronization Protocols in LITMUS RT Björn B. Brandenburg and James H. Anderson The University of North Carolina at Chapel Hill Abstract
More informationIssues in Parallel Processing. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Issues in Parallel Processing Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Introduction Goal: connecting multiple computers to get higher performance
More informationSystem Architecture Directions for Networked Sensors. Jason Hill et. al. A Presentation by Dhyanesh Narayanan MS, CS (Systems)
System Architecture Directions for Networked Sensors Jason Hill et. al. A Presentation by Dhyanesh Narayanan MS, CS (Systems) Sensor Networks Key Enablers Moore s s Law: More CPU Less Size Less Cost Systems
More informationPerformance of Multicore LUP Decomposition
Performance of Multicore LUP Decomposition Nathan Beckmann Silas Boyd-Wickizer May 3, 00 ABSTRACT This paper evaluates the performance of four parallel LUP decomposition implementations. The implementations
More informationQUESTION BANK UNIT I
QUESTION BANK Subject Name: Operating Systems UNIT I 1) Differentiate between tightly coupled systems and loosely coupled systems. 2) Define OS 3) What are the differences between Batch OS and Multiprogramming?
More informationMultiprocessor Systems. COMP s1
Multiprocessor Systems 1 Multiprocessor System We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than one CPU to improve
More informationIntel Thread Building Blocks, Part IV
Intel Thread Building Blocks, Part IV SPD course 2017-18 Massimo Coppola 13/04/2018 1 Mutexes TBB Classes to build mutex lock objects The lock object will Lock the associated data object (the mutex) for
More informationTasks. Task Implementation and management
Tasks Task Implementation and management Tasks Vocab Absolute time - real world time Relative time - time referenced to some event Interval - any slice of time characterized by start & end times Duration
More informationReal-Time Internet of Things
Real-Time Internet of Things Chenyang Lu Cyber-Physical Systems Laboratory h7p://www.cse.wustl.edu/~lu/ Internet of Things Ø Convergence of q Miniaturized devices: integrate processor, sensors and radios.
More informationEmbedded Systems: OS
Embedded Systems: OS Jinkyu Jeong (Jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ICE3028: Embedded Systems Design, Fall 2018, Jinkyu Jeong (jinkyu@skku.edu) Standalone
More information1 Multiprocessors. 1.1 Kinds of Processes. COMP 242 Class Notes Section 9: Multiprocessor Operating Systems
COMP 242 Class Notes Section 9: Multiprocessor Operating Systems 1 Multiprocessors As we saw earlier, a multiprocessor consists of several processors sharing a common memory. The memory is typically divided
More informationReal-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo
Real-Time Scalability of Nested Spin Locks Hiroaki Takada and Ken Sakamura Department of Information Science, Faculty of Science, University of Tokyo 7-3-1, Hongo, Bunkyo-ku, Tokyo 113, Japan Abstract
More informationDynamic Fine Grain Scheduling of Pipeline Parallelism. Presented by: Ram Manohar Oruganti and Michael TeWinkle
Dynamic Fine Grain Scheduling of Pipeline Parallelism Presented by: Ram Manohar Oruganti and Michael TeWinkle Overview Introduction Motivation Scheduling Approaches GRAMPS scheduling method Evaluation
More informationThreads SPL/2010 SPL/20 1
Threads 1 Today Processes and Scheduling Threads Abstract Object Models Computation Models Java Support for Threads 2 Process vs. Program processes as the basic unit of execution managed by OS OS as any
More informationCS140 Operating Systems and Systems Programming Midterm Exam
CS140 Operating Systems and Systems Programming Midterm Exam October 31 st, 2003 (Total time = 50 minutes, Total Points = 50) Name: (please print) In recognition of and in the spirit of the Stanford University
More informationCPSC/ECE 3220 Summer 2018 Exam 2 No Electronics.
CPSC/ECE 3220 Summer 2018 Exam 2 No Electronics. Name: Write one of the words or terms from the following list into the blank appearing to the left of the appropriate definition. Note that there are more
More informationOVERVIEW. Last Week: But if frequency of high priority task increases temporarily, system may encounter overload: Today: Slide 1. Slide 3.
OVERVIEW Last Week: Scheduling Algorithms Real-time systems Today: But if frequency of high priority task increases temporarily, system may encounter overload: Yet another real-time scheduling algorithm
More informationPresented by: Nafiseh Mahmoudi Spring 2017
Presented by: Nafiseh Mahmoudi Spring 2017 Authors: Publication: Type: ACM Transactions on Storage (TOS), 2016 Research Paper 2 High speed data processing demands high storage I/O performance. Flash memory
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2019 Lecture 8 Scheduling Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ POSIX: Portable Operating
More informationThreads. CS3026 Operating Systems Lecture 06
Threads CS3026 Operating Systems Lecture 06 Multithreading Multithreading is the ability of an operating system to support multiple threads of execution within a single process Processes have at least
More informationAnalysis and Implementation of Global Preemptive Fixed-Priority Scheduling with Dynamic Cache Allocation
Analysis and Implementation of Global Preemptive Fixed-Priority Scheduling with Dynamic Cache Allocation Meng Xu Linh Thi Xuan Phan Hyon-Young Choi Insup Lee Department of Computer and Information Science
More informationJob sample: SCOPE (VLDBJ, 2012)
Apollo High level SQL-Like language The job query plan is represented as a DAG Tasks are the basic unit of computation Tasks are grouped in Stages Execution is driven by a scheduler Job sample: SCOPE (VLDBJ,
More informationReview: Creating a Parallel Program. Programming for Performance
Review: Creating a Parallel Program Can be done by programmer, compiler, run-time system or OS Steps for creating parallel program Decomposition Assignment of tasks to processes Orchestration Mapping (C)
More informationTHREADS & CONCURRENCY
27/04/2018 Sorry for the delay in getting slides for today 2 Another reason for the delay: Yesterday: 63 posts on the course Piazza yesterday. A7: If you received 100 for correctness (perhaps minus a late
More informationDealing with Issues for Interprocess Communication
Dealing with Issues for Interprocess Communication Ref Section 2.3 Tanenbaum 7.1 Overview Processes frequently need to communicate with other processes. In a shell pipe the o/p of one process is passed
More informationImplementing Mutual Exclusion. Sarah Diesburg Operating Systems CS 3430
Implementing Mutual Exclusion Sarah Diesburg Operating Systems CS 3430 From the Previous Lecture The too much milk example shows that writing concurrent programs directly with load and store instructions
More informationManaging Performance Variance of Applications Using Storage I/O Control
Performance Study Managing Performance Variance of Applications Using Storage I/O Control VMware vsphere 4.1 Application performance can be impacted when servers contend for I/O resources in a shared storage
More informationChapter 20: Database System Architectures
Chapter 20: Database System Architectures Chapter 20: Database System Architectures Centralized and Client-Server Systems Server System Architectures Parallel Systems Distributed Systems Network Types
More informationUNIX Input/Output Buffering
UNIX Input/Output Buffering When a C/C++ program begins execution, the operating system environment is responsible for opening three files and providing file pointers to them: stdout standard output stderr
More information