Administration Matteo Corti RZ H11 27944 corti@inf.ethz.ch Today: IFW A36, 16:15 D-INFK Colloquium Automated, compositional and iterative deadlock detection Natasha Sharygina, Carnegie Mellon University Exam: (tentative) 2005-03-04 3 hours 2 pages handwritten P. Reali / M. Corti
Activities Program (static concept) Process (dynamic) Processes, jobs, tasks, threads (differences later) program code context: program counter (PC) and registers stack pointer state [new] running waiting ready [terminated] stack data section (heap) 174 P. Reali / M. Corti
Processes vs. Threads Process or job (heavyweight) code address space processor state private data (stack+registers) Thread (lightweight) shared code shared address space processor state private data (stack+registers) can have multiple threads Kernel CPU 175 P. Reali / M. Corti
Processes vs. Threads: Example HEAP 1 HEAP 2 HEAP STACK 1 STACK 2 STACK 1 STACK 2 PROC 1 instr instr instr PROC 2 instr instr instr PROC instr instr instr 176 P. Reali / M. Corti
Multitasking synchronous asynchronous Programmed events that can cause a task switch 177 P. Reali / M. Corti protection (locks) acquire release synchronization wait on a condition send a signal (send-and-pass) System events that can cause a task switch voluntary switch ( yield, task termination) process with higher priority becomes available consumption of the allowed time quantum task preemption
Preemption Assign each process a time-quantum (normally in the order of tens of ms) Asynchronous task switches can happen at any time! task can be in the middle of a computation save whole CPU state (registers, flags,...) Perform switch on resource conflict on synchronization request on timer-interrupt (time-quantum is over) 178 P. Reali / M. Corti
Context switch Scheduler invocation: preemption interrupt cooperation explicit call Operations: store the process state (PC, regs, ) choose the next process (strategy) [accounting] restore the state of the next process (regs, SP, PC, ) jump to the restored PC A context switch is usually expensive: 1 1000µs depending on the system and number of processes hardware optimizations (e.g., multiple sets of registers SPARC, DECSYSTEM-20) 179 P. Reali / M. Corti
Scheduling algorithms Three categories of environments: batch systems (e.g., VPP, DOS) usually non-preemptive (i.e., task is not stopped by scheduler, only synchronous switches) interactive systems (UNIX, Windows, Mac OS) cooperative or preemptive no task allowed to have the CPU forever real-time systems (PathWorks, RT Linux) timing constraints (deadlines, periodicity) 180 P. Reali / M. Corti
Scheduling Performance CPU utilization Throughput number of jobs per time unit minimize context switch penalty Turnaround time = exit time - arrival time execution, wait, I/O Response time = start time - request time Waiting time (I/O, waiting, ) Fairness 181 P. Reali / M. Corti
Scheduling algorithm goals All systems Fairness give every task a chance Policy enforcement Balance keep all subsystems busy Interactive systems Response time respond quickly Proportionality meet user s expectations Batch systems Throughput maximize number of jobs Turnaround time minimize time in system CPU utilization keep CPU busy Real-time systems Meet deadlines avoid losing data Predictability avoid degradation Hard- vs. soft-real-time systems 182 P. Reali / M. Corti
Batch Scheduling Algorithms Choose task to run (task is usually not preempted) First Come First Serve (FCFS) fair, may cause long waiting times Shortest Job First (SJF) requires knowledge about job length Longest Response Ratio response ratio = (time in the system / CPU time) depends on the waiting time Highest Priority First with or without preemption Mixed ETH-VPP is a batch system! Which algorithm does it use? the priority is adjusted dynamically (time in queue, length, priority, ) 183 P. Reali / M. Corti
Preemptive Scheduling Algorithms Time sharing Each task has a predefined time quantum Round-Robin Schedule next task on the ready list Quantum choice: small: may cause frequent switches big: may cause slow response P1 P4 P3 Implicit assumption: all task have same importance next P2 next 184 P. Reali / M. Corti
Preemptive Scheduling Algorithms Priority scheduling process with highest priority is scheduled first Variants multilevel queue scheduling one list per priority, use round-robin on list dynamic priorities proportional to time in system inversely proportional to part of quantum used make time quantum proportional to priority 185 P. Reali / M. Corti
Real-Time Scheduling Algorithms Task needs to meet the deadline! Task cost is known (should) Two task kind: aperiodic periodic Reservation scheduler decides if system has enough resources for the task Algorithms: Rate Monotonic Scheduling assign static priorities (priority proportional to frequency) Earliest Deadline First task with closest deadline is chosen 186 P. Reali / M. Corti
Summary & Admin Processes & Threads Multitasking context switch preemption cooperation Scheduling performance criteria algorithms Exercises: will take place tomorrow as usual :-) 187 P. Reali / M. Corti
Scheduling Algorithm Example Situation: Tasks P1, P2, P3, P4 Arrive at time t = 0 Priority: P1 highest, P4 lowest Time to process: 10, 2, 5, 3 188 P. Reali / M. Corti
Scheduling Algorithm Example Highest Priority First P1 P2 P3 P4 0 10 12 17 20 10 +12 +17 + 20 turnaround = =14.75 4 0 +10 +12 +17 response time = = 9.75 4 189 P. Reali / M. Corti
Scheduling Algorithm Example Shortest Job First P1 P2 P3 P4 0 2 20 10 5 2 + 5 +10 + 20 turnaround = = 9.25 4 response time = 0 + 2 + 5 +10 = 4.25 4 190 P. Reali / M. Corti
Scheduling Algorithm Example Timesharing with quantum = 2 P1 P2 P3 P4 0 2 4 6 8 10 12 13 14 16 18 20 20 + 4 +14 +12 turnaround = =12.5 4 response time = 0 + 2 + 4 + 6 = 3 4 191 P. Reali / M. Corti
Scheduling Algorithm Example Timesharing with quantum 0 P1 P2 P3 running at 1/4 running at 1/3 running at 1/2 P4 0 8 11 15 20 turnaround = 20 + 8 +15 +11 4 response time = 0 =13.5 192 P. Reali / M. Corti
Scheduling Algorithm Example: Results Situation: Tasks P1, P2, P3, P4 Arrive at time t = 0 Priority: P1 highest, P4 lowest Time to process: 10, 2, 5, 3 Results turnaround response time Highest Priority First: 14.75 9.75 Shortest Job First: 9.25 4.25 Timesharing with Quantum = 2: 12.75 3.0 Timesharing with Quantum 0: 13.5 0 193 P. Reali / M. Corti
Scheduling Examples UNIX preemption 32 priority levels (round robin) each second the priorities are recomputed (CPU usage, nice level, last run) BSD similar every 4th tick priorities are recomputed (usage estimation) Windows NT real time priorities: fixed, may run forever variable: dynamic priorities, preemption idle: last choice (swap manager) 194 P. Reali / M. Corti
Scheduling Examples: Quantum & Priorities Win2K: quantum = 20ms (professional) 120ms (user), configurable depending on type (I/O bound) BSD: quantum = 100ms priority = f(load,nice,time last ) Linux: quantum = quantum / 2 + priority f(quantum, nice) 195 P. Reali / M. Corti
Scheduling Problems Starvation A task is never scheduled (although ready) fairness Deadlock No task is ready (nor it will ever become ready) detection+recovery or avoidance 196 P. Reali / M. Corti
Deadlock Conditions T Thread A holds R R1 B wants R R Resource A wants S T 1 T 2 R 2 B holds S Coffman conditions for a deadlock (1971): Mutual exclusion Hold and wait No resource preemption Circular wait (cycle) 197 P. Reali / M. Corti
Deadlock Remedies Coarser lock granularity: use a single lock for all resources (e.g., Linux 2.0-2.4 Big Kernel Lock ) Locking order: resources are ordered resource locking according to the resource order (ticketing) Two-phase-locking: try to acquire all the resources if successful, lock them; otherwise free them and try again 198 P. Reali / M. Corti
Deadlock Detection, Prevention & Recovery Deadlock detection: the system keeps a graph of locks and tries to detect cycles. time consuming the graph has to be kept consistent with the actual state Deadlock prevention (avoidance): remove one of the four Coffman conditions cycles Recovery: kill processes and reclaim the resources rollback: requires to save the states of the processes regularly 199 P. Reali / M. Corti
Simple Deadlock Scenario Example Resources R, S, T Tasks A, B, C require { R, S }, { S, T }, { T, R } respectively Case 1: Sequential execution, no deadlock A B C +R +S -R -S +S +T -S -T +T +R -T -R 200 P. Reali / M. Corti
Simple Deadlock Scenario Case 2: Interleaving, deadlock A B C +R +S +T +S +T +R C R A T S B 201 P. Reali / M. Corti
Complex Deadlock Scenario Case with 6 resources and 7 tasks Thread holds requests A R S B - T C - S D U S, T E T V F W S G V U R C graphical representation A S F D U B T E V is this a case of deadlock? W G 202 P. Reali / M. Corti
Deadlock Avoidance Strategy in Bluebottle Timers Threads Processors Traps Each Kernel Module has a lock to protect its data Module Hierarchy Interrupts Blocks Locks Modules Memory Configuration When multiple locks are needed, acquire them according to the module hierarchy Module Lock 203 P. Reali / M. Corti
Priority Inversion A high-priority task can be blocked by a lower priority one. Example: High Medium Low waiting running ready 204 P. Reali / M. Corti
Priority Inversion Big problem for RTOS Solutions priority inheritance low-priority task holding resource inherits priority of highpriority task wanting the resource priority ceilings each resource has a priority corresponding to the highest priority of the users +1 the priority of the resource is transferred to the locking process can be used instead of semaphores 205 P. Reali / M. Corti
Example: Mars Pathfinder (1996 1998) VxWorks real-time system: preemptive, priorities Communication bus: shared resource (mutexes) Low priority task (short): meteorological data gathering Medium priority task (long): communication High priority: bus manager Detection: watchdog on bus activity system reset Fix: activate priority inheritance via an uploaded onthe-fly patch (no memory protection). 206 P. Reali / M. Corti
Locking on Multiprocessor Machines Real parallelism! Cannot disable interrupts like on single processor machines (could stop every task, but not efficient) Software solutions Peterson, Dekker,... Hardware support bus locking atomic instructions (Test And Set, Compare And Swap) 207 P. Reali / M. Corti
Locking on multiprocessor machines Test And Set Compare and Swap (Intel) TAS s: CAS R1, R2, A: IF s = 0 THEN s := 1 ELSE CC := TRUE R1: expected value R2: new value A: address END IF R1 = M[A] THEN M[A] := R2; CC := TRUE ELSE R1 := M[A]; CC := FALSE END These instructions are atomic even on multiprocessors! The usually do so by locking the data bus 208 P. Reali / M. Corti
Example: Semaphores on SMP Counter s: available resources Binary Semaphores with TAS Spinning (busy wait) Try TAS s JMP Try CS TAS s JMP Queuing CS Blocking 209 P. Reali / M. Corti
Example: Semaphores on SMP Counter s: available resources Generic Semaphores with CAS P(s) Enter CS Exit CS V(s) P(S): { S := S - 1} IF S < 0 THEN jump queuing END Load R1 s TryP MOVE R1 R2 DEC R2 CAS R1, R2, s BNE TryP CMP R2, 0 BN Queuing [CS] V(S): { S := S + 1} IF S <= 0 THEN jump dequeuing END [CS] Load R1 s TryV MOVE R1 R2 INC R2 CAS R1, R2, s BNE TryV CMP R2, 0 BNP Dequeuing 210 P. Reali / M. Corti
Spin-Locks: the Bluebottle/i386 way PROCEDURE AcquireSpinTimeout(VAR locked: BOOLEAN); CODE {SYSTEM.i386} MOV EBX, locked[ebp] ; EBX := ADR(locked) MOV AL, 1 ; AL := 1 CLI ; switch interrupts off before ; acquiring lock test: XCHG [EBX], AL CMP AL, 1 JE test ; set and read the lock ; atomically. ; LOCK prefix implicit. ; was locked? ; retry.. END AcquireSpinTimeout; simplified version 211 P. Reali / M. Corti
Active Objects in Active Oberon State Method Object Activity Z = OBJECT VAR myt: T; I: INTEGER; PROCEDURE & NEW (t: T); BEGIN myt := t END NEW; PROCEDURE P (u: U; VAR v: V); BEGIN { EXCLUSIVE } i := 1 END P; BEGIN { ACTIVE } BEGIN { EXCLUSIVE } AWAIT (i > 0); END END Z; Initializer Mutual Exclusion Condition 212 P. Reali / M. Corti
Active Oberon Runtime Structures CPUs NIL 1 Running 2 Lock Queue Wait Queue Awaiting Object Awaiting Assertion Ready Ready Queue Ready 213 P. Reali / M. Corti
Active Oberon Implementation NEW Create object; Create process; Set to ready Preempt Set to ready; Run next ready END Run next ready 0 6 1 7 1 2 Running 3 Awaiting Object 1 NIL 7 6 Awaiting Assertion 4 Ready 5 0 NIL 214 P. Reali / M. Corti
Active Oberon Implementation Enter Monitor IF monitor lock set THEN Put me in monitor obj wait list; Run next ready ELSE set monitor lock END Exit Monitor Find first asserted x in wait list; IF x found THEN set x to ready ELSE Find first x in obj wait list; IF x found THEN set x to ready ELSE clear monitor lock END END Run next ready 215 P. Reali / M. Corti 2 1 5 4 1 2 3 Running Awaiting Object 1 NIL 7 Awaiting Assertion 4 Ready 5 0 NIL 6
Active Oberon Implementation NIL 7 2 Running 3 AWAIT Put me in monitor assn wait list; Call Exit monitor 3 Awaiting Object 1 6 Awaiting Assertion 4 Ready 5 0 216 P. Reali / M. Corti NIL
Summary Scheduling algorithms examples Deadlocks Priority inversion Test & set instructions Case study: Active Oberon scheduler Next: Q&A Threads and locks: case studies File systems 217 P. Reali / M. Corti
[Q&A] Virtual Address Translations There is not a general scheme (or formula) valid for all architectures! General idea: Virtual address TLB Real address Page table Time (example): T := T TLB + (1" P TLBhit ) # (T PT + (1" P PThit ) # T disk ) 218 P. Reali / M. Corti
Case Study: Windows CE 3.0 Real-time constraints Reaction time on events Execution time Threads with priorities and time quanta Priorities: 0 (high),, 255 (low) Time quanta in ms Default 100 ms 0 no quantum Single processor p p q < p end of quantum 219 P. Reali / M. Corti
Case Study: Windows CE 3.0 Interrupt Handling ISR (Interrupt Service Routine) 1st level handling Kernel mode, uses kernel stack User Modus Installed at boot-time Creates event on-demand IST Preempted by ISR with higher priority IST (Interrupt Service Thread) 2nd level handling Event IRQ NK.EXE Event User mode Awaits events ISR Kernel Modus 220 P. Reali / M. Corti
Case Study: Windows CE 3.0 Synchronization on common resources: Critical sections: enter, leave operations Semaphores and mutexes (binary semaphores) [ CS ] [ ] [ ] Synchronization is performed with system/library calls (they are not part of a language). Priority inversion avoidance priority inheritance (thread inherits priority of task wanting the resource) 221 P. Reali / M. Corti
Case Study: Java Activities are mapped to threads (no processes) Synchronization in the language locks signals Threads provided by the library Scheduling depends on the JVM 222 P. Reali / M. Corti
Case Study: Java public class MyThread() extends Thread { } public void run() { System.out.println("Running"); } public static void main(string [] arguments) { MyThread t = (new MyStread()).start(); } 223 P. Reali / M. Corti
Case Study: Java public class MyThread() implements Runnable { } public void run() { System.out.println("Running"); } public static void main(string [] arguments) { Thread t = (new Thread(this)).start(); } 224 P. Reali / M. Corti
Case Study: Java Protection with monitor-like objects with method granularity public synchronized void somemethod() with statement granularity synchronized(anobject) {... } Synchronization with signals wait() (with optional time-out) notify() / notifyall() ( send and continue pattern) 225 P. Reali / M. Corti
Case Study: Java private Object o; public synchronized consume() { while (o == null) { try { wait(); } catch (InterruptedException e) {} } use(o); o = null; notifyall(); } public synchronized void produce(object p) { while (o!= null) { try { wait(); } catch (InterruptedException e) {} } o = p; notifyall(); } 226 P. Reali / M. Corti
Case Study: POSIX Threads Standard interface for threads in C Mostly UNIX, possible on Windows Provided by a library (libpthread) and not part of the language. IEEE POSIX 1003.1c standard (1995) Various implementations (both user and kernel level) 227 P. Reali / M. Corti
Case Study: POSIX Threads #include <pthread.h> pthread_mutex_t m; void *run(){ pthread_mutex_lock(&m); // critical section pthread_mutex_unlock(&m); pthread_exit(null); } int main (int argc, char *argv[]){ pthread_t t; pthread_create(&t, NULL, run,null); pthread_exit(null); } 228 P. Reali / M. Corti