In-Kernel Synchronization Outline Part A Interrupts and Exceptions Kernel preemption CSC-256/456 Operating Systems Common in-kernel synchronization primitives Applicability of synchronization mechanisms Tuesday, November 27, 2007 Examples in the kernel Konstantinos Menychtas Part B Transactional Memory, a new synchronization model TxLinux * Our case study is Linux * Outline Reasons for synchronization Part A Kernel is reentrant Interrupts and Exceptions Interleaved control-paths Kernel preemption Common in-kernel synchronization primitives Applicability of synchronization mechanisms Asynchronous events Limited resources distribution Examples in the kernel Can be a non monolithic application Part B SMP / Multicore / Manycore Transactional Memory, a new synchronization model Distributed / Multithreaded TxLinux
Interrupt signals Implied interrupt signals priorities Interrupts = asynchronous (other devices) Maskable ( IRQs ) Non-maskable (ex. hardware failure) Exceptions = synchronous (by the CPU) Programmed, aka software interrupts (ex. buffer overflow) Processor-detected: faults, traps, aborts Most exceptions at user mode Exceptions can interrupt exceptions page fault may occur in kernel-mode depth at most two (1. system call, 2.page fault) Interrupts in any mode Can interrupt both interrupts and exceptions Cannot be interrupted by exceptions Handling nested execution They try to avoid causing page faults implicitly prioritized Deferrable Functions Policy and Kernel Preemption Enable asynchronous processing, hide latency (ex. Network interrupt handling) Remember TinyOS tasks (vs event handlers) Softirqs statically allocated concurrent (spin-locked) Clever waiter policy Customer = user, Boss = Interrupt Signals Customer service is my priority (user mode) Unless some boss has a request (kernel mode) Every boss wants instant service reentrant Tasklets on top of softirqs (HI_SOFTirq, TASKLET_SOFTirq) dynamic same type tasklets always serialized Kernel preemption = process switch while in kernel mode Forced, in addition to planned switch Replacement not necessarily to switch to user mode Lower dispatch latency higher interactivity
Preemption conditions Outline Only when executing an exception handler Part A Only when preemption is enabled Interrupts and Exceptions When kernel control path termination (usually interrupt handler) exception handler re-enables kernel preemption deferrable functions are enabled Kernel preemption Common in-kernel synchronization primitives Applicability of synchronization mechanisms Examples in the kernel Part B Kernel preemption is optional Transactional Memory, a new synchronization model TxLinux Need for synchronization Synchronization Primitives Critical region in control path must run path to completion Race condition execution depends on nesting order Synchronization not necessary IRQ line disabling Implied priority (ex. interrupt handling never interrupted) non-simultaneous execution (ex. tasklets)
Per-CPU variables Atomic operations Array of data structs with 1 element / CPU To avoid read-modify-write race conditions When data can be split Hardware memory arbiter serializes them Memory alignment cache line alignment Order undetermined No protection against asynchronous functions atomic_t Prone to race conditions single, chip-level instruction Kernel preemption disabling necessary hardware support Atomic operations, ex. atomic_sub_and_test Atomic bit handling functions, ex. test_and_set_bit Optimization and Memory Barriers Spin-locks (+ RW-spin-locks) The problems Hardware optimizations (ex. dual-issue) Compiler optimizations (ex. loop unrolling) Solutions: Use barriers to avoid synchronization issues Optimization barrier asm volatile( ::: memory ) Memory barrier certain instructions act like that anyway (ex. I/O ports) special hardware [smp_] mb(), rmb(), wmb() Poll until condition Can be faster than rescheduling spinlock_t slock (<=0/1) break_lock (SMP) spin_lock_irq() disables IRQs while held Read/Write rwlock_t : 2^23 R + 1-bit unlock flag
Seqlocks Read-Copy Update As of 2.6 linux kernel For data structures mostly read seqlock_t Lock-free spinlock_t creates copy of the protected data structure int sequence Many writers, many readers Biased, in favor of (single) writer, R/W spinlocks Writer never waits seqlock.sequence is increased both at lock / unlock writing only when sequence is odd Suitable when not acting on pointers read by readers and when multiple reads are OK Only dynamically allocated, pointer referenced data structures No kernel control path can sleep inside RCU Free copies on every tick a tasklet executes callbacks (memory-manager-like) Semaphores Completions In-kernel!= System V struct semaphore atomic_t count wait (wait queue list address) sleepers Suspends unsuccessful acquire (down) attempters To be used only by functions allowed to sleep not interrupt handlers not deferrable functions Read/Write semaphores (like rw spinlocks) wake-up one writer / many readers Mutexes (binary semaphores, as of 2.6.16) Similar to semaphore Specifically for subtle synchronization in SMPs process A at CPU 1 inits semaphore and downs process B at CPU 2 ups semaphore at the same time A tries to destroy semaphore Allows classic semaphore to stay optimized spinlock used to ensure complete() and wait_for_completion()
Local Interrupt Disabling Disabling and Enabling Deferrable Functions Allows easy synchronization with concurrent processes at the same CPU Deferrable functions execute unpredictably (mostly when hardware interrupt handlers finish) Does not handle access to data structures by other protect data structures they access against race conditions CPUs Disable softirqs without disabling interrupts addressed by coupling with spinlocks User softirq counter local_irq_disable : uses asm cli() do_softirq never executes when counter > 0 Extra care in interrupt re-enabling take care of nested handling by saving eflags Outline Synchronizing Access Part A Always keep concurrency level high Interrupts and Exceptions number of I/O devices / CPUs Kernel preemption Common in-kernel synchronization primitives Spin-locks, RW-locks, SEQlocks, RCU, local interrupt/softirq disabling stop kernel preemption Applicability of synchronization mechanisms Examples in the kernel Part B Spinlocks can have even more negative effect busy waiting cache tainting Transactional Memory, a new synchronization model TxLinux How to be more clever (examples) atomic_t memory barrier lists (favorite kernel data structure)
Choosing the best Protect against exceptions ex. System call service routines Semaphores suffice to synch on unavailable resources works well in UP or SMP environments Kernel preemption does not cause problems unless for per-cpu variables Protect against interrupts Protect against deferrable functions Interrupt handlers are serialized to themselves We need synch for data structures accessed by different handlers Can we lock this struct? Spin-lock : if interrupted, cannot unlock Semaphore : can block the process In UP, disable interrupts in critical regions In SMPs, disable them and acquire spin-lock if locked, interrupt handler on other CPU will release it Lots of macros to couple spinlocks+interrupt disable No synchronization is required in UP a deferrable function is always serialized Synch is required for SMPs multiple softirqs on different CPUs spin-lock one kind of tasklet no more than one run concurrently different kinds of tasklets spin-lock
Protect against exceptions and interrupts Protect against exceptions and deferrable functions Consider data structure accessed by both exceptions (ex system calls service routines) and interrupt handlers Interrupt handlers are not reentrant Can be treated like interrupts+exceptions Deferrable functions are activated by interrupts No exception can be raised while they run Disable local interrupts access data structure without interrupts SMPs disable+spin-locks to synch with >1 CPUs or disable+semaphores don't block interrupts! (down_trylock) Couple them with spin-locks+local disable for SMPs local_bh_disable() disables local deferrable functions preferable Protect against interrupts and deferrable functions Protect against everyone Like interrupts+exceptions Solution is the intersection: disable local interrupts + take spin-lock Interrupt can be raised while deferrable function running No deferrable function can stop an interrupt handler Disable local interrupts during deferrable function + spin-lock for SMPs
Outline Examples Part A Interrupts and Exceptions Kernel preemption Common in-kernel synchronization primitives Applicability of synchronization mechanisms Examples in the kernel Part B Transactional Memory, a new synchronization model TxLinux Reference counters : atomic_t BKL (Big Kernel Lock) [redundant] : semaphore (used to be spin-lock) reentrant lock_depth counter : how many times acquired per process descriptor per-cpu automatically released on schedule() temporarily set depth=-1 when preempting to avoid corruption More examples Outline Slab-list semaphore : kernels pool for dynamic memory protect (non-block) access (kmem_cache_ create/shrink/reap) never invoke them inside interrupt handler Inode semaphore : i_sem per inode concurrent access to file is a definite source of synchronization issues perform semaphore acquire request in predetermined order to avoid deadlocks (ex. rename) Part A Interrupts and Exceptions Kernel preemption Common in-kernel synchronization primitives Applicability of synchronization mechanisms Examples in the kernel Part B Transactional Memory, a new synchronization model TxLinux
Recap A relative comparison We saw Blocking Non-blocking 9 different concurrency control mechanisms 7 in-kernel situations which demand extra care Blocking synchronization ex. spin_lock_t (interrupt disabling) Non-blocking synchronization hardware: compare_and_swap, load_linked store_conditional ex. atomic_t programmability large granularity large synch cost unscalable performance small granularity scalability hard to program Is there any alternative model? Can the kernel take advantage of another model? Transactional Memory Example Atomic Transaction Atomic : operation cannot be broken down to smaller pieces (ex. CAS, non-ex. ADD ) (object is of type shared object ) Transaction : either commit or abort think of money withdrawal: you wouldn't want the bank to charge you for the money you tried to withdray, but never did because its ATM broke.. Atomic Memory Transactions Herlihy (Brown), Shavit and Toitou (Tel-Aviv) Cambridge, Texas (TMLinux), Rochester :)...
Questions for TM Atomic memory transactions Blocking or non-blocking? When should we lock the transactional object? When should the updates appear to memory? Whom to promote on conflicts? How should we handle doomed transactions? How should we handle nested transactions? How do we handle memory, since we don't know when to free? Atomic transactions on memory as if it was a database Software or Hardware or Hybrid Granularity level : from cache line to object Scalability Performance Programmability (working on it) Arbitrary granularity Transactional Memory in the Kernel MetaTM/TxLinux Could an OS kernel benefit from TM? What assumptions of TM should we take care of to utilize it in-kernel? how do TM systems it handle interrupts Take Linux: large, mature, well tuned concurrent program TM can be an extra concurrency abstraction for user-level programmers TM in the kernel can benefit from staying close to hardware (hybrid) Use MetaTM HTM Study the workload, propose or incorporate relative ideas from other TM systems (ex contention management) Rewrite Linux using the new TM model TxLinux Evaluate We 'll stick to the problems faced and suggested ideas
Bringing TM to Linux Interrupts Single thread of control multiple active transactions transaction stacking Communicate hints to the hardware (HTM) to help conflict management Can the front-end of a TM library benefit from analogous hints? Conflicts in STMs are exceptions anyway! But the kernel is more complex... Non-deterministic, very frequent, undefined origin / request / status How to pair interrupts with TM don't use TM inside interrupt handlers abort first transaction and re-execute after handler nest transactions treat the interrupt as context switch per CPU data struct, interrupts disabling, blocking operations... Stacked transactions (xpush, xpop) no nesting relationship, though like context switch Stacked Transactions Issues Contention management If, while executing T1, interrupt starts T2, and they conflict, there is a livelock Solution: Abort T1 Stack memory might change during the sequence start T, call calle return, interrupt, handler return, T restart conflict with interrupt after calle return RW-spinlocks favor readers SEQlocks favor writers RCU favors readers even more How do we handle conflicts in the light of such conflicting observations? T restarts interrupt due to overwritten stack frame Solution: ignore proper parts from transaction's set SizeMatters policy : restart the smallest transaction. Revert to time-stamps after a while to avoid livelock. backoff is essential
Special cases Some (of the) results Not all in-kernel synchronization should be replaced per-cpu : no synchronization issues, substitution might cause performance loss Blocking (semaphores, completions, mutexes) Dominant cost is waiting or queueing? Will we get smaller code footprint with TM? Different behavior if holding lock or not I/O Much I/O while spin-locks held (1/3!) potential performance gains Understanding the Linux Kernel, by Daniel P. Bovet, Marco Cesati TxLinux: Using and Managing Hardware TransactionalMemory in an Operating System and MetaTM/TxLinux: Transactional Memory For An Operating System, by Rossbach, Ramadan et. al (University of Texas at Austin) Unlocking Concurrency: Multicore Programming with Transactional Memory, by Adl-Tabatabai, Kozyrakis, Saha. ACM Queue, vol. 4, no. 10, December 2006 Lowering the Overhead of Nonblocking Software Transactional Memory, by Marathe, Spear, Scott et al ( University of Rochester, RSTM!) THANK YOU Disclaimer: All work presented is work of the respective authors (see ) All copyrighted material (ex. images) also belongs to the respective authors.