CS510 Cncurrent Systems Class 2 A Lck-Free Multiprcessr OS Kernel
The Synthesis kernel A research prject at Clumbia University Synthesis V.0 ( 68020 Uniprcessr (Mtrla N virtual memry 1991 - Synthesis V.1 Dual 68030s virtual memry, threads, etc Lck-free kernel CS510 - Cncurrent Systems 2
Lcking Why d kernels nrmally use lcks? Lcks supprt a cncurrent prgramming style based n mutual exclusin Acquire lck n entry t critical sectins Release lck n exit Blck r spin if lck is held Only ne thread at a time executes the critical sectin Lcks prevent cncurrent access and enable sequential reasning abut critical sectin cde CS510 - Cncurrent Systems 3
S why nt use lcking? Granularity decisins Simplicity vs perfrmance Increasingly pr perfrmance (superscalar CPUs) Cmplicates cmpsitin Need t knw the lcks I m hlding befre calling a functin Need t knw if its safe t call while hlding thse lcks? Risk f deadlck Prpagates thread failures t ther threads What if I crash while hlding a lck? CS510 - Cncurrent Systems 4
Is there an alternative? Use lck-free, ptimistic synchrnizatin Execute the critical sectin uncnstrained, and check at the end t see if yu were the nly ne If s, cntinue. If nt rll back and retry Synthesis uses n lcks at all! Gal: Shw that Lck-Free synchrnizatin is... Sufficient fr all OS synchrnizatin needs Practical High perfrmance CS510 - Cncurrent Systems 5
Lcking is pessimistic Murphy's law: If it can g wrng, it will... In cncurrent prgramming: If we can have a race cnditin, we will... If anther thread culd mess us up, it will... Slutin: Hide the resurces behind lcked drs Make everyne wait until we're dne That is...if there was anyne at all We pay the same cst either way CS510 - Cncurrent Systems 6
Optimistic synchrnizatin The cmmn case is ften little r n cntentin Or at least it shuld be! D we really need t shut ut the whle wrld? Why nt prceed ptimistically and nly incur cst if we encunter cntentin? If there's little cntentin, there's n starvatin S we dn t need t be wait-free which guarantees n starvatin Lck-free is easier and cheaper than wait-free Small critical sectins really help perfrmance CS510 - Cncurrent Systems 7
Hw des it wrk? Cpy Write dwn any state we need in rder t retry D the wrk Perfrm the cmputatin Atmically test and cmmit r retry Cmpare saved assumptins with the actual state f the wrld If different, und wrk, and start ver with new state If precnditins still hld, cmmit the results and cntinue This is where the wrk becmes visible t the wrld (ideally) CS510 - Cncurrent Systems 8
Example stack pp Pp() { retry: ld_sp = SP; new_sp = ld_sp + 1; elem = *ld_sp; if (CAS(ld_SP, new_sp, &SP) == FAIL) gt retry; return elem; } CS510 - Cncurrent Systems 9
Example stack pp lp Pp() { retry: ld_sp = SP; new_sp = ld_sp + 1; elem = *ld_sp; if (CAS(ld_SP, new_sp, &SP) == FAIL) gt retry; return elem; } CS510 - Cncurrent Systems 10
Example stack pp Lcals - wn t change! Pp() { retry: ld_sp = SP; new_sp = ld_sp + 1; elem = *ld_sp; if (CAS(ld_SP, new_sp, &SP) == FAIL) gt retry; return elem; } Glbal - may change any time! Atmic read-mdify-write instructin CS510 - Cncurrent Systems 11
CAS CAS single wrd Cmpare and Swap An atmic read-mdify-write instructin Semantics f the single atmic instructin are: CAS(cpy, update, mem_addr) { if (*mem_addr == cpy) { *mem_addr = update; return SUCCESS; } else return FAIL; } CS510 - Cncurrent Systems 12
Example stack pp Pp() { retry: ld_sp = SP; new_sp = ld_sp + 1; elem = *ld_sp; if (CAS(ld_SP, new_sp, &SP) == FAIL) gt retry; return elem; } CS510 - Cncurrent Systems 13
Example stack pp Pp() { retry: ld_sp = SP; new_sp = ld_sp + 1; D Wrk elem = *ld_sp; if (CAS(ld_SP, new_sp, &SP) == FAIL) gt retry; return elem; } CS510 - Cncurrent Systems 14
Example stack pp Pp() { retry: ld_sp = SP; new_sp = ld_sp + 1; elem = *ld_sp; if (CAS(ld_SP, new_sp, &SP) == FAIL) gt retry; return elem; } CS510 - Cncurrent Systems 15
Example stack pp Pp() { retry: ld_sp = SP; new_sp = ld_sp + 1; D Wrk elem = *ld_sp; if (CAS(ld_SP, new_sp, &SP) == FAIL) gt retry; return elem; } CS510 - Cncurrent Systems 16
What made it wrk? It wrks because we can atmically cmmit the new stack pinter value and cmpare the ld stack pinter with the ne at cmmit time This allws us t verify n ther thread has accessed the stack cncurrently with ur peratin i.e. since we tk the cpy Well, at least we knw the address in the stack pinter is the same as it was when we started Des this guarantee there was n cncurrent activity? Des it matter? We have t be careful! CS510 - Cncurrent Systems 17
Stack push Push(elem) { retry: ld_sp = SP; new_sp = ld_sp 1; ld_val = *new_sp; if(cas2(ld_sp, ld_val, new_sp, elem, &SP, new_sp) == ( FAIL gt retry; } CS510 - Cncurrent Systems 18
Stack push Push(elem) { retry: Cpy ld_sp = SP; new_sp = ld_sp 1; ld_val = *new_sp; if(cas2(ld_sp, ld_val, new_sp, elem, &SP, new_sp) == ( FAIL gt retry; } CS510 - Cncurrent Systems 19
Stack push Push(elem) { retry: D Wrk ld_sp = SP; new_sp = ld_sp 1; ld_val = *new_sp; if(cas2(ld_sp, ld_val, new_sp, elem, &SP, new_sp) == ( FAIL gt retry; } CS510 - Cncurrent Systems 20
Stack push Push(elem) { retry: ld_sp = SP; new_sp = ld_sp 1; ld_val = *new_sp; if(cas2(ld_sp, ld_val, new_sp, elem, &SP, new_sp) == ( FAIL gt retry; } CS510 - Cncurrent Systems 21
Stack push Push(elem) { retry: ld_sp = SP; Unnecessary new_sp = ld_sp 1; } ld_val = *new_sp; Cmpare if(cas2(ld_sp, ld_val, new_sp, elem, &SP, new_sp) == ( FAIL gt retry; Nte: this is a duble cmpare and swap! Its needed t atmically update bth the new item and the new stack pinter CS510 - Cncurrent Systems 22
CAS2 CAS2 = duble cmpare and swap Smetimes referred t as DCAS CAS2(cpy1, cpy2, update1, update2, addr1, addr2) { if(addr1 == cpy1 && addr2 == cpy2) { *addr1 = update1; *addr2 = update2; return SUCCEED; } else return FAIL; } CS510 - Cncurrent Systems 23
Stack push Push(elem) { retry: ld_sp = SP; new_sp = ld_sp 1; D Wrk ld_val = *new_sp; if(cas2(ld_sp, ld_val, new_sp, elem, &SP, new_sp) == ( FAIL gt retry; } CS510 - Cncurrent Systems 24
Optimistic synchrnizatin in Synthesis Saved state is nly ne r tw wrds Cmmit is dne via Cmpare-and-Swap (CAS), r Duble-Cmpare-and-Swap (CAS2 r DCAS) Can we really d everything in nly tw wrds? Every synchrnizatin prblem in the Synthesis kernel is reduced t nly needing t atmically tuch tw wrds at a time! Requires sme very clever kernel architecture CS510 - Cncurrent Systems 25
Apprach Build data structures that wrk cncurrently Stacks Queues (array-based t avid allcatins) Linked lists Then build the OS arund these data structures Cncurrency is a first-class cncern CS510 - Cncurrent Systems 26
Why is this trickier than it seems? List peratins shw insert and delete at the head This is the easy case What abut insert and delete f interir ndes? Next pinters f deletable ndes are nt safe t traverse, even the first time! Need reference cunts and DCAS t atmically cmpare and update the cunt and pinter values This is expensive, s we may chse t defer deletes instead (mre n this later in the curse) Specialized list and queue implementatins can reduce the verheads CS510 - Cncurrent Systems 27
The fall-back psitin If yu can t reduce the wrk such that it requires atmic updates t tw r less wrds: Create a single server thread and d the wrk sequentially n a single CPU Why is this faster than letting multiple CPUs try t d it cncurrently? Callers pack the requested peratin int a message Send it t the server (using lck-free queues!) Wait fr a respnse/callback/... The queue effectively serializes the peratins CS510 - Cncurrent Systems 28
Lck vs lck-free critical sectins CS510 - Cncurrent Systems 29
Cnclusins This is really intriguing! Its pssible t build an entire OS withut lcks! But d yu really want t? Des it add r remve cmplexity? What if hardware nly gives yu CAS and n DCAS? What if critical sectins are large r lng lived? What if cntentin is high? What if we can t und the wrk?? CS510 - Cncurrent Systems 30