Verification of Tree-Based Hierarchical Read-Copy Update in the Linux Kernel Paul E. McKenney, IBM Linux Technology Center Joint work with Lihao Liang*, Daniel Kroening, and Tom Melham, University of Oxford
What Is RCU? 3/5/8 Verification of Linux-Kernel RCU 2
What Is RCU? Synchronization primitive used in Linux kernel Heavily used, and gaining use elsewhere as well Some implementations do read-only traversal of linked data structures using exactly the same sequence of machine instructions used in the absence of updates Readers get excellent performance, scalability, Complex and highly concurrent implementation http://www.rdrop.com/users/paulmck/rcu/ 3/5/8 Verification of Linux-Kernel RCU 3
RCU Is Specialized: Area of Applicability Stale and inconsistent data OK 00% Updates Update-Mostly, Need Fresh Consistent Data (RCU Not So Good),2 Read-Write, Need Consistent Data (RCU Might Be OK) Read-Mostly, Need Consistent Data (RCU Is OK) Read-Mostly, Stale/Inconsistent OK (RCU Is Great!!!) 00% Reads Need fresh and consistent data. RCU provides ABA protection for update-friendly mechanisms 3/5/8 2. RCU provides bounded Verification wait-free of Linux-Kernel read-side primitives RCU for real-time use 4
What Exactly Does Heavily Used Mean? 3/5/8 Verification of Linux-Kernel RCU 5
Isn't Making Software Work A Solved Problem? Million-Year Bug: Once Per Million Years 975 NHS 3/5/8 Verification of Linux-Kernel RCU 6
Isn't Making Software Work A Solved Problem? Million-Year Bug: Once In Ten Millennia 00 0 975 NHS 985 Various 3/5/8 Verification of Linux-Kernel RCU 7
Isn't Making Software Work A Solved Problem? Million-Year Bug: Once Per Century 00 0 0K K 00 0 975 NHS 985 Various 995 SQNT 3/5/8 Verification of Linux-Kernel RCU 8
Isn't Making Software Work A Solved Problem? Million-Year Bug: Once a Month 0M 00 0 0K K 00 0 00K 0K K 00 0 975 NHS 985 Various 995 SQNT 2005 Linux 3/5/8 Verification of Linux-Kernel RCU 9
Isn't Making Software Work A Solved Problem? Million-Year Bug: Several Times per Day 0G 0M 0M 00 0 0K K 00 0 00K 0K K 00 0 00K 0K K 00 0 975 NHS 985 Various 995 SQNT 2005 Linux 205 Linux 3/5/8 Verification of Linux-Kernel RCU 0
Isn't Making Software Work A Solved Problem? Million-Year Bug: Several Times per Hour 0G 00G 0G 00 0 0K K 00 0 0M 00K 0K K 00 0 0M 00K 0K K 00 0 0M 00K 0K K 00 0 975 NHS 985 Various 995 SQNT 2005 Linux 205 Linux 207 Linux 3/5/8 Verification of Linux-Kernel RCU
Isn't Making Software Work A Solved Problem? Million-Year Bug? You don't want to know... 975 NHS 00 0 985 Various 0K K 00 0 995 SQNT 0M 00K 0K K 00 0 2005 Linux 0G 0M 00K 0K K 00 0 205 Linux 00G 0G 0M 00K 0K K 00 0 207 Linux T 00G 0G 0M 00K T 0K K 00 0 3/5/8 Verification of Linux-Kernel RCU 2 IoT
Isn't Making Software Work A Solved Problem? Million-Year Bug? You don't want to know... But Murphy has transitioned from nice guy to homicidal maniac!!! 975 NHS 00 0 985 Various 0K K 00 0 995 SQNT 0M 00K 0K K 00 0 2005 Linux 0G 0M 00K 0K K 00 0 205 Linux 00G 0G 0M 00K 0K K 00 0 207 Linux T 00G 0G 0M 00K T 0K K 00 0 3/5/8 Verification of Linux-Kernel RCU 3 IoT
Why Stress About Potential Low-Probability Bugs? Almost any bug can become a security exploit Internet: Physical presence no longer required Not restricted to software: Meltdown and Spectre RCU is not the only thing with empirical spec! RCU is low level does not imply low risk After all, Row Hammer hit DRAM! Might be a trillion IoT devices in the World Translates to huge numbers of failures Some of which might put the general public at risk RCU is well-contained test case for PoC 3/5/8 Verification of Linux-Kernel RCU 4
Why Not Try Formal Verification? 3/5/8 Verification of Linux-Kernel RCU 5
Why Not Try Formal Verification? In Linux-Kernel RCU's Regression Tests... 3/5/8 Verification of Linux-Kernel RCU 6
Formal Verification & Regression Tests: Requirements Either automatic or no translation Correctly handle environment: memory model! Reasonable memory and CPU overhead Map back to lines of code containing bugs Main input: source code under test Find relevant bugs 3/5/8 Verification of Linux-Kernel RCU 7
Scorecard for Linux-Kernel C Code Promela PPCMEM Herd CBMC Test () Automated (2) Handle environment (MM) (MM) (MM) (3) Low overhead SAT? (4) Map to source code (5) Modest input (6) Relevant bugs???????????? Paul McKenney's first use 993 20 204 205 973 3/5/8 Verification of Linux-Kernel RCU 8
CBMC (Very) Rough Schematic C Code Encode into logic expression CBMC SAT solver Kroening, Clarke, and Lerda, A tool for checking ANSI-C Programs, Tools and Algorithms for the Construction and Analysis of Systems, 2004, pp. 68-76. https://github.com/diffblue/cbmc Trace generation (if counterexample located) Verification Result 3/5/8 Verification of Linux-Kernel RCU 9
Applying CBMC to Linux-Kernel RCU Linux kernel (v4.3.6) Test Scaffolding Linux-kernel RCU Port & trim Linux-kernel RCU Run CBMC Reflects server-class RCU up to 6 CPUs with default config (32 or 64 CPUs w/non-default configs) Approximated interrupts & grace-period kthread w/handplaced function calls, specified per-loop unrolling limits Modeled per-cpu variables with arrays Modeled locks with CPROVER_atomic*() Memory consistency models: SC, TSO, and PSO Tested safety and a weak form of liveness 3/5/8 Source code: https://github.com/lihaol/verify-treercu 20
Healthy Skepticism 3/5/8 Verification of Linux-Kernel RCU 2
Healthy Skepticism Formal verification of Linux-kernel RCU? 3/5/8 Verification of Linux-Kernel RCU 22
Healthy Skepticism Formal verification of Linux-kernel RCU? Sure, I can also write printf( VERIFIED\n ); 3/5/8 Verification of Linux-Kernel RCU 23
Healthy Skepticism Formal verification of Linux-kernel RCU? Sure, I can also write printf( VERIFIED\n ); I therefore maintain bug-injected RCU versions https://paulmck.livejournal.com/46993.html 3/5/8 Verification of Linux-Kernel RCU 24
Healthy Skepticism Formal verification of Linux-kernel RCU? Sure, I can also write printf( VERIFIED\n ); I therefore maintain bug-injected RCU versions https://paulmck.livejournal.com/46993.html How did CBMC do? 3/5/8 Verification of Linux-Kernel RCU 25
Healthy Skepticism Formal verification of Linux-kernel RCU? Sure, I can also write printf( VERIFIED\n ); I therefore maintain bug-injected RCU versions https://paulmck.livejournal.com/46993.html How did CBMC do? Only 2 failures out of 30. Interrupt over-approximation, memory exhaustion Up to 90.4M SAT variables, 75GB, ~70 CPU hours Ran on 64-bit 2.4GHz Xeon, 2 cores & 96GB memory 3/5/8 Verification of Linux-Kernel RCU 26
Healthy Skepticism Formal verification of Linux-kernel RCU? Sure, I can also write printf( VERIFIED\n ); I therefore maintain bug-injected RCU versions https://paulmck.livejournal.com/46993.html How did CBMC do? Only 2 failures out of 30. Interrupt over-approximation, memory exhaustion Up to 90.4M SAT variables, 75GB, ~70 CPU hours Ran on 64-bit 2.4GHz Xeon, 2 cores & 96GB memory But did not find new-to-me bugs! 3/5/8 Verification of Linux-Kernel RCU 27
Summary and Challenges 3/5/8 Verification of Linux-Kernel RCU 28
Summary Linux-kernel RCU robustness is important Large installed base poses severe challenge First automated LK RCU formal verification Two other teams have since done similar work Formal verification in regression tests: Almost Future work: Find bugs I don't already know about! Nevertheless, this work demonstrates the nascent ability and potential of SAT-based formal-verification tools to handle real-world production-quality synchronization primitives 3/5/8 Verification of Linux-Kernel RCU 29
Challenges/Limitations/Future Work Better modeling of interrupts & kernel threads Model concurrent linked lists: call_rcu() Incorporate Linux-kernel memory model And/or ARM, PowerPC, RISC-V,... Forward progress: Detect hangs & deadlocks Can already detect unconditional hangs/deadlocks Fully analyze unbounded looping Or at least automatically derive unrolling bounds Larger programs: Automatic decomposition? 3/5/8 Verification of Linux-Kernel RCU 30
Additional Challenges Find bug in rcu_preempt_offline_tasks() http://paulmck.livejournal.com/37782.html Find bug in RCU_NO_HZ_FULL_SYSIDLE http://paulmck.livejournal.com/3806.html Find bug in RCU linked-list use cases http://paulmck.livejournal.com/39793.html Verification Challenge 6 http://paulmck.livejournal.com/46993.html Find bugs in other popular open-source SW 3/5/8 Verification of Linux-Kernel RCU 3
Legal Statement This work represents the view of the authors and does not necessarily represent the view of IBM or University of Oxford. IBM and IBM (logo) are trademarks or registered trademarks of International Business Machines Corporation in the United States and/or other countries. Linux is a registered trademark of Linus Torvalds. Other company, product, and service names may be trademarks or service marks of others. 3/5/8 Verification of Linux-Kernel RCU 32
Questions? 3/5/8 Verification of Linux-Kernel RCU 33