Verification of Tree-Based Hierarchical Read-Copy Update in the Linux Kernel

Similar documents
Does RCU Really Work?

Practical Experience With Formal Verification Tools

Formal Verification and Linux-Kernel Concurrency

Verification of Tree-Based Hierarchical Read-Copy Update in the Linux Kernel

Netconf RCU and Breakage. Paul E. McKenney IBM Distinguished Engineer & CTO Linux Linux Technology Center IBM Corporation

Introduction to RCU Concepts

The RCU-Reader Preemption Problem in VMs

arxiv: v2 [cs.lo] 18 Jan 2018

Abstraction, Reality Checks, and RCU

Read-Copy Update (RCU) for C++

Getting RCU Further Out Of The Way

Tracing and Linux-Kernel RCU

Testing real-time Linux: What to test and how.

Using a Malicious User-Level RCU to Torture RCU-Based Algorithms

Real-Time Response on Multicore Systems: It is Bigger Than You Think

Practical Experience With Formal Verification Tools

F-Soft: Software Verification Platform

Decoding Those Inscrutable RCU CPU Stall Warnings

RCU and C++ Paul E. McKenney, IBM Distinguished Engineer, Linux Technology Center Member, IBM Academy of Technology CPPCON, September 23, 2016

What Happened to the Linux-Kernel Memory Model?

When Do Real Time Systems Need Multiple CPUs?

Read-Copy Update in a Garbage Collected Environment. By Harshal Sheth, Aashish Welling, and Nihar Sheth

Formal Verification and Linux-Kernel Concurrency

Validating Core Parallel Software

Real-Time Response on Multicore Systems: It is Bigger Than I Thought

Introduction to RCU Concepts

Decoding Those Inscrutable RCU CPU Stall Warnings

Hazard Pointers. Safe Resource Reclamation for Optimistic Concurrency

Real Time vs. Real Fast

Frightening small children and disconcerting grown-ups: Concurrency in the Linux kernel

Multi-Core Memory Models and Concurrency Theory

Model Checking Embedded C Software using k-induction and Invariants

Tom Hart, University of Toronto Paul E. McKenney, IBM Beaverton Angela Demke Brown, University of Toronto

Jackson Marusarz Software Technical Consulting Engineer

Advances in Validation of Concurrent Software

Bounded Model Checking Of C Programs: CBMC Tool Overview

NVMe Over Fabrics: Scaling Up With The Storage Performance Development Kit

Outline. Low-Level Optimizations in the PowerPC/Linux Kernels. PowerPC Architecture. PowerPC Architecture

Typed Assembly Language for Implementing OS Kernels in SMP/Multi-Core Environments with Interrupts

C Code Verification based on the Extended Labeled Transition System Model

Bounded Model Checking of Concurrent Programs

Sleepable Read-Copy Update

Effective Data-Race Detection for the Kernel

Paul E. McKenney, Distinguished Engineer IBM Linux Technology Center

Advances in Validation of Concurrent Software

BITCOIN MINING IN A SAT FRAMEWORK

Paul E. McKenney, Distinguished Engineer IBM Linux Technology Center

Software Model Checking. Xiangyu Zhang

Parallel Methods for Verifying the Consistency of Weakly-Ordered Architectures. Adam McLaughlin, Duane Merrill, Michael Garland, and David A.

Proving Dekker with SPIN and PROMELA

Exploring mtcp based Single-Threaded and Multi-Threaded Web Server Design

The benefits and costs of writing a POSIX kernel in a high-level language

Verifying Concurrent Programs

Formal Verification for UML/SysML models

Stateless Model Checking of the Linux Kernel s Hierarchical Read-Copy-Update (Tree RCU)

NVIDIA COLLECTIVE COMMUNICATION LIBRARY (NCCL)

Is Parallel Programming Hard, And If So, What Can You Do About It?

Precision Time Protocol, and Sub-Microsecond Synchronization

Lawson M3 7.1 Large User Scaling on System i

Read-Copy Update (RCU) Validation and Verification for Linux

UNDERSTANDING PROGRAMMING BUGS IN ANSI-C SOFTWARE USING BOUNDED MODEL CHECKING COUNTER-EXAMPLES

Load-reserve / Store-conditional on POWER and ARM

What Is RCU? Guest Lecture for University of Cambridge

Extreme-Scale Operating Systems

Program Analysis and Code Verification

On partial order semantics for SAT/SMT-based symbolic encodings of weak memory concurrency

Intel Threading Building Blocks (Intel TBB) 2.1. In-Depth

Read-Copy Update (RCU) Don Porter CSE 506

David DeFlyer Class notes CS162 January 26 th, 2009

Hazard Pointers. Safe Resource Reclamation for Optimistic Concurrency

NightStar. NightView Source Level Debugger. Real-Time Linux Debugging and Analysis Tools BROCHURE

Multiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.

The ComFoRT Reasoning Framework

Simplicity Through Optimization

Lecture 1: Model Checking. Edmund Clarke School of Computer Science Carnegie Mellon University

Multiprocessor Systems. COMP s1

More on Verification and Model Checking

Software verification for ubiquitous computing

Abstraction techniques for Floating-Point Arithmetic

What Is RCU? Indian Institute of Science, Bangalore. Paul E. McKenney, IBM Distinguished Engineer, Linux Technology Center 3 June 2013

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

Bug Finding with Under-approximating Static Analyses. Daniel Kroening, Matt Lewis, Georg Weissenbacher

Non-blocking Array-based Algorithms for Stacks and Queues. Niloufar Shafiei

Read-Copy Update (RCU) Don Porter CSE 506

Intel Threading Tools

NVIDIA COLLECTIVE COMMUNICATION LIBRARY (NCCL)

Market Data Publisher In a High Frequency Trading Set up

<Insert Picture Here> Boost Linux Performance with Enhancements from Oracle

Supra-linear Packet Processing Performance with Intel Multi-core Processors

Advanced Operating Systems (CS 202)

System Wide Tracing User Need

Proposed Wording for Concurrent Data Structures: Hazard Pointer and Read Copy Update (RCU)

OpenMPDK and unvme User Space Device Driver for Server and Data Center

A Comparison of Relativistic and Reader-Writer Locking Approaches to Shared Data Access

Concurrency: State Models & Design Patterns

Scalable Program Analysis Using Boolean Satisfiability: The Saturn Project

Synchronization for Concurrent Tasks

Scalable Correct Memory Ordering via Relativistic Programming

Validating Core Parallel Software

Advanced Topic: Efficient Synchronization

Transcription:

Verification of Tree-Based Hierarchical Read-Copy Update in the Linux Kernel Paul E. McKenney, IBM Linux Technology Center Joint work with Lihao Liang*, Daniel Kroening, and Tom Melham, University of Oxford

What Is RCU? 3/5/8 Verification of Linux-Kernel RCU 2

What Is RCU? Synchronization primitive used in Linux kernel Heavily used, and gaining use elsewhere as well Some implementations do read-only traversal of linked data structures using exactly the same sequence of machine instructions used in the absence of updates Readers get excellent performance, scalability, Complex and highly concurrent implementation http://www.rdrop.com/users/paulmck/rcu/ 3/5/8 Verification of Linux-Kernel RCU 3

RCU Is Specialized: Area of Applicability Stale and inconsistent data OK 00% Updates Update-Mostly, Need Fresh Consistent Data (RCU Not So Good),2 Read-Write, Need Consistent Data (RCU Might Be OK) Read-Mostly, Need Consistent Data (RCU Is OK) Read-Mostly, Stale/Inconsistent OK (RCU Is Great!!!) 00% Reads Need fresh and consistent data. RCU provides ABA protection for update-friendly mechanisms 3/5/8 2. RCU provides bounded Verification wait-free of Linux-Kernel read-side primitives RCU for real-time use 4

What Exactly Does Heavily Used Mean? 3/5/8 Verification of Linux-Kernel RCU 5

Isn't Making Software Work A Solved Problem? Million-Year Bug: Once Per Million Years 975 NHS 3/5/8 Verification of Linux-Kernel RCU 6

Isn't Making Software Work A Solved Problem? Million-Year Bug: Once In Ten Millennia 00 0 975 NHS 985 Various 3/5/8 Verification of Linux-Kernel RCU 7

Isn't Making Software Work A Solved Problem? Million-Year Bug: Once Per Century 00 0 0K K 00 0 975 NHS 985 Various 995 SQNT 3/5/8 Verification of Linux-Kernel RCU 8

Isn't Making Software Work A Solved Problem? Million-Year Bug: Once a Month 0M 00 0 0K K 00 0 00K 0K K 00 0 975 NHS 985 Various 995 SQNT 2005 Linux 3/5/8 Verification of Linux-Kernel RCU 9

Isn't Making Software Work A Solved Problem? Million-Year Bug: Several Times per Day 0G 0M 0M 00 0 0K K 00 0 00K 0K K 00 0 00K 0K K 00 0 975 NHS 985 Various 995 SQNT 2005 Linux 205 Linux 3/5/8 Verification of Linux-Kernel RCU 0

Isn't Making Software Work A Solved Problem? Million-Year Bug: Several Times per Hour 0G 00G 0G 00 0 0K K 00 0 0M 00K 0K K 00 0 0M 00K 0K K 00 0 0M 00K 0K K 00 0 975 NHS 985 Various 995 SQNT 2005 Linux 205 Linux 207 Linux 3/5/8 Verification of Linux-Kernel RCU

Isn't Making Software Work A Solved Problem? Million-Year Bug? You don't want to know... 975 NHS 00 0 985 Various 0K K 00 0 995 SQNT 0M 00K 0K K 00 0 2005 Linux 0G 0M 00K 0K K 00 0 205 Linux 00G 0G 0M 00K 0K K 00 0 207 Linux T 00G 0G 0M 00K T 0K K 00 0 3/5/8 Verification of Linux-Kernel RCU 2 IoT

Isn't Making Software Work A Solved Problem? Million-Year Bug? You don't want to know... But Murphy has transitioned from nice guy to homicidal maniac!!! 975 NHS 00 0 985 Various 0K K 00 0 995 SQNT 0M 00K 0K K 00 0 2005 Linux 0G 0M 00K 0K K 00 0 205 Linux 00G 0G 0M 00K 0K K 00 0 207 Linux T 00G 0G 0M 00K T 0K K 00 0 3/5/8 Verification of Linux-Kernel RCU 3 IoT

Why Stress About Potential Low-Probability Bugs? Almost any bug can become a security exploit Internet: Physical presence no longer required Not restricted to software: Meltdown and Spectre RCU is not the only thing with empirical spec! RCU is low level does not imply low risk After all, Row Hammer hit DRAM! Might be a trillion IoT devices in the World Translates to huge numbers of failures Some of which might put the general public at risk RCU is well-contained test case for PoC 3/5/8 Verification of Linux-Kernel RCU 4

Why Not Try Formal Verification? 3/5/8 Verification of Linux-Kernel RCU 5

Why Not Try Formal Verification? In Linux-Kernel RCU's Regression Tests... 3/5/8 Verification of Linux-Kernel RCU 6

Formal Verification & Regression Tests: Requirements Either automatic or no translation Correctly handle environment: memory model! Reasonable memory and CPU overhead Map back to lines of code containing bugs Main input: source code under test Find relevant bugs 3/5/8 Verification of Linux-Kernel RCU 7

Scorecard for Linux-Kernel C Code Promela PPCMEM Herd CBMC Test () Automated (2) Handle environment (MM) (MM) (MM) (3) Low overhead SAT? (4) Map to source code (5) Modest input (6) Relevant bugs???????????? Paul McKenney's first use 993 20 204 205 973 3/5/8 Verification of Linux-Kernel RCU 8

CBMC (Very) Rough Schematic C Code Encode into logic expression CBMC SAT solver Kroening, Clarke, and Lerda, A tool for checking ANSI-C Programs, Tools and Algorithms for the Construction and Analysis of Systems, 2004, pp. 68-76. https://github.com/diffblue/cbmc Trace generation (if counterexample located) Verification Result 3/5/8 Verification of Linux-Kernel RCU 9

Applying CBMC to Linux-Kernel RCU Linux kernel (v4.3.6) Test Scaffolding Linux-kernel RCU Port & trim Linux-kernel RCU Run CBMC Reflects server-class RCU up to 6 CPUs with default config (32 or 64 CPUs w/non-default configs) Approximated interrupts & grace-period kthread w/handplaced function calls, specified per-loop unrolling limits Modeled per-cpu variables with arrays Modeled locks with CPROVER_atomic*() Memory consistency models: SC, TSO, and PSO Tested safety and a weak form of liveness 3/5/8 Source code: https://github.com/lihaol/verify-treercu 20

Healthy Skepticism 3/5/8 Verification of Linux-Kernel RCU 2

Healthy Skepticism Formal verification of Linux-kernel RCU? 3/5/8 Verification of Linux-Kernel RCU 22

Healthy Skepticism Formal verification of Linux-kernel RCU? Sure, I can also write printf( VERIFIED\n ); 3/5/8 Verification of Linux-Kernel RCU 23

Healthy Skepticism Formal verification of Linux-kernel RCU? Sure, I can also write printf( VERIFIED\n ); I therefore maintain bug-injected RCU versions https://paulmck.livejournal.com/46993.html 3/5/8 Verification of Linux-Kernel RCU 24

Healthy Skepticism Formal verification of Linux-kernel RCU? Sure, I can also write printf( VERIFIED\n ); I therefore maintain bug-injected RCU versions https://paulmck.livejournal.com/46993.html How did CBMC do? 3/5/8 Verification of Linux-Kernel RCU 25

Healthy Skepticism Formal verification of Linux-kernel RCU? Sure, I can also write printf( VERIFIED\n ); I therefore maintain bug-injected RCU versions https://paulmck.livejournal.com/46993.html How did CBMC do? Only 2 failures out of 30. Interrupt over-approximation, memory exhaustion Up to 90.4M SAT variables, 75GB, ~70 CPU hours Ran on 64-bit 2.4GHz Xeon, 2 cores & 96GB memory 3/5/8 Verification of Linux-Kernel RCU 26

Healthy Skepticism Formal verification of Linux-kernel RCU? Sure, I can also write printf( VERIFIED\n ); I therefore maintain bug-injected RCU versions https://paulmck.livejournal.com/46993.html How did CBMC do? Only 2 failures out of 30. Interrupt over-approximation, memory exhaustion Up to 90.4M SAT variables, 75GB, ~70 CPU hours Ran on 64-bit 2.4GHz Xeon, 2 cores & 96GB memory But did not find new-to-me bugs! 3/5/8 Verification of Linux-Kernel RCU 27

Summary and Challenges 3/5/8 Verification of Linux-Kernel RCU 28

Summary Linux-kernel RCU robustness is important Large installed base poses severe challenge First automated LK RCU formal verification Two other teams have since done similar work Formal verification in regression tests: Almost Future work: Find bugs I don't already know about! Nevertheless, this work demonstrates the nascent ability and potential of SAT-based formal-verification tools to handle real-world production-quality synchronization primitives 3/5/8 Verification of Linux-Kernel RCU 29

Challenges/Limitations/Future Work Better modeling of interrupts & kernel threads Model concurrent linked lists: call_rcu() Incorporate Linux-kernel memory model And/or ARM, PowerPC, RISC-V,... Forward progress: Detect hangs & deadlocks Can already detect unconditional hangs/deadlocks Fully analyze unbounded looping Or at least automatically derive unrolling bounds Larger programs: Automatic decomposition? 3/5/8 Verification of Linux-Kernel RCU 30

Additional Challenges Find bug in rcu_preempt_offline_tasks() http://paulmck.livejournal.com/37782.html Find bug in RCU_NO_HZ_FULL_SYSIDLE http://paulmck.livejournal.com/3806.html Find bug in RCU linked-list use cases http://paulmck.livejournal.com/39793.html Verification Challenge 6 http://paulmck.livejournal.com/46993.html Find bugs in other popular open-source SW 3/5/8 Verification of Linux-Kernel RCU 3

Legal Statement This work represents the view of the authors and does not necessarily represent the view of IBM or University of Oxford. IBM and IBM (logo) are trademarks or registered trademarks of International Business Machines Corporation in the United States and/or other countries. Linux is a registered trademark of Linus Torvalds. Other company, product, and service names may be trademarks or service marks of others. 3/5/8 Verification of Linux-Kernel RCU 32

Questions? 3/5/8 Verification of Linux-Kernel RCU 33