Coccinelle: Practical Program Transformation for the Linux Kernel. Julia Lawall (Inria/LIP6) June 25, 2018

Similar documents
Coccinelle: A Program Matching and Transformation Tool for Systems Code

Coccinelle: A Program Matching and Transformation Tool for Linux

Inside the Mind of a Coccinelle Programmer

Introduction to Coccinelle

Coccinelle: Bug Finding for the Linux Community

Advanced SmPL: Finding Missing IS ERR tests

How Double-Fetch Situations turn into Double-Fetch Vulnerabilities:

Coccinelle. Julia Lawall (Inria/Irill/LIP6) August 19, 2015

Coccinelle: Killing Driver Bugs Before They Hatch

Cross-checking Semantic Correctness: The Case of Finding File System Bugs

Writing drivers for the Linux Crypto subsystem

Understanding the Genetic Makeup of Linux Device Drivers

A Foundation for Flow-Based Program Matching

DR. CHECKER. A Soundy Analysis for Linux Kernel Drivers. University of California, Santa Barbara. USENIX Security seclab

Automatic Mining of Functionally Equivalent Code Fragments via Random Testing. Lingxiao Jiang and Zhendong Su

Free/Open Source Software: scientific and technological challenges for complex systems engineering

Radix Tree, IDR APIs and their test suite. Rehas Sachdeva & Sandhya Bankar

Finding User/Kernel Pointer Bugs with Type Inference p.1

An Evolutionary Study of Linux Memory Management for Fun and Profit

Unleashing D* on Android Kernel Drivers. Aravind Machiry

Linux for Safety Critical Applications: Hunting down bug patterns

Increasing Automation in the Backporting of Linux Drivers Using Coccinelle

Implementing PID allocation Talk Title Here with the IDR API. Author Name, Company Gargi Sharma, Outreachy Intern

13th ANNUAL WORKSHOP 2017 VERBS KERNEL ABI. Liran Liss, Matan Barak. Mellanox Technologies LTD. [March 27th, 2017 ]

Concurrency and Race Conditions (Linux Device Drivers, 3rd Edition (

Verification & Validation of Open Source

Coccigrep: a semantic grep for C language

Diagnosys: Automatic Generation of a Debugging Interface to the Linux Kernel

Hoare logic. A proof system for separation logic. Introduction. Separation logic

Chapter 12 IoT Projects Case Studies. Lesson-01: Introduction

The Note/1 Driver Design

Faults in Linux 2.6. Nicolas Palix, Gaël Thomas, Suman Saha, Christophe Calvès, Gilles Muller, Julia L. Lawall. To cite this version:

OS lpr. www. nfsd gcc emacs ls 1/27/09. Process Management. CS 537 Lecture 3: Processes. Example OS in operation. Why Processes? Simplicity + Speed

Coccinelle Usage (version 0.1.7)

Introduction to Linux Device Drivers Recreating Life One Driver At a Time

Processes (Intro) Yannis Smaragdakis, U. Athens

On Reasoning about Finite Sets in Software Checking

Dissecting a 17-year-old kernel bug

Bill Bridge. Oracle Software Architect NVM support for C Applications

Generic Patch Inference

Hunting Down Data Races in the Linux Kernel

Software Model Checking. Xiangyu Zhang

Formal methods for software security

Having a BLAST with SLAM

Precise Garbage Collection for C. Jon Rafkind * Adam Wick + John Regehr * Matthew Flatt *

Finding Error Handling Bugs in OpenSSL using Coccinelle

Extensions to Barrelfish Asynchronous C

Toolbox Kernel Korner

Checking System Rules Using System-Specific, Programmer- Written Compiler Extensions

Generic patch inference

Clock-directed Modular Code-generation for Synchronous Data-flow Languages

So you want to write a Linux driver subsystem? Michael Turquette

Asynchronous Events on Linux

Using kgdb and the kgdb Internals

Implementation and Evaluation of the Synchronization Protocol Immediate Priority Ceiling in PREEMPT-RT Linux

SmPL: A Domain-Specic Language for Specifying Collateral Evolutions in Linux Device Drivers

Register Allocation, iii. Bringing in functions & using spilling & coalescing

CS Lab 0.5: Fish

Linux DRM Developer s Guide

Towards Easing the Diagnosis of Bugs in OS Code

Subroutines. int main() { int i, j; i = 5; j = celtokel(i); i = j; return 0;}

Scripting Linux system calls with Lua. Lua Workshop 2018 Pedro Tammela CUJO AI

New features in AddressSanitizer. LLVM developer meeting Nov 7, 2013 Alexey Samsonov, Kostya Serebryany

Static Analysis with Goanna

Outline. Introduction. Survey of Device Driver Management in Real-Time Operating Systems

Identifying Linux Bug Fixing Patches

Documenting and Automating Collateral Evolutions in Linux Device Drivers

MC: Meta-level Compilation

The Spin Model Checker : Part I/II

Motivation. What s the Problem? What Will we be Talking About? What s the Solution? What s the Problem?

DEBUGGING: STATIC ANALYSIS

Bachelor s Thesis : Finding Bugs in Open Source Software using Coccinelle

Differential program verification

Final Exam. Fall Semester 2016 KAIST EE209 Programming Structures for Electrical Engineering. Name: Student ID:

Overview. This Lecture. Interrupts and exceptions Source: ULK ch 4, ELDD ch1, ch2 & ch4. COSC440 Lecture 3: Interrupts 1

TCSS 422: OPERATING SYSTEMS

Creating a system call in Linux. Tushar B. Kute,

3L: Learning Linux Logging (Extended Abstract for BENEVOL 15)

Verifying Concurrent Programs

TPMC901-SW-95. QNX4 - Neutrino Device Driver. User Manual. The Embedded I/O Company. 6/4/2 Channel Extended CAN-Bus PMC

Xen/paravirt_ops upstreaming Jeremy Fitzhardinge XenSource 04/18/07

APISan: Sanitizing API Usages through Semantic Cross-checking

The Process Abstraction. CMPU 334 Operating Systems Jason Waterman

Turning proof assistants into programming assistants

Stephen McLaughlin. From Uncertainty to Belief: Inferring the Specification Within

CS201 Some Important Definitions

Aalborg Universitet. Published in: Science of Computer Programming. DOI (link to publication from Publisher): /j.scico

Having a BLAST with SLAM

Maintaining an Out-of-Tree Driver and an Upstream Driver Simultaneously (with minimal pain)

www nfsd emacs lpr Process Management CS 537 Lecture 4: Processes Example OS in operation Why Processes? Simplicity + Speed

DDMD AND AUTOMATED CONVERSION FROM C++ TO D

MIPS Programming. A basic rule is: try to be mechanical (that is, don't be "tricky") when you translate high-level code into assembler code.

C#.Net. Course Contents. Course contents VT BizTalk. No exam, but laborations

APISan: Sanitizing API Usages through Semantic Cross-Checking

RCU in the Linux Kernel: One Decade Later

EECE.4810/EECE.5730: Operating Systems Spring 2017 Homework 2 Solution

Static program checking and verification

Compiler Construction Lecture 05 A Simple Stack Machine. Lent Term, Lecturer: Timothy G. Griffin. Computer Laboratory University of Cambridge

Operating Systems (234123) Spring 2017 (Homework Wet 1) Homework 1 Wet

MEMORY MANAGEMENT TEST-CASE GENERATION OF C PROGRAMS USING BOUNDED MODEL CHECKING

Transcription:

Coccinelle: Practical Program Transformation for the Linux Kernel Julia Lawall (Inria/LIP6) June 25, 2018 1

Motivation Large, critical infrastructure-software code bases Linux kernel, OpenSSL, Qemu, Firefox, etc. Frequent changes to add features, improve performance, etc. Large, diverse developer bases. Issues: How to impose API improvements on the entire code base? (Collateral evolutions) How to ensure that a bug found in one place is fixed everywhere? How to explore the code structure? 2

Motivation Large, critical infrastructure-software code bases Linux kernel, OpenSSL, Qemu, Firefox, etc. Frequent changes to add features, improve performance, etc. Large, diverse developer bases. Issues: How to impose API improvements on the entire code base? (Collateral evolutions) How to ensure that a bug found in one place is fixed everywhere? How to explore the code structure? 3

Motivation Large, critical infrastructure-software code bases Linux kernel, OpenSSL, Qemu, Firefox, etc. Frequent changes to add features, improve performance, etc. Large, diverse developer bases. Issues: How to impose API improvements on the entire code base? (Collateral evolutions) How to ensure that a bug found in one place is fixed everywhere? How to explore the code structure? 4

Example Evolution: A new function: kzalloc = Collateral evolution: Merge kmalloc and memset into kzalloc fh = kmalloc(sizeof(struct zoran_fh), GFP_KERNEL); if (!fh) { dprintk(1, KERN_ERR "%s: zoran_open(): allocation of zoran_fh failed\n", ZR_DEVNAME(zr)); return -ENOMEM; } memset(fh, 0, sizeof(struct zoran_fh)); 5

Example Evolution: A new function: kzalloc = Collateral evolution: Merge kmalloc and memset into kzalloc fh = kzalloc(sizeof(struct zoran_fh), GFP_KERNEL); if (!fh) { dprintk(1, KERN_ERR "%s: zoran_open(): allocation of zoran_fh failed\n", ZR_DEVNAME(zr)); return -ENOMEM; } memset(fh, 0, sizeof(struct zoran_fh)); 6

Example Evolution: A new function: kzalloc = Collateral evolution: Merge kmalloc and memset into kzalloc fh = kzalloc(sizeof(struct zoran_fh), GFP_KERNEL); if (!fh) { dprintk(1, KERN_ERR "%s: zoran_open(): allocation of zoran_fh failed\n", ZR_DEVNAME(zr)); return -ENOMEM; } memset(fh, 0, sizeof(struct zoran_fh)); Originally, hundreds of kmalloc and memset calls 7

Example Bug: Reference count mismanagement for_each iterator increments the reference count of the current element and decrements the reference count of the previous one. return escapes, skipping the decrement. = A memory leak. 8

Example for_each_child_of_node(np, child) {... ret = of_property_read_u32(child, "reg", &reg); if (ret) { dev_err(dev, "Failed to get reg property\n"); + of_node_put(child); return ret; } if (reg >= MX25_NUM_CFGS) { dev_err(dev, "reg value is greater than the number of available...\n"); + of_node_put(child); return -EINVAL; }... } 56 instances in linux-next (June 22, 2018) 9

Example Program structure: If a function is defined in a header files and used in multiple.c files, then its implementation is duplicated, wasting code space. Is this a problem in practice? codel_impl.h: 202 lines: codel_vars_init: sz: 4 callers: 3 codel_stats_init: sz: 4 callers: 3 codel_params_init: sz: 7 callers: 3 codel_dequeue: sz: 112 callers: 3 10

Existing tools Collateral evolutions Refactoring tools in various IDEs Fixed set of semantics-preserving transformations Bug finding Metal/Coverity [OSDI 2001], SLAM/SMV [SPIN 2001], Flawfinder, etc. Often black box, no support for bug fixing. Visitors CIL, LLVM/Clang Configuration-specific, internal representation. 11

Coccinelle to the rescue! 12

What is Coccinelle? Pattern-based language (SmPL) for matching and transforming C code Under development since 2005. Open source since 2008. Allows code changes to be expressed using patch-like code patterns (semantic patches). 13

Semantic patches Like patches, but independent of irrelevant details (line numbers, spacing, variable names, etc.) Derived from code, with abstraction. Goal: fit with the existing habits of the Linux programmer. 14

Semantic patch example @@ expression x,e1,e2; @@ - x = kmalloc(e1,e2); + x = kzalloc(e1,e2);... - memset(x, 0, E1); 15

Creating a semantic patch: kmalloc kzalloc Start with an example fh = kmalloc(sizeof(struct zoran_fh), GFP_KERNEL); if (!fh) { dprintk(1, KERN_ERR "%s: zoran_open(): allocation of zoran_fh failed\n", ZR_DEVNAME(zr)); return -ENOMEM; } memset(fh, 0, sizeof(struct zoran_fh)); 16

Creating a semantic patch: kmalloc kzalloc Eliminate irrelevant code fh = kmalloc(sizeof(struct zoran_fh), GFP_KERNEL); if (!fh) { dprintk(1, KERN_ERR... "%s: zoran_open() - allocation of zoran_fh failed\n", ZR_DEVNAME(zr)); return -ENOMEM; } memset(fh, 0, sizeof(struct zoran_fh)); 17

Creating a semantic patch: kmalloc kzalloc Describe transformations @@ expression x; expression E1,E2,E3; identifier f; @@ - fh = kmalloc(sizeof(struct zoran_fh), GFP_KERNEL); + fh = kzalloc(sizeof(struct zoran_fh), GFP_KERNEL);... - memset(fh, 0, sizeof(struct zoran_fh)); 18

Creating a semantic patch: kmalloc kzalloc Abstract over subterms @@ expression x; expression E1,E2;E3; identifier f; @@ - x = kmalloc(e1,e2); + x = kzalloc(e1,e2);... - memset(x, 0, E1); 19

One result int init snd_seq_oss_create_client(void) { int rc; struct snd_seq_port_info *port; struct snd_seq_port_callback port_callback; - port = kmalloc(sizeof(*port), GFP_KERNEL); + port = kzalloc(sizeof(*port), GFP_KERNEL); if (!port) { rc = -ENOMEM; goto error; } rc = snd_seq_create_kernel_client(null, SNDRV_SEQ_CLIENT_OSS,...); if (rc < 0) goto error; system_client = rc; - memset(port, 0, sizeof(*port));... } Correct change 20

One result int init snd_seq_oss_create_client(void) { int rc; struct snd_seq_port_info *port; struct snd_seq_port_callback port_callback; - port = kmalloc(sizeof(*port), GFP_KERNEL); + port = kzalloc(sizeof(*port), GFP_KERNEL); if (!port) { rc = -ENOMEM; goto error; } rc = snd_seq_create_kernel_client(null, SNDRV_SEQ_CLIENT_OSS,...); if (rc < 0) goto error; system_client = rc; - memset(port, 0, sizeof(*port));... } Correct change 21

Another result - inblockdata = kmalloc(blocksize, gfp_mask); + inblockdata = kzalloc(blocksize, gfp_mask); if (inblockdata == NULL) goto err_free_cipher;... inblock.data = (char *) inblockdata; inblock.len = blocksize;... if (in_constant->len == inblock.len) { memcpy(inblock.data, in_constant->data, inblock.len); } else { krb5_nfold(in_constant->len * 8, in_constant->data, inblock.len * 8, inblock.data); }... - memset(inblockdata, 0, blocksize); kfree(inblockdata); False positive 22

Another result - inblockdata = kmalloc(blocksize, gfp_mask); + inblockdata = kzalloc(blocksize, gfp_mask); if (inblockdata == NULL) goto err_free_cipher;... inblock.data = (char *) inblockdata; inblock.len = blocksize;... if (in_constant->len == inblock.len) { memcpy(inblock.data, in_constant->data, inblock.len); } else { krb5_nfold(in_constant->len * 8, in_constant->data, inblock.len * 8, inblock.data); }... - memset(inblockdata, 0, blocksize); kfree(inblockdata); False positive 23

Creating a semantic patch: kmalloc kzalloc Refinement @@ expression x; expression E1,E2,E3; identifier f; type T; @@ - x = kmalloc(e1,e2); + x = kzalloc(e1,e2);... when!= E3 = (T)x when!= (<+...x...+>) = E3 when!= f(...,x,...) - memset(x, 0, E1); 24

Results Correctly updates 9 occurrences No false positives 25

Results Correctly updates 9 occurrences No false positives Other opportunities: acpi_os_allocate acpi_os_allocate_zeroed dma_pool_alloc dma_pool_zalloc dma_alloc_coherent dma_zalloc_coherent kmem_cache_alloc kmem_cache_zalloc pci_alloc_consistent pci_zalloc_consistent vmalloc vzalloc vmalloc_node vzalloc_node 25

Semantic patch example @@ expression root,e,e1; local idexpression child; iterator name for_each_child_of_node; @@ for_each_child_of_node(root, child) {... when!= of_node_put(child) when!= e = child ( return <+...child...+>; + of_node_put(child);? return e1; )... } 26

Semantic patch example @h@ identifier f; position p; type t; @@ ( inline t f(...) {... } t f@p(...) {... } ) @script:ocaml@ p << h.p; f << h.f; @@ if in_header p then add_def f p @@ identifier f; position p : script:ocaml(f) { in_c p && make_local f }; @@ f@p(...) {... } @@ identifier f; position p : script:ocaml(f) { in_c p && add_call f p }; @@ f(...)@p 27

How does it work? 28

Requirements Reason about possible execution paths. Suggests matching against paths in a control-flow graph. Define the language by translation to CTL [Lacey et al POPL 03]. Keep track of different variables. Multiple kmallocs before any memset. for_each iterators can be nested. Our contribution (CTL-V model checking algorithm). Collect transformation information Where to transform? What transformation to carry out? Our contribution (CTL-VW). 29

Requirements Reason about possible execution paths. Suggests matching against paths in a control-flow graph. Define the language by translation to CTL [Lacey et al POPL 03]. Keep track of different variables. Multiple kmallocs before any memset. for_each iterators can be nested. Our contribution (CTL-V model checking algorithm). Collect transformation information Where to transform? What transformation to carry out? Our contribution (CTL-VW). 30

Requirements Reason about possible execution paths. Suggests matching against paths in a control-flow graph. Define the language by translation to CTL [Lacey et al POPL 03]. Keep track of different variables. Multiple kmallocs before any memset. for_each iterators can be nested. Our contribution (CTL-V model checking algorithm). Collect transformation information Where to transform? What transformation to carry out? Our contribution (CTL-VW). 31

Coccinelle architecture Semantic patch C code AX(A[(ϕ 1 ϕ 2 )Uϕ 3 ]...) CTL formula Control Flow Graph Model checking algorithm Identification of the nodes to be modified Modification of the identified code 32

CTL translation and model checking Semantic patch f(...); - g(...); s 1 f(1) s 2 s 3 CTL representation f(...) AX g(...) g(1,2) g(1,3) Model checking algorithm SAT(g(...)) = {s 2, s 3 } SAT(AX g(...)) = {s 1 } SAT(f(...) AX g(...)) = {s 1 } 33

CTL translation and model checking Semantic patch f(...); - g(...); s 1 f(1) s 2 s 3 CTL representation f(...) AX g(...) g(1,2) g(1,3) Model checking algorithm SAT(g(...)) = {s 2, s 3 } SAT(AX g(...)) = {s 1 } SAT(f(...) AX g(...)) = {s 1 } 34

CTL translation and model checking Semantic patch f(...); - g(...); s 1 f(1) s 2 s 3 CTL representation f(...) AX g(...) g(1,2) g(1,3) Model checking algorithm SAT(g(...)) = {s 2, s 3 } SAT(AX g(...)) = {s 1 } SAT(f(...) AX g(...)) = {s 1 } 35

CTL translation and model checking Semantic patch f(...); - g(...); s 1 f(1) s 2 s 3 CTL representation f(...) AX g(...) g(1,2) g(1,3) Model checking algorithm SAT(g(...)) = {s 2, s 3 } SAT(AX g(...)) = {s 1 } SAT(f(...) AX g(...)) = {s 1 } 36

Adding an environment (CTL-FV) Semantic patch f(x); - g(x,y); s 1 f(1) s 2 s 3 CTL representation f(x) AX g(x, y) g(1,2) g(1,2) Model checking algorithm SAT(g(x, y)) = {(s 2, [x 1, y 2]); (s 3, [x 1, y 2])} SAT(AX g(x, y)) = {(s 1, [x 1, y 2])} SAT(f(x)) = {(s 1, [x 1])} SAT(f(x) AX g(x, y)) = {(s 1, [x 1, y 2])} Problem: y has to be the same everywhere. 37

Adding an environment (CTL-FV) Semantic patch f(x); - g(x,y); s 1 f(1) s 2 s 3 CTL representation f(x) AX g(x, y) g(1,2) g(1,2) Model checking algorithm SAT(g(x, y)) = {(s 2, [x 1, y 2]); (s 3, [x 1, y 2])} SAT(AX g(x, y)) = {(s 1, [x 1, y 2])} SAT(f(x)) = {(s 1, [x 1])} SAT(f(x) AX g(x, y)) = {(s 1, [x 1, y 2])} Problem: y has to be the same everywhere. 38

Adding an environment (CTL-FV) Semantic patch f(x); - g(x,y); s 1 f(1) s 2 s 3 CTL representation f(x) AX g(x, y) g(1,2) g(1,2) Model checking algorithm SAT(g(x, y)) = {(s 2, [x 1, y 2]); (s 3, [x 1, y 2])} SAT(AX g(x, y)) = {(s 1, [x 1, y 2])} SAT(f(x)) = {(s 1, [x 1])} SAT(f(x) AX g(x, y)) = {(s 1, [x 1, y 2])} Problem: y has to be the same everywhere. 39

Adding an environment (CTL-FV) Semantic patch f(x); - g(x,y); s 1 f(1) s 2 s 3 CTL representation f(x) AX g(x, y) g(1,2) g(1,2) Model checking algorithm SAT(g(x, y)) = {(s 2, [x 1, y 2]); (s 3, [x 1, y 2])} SAT(AX g(x, y)) = {(s 1, [x 1, y 2])} SAT(f(x)) = {(s 1, [x 1])} SAT(f(x) AX g(x, y)) = {(s 1, [x 1, y 2])} Problem: y has to be the same everywhere. 40

Adding an environment (CTL-FV) Semantic patch f(x); - g(x,y); s 1 f(1) s 2 s 3 CTL representation f(x) AX g(x, y) g(1,2) g(1,2) Model checking algorithm SAT(g(x, y)) = {(s 2, [x 1, y 2]); (s 3, [x 1, y 2])} SAT(AX g(x, y)) = {(s 1, [x 1, y 2])} SAT(f(x)) = {(s 1, [x 1])} SAT(f(x) AX g(x, y)) = {(s 1, [x 1, y 2])} Problem: y has to be the same everywhere. 41

Example Semantic patch @@ expression root,e,e1; local idexpression child; iterator name for_each_child_of_node; @@ for_each_child_of_node(root, child) {... when!= of_node_put(child) when!= e = child ( return <+...child...+>; + of_node_put(child);? return e1; )... } C code for_each_child_of_node(np, child) {... ret = of_property_read_u32(child, "reg", &reg); if (ret) { dev_err(dev, "Failed to get reg property"); return ret; } if (reg >= MX25_NUM_CFGS) { dev_err(dev, "reg value is greater than the..."); return -EINVAL; }... } 42

Adding existential quantification (CTL-V) Semantic patch f(x); - g(x,y); s 1 f(1) s 2 s 3 CTL representation g(1,2) x.(f(x) AX y. g(x, y)) g(1,3) Model checking algorithm SAT(g(x, y)) = {(s 2, [x 1, y 2]); (s 3, [x 1, y 3])} SAT( y.g(x, y)) = {(s 2, [x 1]); (s 3, [x 1])} SAT(AX y.g(x, y)) = {(s 1, [x 1])} SAT(f(x) AX y.g(x, y)) = {(s 1, [x 1])} x.(sat(f(x) AX y.g(x, y))) = {(s 1, )} 43

Adding existential quantification (CTL-V) Semantic patch f(x); - g(x,y); s 1 f(1) s 2 s 3 CTL representation g(1,2) x.(f(x) AX y. g(x, y)) g(1,3) Model checking algorithm SAT(g(x, y)) = {(s 2, [x 1, y 2]); (s 3, [x 1, y 3])} SAT( y.g(x, y)) = {(s 2, [x 1]); (s 3, [x 1])} SAT(AX y.g(x, y)) = {(s 1, [x 1])} SAT(f(x) AX y.g(x, y)) = {(s 1, [x 1])} x.(sat(f(x) AX y.g(x, y))) = {(s 1, )} 44

Adding existential quantification (CTL-V) Semantic patch f(x); - g(x,y); s 1 f(1) s 2 s 3 CTL representation g(1,2) x.(f(x) AX y. g(x, y)) g(1,3) Model checking algorithm SAT(g(x, y)) = {(s 2, [x 1, y 2]); (s 3, [x 1, y 3])} SAT( y.g(x, y)) = {(s 2, [x 1]); (s 3, [x 1])} SAT(AX y.g(x, y)) = {(s 1, [x 1])} SAT(f(x) AX y.g(x, y)) = {(s 1, [x 1])} x.(sat(f(x) AX y.g(x, y))) = {(s 1, )} 45

Adding existential quantification (CTL-V) Semantic patch f(x); - g(x,y); s 1 f(1) s 2 s 3 CTL representation g(1,2) x.(f(x) AX y. g(x, y)) g(1,3) Model checking algorithm SAT(g(x, y)) = {(s 2, [x 1, y 2]); (s 3, [x 1, y 3])} SAT( y.g(x, y)) = {(s 2, [x 1]); (s 3, [x 1])} SAT(AX y.g(x, y)) = {(s 1, [x 1])} SAT(f(x) AX y.g(x, y)) = {(s 1, [x 1])} x.(sat(f(x) AX y.g(x, y))) = {(s 1, )} 46

Adding existential quantification (CTL-V) Semantic patch f(x); - g(x,y); s 1 f(1) s 2 s 3 CTL representation g(1,2) x.(f(x) AX y. g(x, y)) g(1,3) Model checking algorithm SAT(g(x, y)) = {(s 2, [x 1, y 2]); (s 3, [x 1, y 3])} SAT( y.g(x, y)) = {(s 2, [x 1]); (s 3, [x 1])} SAT(AX y.g(x, y)) = {(s 1, [x 1])} SAT(f(x) AX y.g(x, y)) = {(s 1, [x 1])} x.(sat(f(x) AX y.g(x, y))) = {(s 1, )} 47

Adding witnesses (CTL-VW) Goal: collect information about how and where to transform Semantic patch s 1 f(1) f(x); - g(x,y); CTL representation s 2 s 3 g(1,2) g(1,3) x.(f(x) AX y. g(x, y)) Model checking algorithm SAT(g(x, y)) = {(s 2, [x 1, y 2], ()); (s 3, [x 1, y 3], ())} SAT( y.g(x, y)) = {(s 2, [x 1], ( s 2, [y 2], () )); (s 3, [x 1], ( s 3, [y 3], () ))} SAT(AX y.g(x, y)) = {(s 1, [x 1], ( s 2, [y 2], (), s 3, [y 3], () ))} 48

Adding witnesses (CTL-VW) Goal: collect information about how and where to transform Semantic patch s 1 f(1) f(x); - g(x,y); CTL representation s 2 s 3 g(1,2) g(1,3) x.(f(x) AX y. g(x, y)) Model checking algorithm SAT(g(x, y)) = {(s 2, [x 1, y 2], ()); (s 3, [x 1, y 3], ())} SAT( y.g(x, y)) = {(s 2, [x 1], ( s 2, [y 2], () )); (s 3, [x 1], ( s 3, [y 3], () ))} SAT(AX y.g(x, y)) = {(s 1, [x 1], ( s 2, [y 2], (), s 3, [y 3], () ))} 49

Adding witnesses (CTL-VW) Goal: collect information about how and where to transform Semantic patch s 1 f(1) f(x); - g(x,y); CTL representation s 2 s 3 g(1,2) g(1,3) x.(f(x) AX y. g(x, y)) Model checking algorithm SAT(g(x, y)) = {(s 2, [x 1, y 2], ()); (s 3, [x 1, y 3], ())} SAT( y.g(x, y)) = {(s 2, [x 1], ( s 2, [y 2], () )); (s 3, [x 1], ( s 3, [y 3], () ))} SAT(AX y.g(x, y)) = {(s 1, [x 1], ( s 2, [y 2], (), s 3, [y 3], () ))} 50

Adding witnesses (CTL-VW) Goal: collect information about how and where to transform Semantic patch s 1 f(1) f(x); - g(x,y); CTL representation s 2 s 3 g(1,2) g(1,3) x.(f(x) AX y. g(x, y)) Model checking algorithm SAT( x.(f(x) AX y. g(x, y))) = {(s 1,, ( s 1, [x 1], ( s 2, [y 2], (), s 3, [y 3], () ) ))} 51

Witnessing transformations Semantic patch f(x); - g(x,y); s 1 f(1) s 2 s 3 CTL representation g(1,2) g(1,3) x.(f(x) AX ( y. g(x, y) v. matches( g(x, y), v))) Model checking SAT(g(x, y) v. matches( g(x, y), v)) = xxx {(s 2, [x 1, y 2], ( s 2, [v g(x, y) ], () )); s 3, [x 1, y 3], ( s 3, [v g(x, y) ], () ))} Result {(s 1,, ( s 1, [x 1], ( s 2, [y 2], ( s 2, [v g(x, y) ], () ), s 3, [y 3], ( s 3, [v g(x, y) ], () ) ))} 52

Practical application 53

Usage in the Linux kernel Coccinelle developers Dedicated user Kernel maintainers Outreachy interns 0-day Others # of commits 400 200 0 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 54

Impact: Changed lines Removed lines Added lines lines of code (log scale) 10 5 10 3 10 1 arch block crypto drivers fs include init ipc kernel lib mm net samples security sound tools virt 55

Impact: Maintainer use number 300 200 100 Cleanups Bug fixes 0 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 45% of maintainers who have at least one commit touching at least 100 files have at some point used Coccinelle. 56

Impact: Maintainer use examples TTY. Remove an unused function argument. 11 affected files. DRM. Eliminate a redundant field in a data structure. 54 affected files. Interrupts. Prepare to remove the irq argument from interrupt handlers, and then remove that argument. 188 affected files. 57

Impact: Intel s 0-day build-testing service 59 semantic patches in the Linux kernel with a dedicated make target. # with patches # with message only 400 200 0 200 100 0 api free iterators locks null tests misc 2013 2014 2015 2016 2017 2013 2014 2015 2016 2017 58

Performance elapsed time (sec.) number of files 4,000 2,000 0 40,000 20,000 0 2 cores (4 threads) semantic patches files considered semantic patches Based on the 59 semantic patches in the Linux kernel. 59

Coccinelle community 25 contributors Most at Inria, due to use of OCaml and PL concepts. Active mailing list. Availability Packaged for many Linux distros. Use outside Linux RIOT, systemd, qemu, etc. 60

Summary: Theoretical results Semantics and model checking algorithm for CTL with environments (CTL-V) Soundness and completeness proved. Proof validated with Coq. Semantics and model checking algorithm for CTL with environments and witnesses (CTL-VW) Soundness and completeness proved. Not yet validated with Coq. More details in POPL 2009. 61

Summary: Practical results Collateral evolutions Semantic patches for over 60 collateral evolutions. Applied to over 5800 Linux files from various versions, with a success rate of 100% on 93% of the files. Bug finding Generic bug types: Null pointer dereference, initialization of unused variables,!x&y, etc. Bugs in the use of Linux APIs: Incoherent error checking, memory leaks, etc. 6000 patches created using Coccinelle accepted into Linux Used by other Linux kernel developers 62

Conclusion A patch-like program matching and transformation language Simple and efficient extension of CTL that is useful for this domain Accepted by Linux developers Future work Programming languages other than C Semantic patch inference Coccinelle is publicly available http://coccinelle.lip6.fr/ 63