A concrete memory model for CompCert

Similar documents
An Abstract Stack Based Approach to Verified Compositional Compilation to Machine Code

Formal verification of program obfuscations

CompCertS: A Memory-Aware Verified C Compiler using Pointer as Integer Semantics

A Verified CompCert Front-End for a Memory Model Supporting Pointer Arithmetic and Uninitialised Data

Formal C semantics: CompCert and the C standard

A Formally-Verified C static analyzer

Formal verification of a static analyzer based on abstract interpretation

Formal verification of a C-like memory model and its uses for verifying program transformations

An Operational and Axiomatic Semantics for Non-determinism and Sequence Points in C

Verasco: a Formally Verified C Static Analyzer

INFORMATION AND COMMUNICATION TECHNOLOGIES (ICT) PROGRAMME. Project FP7-ICT-2009-C CerCo. Report n. D2.2 Prototype implementation

Formal verification of an optimizing compiler

How much is a mechanized proof worth, certification-wise?

Validating LR(1) parsers

Portable Software Fault Isolation

A Formal C Memory Model Supporting Integer-Pointer Casts

Undefinedness and Non-determinism in C

School of Computer & Communication Sciences École Polytechnique Fédérale de Lausanne

Aliasing restrictions of C11 formalized in Coq

Type checking. Jianguo Lu. November 27, slides adapted from Sean Treichler and Alex Aiken s. Jianguo Lu November 27, / 39

Introduction A Tiny Example Language Type Analysis Static Analysis 2009

Static Program Analysis Part 1 the TIP language

A Formal C Memory Model Supporting Integer-Pointer Casts

Certified Memory Usage Analysis

CSE 403: Software Engineering, Fall courses.cs.washington.edu/courses/cse403/16au/ Static Analysis. Emina Torlak

Synchronous Specification

Certified compilers. Do you trust your compiler? Testing is immune to this problem, since it is applied to target code

CMPT 379 Compilers. Parse trees

CSC C69: OPERATING SYSTEMS

The CompCert verified compiler

Formal proofs of code generation and verification tools

Œuf: Minimizing the Coq Extraction TCB

A Formally-Verified C Static Analyzer

CS164: Programming Assignment 5 Decaf Semantic Analysis and Code Generation

Formal Verication of C++ Object Construction and Destruction

Verifying a Lustre Compiler Part 2

18-642: Code Style for Compilers

CS Programming In C

Outline. Analyse et Conception Formelle. Lesson 7. Program verification methods. Disclaimer. The basics. Definition 2 (Specification)

Building Verified Program Analyzers in Coq

GO - OPERATORS. This tutorial will explain the arithmetic, relational, logical, bitwise, assignment and other operators one by one.

Work-in-progress: "Verifying" the Glasgow Haskell Compiler Core language

Introduction. Following are the types of operators: Unary requires a single operand Binary requires two operands Ternary requires three operands

Pointers. Chapter 8. Decision Procedures. An Algorithmic Point of View. Revision 1.0

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics

Static and User-Extensible Proof Checking. Antonis Stampoulis Zhong Shao Yale University POPL 2012

Translation Validation for a Verified OS Kernel

Calculating Correct Compilers

Proving liveness. Alexey Gotsman IMDEA Software Institute

Formal semantics and other research around Spark2014

A New Verified Compiler Backend for CakeML. Yong Kiam Tan* Magnus O. Myreen Ramana Kumar Anthony Fox Scott Owens Michael Norrish

Operational Semantics of Cool

Chapter 2. Computer Abstractions and Technology. Lesson 4: MIPS (cont )

Lecture 4 September Required reading materials for this class

Malloc Lab & Midterm Solutions. Recitation 11: Tuesday: 11/08/2016

Operational Semantics. One-Slide Summary. Lecture Outline

Data Storage. August 9, Indiana University. Geoffrey Brown, Bryce Himebaugh 2015 August 9, / 19

Lecture Compiler Middle-End

Static Program Analysis

Mechanized Verification of SSA-based Compilers Techniques

Binary Code Analysis: Concepts and Perspectives

Structuring an Abstract Interpreter through Value and State Abstractions: EVA, an Evolved Value Analysis for Frama C

CS 61C: Great Ideas in Computer Architecture. (Brief) Review Lecture

CSCI-243 Exam 1 Review February 22, 2015 Presented by the RIT Computer Science Community

An Operational and Axiomatic Semantics for Non-determinism and Sequence Points in C

Certified Merger for C Programs Using a Theorem Prover: A First Step

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

Refinements to techniques for verifying shape analysis invariants in Coq

A Study on the Preservation on Cryptographic Constant-Time Security in the CompCert Compiler

Intermediate Programming, Spring 2017*

Implementing Subprograms

A Formally-Verified Alias Analysis

Kampala August, Agner Fog

Work-in-progress: Verifying the Glasgow Haskell Compiler Core language

Frama-C Value Analysis

C Programming. Course Outline. C Programming. Code: MBD101. Duration: 10 Hours. Prerequisites:

INITIALISING POINTER VARIABLES; DYNAMIC VARIABLES; OPERATIONS ON POINTERS

An experiment with variable binding, denotational semantics, and logical relations in Coq. Adam Chlipala University of California, Berkeley

At build time, not application time. Types serve as statically checkable invariants.

CS412/CS413. Introduction to Compilers Tim Teitelbaum. Lecture 21: Generating Pentium Code 10 March 08

CS153: Compilers Lecture 15: Local Optimization

Semantic Analysis Type Checking

Mechanized Semantics for the Clight Subset of the C Language

Testing, Debugging, and Verification exam DIT082/TDA567. Day: 9 January 2016 Time: Will be published mid February or earlier

Denotational Semantics. Domain Theory

Development of a Translator from LLVM to ACL2

Chapter 1. Introduction

Formal Semantics of Programming Languages

Declaring Pointers. Declaration of pointers <type> *variable <type> *variable = initial-value Examples:

Lecture Outline. COOL operational semantics. Operational Semantics of Cool. Motivation. Lecture 13. Notation. The rules. Evaluation Rules So Far

Begin at the beginning

Introduction to C. Why C? Difference between Python and C C compiler stages Basic syntax in C

CSE P 501 Exam 8/5/04

COS 320. Compiling Techniques

A NEW PROOF-ASSISTANT THAT REVISITS HOMOTOPY TYPE THEORY THE THEORETICAL FOUNDATIONS OF COQ USING NICOLAS TABAREAU

Lecture Outline. COOL operational semantics. Operational Semantics of Cool. Motivation. Notation. The rules. Evaluation Rules So Far.

Operational Semantics of Cool

Verified compilers. Guest lecture for Compiler Construction, Spring Magnus Myréen. Chalmers University of Technology

Review of the C Programming Language for Principles of Operating Systems

Assuring Software Protection in Virtual Machines

Transcription:

A concrete memory model for CompCert Frédéric Besson Sandrine Blazy Pierre Wilke Rennes, France P. Wilke A concrete memory model for CompCert 1 / 28

CompCert real-world C to ASM compiler used in industry (commercialised by AbsInt) proven correct in Coq: it does not introduce bugs! C Clight Cminor RTL ASM P. Wilke A concrete memory model for CompCert 2 / 28

CompCert real-world C to ASM compiler used in industry (commercialised by AbsInt) proven correct in Coq: it does not introduce bugs! C Clight Cminor RTL ASM P. Wilke A concrete memory model for CompCert 2 / 28

CompCert real-world C to ASM compiler used in industry (commercialised by AbsInt) proven correct in Coq: it does not introduce bugs! C Clight Cminor RTL ASM Each language has a Formal Semantics i.e. a mathematical meaning for programs P. Wilke A concrete memory model for CompCert 2 / 28

CompCert real-world C to ASM compiler used in industry (commercialised by AbsInt) proven correct in Coq: it does not introduce bugs! C Clight Cminor RTL ASM Each language has a Formal Semantics i.e. a mathematical meaning for programs Proof of semantic preservation For every source program S that has a defined semantics, If the compiler succeeds to generate a target program T, Then T has the same behavior as S. P. Wilke A concrete memory model for CompCert 2 / 28

CompCert real-world C to ASM compiler used in industry (commercialised by AbsInt) proven correct in Coq: it does not introduce bugs! Memory model C Clight Cminor RTL ASM Each language has a Formal Semantics i.e. a mathematical meaning for programs Proof of semantic preservation For every source program S that has a defined semantics, If the compiler succeeds to generate a target program T, Then T has the same behavior as S. P. Wilke A concrete memory model for CompCert 2 / 28

Goal: Make the semantics of C more defined Why did C leave some behaviors undefined? Portability Performance Why do we want to make it more defined? real-life programs use features that are undefined, according to C the compilation theorem will be more useful What kind of undefined behaviors do we aim at? undefined pointer arithmetic, i.e. bitwise operators use of uninitialised memory Our starting point: CompCert P. Wilke A concrete memory model for CompCert 3 / 28

An example of low-level C program in CompCert b p int main(){ int * p = (int *) malloc (sizeof (int)); *p = 42; int * q = p 5; int * r = (q» 3) «3; return *r; } b q b r P. Wilke A concrete memory model for CompCert 4 / 28

An example of low-level C program in CompCert b p b q int main(){ int * p = (int *) malloc (sizeof (int)); *p = 42; int * q = p 5; int * r = (q» 3) «3; return *r; } (b,0) b b r P. Wilke A concrete memory model for CompCert 4 / 28

An example of low-level C program in CompCert b p b q int main(){ int * p = (int *) malloc (sizeof (int)); *p = 42; int * q = p 5; int * r = (q» 3) «3; return *r; } (b,0) b 42 b r P. Wilke A concrete memory model for CompCert 4 / 28

An example of low-level C program in CompCert b p b q int main(){ int * p = (int *) malloc (sizeof (int)); *p = 42; int * q = p 5; int * r = (q» 3) «3; return *r; } (b,0) b 42 b r Bitwise operators on pointers are undefined behavior! CompCert [JAR 09], KCC [POPL 12], Krebbers [POPL 14], Norrish [PhD 98]: undefined behavior Kang et al. [PLDI 15]: don t model bitwise operators P. Wilke A concrete memory model for CompCert 4 / 28

Contributions Previous work [APLAS 14]: A memory model for low-level programs This work: integration of the memory model inside CompCert correctness proofs of the memory model correctness proofs of the transformations of the frontend (up to Cminor) P. Wilke A concrete memory model for CompCert 5 / 28

Outline 1 CompCert s memory model 2 New features of the memory model 3 Consistency of the memory models 4 CompCert proof: Overview 5 Conclusion P. Wilke A concrete memory model for CompCert 6 / 28

Outline 1 CompCert s memory model 2 New features of the memory model 3 Consistency of the memory models 4 CompCert proof: Overview 5 Conclusion P. Wilke A concrete memory model for CompCert 7 / 28

New features of the memory model Symbolic expressions val ::= i (b, o) not expressive enough We change the semantic domain to: expr ::= val op 1 expr expr op 2 expr P. Wilke A concrete memory model for CompCert 8 / 28

New features of the memory model Symbolic expressions val ::= i (b, o) not expressive enough We change the semantic domain to: Alignment constraints expr ::= val op 1 expr expr op 2 expr We need information about some bits of the concrete address of a pointer The alloc primitive takes an extra parameter mask, such that: A(b) & mask = A(b) P. Wilke A concrete memory model for CompCert 8 / 28

Interaction with the memory model What is the semantics of reading from memory: *p? In CompCert, p is evaluated into a pointer (b,i), then we can use load(m,b,i) In our model, p is a symbolic expression. It needs to be transformed into a pointer so that we can use load. normalise : mem expr val We need to modify the semantics to include calls to normalise memory accesses (load and store) conditionnal branches P. Wilke A concrete memory model for CompCert 9 / 28

Back to the example b p int main(){ int * p = (int *) malloc (sizeof (int)); *p = 42; int * q = p 5; int * r = (q» 3) «3; return *r; } (b,0) b 8 42 b q b r P. Wilke A concrete memory model for CompCert 10 / 28

Back to the example b p b q int main(){ int * p = (int *) malloc (sizeof (int)); *p = 42; int * q = p 5; int * r = (q» 3) «3; return *r; } (b,0) (b,0) 5 b 8 42 b r P. Wilke A concrete memory model for CompCert 10 / 28

Back to the example b p b q b r int main(){ int * p = (int *) malloc (sizeof (int)); *p = 42; int * q = p 5; int * r = (q» 3) «3; return *r; } (b,0) (b,0) 5 ( ((b,0) 5 ) 3 ) 3 b 8 42 P. Wilke A concrete memory model for CompCert 10 / 28

Back to the example b p b q int main(){ int * p = (int *) malloc (sizeof (int)); *p = 42; int * q = p 5; int * r = (q» 3) «3; return *r; } (b,0) (b,0) 5 b 8 42 b r ( ((b,0) 5 ) 3 ) 3 normalise (b,0) P. Wilke A concrete memory model for CompCert 10 / 28

Back to the example b p b q int main(){ int * p = (int *) malloc (sizeof (int)); *p = 42; int * q = p 5; int * r = (q» 3) «3; return *r; } (b,0) (b,0) 5 b 8 42 b r ( ((b,0) 5 ) 3 ) 3 normalise (b,0) P. Wilke A concrete memory model for CompCert 10 / 28

Normalisation specification: concrete memories (b 2,2) 5 Abstract memory m cm 1 cm 2 Concrete memories of m cm i m cm 3 cm 4 cm 5 cm 6 range : ]0;55[ no overlap alignment 0 8 16 24 32 40 48 56 P. Wilke A concrete memory model for CompCert 11 / 28

Normalisation: example 1 e = ((( b,0) 5) 3) 3 cm 1 8 = (b,o) cm1 cm 2 8 = (b,o) cm2 cm 3 16 = (b,o) cm3 cm 4 24 = (b,o) cm4 cm 5 32 = (b,o) cm5 cm 6 32 = (b,o) cm6 0 8 16 24 32 40 48 56 e cm1 = (((cm 1 (b) + 0) 5) 3) = ((8 5) 3) = ((0b1000 5) 3) 3 = (0b1101 3) 3 = 0b0001 3 = 0b1000 = 8 = cm 1 (b) i, e cmi = cm i (b), hence e normalises into (b,0) P. Wilke A concrete memory model for CompCert 12 / 28

Normalisation: example 2 e = ( b,0) > ( b,0) cm 1 cm 2 cm 3 cm 4 cm 5 cm 6 true true true false false false 0 8 16 24 32 40 48 56 There is no v such that i, e cmi = v cmi, hence e doesn t normalise P. Wilke A concrete memory model for CompCert 13 / 28

CompCert with symbolic expressions expr ::= val op 1 expr expr op 2 expr b 1 (b 2,2) b 3 5 b 2 0 5 7 (b,o) 5 Memory model C Clight Cminor RTL ASM S S S S S P. Wilke A concrete memory model for CompCert 14 / 28

Outline 1 CompCert s memory model 2 New features of the memory model 3 Consistency of the memory models 4 CompCert proof: Overview 5 Conclusion P. Wilke A concrete memory model for CompCert 15 / 28

How does our model compare to CompCert? x(t) x(t) Behaviors in CompCert t Behaviors with symbolic expressions t We are an extension of CompCert P. Wilke A concrete memory model for CompCert 16 / 28

How does our model compare to CompCert? Formally, Lemma expr_add_ok: v 1 v 2 m v, sem_add v 1 v 2 m = v e, sem_add_expr v 1 v 2 m = e normalise m e = v. If the addition of v 1 and v 2 succeeds in CompCert, Then it should succeed in our model as well, And the expression we compute should normalise into the same value. P. Wilke A concrete memory model for CompCert 17 / 28

Discovery of bugs 2 cases where our model disagrees with CompCert Bug in CompCert 2.4: Pointer comparison to NULL (fixed in CompCert 2.5) Bug in our model: incorrect handling of pointers one past the end P. Wilke A concrete memory model for CompCert 18 / 28

Incorrect pointer comparison to NULL In CompCert: pointers are pairs (b, o) the NULL pointer is represented as the integer 0 p == 0 was incorrectly defined to always evaluate to false when p is a pointer. b 0 8 16 24 32 40 48 56 But we need to check that o is a valid offset of b (b,o) cm = cm(b) + o is not zero only in that case otherwise (b, 8) evaluates to zero P. Wilke A concrete memory model for CompCert 19 / 28

Outline 1 CompCert s memory model 2 New features of the memory model 3 Consistency of the memory models 4 CompCert proof: Overview 5 Conclusion P. Wilke A concrete memory model for CompCert 20 / 28

Overview of CompCert architecture C Clight C minor Cminor frontend backend Linear LTL RTL CminorSel Mach ASM : conserves the memory layout : modifies the memory layout P. Wilke A concrete memory model for CompCert 21 / 28

Memory injections: a generic memory transformation In CompCert C, each local variable has its own block. During the compilation these variables are merged into a stack frame. mem_inject f m m m m f b b 1 b 2 1 (b 3,o) f f 1 (b,o + δ 2 ) 37 δ 1 δ 2 b 3 37 Adapting to symbolic expressions: generalization of the injection over values lots of proofs to adapt (relation with normalisation) P. Wilke A concrete memory model for CompCert 22 / 28

Memory injections - Central theorem Theorem norm_inject: f m m e e (Minj: inject f m m ) (Einj: expr_inject f e e ), val_inject f (normalise m e) (normalise m e ). We can show that: v,val_inject f (normalise m e) v Let s now prove that: normalise m e = v cm m, e cm = v cm From the specification of the normalisation of e in m we know: cm m, e cm = normalise m e cm We need a theorem relating evaluations in cm and cm! P. Wilke A concrete memory model for CompCert 23 / 28

Memory injections - Evaluation mem_inject f m m Concrete memories of m Concrete memories of m 8 16 24 32 40 48 8 16 24 32 40 48 pre_cm(f,cm ): recovers a concrete memory as it was before injection Definition pre_cm f cm := fun (b: block) let (b, delta) := f b in cm b + delta. Theorem expr_inject_eval: f cm e e (Einj: expr_inject f e e ), e cm = e pre_cm(f,cm ). P. Wilke A concrete memory model for CompCert 24 / 28

Memory injections - Central theorem Theorem norm_inject: f m m e e (Minj: inject f m m ) (Einj: expr_inject f e e ), val_inject f (normalise m e) (normalise m e ). Concrete memories of m Concrete memories of m We are left to prove: cm m, e cm = v cm expr_inject_eval : expr_inject f e e e cm = e pre_cm(f,cm ) We rewrite both sides using expr_inject_eval, the goal becomes: cm m, e pre_cm(f,cm ) = normalise m e pre_cm(f,cm ) From the specification of the normalisation of e in m we know: which solves our goal. cm m, e cm = normalise m e cm P. Wilke A concrete memory model for CompCert 25 / 28

Outline 1 CompCert s memory model 2 New features of the memory model 3 Consistency of the memory models 4 CompCert proof: Overview 5 Conclusion P. Wilke A concrete memory model for CompCert 26 / 28

Conclusion A semantics for C more precise than CompCert s compatible with CompCert nearly as proven correct as CompCert Future directions finish the proof by adapting the last remaining unproven pass add a more concrete assembly language to the certified compilation chain plug back in optimizations at RTL level (precision improvement?, still sound?) P. Wilke A concrete memory model for CompCert 27 / 28

Questions? P. Wilke A concrete memory model for CompCert 28 / 28