Aliasing restrictions of C11 formalized in Coq

Similar documents
Undefinedness and Non-determinism in C

An Operational and Axiomatic Semantics for Non-determinism and Sequence Points in C

Formal C semantics: CompCert and the C standard

Pointers (continued), arrays and strings

arxiv: v1 [cs.lo] 10 Sep 2015

Pointers (continued), arrays and strings

Hacking in C. Pointers. Radboud University, Nijmegen, The Netherlands. Spring 2019

Pointers. 1 Background. 1.1 Variables and Memory. 1.2 Motivating Pointers Massachusetts Institute of Technology

N1793: Stability of indeterminate values in C11

Final CSE 131B Spring 2004

Understanding C/C++ Strict Aliasing

Structures, Unions Alignment, Padding, Bit Fields Access, Initialization Compound Literals Opaque Structures Summary. Structures

Bitwise Data Manipulation. Bitwise operations More on integers

Pierce Ch. 3, 8, 11, 15. Type Systems

Pointers, Dynamic Data, and Reference Types

Intermediate Code Generation Part II

Data Types. Every program uses data, either explicitly or implicitly to arrive at a result.

Outline. Computer Memory Structure Addressing Concept Introduction to Pointer Pointer Manipulation Summary

Variables Data types Variable I/O. C introduction. Variables. Variables 1 / 14

QUIZ. Can you find 5 errors in this code?

A concrete memory model for CompCert

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

A flow chart is a graphical or symbolic representation of a process.

An Operational and Axiomatic Semantics for Non-determinism and Sequence Points in C

Pointers as Arguments

C++ for Java Programmers

Programming in C++ 4. The lexical basis of C++

CSolve: Verifying C With Liquid Types

But first, encode deck of cards. Integer Representation. Two possible representations. Two better representations WELLESLEY CS 240 9/8/15

Lecture 03 Bits, Bytes and Data Types

Types and Programming Languages. Lecture 5. Extensions of simple types

CSE 307: Principles of Programming Languages

Symbol Tables. For compile-time efficiency, compilers often use a symbol table: associates lexical names (symbols) with their attributes

Introduction to C++ Introduction. Structure of a C++ Program. Structure of a C++ Program. C++ widely-used general-purpose programming language

An application: foreign function bindings

QUIZ. What is wrong with this code that uses default arguments?

Introduction to C++ with content from

Character Set. The character set of C represents alphabet, digit or any symbol used to represent information. Digits 0, 1, 2, 3, 9


A Context-Sensitive Memory Model for Verification of C/C++ Programs

CSCI-243 Exam 1 Review February 22, 2015 Presented by the RIT Computer Science Community

Basic Types, Variables, Literals, Constants

EL6483: Brief Overview of C Programming Language

The Design of Core C++ (Notes)

Data Representation and Storage. Some definitions (in C)

Integers II. CSE 351 Autumn Instructor: Justin Hsia

Integers II. CSE 351 Autumn Instructor: Justin Hsia

CSE 504. Expression evaluation. Expression Evaluation, Runtime Environments. One possible semantics: Problem:

The New C Standard (Excerpted material)

Symbol Table Information. Symbol Tables. Symbol table organization. Hash Tables. What kind of information might the compiler need?

D Programming Language

G Programming Languages Spring 2010 Lecture 6. Robert Grimm, New York University

Short Notes of CS201

Integers II. CSE 351 Autumn 2018

C Language Programming

Functions, Pointers, and the Basics of C++ Classes

Algorithms & Data Structures

QUIZ. What are 3 differences between C and C++ const variables?

Clarifying the restrict Keyword. Introduction

Formal Semantics of Programming Languages

Tokens, Expressions and Control Structures

Bits, Bytes, and Integers

CS201 - Introduction to Programming Glossary By

Computer Systems Principles. C Pointers

Pointers. Chapter 8. Decision Procedures. An Algorithmic Point of View. Revision 1.0

Representing and Manipulating Integers. Jo, Heeseung

A Fast Review of C Essentials Part I

Inclusion Polymorphism

Types. What is a type?

CSE 374 Programming Concepts & Tools. Hal Perkins Spring 2010

Administrivia. Introduction to Computer Systems. Pointers, cont. Pointer example, again POINTERS. Project 2 posted, due October 6

C Programming. Course Outline. C Programming. Code: MBD101. Duration: 10 Hours. Prerequisites:

Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types. Record Types. Pointer and Reference Types

CPSC 3740 Programming Languages University of Lethbridge. Data Types

Page 1. Agenda. Programming Languages. C Compilation Process

Under the Compiler's Hood: Supercharge Your PLAYSTATION 3 (PS3 ) Code. Understanding your compiler is the key to success in the gaming world.

Certified compilers. Do you trust your compiler? Testing is immune to this problem, since it is applied to target code

CS Programming In C

Pointer Basics. Lecture 13 COP 3014 Spring March 28, 2018

Names, Scope, and Bindings

Bits, Bytes, and Integers

Goals of this Lecture

Types and Type Inference

Chapter 2 (Dynamic variable (i.e. pointer), Static variable)

Review of the C Programming Language for Principles of Operating Systems

COMPUTABILITY THEORY AND RECURSIVELY ENUMERABLE SETS

Strict Inheritance. Object-Oriented Programming Winter

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Types, Variables, and Constants

CS558 Programming Languages Winter 2018 Lecture 4a. Andrew Tolmach Portland State University

C BOOTCAMP DAY 2. CS3600, Northeastern University. Alan Mislove. Slides adapted from Anandha Gopalan s CS132 course at Univ.

calling a function - function-name(argument list); y = square ( z ); include parentheses even if parameter list is empty!

Appendix. Grammar. A.1 Introduction. A.2 Keywords. There is no worse danger for a teacher than to teach words instead of things.

Data Representation and Storage

ENERGY 211 / CME 211. Functions

Bits, Bytes and Integers Part 1

Proposal for Extending the switch statement

Midterm CSE 131B Spring 2005

CS162: Introduction to Computer Science II. Primitive Types. Primitive types. Operations on primitive types. Limitations

COMP 2355 Introduction to Systems Programming

Transcription:

Aliasing restrictions of C11 formalized in Coq Robbert Krebbers Radboud University Nijmegen December 11, 2013 @ CPP, Melbourne, Australia

Aliasing Aliasing: multiple pointers referring to the same object

Aliasing Aliasing: multiple pointers referring to the same object int f(int *p, int *q) { int x = *p; *q = 314; return x; } If p and q alias, the original value n of *p is returned n p q

Aliasing Aliasing: multiple pointers referring to the same object int f(int *p, int *q) { int x = *p; *q = 314; return x *p; } If p and q alias, the original value n of *p is returned n p q Optimizing x away is unsound: 314 would be returned

Aliasing Aliasing: multiple pointers referring to the same object int f(int *p, int *q) { int x = *p; *q = 314; return x; } If p and q alias, the original value n of *p is returned n p q Optimizing x away is unsound: 314 would be returned Alias analysis: to determine whether pointers can alias

Aliasing with different types Consider a similar function: int h(int *p, float *q) { int x = *p; *q = 3.14; return x; }

Aliasing with different types Consider a similar function: int h(int *p, float *q) { int x = *p; *q = 3.14; return x; } It can still be called with aliased pointers: union { int x; float y; } u; u.x = 271; return h(&u.x, &u.y); &u.x x y &u.y C89 allows p and q to be aliased, and thus requires it to return 271

Aliasing with different types Consider a similar function: int h(int *p, float *q) { int x = *p; *q = 3.14; return x; } It can still be called with aliased pointers: union { int x; float y; } u; u.x = 271; return h(&u.x, &u.y); &u.x x y &u.y C89 allows p and q to be aliased, and thus requires it to return 271 C99/C11 allows type-based alias analysis: A compiler can assume that p and q do not alias Reads/writes with the wrong type yield undefined behavior

Undefined behavior in C Garbage in, garbage out principle Programs with undefined behavior are not statically excluded Instead, these may do literally anything when executed Compilers are allowed to assume no undefined behavior occurs Allows them to omit (expensive) dynamic checks

Undefined behavior in C Garbage in, garbage out principle Programs with undefined behavior are not statically excluded Instead, these may do literally anything when executed Compilers are allowed to assume no undefined behavior occurs Allows them to omit (expensive) dynamic checks A formal C semantics should account for undefined behavior

Bits and bytes Interplay between low- and high-level Each object should be represented as a sequence of bits... which can be inspected and manipulated in C Each object can be accessed using typed expressions... that are used by compilers for optimizations Hence, the formal memory model needs to keep track of more information than present in the memory of an actual machine

Bits and bytes Interplay between low- and high-level Each object should be represented as a sequence of bits... which can be inspected and manipulated in C Each object can be accessed using typed expressions... that are used by compilers for optimizations Hence, the formal memory model needs to keep track of more information than present in the memory of an actual machine The standard is unclear on many of such difficulties Opportunities for a formal semantics to resolve this unclarity!

Contribution An abstract formal memory for C supporting Types (arrays, structs, unions,... ) Strict aliasing restrictions (effective types) Byte-level operations Type-punning Indeterminate memory Pointers one past the last element Parametrized by an interface for integer types Formalized together with essential properties in Coq

How to treat pointers Others (e.g. CompCert) Memory: a finite map of cells which consist of arrays of bytes Our approach Pointers: pairs (x, i) where x identifies the cell, and i the offset into that cell Too little information to capture strict aliasing restrictions

How to treat pointers Others (e.g. CompCert) Memory: a finite map of cells which consist of arrays of bytes Our approach A finite map of cells which consist of well-typed trees of bits Pointers: pairs (x, i) where x identifies the cell, and i the offset into that cell Too little information to capture strict aliasing restrictions

How to treat pointers Others (e.g. CompCert) Memory: a finite map of cells which consist of arrays of bytes Our approach A finite map of cells which consist of well-typed trees of bits Pointers: pairs (x, i) where x identifies the cell, and i the offset into that cell Pairs (x, r) where x identifies the cell, and r the path through the tree in that cell Too little information to capture strict aliasing restrictions

How to treat pointers Others (e.g. CompCert) Memory: a finite map of cells which consist of arrays of bytes Our approach A finite map of cells which consist of well-typed trees of bits Pointers: pairs (x, i) where x identifies the cell, and i the offset into that cell Pairs (x, r) where x identifies the cell, and r the path through the tree in that cell Too little information to capture strict aliasing restrictions A semantics for strict aliasing restrictions

Three kinds of values Our formal description has three kinds of values. For struct { short x, *p; } s = { 33; &s.x } we have: A memory value with arrays of bits as leafs: 1000010000000000

Three kinds of values Our formal description has three kinds of values. For struct { short x, *p; } s = { 33; &s.x } we have: An abstract value with machine integers and pointers as leafs: 33 A memory value with arrays of bits as leafs: toval ofval 1000010000000000

Three kinds of values Our formal description has three kinds of values. For struct { short x, *p; } s = { 33; &s.x } we have: An abstract value with machine integers and pointers as leafs: 33 A memory value with arrays of bits as leafs: toval ofval 1000010000000000 An array of bits: 1000010000000000 mtobits mofbits

Bits and memory values Bits are represented symbolically (à la bytes in CompCert): b ::= 0 1 (ptr p) i indet Gives the best of both worlds : allows bitwise hacking on integers while keeping the memory abstract

Bits and memory values Bits are represented symbolically (à la bytes in CompCert): b ::= 0 1 (ptr p) i indet Gives the best of both worlds : allows bitwise hacking on integers while keeping the memory abstract Memory values are defined as: w ::= base τb b array w struct s w union s (i, w) union s b Memory values have (unique) types

Example Consider: struct T { union U { signed char x[2]; int y; } u; void *p; } s = { {.x = {33,34} }, s.u.x + 2 } As a picture: x s w s =.0 signed char: 10000100 01000100 void : (ptr p) 0 (ptr p) 1... (ptr p) 31 p

Example Consider: struct T { union U { signed char x[2]; int y; } u; void *p; } s = { {.x = {33,34} }, s.u.x + 2 } As a picture: x s w s =.0 signed char: 10000100 01000100 void : (ptr p) 0 (ptr p) 1... (ptr p) 31 Here we have: T p = (x s, 0 U 0, 2) signed char>void p

Example Consider: struct T { union U { signed char x[2]; int y; } u; void *p; } s = { {.x = {33,34} }, s.u.x + 2 } As a picture: x s w s =.0 signed char: 10000100 01000100 void : (ptr p) 0 (ptr p) 1... (ptr p) 31 Here we have: T p = (x s, 0 U 0, 2) signed char>void mtobits w s = p 100001001000100 indet... indet (ptr p) 0 (ptr p) 1... (ptr p) 31

Memory values to bits? Let us reconsider memory values: w ::= base τb b array w struct s w union s (i, w) union s b How to do the conversion of bits?

Memory values to bits? Let us reconsider memory values: w ::= base τb b array w struct s w union s (i, w) union s b How to do the conversion of bits? Seems impossible Given bits of union U { int x; int y; } should we choose the variant x or y?

Memory values to bits? Let us reconsider memory values: w ::= base τb b array w How to do the conversion of bits? Seems impossible Given bits of struct s w union s (i, w) union s b union U { int x; int y; } should we choose the variant x or y? Solution: postpone this choice by storing it as a union U b... and change it into a union U (i, w) node on a lookup

Memory values to bits? Let us reconsider memory values: w ::= base τb b array w How to do the conversion of bits? Seems impossible Given bits of struct s w union s (i, w) union s b union U { int x; int y; } should we choose the variant x or y? Solution: postpone this choice by storing it as a union U b... and change it into a union U (i, w) node on a lookup Hard part: dealing with this choice in abstract values and the various operations

Type-punning Type-punning: reading a union using a pointer to another variant C11: vaguely mentioned in a footnote

Type-punning Type-punning: reading a union using a pointer to another variant C11: vaguely mentioned in a footnote GCC: allowed if the memory is accessed through the union type Given: union U { int x; float y; } t; Defined behavior: t.y = 3.0; return t.x; // OK

Type-punning Type-punning: reading a union using a pointer to another variant C11: vaguely mentioned in a footnote GCC: allowed if the memory is accessed through the union type Given: union U { int x; float y; } t; Defined behavior: t.y = 3.0; return t.x; // OK Undefined behavior: int *p = &t.x; t.y = 3.0; return *p; // UB

Type-punning Type-punning: reading a union using a pointer to another variant C11: vaguely mentioned in a footnote GCC: allowed if the memory is accessed through the union type Given: union U { int x; float y; } t; Defined behavior: t.y = 3.0; return t.x; // OK Undefined behavior: int *p = &t.x; t.y = 3.0; return *p; // UB Formalized by decorating pointers with annotations

Strict-aliasing Theorem Theorem (Strict-aliasing) Given: addresses m a 1 : σ 1 and m a 2 : σ 2 with annotations that do not allow type-punning σ 1, σ 2 unsigned char σ 1 not a subtype of σ 2 and vice versa Then there are two possibilities: a 1 and a 2 do not alias accessing a 1 after a 2 (and vice versa) has undefined behavior

Strict-aliasing Theorem Theorem (Strict-aliasing) Given: addresses m a 1 : σ 1 and m a 2 : σ 2 with annotations that do not allow type-punning σ 1, σ 2 unsigned char σ 1 not a subtype of σ 2 and vice versa Then there are two possibilities: a 1 and a 2 do not alias accessing a 1 after a 2 (and vice versa) has undefined behavior Corollary Compilers can perform type based alias analysis

Memory extensions To prove program transformations correct one has to relate: the memory of the original program to the memory of the the transformed program

Memory extensions To prove program transformations correct one has to relate: the memory of the original program to the memory of the the transformed program We have m 1 m 2 if m 2 allows more behaviors than m 1 : More memory content determinate indet b

Memory extensions To prove program transformations correct one has to relate: the memory of the original program to the memory of the the transformed program We have m 1 m 2 if m 2 allows more behaviors than m 1 : More memory content determinate indet b Fewer restrictions on effective types union u (i, w) union u b

Memory extensions To prove program transformations correct one has to relate: the memory of the original program to the memory of the the transformed program We have m 1 m 2 if m 2 allows more behaviors than m 1 : More memory content determinate indet b Fewer restrictions on effective types union u (i, w) union u b Theorem copy by assignment byte-wise copy

Formalization in Coq Type theory is ideal for the combination programming/proving The devil is in the details, Coq is extremely useful for debugging of definitions Useful to prove meta-theoretical properties Use of type classes for parametrization by machine integers Use of type classes for overloading of notations 8.500 lines of code

Future research Integration into our operational semantics [K, POPL 14]... and make it (reasonably efficiently) executable Memory injections à la CompCert Integration into our axiomatic semantics [K, POPL 14] Floating point numbers, bit fields, variable length arrays The const, volatile and restrict qualifier Verification Condition generator in Coq

Questions Sources: see http://robbertkrebbers.nl/research/ch2o/