Pointers. Chapter 8. Decision Procedures. An Algorithmic Point of View. Revision 1.0

Similar documents
20-CS-4003 Organization of Programming Languages Fall, An Introduction To. Pointer Logic: Formulation and Decision Procedures. Day 1.

Intermediate Code Generation

Formal Semantics of Programming Languages

Formal Semantics of Programming Languages

Motivation was to facilitate development of systems software, especially OS development.

Intro to semantics; Small-step semantics Lecture 1 Tuesday, January 29, 2013

Note that in this definition, n + m denotes the syntactic expression with three symbols n, +, and m, not to the number that is the sum of n and m.

CS558 Programming Languages

Qualifying Exam in Programming Languages and Compilers

FOR Loop. FOR Loop has three parts:initialization,condition,increment. Syntax. for(initialization;condition;increment){ body;

Functional Programming. Pure Functional Programming

CSE 504. Expression evaluation. Expression Evaluation, Runtime Environments. One possible semantics: Problem:

Question 1. Notes on the Exam. Today. Comp 104: Operating Systems Concepts 11/05/2015. Revision Lectures

Declaring Pointers. Declaration of pointers <type> *variable <type> *variable = initial-value Examples:

Notes on the Exam. Question 1. Today. Comp 104:Operating Systems Concepts 11/05/2015. Revision Lectures (separate questions and answers)

Comp 204: Computer Systems and Their Implementation. Lecture 25a: Revision Lectures (separate questions and answers)

Motivation was to facilitate development of systems software, especially OS development.

CS 330 Lecture 18. Symbol table. C scope rules. Declarations. Chapter 5 Louden Outline

Data Representation and Storage. Some definitions (in C)

Programming Languages Third Edition. Chapter 10 Control II Procedures and Environments

G Programming Languages - Fall 2012

CS Programming In C

Manual Allocation. CS 1622: Garbage Collection. Example 1. Memory Leaks. Example 3. Example 2 11/26/2012. Jonathan Misurda

Pointers, Dynamic Data, and Reference Types

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics

Compiler Construction

LECTURE 17. Expressions and Assignment

Programming Languages Third Edition

Partitioned Memory Models for Program Analysis

More On Syntax Directed Translation

Semantic Analysis and Type Checking

Run-time Environments

The compilation process is driven by the syntactic structure of the program as discovered by the parser

Run-time Environments

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

CS2104 Prog. Lang. Concepts

Data Storage. August 9, Indiana University. Geoffrey Brown, Bryce Himebaugh 2015 August 9, / 19

Application: Programming Language Semantics

Outline. What is semantics? Denotational semantics. Semantics of naming. What is semantics? 2 / 21

Lecture Outline. COOL operational semantics. Operational Semantics of Cool. Motivation. Lecture 13. Notation. The rules. Evaluation Rules So Far

CS 314 Principles of Programming Languages

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

CSC 467 Lecture 13-14: Semantic Analysis

CS558 Programming Languages

Homework I - Solution

Goals: Define the syntax of a simple imperative language Define a semantics using natural deduction 1

Program Syntax; Operational Semantics

Storage. Outline. Variables and Updating. Composite Variables. Storables Lifetime : Programming Languages. Course slides - Storage

Design Issues. Subroutines and Control Abstraction. Subroutines and Control Abstraction. CSC 4101: Programming Languages 1. Textbook, Chapter 8

Tail Calls. CMSC 330: Organization of Programming Languages. Tail Recursion. Tail Recursion (cont d) Names and Binding. Tail Recursion (cont d)

CS 430 Spring Mike Lam, Professor. Data Types and Type Checking

CS558 Programming Languages Winter 2018 Lecture 4a. Andrew Tolmach Portland State University

CS558 Programming Languages

3.7 Denotational Semantics

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University

CS558 Programming Languages

Denotational Semantics. Domain Theory

CS241 Computer Organization Spring Data Alignment

Informal Semantics of Data. semantic specification names (identifiers) attributes binding declarations scope rules visibility

Lecture Notes on Memory Layout

Compiler Construction

CS558 Programming Languages. Winter 2013 Lecture 3

Programming Languages Third Edition. Chapter 7 Basic Semantics

Contents. Chapter 1 SPECIFYING SYNTAX 1

SE352b: Roadmap. SE352b Software Engineering Design Tools. W3: Programming Paradigms

Memory and Addresses. Pointers in C. Memory is just a sequence of byte-sized storage devices.

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

These notes are intended exclusively for the personal usage of the students of CS352 at Cal Poly Pomona. Any other usage is prohibited without

CS558 Programming Languages

struct _Rational { int64_t Top; // numerator int64_t Bottom; // denominator }; typedef struct _Rational Rational;

Types, Expressions, and States

Comp 411 Principles of Programming Languages Lecture 7 Meta-interpreters. Corky Cartwright January 26, 2018

Question Bank. 10CS63:Compiler Design

struct Properties C struct Types

Memory and C++ Pointers

Propositional Calculus: Boolean Functions and Expressions. CS 270: Mathematical Foundations of Computer Science Jeremy Johnson

MIDTERM EXAM (Solutions)

5. Semantic Analysis!

Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types. Record Types. Pointer and Reference Types

CMSC 330: Organization of Programming Languages. Operational Semantics

Foundations: Syntax, Semantics, and Graphs

Operational Semantics. One-Slide Summary. Lecture Outline

SYSC 2006 C Winter 2012

Chapter 3 (part 3) Describing Syntax and Semantics

Weeks 6&7: Procedures and Parameter Passing

Lectures Basics in Procedural Programming: Machinery

Reminder of the last lecture. Aliasing Issues: Call by reference, Pointer programs. Introducing Aliasing Issues. Home Work from previous lecture

This is already grossly inconvenient in present formalisms. Why do we want to make this convenient? GENERAL GOALS

Hoare Logic and Model Checking

COSC252: Programming Languages: Semantic Specification. Jeremy Bolton, PhD Adjunct Professor

Type Checking. Outline. General properties of type systems. Types in programming languages. Notation for type rules.

Example. program sort; var a : array[0..10] of integer; procedure readarray; : function partition (y, z :integer) :integer; var i, j,x, v :integer; :

Chapter 5 Names, Binding, Type Checking and Scopes

CPSC 3740 Programming Languages University of Lethbridge. Data Types

Outline. General properties of type systems. Types in programming languages. Notation for type rules. Common type rules. Logical rules of inference

Principles of Compiler Design

Lecture 12: Executing Alpha

Values (a.k.a. data) representation. Advanced Compiler Construction Michel Schinz

Aliasing restrictions of C11 formalized in Coq

Values (a.k.a. data) representation. The problem. Values representation. The problem. Advanced Compiler Construction Michel Schinz

Transcription:

Pointers Chapter 8 Decision Procedures An Algorithmic Point of View D.Kroening O.Strichman Revision 1.0

Outline 1 Introduction Pointers and Their Applications Dynamic Memory Allocation Analysis of Programs with Pointers 2 A Simple Pointer Logic Syntax Semantics Axiomatization of the Memory Model Adding Structure Types 3 Modeling Lists Trees 4 Using the Semantic Translation Applying the Memory Model Axioms Pure Variables Partitioning the Memory Decision Procedures Pointers 2

Pointers and Their Applications Pointer: a program variable that refers to some other program construct This other construct may be another variable, including a pointer, a function or method. Decision Procedures Pointers 3

Motivation Pointers to other variables allow code fragments to operate on different sets of data This avoids inefficient copying of data Pointers enable dynamic data structures But: Many bugs relate to the (ab-)use of pointers Decision Procedures Pointers 4

Implementation Memory cells of a computer have addresses, i.e., each cell has a unique number The value of a pointer is such a number memory model: the way the memory cells are addressed Decision Procedures Pointers 5

Formalization Definition (Our Memory Model) Set of addresses A is a subinterval of the integers {0,..., N 1} Each address corresponds to a memory cell that is able to store one data word. The set of data words is denoted by D. Memory valuation M : A D (this is a continuous, uniform address space) Decision Procedures Pointers 6

Arrays and Structs A variable may require more than one data word to be stored in memory Examples: structs, arrays, double-precision floating-point Decision Procedures Pointers 7

Arrays and Structs A variable may require more than one data word to be stored in memory Examples: structs, arrays, double-precision floating-point Let σ(v) with v V denote the size (in data words) of v. Decision Procedures Pointers 7

The Memory Layout Let V denote the set of variables. Definition (memory layout) A memory layout L : V A is a mapping from V to an address A. The address of v V is also called the memory location of v. The memory locations of the statically allocated variables are usually non-overlapping The memory layout is not necessarily continuous (e.g., due to alignment restrictions) Decision Procedures Pointers 8

The Memory Layout: Example int var_a, var_b, var_c; struct { int x; int y; } S; int array[4]; int *p = &var_c; int main() { *p=100; } var a var b var c S.x S.y array[0] array[1] array[2] array[3] p 0 1 2 3 4 5 6 7 8 9 Decision Procedures Pointers 9

Dynamic Memory Allocation There is an area of memory (called heap) for objects that are created at run time A library maintains a list of the memory regions that are unused Some function allocates a memory region of a given size and returns a pointer to it malloc() in C, new in C++, C#, and Java. Decision Procedures Pointers 10

Example from Program Analysis Program analysis tools often need to reason about pointers void f(int *sum) { *sum = 0; } for(i=0; i<10; i++) *sum = *sum + array[i]; Decision Procedures Pointers 11

Example from Program Analysis Program analysis tools often need to reason about pointers void f(int *sum) { *sum = 0; } for(i=0; i<10; i++) *sum = *sum + array[i]; This program does not obey the obvious specification if the address held by sum is equal to the address of i Aliasing not anticipated by the programmer is a common source of problems Decision Procedures Pointers 11

A Simple Pointer Logic Definition (Pointer Logic) Syntax: formula : formula formula formula (formula) atom atom : pointer = pointer term = term pointer < pointer term < term pointer : pointer identifier pointer + term (pointer) &identifier & pointer pointer NULL term : identifier pointer term op term (term) integer constant identifier [ term ] op : + Warning: = is equality here, not assignment Decision Procedures Pointers 12

Example Let p, q denote pointer identifiers, and let i, j denote integer identifiers. The following formulas are well-formed according to the grammar: (p + i) = 1, (p + p) = 0, p = q p = 5, p = 1, p < q. Decision Procedures Pointers 13

Example The following formulas are not permitted by the grammar: p + i, p = i, (p + q), 1 = 1, p < i. Decision Procedures Pointers 14

Semantics We define the semantics by referring to a specific memory layout L and a specific memory valuation M. Pointer logic formulas are predicates on M, L pairs We obtain a reduction to integer arithmetic and array logic Decision Procedures Pointers 15

Semantics We define a semantics using the function : L P L D L P : language of pointer expressions L D : expressions over variables with values from D Decision Procedures Pointers 16

Semantics Defined recursively. Boolean connectives: f 1 f 2 = f 1 f 2 f = f Predicates: p 1 = p 2 = p 1 = p 2 where p 1, p 2 are pointer expressions p 1 < p 2 = p 1 < p 2 where p 1, p 2 are pointer expressions t 1 = t 2 = t 1 = t 2 where t 1, t 2 are terms t 1 < t 2 = t 1 < t 2 where t 1, t 2 are terms Decision Procedures Pointers 17

Semantics Non-pointer terms: v = M[L[v]] where v V is a variable with σ(v) = 1 t 1 op t 2 = t 1 op t 2 where t 1, t 2 are terms c = c where c is an integer constant v[t] = M[L[v] + t ] where v is an array identifier, t is a term Decision Procedures Pointers 18

Semantics Pointer-related expressions: p = M[L[p]] where p is a pointer identifier p + t = p + t where p is a pointer expression, t is a term &v = L[v] where v V is a variable & p = p where p is a pointer expression NULL = 0 p = M[ p ] where p is a pointer expression Decision Procedures Pointers 19

Notation A pointer p points to a variable x if M[L[p]] = L[x] Shorthand: p z for p = z Warning: the meaning p + i does not depend on the type of p Decision Procedures Pointers 20

Example I Let a be an array identifier: ((&a) + 1) = a[1] The definition expands as follows: ((&a) + 1) = a[1] Decision Procedures Pointers 21

Example I Let a be an array identifier: ((&a) + 1) = a[1] The definition expands as follows: ((&a) + 1) = a[1] ((&a) + 1) = a[1] Decision Procedures Pointers 21

Example I Let a be an array identifier: ((&a) + 1) = a[1] The definition expands as follows: ((&a) + 1) = a[1] ((&a) + 1) = a[1] M[ (&a) + 1 ] = M[L[a] + 1 ] Decision Procedures Pointers 21

Example I Let a be an array identifier: ((&a) + 1) = a[1] The definition expands as follows: ((&a) + 1) = a[1] ((&a) + 1) = a[1] M[ (&a) + 1 ] = M[L[a] + 1 ] M[ &a + 1 ] = M[L[a] + 1] Decision Procedures Pointers 21

Example I Let a be an array identifier: ((&a) + 1) = a[1] The definition expands as follows: ((&a) + 1) = a[1] ((&a) + 1) = a[1] M[ (&a) + 1 ] = M[L[a] + 1 ] M[ &a + 1 ] = M[L[a] + 1] M[L[a] + 1] = M[L[a] + 1] The last formula is obviously valid. Decision Procedures Pointers 21

Example II The translated formula must evaluate to true for any L and M! The following formula is not valid: p = 1 x = 1 For p &x, this formula evaluates to false. Decision Procedures Pointers 22

Axiomatization of the Memory Model It is possible to exploit assumptions made about the memory model. Depends highly on the architecture! Here: we formalize properties that most architectures comply with. Decision Procedures Pointers 23

Example On most architectures, the following two formulas are valid: 1 &x NULL 2 &x &y (1) translates into L[x] 0 and relies on the fact that no object has address 0. (2) relies on non-overlapping addresses Decision Procedures Pointers 24

Memory Axiom 1 Memory Model Axiom ( No object has address 0 ) v V. L[v] 0 Decision Procedures Pointers 25

Overlapping Objects How do we address (2)? Suggestion: v 1, v 2 V. v 1 v 2 L[v 1 ] L[v 2 ] Decision Procedures Pointers 26

Memory Axioms 2 and 3 The following two conditions together are stronger: Memory Model Axiom ( Objects have size at least one ) v V. σ(v) 1 Memory Model Axiom ( Objects do not overlap ) v 1, v 2 V. v 1 v 2 {L[v 1 ],..., L[v 1 ] + σ(v 1 ) 1} {L[v 2 ],..., L[v 2 ] + σ(v 2 ) 1} =. Decision Procedures Pointers 27

More? Some code relies on additional, architecture-specific guarantees, e.g., byte ordering endianness, alignment, structure layout. Some program analysis tools allow adding such rules. Decision Procedures Pointers 28

Structure Types Convenient way to implement data structures We add this as a syntactic extension Notation: s.f to denote the value of the field f in the structure s Decision Procedures Pointers 29

Mapping to Array Types Each field of the structure is assigned a unique offset: o(f) Meaning of s.f: s.f. = ((&s) + o(f)) Following PASCAL and ANSI-C syntax: p->f. = ( p).f Adopted from separation logic: p a, b, c,.... = (p + 0) = a (p + 1) = b (p + 2) = c.... Decision Procedures Pointers 30

Modeling Lists Simplest dynamically allocated data structure Realized by means of a structure type that contains fields for a next pointer Decision Procedures Pointers 31

List Example p t e x t... 0 p t, p 1 p 1 e, p 2 p 2 x, p 3 p 3 t, NULL. Decision Procedures Pointers 32

List Example Define a recursive shorthand for the i-th member of a list:. list-elem(p, 0) = p,. list-elem(p, i) = list-elem(p, i 1)->n for i 1 We use n as the next pointer field. Now define the shorthand list(p, l): list(p, l). = list-elem(p, l) = NULL Decision Procedures Pointers 33

Cyclic Lists A linked list is cyclic if the pointer of the last element points to the first one: p t e x t.... Decision Procedures Pointers 34

Cyclic Lists A linked list is cyclic if the pointer of the last element points to the first one: p t e x t.... Would this work? my-list(p, l). = list-elem(p, l) = p. Decision Procedures Pointers 34

Cyclic Lists Unfortunately, the following satisfies my-list(p, 4): p t. need to rule out sharing Decision Procedures Pointers 35

Cyclic Lists Define a shorthand overlap as follows: overlap(p, q). = p = q p + 1 = q p = q + 1 Use to state that all list elements are pairwise disjoint:. list-disjoint(p, 0) = true,. list-disjoint(p, l) = list-disjoint(p, l 1) 0 i < l 1. overlap(list-elem(p, i), list-elem(p, l 1)) Grows quadratically in l! Decision Procedures Pointers 36

Trees p 5.. 3.. 8 0 0 1 0 0 4 0 0 Goal: model binary search tree Pointer to the left-hand child: l Pointer to the right-hand child: r Decision Procedures Pointers 37

Trees Idea: (n.l NULL n.l->x < n.x) (n.r NULL n.r->x > n.x). Decision Procedures Pointers 38

Trees Idea: (n.l NULL n.l->x < n.x) (n.r NULL n.r->x > n.x). Not strong enough for O(h) lookup! Decision Procedures Pointers 38

Transitive Closure Let us first define the transitive closure of a relation R: TC 1 R (p, q). = R(p, q) TC i R (p, q). = p. TC i 1 R (p, p ) R(p, q). TC(p, q) = i. TC i R (p, q) Decision Procedures Pointers 39

Trees Now define a predicate tree-reach(p, q): tree-reach(p, q). = p NULL q NULL (p = q p->l = q p->r = q) Use the transitive closure: tree-reach*(p, q) TC tree-reach(p,q) Decision Procedures Pointers 40

Trees New definition: ( p. tree-reach*(n.l, p) p->x < n.x) ( p. tree-reach*(n.r, p) p->x > n.x). Decision Procedures Pointers 41

Using the Semantic Translation is a decision procedure! 1 Define ϕ. = ϕ 2 Pass ϕ to procedure for integers and arrays Decision Procedures Pointers 42

Example I Let x be a variable, and p be a pointer. p = &x x = 1 p = 1 Decision Procedures Pointers 43

Example I Let x be a variable, and p be a pointer. p = &x x = 1 p = 1 Use semantic definition: p = &x x = 1 p = 1 p = &x x = 1 p = 1 p = &x x = 1 p = 1 M[L[p]] = L[x] M[L[x]] = 1 M[M[L[p]]] = 1. The last formula is obviously valid. Decision Procedures Pointers 43

Example II p x p = &x p x p = &x p = x p = &x p = x M[L[p]] = L[x] M[M[L[p]]] = M[L[x]] M[L[p]] = L[x] Decision Procedures Pointers 44

Example II p x p = &x p x p = &x p = x p = &x p = x M[L[p]] = L[x] M[M[L[p]]] = M[L[x]] M[L[p]] = L[x] Counterexample: L[p] = 1, L[x] = 2, M[1] = 3, M[2] = 10, M[3] = 10 3 10 10 0 1 2 3 p x Decision Procedures Pointers 44

Applying the Memory Model Axioms What if the formula relies on a memory model axiom? Example: σ(x) = 2 &y &x + 1 The semantic translation yields: σ(x) = 2 L[y] L[x] + 1 This needs the no-overlapping axiom: {L[x],..., L[x] + σ(x) 1} {L[y],..., L[y] + σ(y) 1} = Decision Procedures Pointers 45

Applying the Memory Model Axioms 1 Transform into linear arithmetic over the integers as follows: (L[x] + σ(x) 1 < L[y]) (L[x] > L[y] + σ(y) 1) 2 Using σ(x) = 2 and σ(y) 1: (L[x] + 1 < L[y]) (L[x] > L[y]) 3 Now strong enough to imply L[y] L[x] + 1 Decision Procedures Pointers 46

Pure Variables x = y y = x x = y y = x M[L[x]] = M[L[y]] M[L[y]] = M[L[x]]. Unnecessary burden for the array decision procedure! Decision Procedures Pointers 47

Pure Variables x = y y = x x = y y = x M[L[x]] = M[L[y]] M[L[y]] = M[L[x]]. Unnecessary burden for the array decision procedure! Should have done: x = y y = x Decision Procedures Pointers 47

Pure Variables Obvious idea: if the address of a variable x is not referred to, translate it to a new variable Υ x instead of M[L[x]] Decision Procedures Pointers 48

Partitioning the Memory Observation: the run time of a decision procedure for array logic depends on the number of different expressions that are used to index a particular array Decision Procedures Pointers 49

Example p = 1 q = 1 This is M[Υ p ] = 1 M[Υ q ] = 1 p and q might alias But there is no reason why they have to! Let s assume they don t! Decision Procedures Pointers 50

Partitioning the Memory We partition M into M 1 and M 2 : M 1 [Υ p ] = 1 M 2 [Υ q ] = 1 This increases the number of array variables But: the number of different indices per array decreases! Typically improves performance Decision Procedures Pointers 51

Partitioning the Memory Cannot always be applied: p = q p = q Obviously valid If we partition as before, the translated formula is no longer valid: Υ p = Υ q M 1 [Υ p ] = M 2 [Υ q ] Decision Procedures Pointers 52

A Partitioning Heuristic Deciding if the optimization is applicable is in general as hard as deciding ϕ itself Do an approximation based on a syntactic test Decision Procedures Pointers 53

A Partitioning Heuristic Deciding if the optimization is applicable is in general as hard as deciding ϕ itself Do an approximation based on a syntactic test Definition Two pointer expressions p and q are related if both p and q are used inside the same relational expression Write p q for TC related Partition according to! Decision Procedures Pointers 53