Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

Similar documents
Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

CSC 7101: Programming Language Structures 1. Operational Semantics. Winskel, Ch. 2 Slonneger and Kurtz Ch 8.4, 8.5, 8.6. Operational vs.

CS422 - Programming Language Design

Note that in this definition, n + m denotes the syntactic expression with three symbols n, +, and m, not to the number that is the sum of n and m.

Intro to semantics; Small-step semantics Lecture 1 Tuesday, January 29, 2013

CS 6110 S14 Lecture 38 Abstract Interpretation 30 April 2014

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

axiomatic semantics involving logical rules for deriving relations between preconditions and postconditions.

Denotational Semantics; Lambda Calculus Basics Section and Practice Problems Week 4: Tue Feb 13 Fri Feb 16, 2018

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

CS 125 Section #4 RAMs and TMs 9/27/16

3.7 Denotational Semantics

Fundamental Concepts. Chapter 1

Semantics via Syntax. f (4) = if define f (x) =2 x + 55.

Induction and Semantics in Dafny

Handout 9: Imperative Programs and State

Goals: Define the syntax of a simple imperative language Define a semantics using natural deduction 1

Lattice Tutorial Version 1.0

Global Optimization. Lecture Outline. Global flow analysis. Global constant propagation. Liveness analysis. Local Optimization. Global Optimization

Static Analysis. Systems and Internet Infrastructure Security

Static Analysis and Bugfinding

Computer Programming. Basic Control Flow - Loops. Adapted from C++ for Everyone and Big C++ by Cay Horstmann, John Wiley & Sons

A Gentle Introduction to Program Analysis

CHAPTER 8. Copyright Cengage Learning. All rights reserved.

Lecture 3: Recursion; Structural Induction

Program Syntax; Operational Semantics

Abstract Interpretation

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

Types, Expressions, and States

Programming Languages and Compilers Qualifying Examination. Answer 4 of 6 questions.

Semantics. A. Demers Jan This material is primarily from Ch. 2 of the text. We present an imperative

We show that the composite function h, h(x) = g(f(x)) is a reduction h: A m C.

Introduction to Axiomatic Semantics

CS2 Algorithms and Data Structures Note 10. Depth-First Search and Topological Sorting

C++ Data Types. 1 Simple C++ Data Types 2. 3 Numeric Types Integers (whole numbers) Decimal Numbers... 5

Principles of Program Analysis. Lecture 1 Harry Xu Spring 2013

Compilation 2012 Static Analysis

Text Input and Conditionals

COMP combinational logic 1 Jan. 18, 2016

Compiler Structure. Data Flow Analysis. Control-Flow Graph. Available Expressions. Data Flow Facts

Advanced Programming Methods. Introduction in program analysis

CONVENTIONAL EXECUTABLE SEMANTICS. Grigore Rosu CS522 Programming Language Semantics

Iterative Program Analysis Abstract Interpretation

1 Introduction. 3 Syntax

Application: Programming Language Semantics

CONVENTIONAL EXECUTABLE SEMANTICS. Grigore Rosu CS422 Programming Language Semantics

3.3 Small-Step Structural Operational Semantics (Small-Step SOS)

6.001 Notes: Section 15.1

Variables. Substitution

Operational Semantics. One-Slide Summary. Lecture Outline

Lecture Notes on Ints

Lecture 2: NP-Completeness

PROGRAM ANALYSIS & SYNTHESIS

2 Introduction to operational semantics

(How Not To Do) Global Optimizations

Reasoning About Programs Panagiotis Manolios

CS 31: Introduction to Computer Systems. 03: Binary Arithmetic January 29

A Propagation Engine for GCC

Relational Database: The Relational Data Model; Operations on Database Relations

(Refer Slide Time 3:31)

9.5 Equivalence Relations

Lecture 1 Contracts : Principles of Imperative Computation (Fall 2018) Frank Pfenning

Cardinality of Sets MAT231. Fall Transition to Higher Mathematics. MAT231 (Transition to Higher Math) Cardinality of Sets Fall / 15

Lecture Transcript While and Do While Statements in C++

14.1 Encoding for different models of computation

Logic and Computation Lecture 20 CSU 290 Spring 2009 (Pucella) Thursday, Mar 12, 2009

AXIOMS OF AN IMPERATIVE LANGUAGE PARTIAL CORRECTNESS WEAK AND STRONG CONDITIONS. THE AXIOM FOR nop

Lecture 1 Contracts. 1 A Mysterious Program : Principles of Imperative Computation (Spring 2018) Frank Pfenning

CS 6110 S11 Lecture 25 Typed λ-calculus 6 April 2011

Lecture 6. Abstract Interpretation

What Is A Relation? Example. is a relation from A to B.


Flow Analysis. Data-flow analysis, Control-flow analysis, Abstract interpretation, AAM

MATH 22 MORE ABOUT FUNCTIONS. Lecture M: 10/14/2003. Form follows function. Louis Henri Sullivan

3.2 Big-Step Structural Operational Semantics (Big-Step SOS)

Lecture Notes: Widening Operators and Collecting Semantics

Relations I. Margaret M. Fleck. 7 November 2008

AXIOMS FOR THE INTEGERS

4 Integer Linear Programming (ILP)

Material from Recitation 1

Hoare Logic. COMP2600 Formal Methods for Software Engineering. Rajeev Goré

CS152: Programming Languages. Lecture 5 Pseudo-Denotational Semantics. Dan Grossman Spring 2011

Static Program Analysis

CS 3EA3: Sheet 9 Optional Assignment - The Importance of Algebraic Properties

6. Hoare Logic and Weakest Preconditions

COMP 507: Computer-Aided Program Design

The Java Type System (continued)

Static Program Analysis CS701

Uncertain Data Models

Formal Semantics of Programming Languages

THREE LECTURES ON BASIC TOPOLOGY. 1. Basic notions.

Summer 2017 Discussion 10: July 25, Introduction. 2 Primitives and Define

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

Static Program Analysis

SCHEME 8. 1 Introduction. 2 Primitives COMPUTER SCIENCE 61A. March 23, 2017

Formal Semantics of Programming Languages

Program Evaluation. From our evaluation rules we have, We have σ =(σ 0 [2/x])[3/y]. What is the value for σ(y) and σ(x)? How about σ(z), z Loc?

Program Analysis: Lecture 02 Page 1 of 32

Semantics with Applications 3. More on Operational Semantics

Object-Oriented Design Lecture 11 CS 3500 Spring 2010 (Pucella) Tuesday, Feb 16, 2010

Transcription:

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages Lecture 21 Tuesday, April 15, 2014 1 Static program analyses For the last few weeks, we have been considering type systems. The key advantage of type systems is that they provide a static semantics to a program, allowing us to reason about all possible executions of a program before actually running the program. However, in order to achieve this, type systems must approximate the runtime behavior of a program. Type systems are (typically) flow-insensitive: the type of a variable (i.e., the static information associated with the runtime value of a variable) is the same for the entire execution of the program, regardless of which control flow path the program takes. is another form of static semantics, allowing us to reason about the behavior of a program before executing it. Like type systems, abstract interpretation can be used to prove statically that various facts hold true for all possible dynamic executions of a program. In this regard, they can provide greater assurance than testing. Testing can verify the behavior of a program on one, or several, possible execution, but it is difficult to use tests to provide assurance that all possible executions of a program will be acceptable. 2 In abstract interpretation we define an abstract domain, an abstraction or approximation, of the real values manipulated by a program, and define a semantics for the language that uses the abstract domain. The key idea is that the abstract semantics for the language (which manipulates values from the abstract domain) is a faithful approximation of the concrete semantics for the language (which manipulates values from the concrete domain). By requiring the abstract domain to be suitably finite we can ensure that the abstract interpretation of a program is guaranteed to terminate. We consider abstract interpretation in IMP. Moreover, we ll introduce the key ideas by considering an analysis that computes the signs of variables. We start by defining an abstract domain for integers, that captures their sign. AbsInt = {pos, zero, neg, anyint} In contrast to the concrete domain Int of possible integer values, the abstract domain AbsInt is clearly finite. Furthermore, we can give a partial order to the elements of AbsInt as follows: anyint neg zero pos In this diagram (which is called a Hasse diagram), if two elements are connected by an edge, then the element higher up is a conservative approximation of the other element. We say that the element lower in the diagram is more precise the element higher up, and write x y to denote this. For example, pos is more precise than anyint, and indeed, knowing that an integer is positive is more precise than just knowing it is some integer. The ordering is a partial order: it is reflexive, anti-symmetric, and transitive. Note that some elements of a partial order may be incomparable. Indeed, pos and neg are incomparable: neither is more precise than the other. A partial order in which every pair of elements has a least upper bound is called a join semi-lattice. The least upper bound of a pair of elements a and b is called the join of a and b, and is written a b.

When we said earlier that the abstraction needs to be suitably finite, what we really meant is that the join semi-lattice of abstract values should have finite height; it may have an infinite number of elements, but the height needs to be finite. Given the abstract values AbsInt, we define abstract stores AbsStore to be functions from variables to abstract values. σ AbsStore = Var AbsInt We can define a semantics for IMP that executes a program using the abstract store, and abstract values. This is abstract interpretation: the analysis forgets the concrete values of variables, and instead uses an approximation to their value; in this case, we are using the sign of integers. Let s start defining this semantics, using denotational semantics. 2.1 Arithmetic expressions For an arithmetic expression, the analysis uses an abstract store to derive a sign for that expression. The analysis tries to derive a precise sign for the expression. However, due to the lack of knowledge about concrete values of variables, in certain cases it may be conservative and say that the sign of the expression is unknown (represented as anyint). We express the analysis of arithmetic expressions using an abstract denotation A [a]. A [a]:absstore AbsInt pos if n > 0 A [n]σ = zero if n = 0 neg if n < 0 A [x]σ = σ(x) pos A neg [a 1 + a 2 ] = zero anyint if (A [a 1 ]σ = pos A [a 2 ]σ {zero, pos}) (A [a 1 ]σ = zero A [a 2 ]σ = pos) if (A [a 1 ]σ = neg A [a 2 ]σ {zero, neg}) (A [a 1 ]σ = zero A [a 2 ]σ = neg) if A [a 1 ]σ = zero A [a 2 ]σ = zero otherwise Note that all of these evaluations use just the information in the abstract store when reasoning about variables. The evaluation has no knowledge about the concrete values of variables. In the last case, the analysis cannot precisely determine the sign of the expression, and conservatively returns anyint. 2.2 Boolean expressions We can also define a similar evaluation for boolean expressions. We use the following abstract domains for booleans. AbsBool = tru, fls, anybool This domain includes the value anybool, indicating that the precise truth value of an expression cannot be precisely determined. Again, we can define an ordering between these values and form a join semi-lattice domain: anybool fls tru Page 2 of 5

B [b]:absstore AbsBool B [true]σ = tru B [false]σ = fls The evaluation B [a 1 < a 2 ]σ must evaluate the signs of subexpressions a 1 and a 2 (using the denotation for arithmetic expressions) and try to determine the truth value of the comparison based on those signs. tru B [a 1 < a 2 ]σ = fls anybool if (A [a 1 ]σ = neg A [a 2 ]σ {zero, pos} (A [a 1 ]σ = zero A [a 2 ]σ = pos) if (A [a 1 ]σ = zero A [a 2 ]σ {zero, neg} (A [a 1 ]σ = pos A [a 2 ]σ {zero, neg}) otherwise 2.3 Commands To finish up the analysis that determines the sign of variables, we must define how the analysis processes commands. For each command c, we want to execute c in some abstract store before c, and determine an abstract store after the command. At each point during the analysis of c, we will keep track of the current abstraction at that point, but we will have no information about the concrete store (i.e., the concrete integer values of variables). We express the analysis for a command c using an abstract denotation C [c]. C [c]:absstore AbsStore Note that C [c] is a total function, in contrast to the concrete semantics C [c], which is a partial function. (Why?) The analysis of skip, assignment, and sequence is straightforward. C [skip]σ = σ C [x := a]σ = σ[x A [a]σ] C [c 1 ; c 2 ]σ = C [c 2 ](C [c 1 ]σ) More interesting is the analysis of an if command. If we can determine which branch is being taken given the current sign information, then we only need to execute the appropriate branch. However, if we cannot precisely determine the truth value of the condition, then we must conservatively analyze each branch in turn and then combine the results: C [c 1 ]σ if B [b]σ = tru C [if b then c 1 else c 2 ]σ = C [c 2 ]σ if B [b]σ = fls (C [c 1 ]σ) (C [c 2 ]σ) if B [b]σ = anybool Note that the last case is the one that we usually expect in real programs the case when we cannot statically tell which branch is going to be executed (usually because the test depends on some input to the program). The first two cases show that one branch is always taken; then, we can optimize the other branch away, and replace the if command with branch being taken. The join operation over two abstract stores, just takes the join of each variable. That is, suppose that σ = σ σ ; then for any variable x, we define σ (x) = σ(x) σ (x). Note that the semantics for a conditional yields the most precise sign of a variable given its signs on the consequent and alternative of the conditional. If a variable x is positive on both branches, then x will map to pos in the resulting abstract store; but if x has unknown sign on one branch, or if it has different signs on the two branches, then x will map to anyint in the resulting abstract store. Page 3 of 5

2.4 Example programs Let s consider a few examples. Let s try the following sequence of assignments: x := 4; y := 3; z := x + y If we start with a concrete store where all variables are initialized to zero ( σ 0 = [x 0, y 0, z 0]), then the concrete execution of c yields: C [c]σ = [x 4, y 3; z 1] Now if we start with abstract store σ 1 = [x zero, y zero, z zero] and we analyze the program using the sign abstraction, we get: C [c]σ 1 = [x pos, y neg; z anyint] Note that the analysis result C [c]σ is faithful to the concrete result C [c]σ: the sign of each variable correctly describe the actual value of that variable. However, the analysis may yield results that, despite being correct, are not very accurate; in this case for z. This is the price to pay for restricting ourselves to the abstract domain during the analysis. Now consider a program with a conditional: x := 4; y := 3; if x > y then x := x + y else y := y x If we start with the same concrete store σ 0, the program will take the true branch and yield: C [c]σ 0 = [x 7, y 4] Starting with an abstract store σ 2 = [x zero, y zero], we get the following. The first two assignments will yield a store σ 2 = [x pos, y pos]. Based on these signs, the analysis cannot determine whether the test x > y succeeds. Therefore, it analyzes each branch. On the true branch, the analysis can determine that x + y is positive and maintain a positive sign for x: C [x := x + y ]σ 2 = [x pos, y pos] On the false branch, however, the analysis cannot determine the sign of y x since both x and y are positive, but their magnitudes are not known. Therefore, the analysis sets an unknown sign for y: C [y := y x]σ = [x pos, y anyint] Finally, the analysis combines the results from the two branches to get the following final result: C [c]σ = [x pos, y anyint] So the analysis could successfully determine that x is positive at the end of the program, regardless of the branch being taken; but it could not precisely determine the sign of y. As in the previous example, the analysis result correctly models the signs of the values computed in the actual execution of the program. 2.5 Analysis of while loops Consider analysis of a while loop while b do c. In the concrete semantics, we would repeatedly evaluate the loop body c until the test b evaluates to false. Suppose we were executing the while loop with the abstract semantics. If, in the abstract semantics, we evaluate b to tru, then we know we need to execute the loop body. If, after executing the loop body zero or more times, b evaluates to the abstract value fls, then we know that we can stop executing the loop. But what if after executing the loop body zero or more times, b evaluates to anybool? That doesn t tell us whether we need to execute the loop body again or not! To get an intuition for the semantics of a while loop, lets think about what properties we want of the abstract semantics. Intuitively, suppose we have some concrete store σ 0, and we execute a program c, and it Page 4 of 5

terminates with store σ 1. And suppose that abstract store σ 0 is a faithful approximation of σ 0, and σ 1 is the result of executing our abstract semantics. Then σ 1 should be a faithful approximation of σ 1. The following commutative diagram shows this graphically. Abstract semantics C [c] σ 0 σ 1 faithful abstraction Concrete semantics C [c] σ 0 σ 1 faithful abstraction Let s consider instantiating this with the while loop while b do c. For the concrete semantics to get from store σ 0 to σ 1, the loop body is executed zero or more times. So, starting with an abstract store σ 0, we want to find a σ 1 such that σ 1 is an abstraction of executing the loop body zero or more times. To make sure that it is a suitable abstraction after executing the loop body zero times, we need to make sure that σ 0 σ 1. To make sure it is a suitable abstraction after executing the loop body once, we need to make sure that C [c]σ 0 σ 1. Indeed, to make sure it is a suitable abstraction after executing the loop body n times, we need C [c] n σ 0 σ 1. Since, in a join semi-lattice, a b c if and only if a c and b c, we can state our requirements succinctly: C [c] i σ 0 σ 1 i N Note that if σ 1 maps every variable to anyint, then this constraint is satisfied. But such an abstract state is useless: it tells us nothing about the sign of any variable. In fact, we want the most precise abstract store σ 1 that satisfies the requirements. That is, in fact, i N C [c] i σ 0. C [while b do c]σ = i N C [c] i σ Note that σ 1 = i N C [c] i σ 0 is a fixed point of C [c]; that is C [c]σ 1 = σ 1. We have an efficient way to find this fixed point: C [while b do c]σ = while (true) σ = σ C [c]σ; if σ = σ then return σ else σ = σ This algorithm is guaranteed to terminate, since the abstract domain is of finite height. That is, each time through the loop, either we have reached a fixed point (and σ = σ) or σ is less precise than σ for at least one variable. But there are only a finite number of variables used in a program, and, with a finite-height domain, only a finite number of times any given variable can be made less precise. Thus, we are guaranteed to reach a fixed point eventually. Page 5 of 5