Static Program Analysis Part 1 the TIP language

Similar documents
Introduction A Tiny Example Language Type Analysis Static Analysis 2009

Static Program Analysis

Static Program Analysis

Lecture Notes on Static Analysis

Static Program Analysis Part 9 pointer analysis. Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University

Static Program Analysis Part 8 control flow analysis

CS558 Programming Languages

CS558 Programming Languages

A Gentle Introduction to Program Analysis

Principles of Software Construction: Objects, Design, and Concurrency (Part 2: Designing (Sub )Systems)

CS321 Languages and Compiler Design I Winter 2012 Lecture 13

Midterm II CS164, Spring 2006

Program Verification. Aarti Gupta

CS558 Programming Languages

G Programming Languages - Fall 2012

Lecture Notes on Static Analysis

Static Program Analysis Part 7 interprocedural analysis

Chapter 1 INTRODUCTION SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC.

Compilation 2012 Static Analysis

Simple Overflow. #include <stdio.h> int main(void){ unsigned int num = 0xffffffff;

Advanced Systems Programming

CS201- Introduction to Programming Current Quizzes

CS558 Programming Languages

Compilation 2012 Static Type Checking

C#: framework overview and in-the-small features

CS558 Programming Languages. Winter 2013 Lecture 3

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc.

Advanced Programming Methods. Introduction in program analysis

Chapter 3 (part 3) Describing Syntax and Semantics

COS 320. Compiling Techniques

Chapter 13: Reference. Why reference Typing Evaluation Store Typings Safety Notes

More Lambda Calculus and Intro to Type Systems

In Java we have the keyword null, which is the value of an uninitialized reference type

EL2310 Scientific Programming

Lecture Notes on Memory Layout

Software Security: Vulnerability Analysis

Formal Semantics of Programming Languages

Lecture 14. No in-class files today. Homework 7 (due on Wednesday) and Project 3 (due in 10 days) posted. Questions?

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

CSE 403: Software Engineering, Fall courses.cs.washington.edu/courses/cse403/16au/ Static Analysis. Emina Torlak

CS-XXX: Graduate Programming Languages. Lecture 9 Simply Typed Lambda Calculus. Dan Grossman 2012

Static Type Checking. Static Type Checking. The Type Checker. Type Annotations. Types Describe Possible Values

Why. an intermediate language for deductive program verification

Lectures 20, 21: Axiomatic Semantics

Mutable References. Chapter 1

CSCI-GA Scripting Languages

Begin at the beginning

Static Analysis in C/C++ code with Polyspace

Introduction to Programming Using Java (98-388)

Short Notes of CS201

Simply-Typed Lambda Calculus

CS558 Programming Languages

Closures. Mooly Sagiv. Michael Clarkson, Cornell CS 3110 Data Structures and Functional Programming

More Lambda Calculus and Intro to Type Systems

CS201 - Introduction to Programming Glossary By

C Review. MaxMSP Developers Workshop Summer 2009 CNMAT

Compiler Construction 2010/2011 Loop Optimizations

Agenda. Peer Instruction Question 1. Peer Instruction Answer 1. Peer Instruction Question 2 6/22/2011

Name: CIS 341 Final Examination 10 December 2008

Static Analysis of C++ Projects with CodeSonar

Formal Semantics of Programming Languages

CMSC 330: Organization of Programming Languages. OCaml Imperative Programming

Outline. Java Models for variables Types and type checking, type safety Interpretation vs. compilation. Reasoning about code. CSCI 2600 Spring

School of Computer & Communication Sciences École Polytechnique Fédérale de Lausanne

CS2141 Software Development using C/C++ Debugging

Today Program Analysis for finding bugs, especially security bugs problem specification motivation approaches remaining issues

Compiler Construction 2016/2017 Loop Optimizations

Assignment 4: Semantics

Computability Summary

Intermediate Code Generation

Automatic Software Verification

Type Checking. Outline. General properties of type systems. Types in programming languages. Notation for type rules.

Outline. Introduction. 2 Proof of Correctness. 3 Final Notes. Precondition P 1 : Inputs include

CSI33 Data Structures

CIS 341 Final Examination 3 May 2011

1. true / false By a compiler we mean a program that translates to code that will run natively on some machine.

Outline. General properties of type systems. Types in programming languages. Notation for type rules. Common type rules. Logical rules of inference

Programming Language Features. CMSC 330: Organization of Programming Languages. Turing Completeness. Turing Machine.

CMSC 330: Organization of Programming Languages

Topic 7: Intermediate Representations

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

More Lambda Calculus and Intro to Type Systems

CS4215 Programming Language Implementation. Martin Henz

CS61C : Machine Structures

Homework I - Solution

QUIZ. 1. Explain the meaning of the angle brackets in the declaration of v below:

Lecture Notes on Arrays

Complexity, Induction, and Recurrence Relations. CSE 373 Help Session 4/7/2016

Lecture 3 Notes Arrays

CS 261 Fall C Introduction. Variables, Memory Model, Pointers, and Debugging. Mike Lam, Professor

Seminar in Software Engineering Presented by Dima Pavlov, November 2010

Compilers and computer architecture: Semantic analysis

Static Analysis in Practice

INF 212 ANALYSIS OF PROG. LANGS FUNCTION COMPOSITION. Instructors: Crista Lopes Copyright Instructors.

Design Issues. Subroutines and Control Abstraction. Subroutines and Control Abstraction. CSC 4101: Programming Languages 1. Textbook, Chapter 8

CS201 Some Important Definitions

Kurt Schmidt. October 30, 2018

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

Tail Calls. CMSC 330: Organization of Programming Languages. Tail Recursion. Tail Recursion (cont d) Names and Binding. Tail Recursion (cont d)

Lectures 5-6: Introduction to C

Transcription:

Static Program Analysis Part 1 the TIP language http://cs.au.dk/~amoeller/spa/ Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University

Questions about programs Does the program terminate on all inputs? How large can the heap become during execution? Can sensitive information leak to non-trusted users? Can non-trusted users affect sensitive information? Are buffer-overruns possible? Data races? SQL injections? XSS? 2

Program points foo(p,x) { var f,q; if (*p==0) { f=1; } else { q = malloc; *q = (*p)-1; f=(*p)*(x(q,x)); } return f; } any point in the program = any value of the PC Invariants: A property holds at a program point if it holds in any such state for any execution with any input 3

Questions about program points Will the value of x be read in the future? Can the pointer p be null? Which variables can p point to? Is the variable x initialized before it is read? What is a lower and upper bound on the value of the integer variable x? At which program points could x be assigned its current value? Do p and q point to disjoint structures in the heap? Can this assert statement fail? 4

Why are the answers interesting? Increase efficiency resource usage compiler optimizations Ensure correctness verify behavior catch bugs early Support program understanding Enable refactorings 5

Testing? Program testing can be used to show the presence of bugs, but never to show their absence. [Dijkstra, 1972] Testing often takes 50% of development cost Concurrency errors are hard to (re)produce with testing ( Heisenbugs ) 6

Programs that reason about programs a program P a program analyzer A P always works correctly P fails for some inputs 7

Requirements to the perfect program analyzer SOUNDNESS (don t miss any errors) COMPLETENESS (don t raise false alarms) TERMINATION (always give an answer)

Rice s theorem, 1953 9

Rice s theorem Any non-trivial property of the behavior of programs in a Turing-complete language is undecidable! 10

Reduction to the halting problem Can we decide if a variable has a constant value? x = 17; if (TM( j )) x = 18; Here, x is constant if and only if the j th Turing machine does not halt on empty input 11

Undecidability of program correctness M e(t) e(s T ) Build e(s T ) from e(t) P Is the FAIL state unreachable in the given program (for any input)? yes no ACCEPT REJECT w S T for w moves Simulate T on input e(t) (ignoring input w) If simulation reaches ACCEPT, then goto FAIL Otherwise, just terminate or loop forever (without reaching FAIL) Does M accept input e(m)? (Note: this proof works even if we only consider programs that always terminate!) 12

Approximation Approximate answers may be decidable! The approximation must be conservative: i.e. only err on the safe side which direction depends on the client application We'll focus on decision problems More subtle approximations if not only yes / no e.g. memory usage, pointer targets 13

False positives and false negatives 14

Example approximations Decide if a given function is ever called at runtime: if no, remove the function from the code if yes, don t do anything the no answer must always be correct if given Decide if a cast (A)x will always succeed: if yes, don t generate a runtime check if no, generate code for the cast the yes answer must always be correct if given 15

Beyond yes / no problems How much memory / time may be used in any execution? Which variables may be the targets of a pointer variable p? 16

The engineering challenge A correct but trivial approximation algorithm may just give the useless answer every time The engineering challenge is to give the useful answer often enough to fuel the client application... and to do so within reasonable time and space This is the hard (and fun) part of static analysis! 17

Bug finding int main() { char *p,*q; p = NULL; printf("%s",p); q = (char *)malloc(100); p = q; free(q); *p = 'x'; free(p); p = (char *)malloc(100); p = (char *)malloc(100); q = p; strcat(p,q); } gcc Wall foo.c lint foo.c No errors! 18

Does anyone use static program analysis? For optimization: every optimizing compiler and modern JIT For verification or error detection: 19

A constraint-based approach Conceptually separates the analysis specification from algorithmic aspects and implementation details program to analyze constraint solver p = &int q = &int alloc = &int x = foo = &n = &int main = ()->int solution mathematical constraints 20

Challenging features in modern programming language Higher-order functions Mutable records or objects, arrays Integer or floating-point computations Dynamic dispatching Inheritance Exceptions Reflection 21

The TIP language Tiny Imperative Programming language Example language used in this course: minimal C-style syntax cut down as much as possible enough features to make static analysis challenging and fun Scala implementation available 22

Expressions E I X E+E E E E*E E/E E>E E==E ( E ) input I represents an integer constant X represents an identifier (x, y, z, ) input expression reads an integer from the input stream comparison operators yield 0 (false) or 1 (true) 23

Statements S X = E; output E; S S if (E) { S } [else { S }]? while (E) { S } In conditions, 0 is false, all other values are true The output statement writes an integer value to the output stream 24

Functions Functions take any number of arguments and return a single value: F X ( X,..., X ) { [var X,..., X;]? S return E; } The optional var block declares a collection of uninitialized variables Function calls are an extra kind of expressions: E X ( E,..., E ) 25

Records E { X:E,, X:E } E.X 26

E alloc E &X *E null Heap pointers S *X = E; (No pointer arithmetic) 27

Function pointers Function names denote function pointers Generalized function calls: E E( E,..., E ) Function pointers suffice to illustrate the main challenges with methods and higher-order functions 28

Programs A program is a collection of functions The function named main initiates execution its arguments are taken from the input stream its result is placed on the output stream We assume that all declared identifiers are unique P F... F 29

An iterative factorial function ite(n) { var f; f = 1; while (n>0) { f = f*n; n = n-1; } return f; } 30

A recursive factorial function rec(n) { var f; if (n==0) { f=1; } else { f=n*rec(n-1); } return f; } 31

An unnecessarily complicated function foo(p,x) { var f,q; if (*p==0) { f=1; } else { q = alloc 0; *q = (*p)-1; f=(*p)*(x(q,x)); } return f; } main() { var n; n = input; return foo(&n,foo); } 32

Beyond TIP Other common language features in mainstream languages: global variables objects nested functions 33

Control flow graphs ite(n) { var f; f = 1; while (n>0) { f = f*n; n = n-1; } return f; } false var f f=1 n>0 true f=f*n n=n-1 return f 34

Control flow graphs A control flow graph (CFG) is a directed graph: nodes correspond to program points (either immediately before or after statements) edges represent possible flow of control A CFG always has a single point of entry a single point of exit (think of them as no-op statements) Let v be a node in a CFG pred(v) is the set of predecessor nodes succ(v) is the set of successor nodes 35

CFG construction (1/3) For the simple while fragment of TIP, CFGs are constructed inductively CFGs for simple statements etc.: X = E output E return E var X 36

CFG construction (2/3) For a statement sequence S 1 S 2 : eliminate the exit node of S 1 and the entry node of S 2 glue the statements together S 1 S 1 S 2 S 2 37

CFG construction (3/3) Similarly for the other control structures: false E true E false true false true E S S 1 S 2 S 38

Normalization Sometimes convenient to assume that there are no nested expressions Normalization: flatten nested expressions, using fresh variables x = f(y+3)*5; t1 = y+3; t2 = f(t1); x = t2*5; 39