Program Analysis (www.cse.iitb.ac.in/ uday) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay January 2008
Part 1 About These Slides
CS 618 Program Analysis: About These Slides 1/19 Copyright These slides are in the series of lectures on CS638 Program Analysis course at. A large part of these slides are based on the following material: 1., Amitabha Sanyal, and Amey Karkare. Heap Reference Analysis Using Access Graphs. ACM Transactions on Programming Languages and Systems, 30(1), Nov 2007. 2., Amitabha Sanyal, and Bageshri Karkare. Data Flow Analysis: Theory and Practice. CRC Press USA. (Under Publication) 3.. Data Flow Analysis. In The Compiler Design Handbook. Editors: Priti Shankar and Y. N. Srikant. CRC Press USA. 2002. (2) is not yet published and the copyrights are reserved by the authors. The copyright for (1) is held by ACM and that for (3) is held by CRC Press USA. These slides may be used only for academic purposes.
CS 618 Program Analysis: About These Slides 2/19 Standard Memory Architecture of Programs Static Data Heap allocation provides the flexibility of Stack Variable Sizes. Data structures can grow or shrink as desired at runtime. (Not bound to the declarations in program.) Heap Code Variable Lifetimes. Data structures can be created and destroyed as desired at runtime. (Not bound to the activations of procedures.)
CS 618 Program Analysis: About These Slides 3/19 Managing Heap Memory Decision 1: When to Allocate? Explicit. Specified in the programs. (eg. Imperative/OO languages) Implicit. Decided by the language processors. (eg. Declarative Languages)
CS 618 Program Analysis: About These Slides 3/19 Managing Heap Memory Decision 1: When to Allocate? Explicit. Specified in the programs. (eg. Imperative/OO languages) Implicit. Decided by the language processors. (eg. Declarative Languages) Decision 2: When to Deallocate? Explicit. Manual Memory Management (eg. C/C++) Implicit. Automatic Memory Management aka Garbage Collection (eg. Java/Declarative languages)
CS 618 Program Analysis: About These Slides 4/19 State of Art in Manual Deallocation Memory leaks 10% to 20% of last development effort goes in plugging leaks Tool assisted manual plugging Purify, Electric Fence, RootCause, GlowCode, yaktest, Leak Tracer, BDW Garbage Collector, mtrace, memwatch, dmalloc etc. All leak detectors are dynamic (and hence specific to execution instances) generate massive reports to be perused by programmers usually do not locate last use but only allocation escaping a call At which program point should a leak be plugged?
CS 618 Program Analysis: About These Slides 5/19 Garbage Collection Automatic Deallocation Retain active data structure. Deallocate inactive data structure. What is an Active Data Structure?
CS 618 Program Analysis: About These Slides 5/19 Garbage Collection Automatic Deallocation Retain active data structure. Deallocate inactive data structure. What is an Active Data Structure? If an object does not have an access path, (i.e. it is unreachable) then its memory can be reclaimed.
CS 618 Program Analysis: About These Slides 5/19 Garbage Collection Automatic Deallocation Retain active data structure. Deallocate inactive data structure. What is an Active Data Structure? If an object does not have an access path, (i.e. it is unreachable) then its memory can be reclaimed. What if an object has an access path, but is not accessed after the given program point?
CS 618 Program Analysis: About These Slides 6/19 What is Garbage? Garbage 1 w = x // x points to m a 2 if (x.data < max) 3 x = x. 4 y = x. 5 z = New class of z 6 y = y. 7 z.sum = x.data + y.data Garbage z x w y Stack p a q b i c f j m Heap d e g h k l n o
CS 618 Program Analysis: About These Slides 6/19 What is Garbage? False Garbage 1 w = x // x points to m a 2 if (x.data < max) 3 x = x. 4 y = x. 5 z = New class of z 6 y = y. 7 z.sum = x.data + y.data Garbage z x w y Stack p a q b i c f j m Heap d e g h k l n o
CS 618 Program Analysis: About These Slides 6/19 What is Garbage? True Garbage 1 w = x // x points to m a 2 if (x.data < max) 3 x = x. 4 y = x. 5 z = New class of z 6 y = y. 7 z.sum = x.data + y.data Garbage z x w y Stack p a q b i c f j m Heap d e g h k l n o
CS 618 Program Analysis: About These Slides 6/19 What is Garbage? Garbage 1 w = x // x points to m a 2 if (x.data < max) 3 x = x. 4 y = x. 5 z = New class of z 6 y = y. 7 z.sum = x.data + y.data Garbage z x w y Stack p a q b i c f j m Heap d e g h k l n o All white nodes are unused and should be considered garbage
CS 618 Program Analysis: About These Slides 7/19 Is Reachable Same as Live? From www.memorymanagement.org/glossary live (also known as alive, active) : Memory(2) or an object is live if the program will read from it in future. The term is often used more broadly to mean reachable. It is not possible, in general, for garbage collectors to determine exactly which objects are still live. Instead, they use some approximation to detect objects that are provably dead, such as those that are not reachable. Similar terms: reachable. Opposites: dead. See also: undead.
CS 618 Program Analysis: About These Slides 8/19 Is Reachable Same as Live? Not really. Most of us know that. Even with the state of art of garbage collection, 24% to 76% unused memory remains unclaimed Yet we have no way of distinguishing. Over a dozen reported approaches (since 1996), no real success.
CS 618 Program Analysis: About These Slides 9/19 Cedar Mesa Folk Wisdom Make the unused memory unreachable by setting references to NULL. (GC FAQ: http://www.iecc.com/gclist/gc-harder.html) z x w y p a q b i X X c f j m d e g h k l n o Stack Heap
CS 618 Program Analysis: About These Slides 9/19 Cedar Mesa Folk Wisdom Make the unused memory unreachable by setting references to NULL. (GC FAQ: http://www.iecc.com/gclist/gc-harder.html) z x w y p a q b i c f j m d e g h k l n o Stack Heap
CS 618 Program Analysis: About These Slides 10/19 Cedar Mesa Folk Wisdom Most promising, simplest to understand, yet the hardest to implement. Which references should be set to NULL? Most approaches rely on feedback from profiling. No systematic and clean solution.
CS 618 Program Analysis: About These Slides 11/19 Distinguishing Between Reachable and Live The state of art Eliminating objects reachable from root variables which are not live. Implemented in current Sun JVMs. Uses liveness data flow analysis of root variables (stack data). What about liveness of heap data?
CS 618 Program Analysis: About These Slides 12/19 Liveness of Stack Data 1 w = x // x points to m a 2 while (x.data < max) Heap 3 x = x. 4 y = x. 5 z = New class of z 6 y = y. Stack 7 z.sum = x.data + y.data if changed to while
CS 618 Program Analysis: About These Slides 13/19 Liveness of Stack Data w = x while (x.data < max) x = x. y = x. z = New class of z y = y. z.sum = x.data + y.data No variable is used beyond this program point
CS 618 Program Analysis: About These Slides 13/19 Liveness of Stack Data w = x while (x.data < max) x = x. y = x. z = New class of z Current values of x, y, and z are used beyond this program point y = y. z.sum = x.data + y.data Live Dead
CS 618 Program Analysis: About These Slides 13/19 Liveness of Stack Data w = x while (x.data < max) y = x. x = x. Current values of x, y, and z are used beyond this program point The value of y is different before and after the assignment to y z = New class of z y = y. z.sum = x.data + y.data
CS 618 Program Analysis: About These Slides 13/19 Liveness of Stack Data w = x while (x.data < max) y = x. x = x. The current values of x and y are used beyond this program point The current value of z is not used beyond this program point z = New class of z y = y. z.sum = x.data + y.data
CS 618 Program Analysis: About These Slides 13/19 Liveness of Stack Data w = x while (x.data < max) x = x. y = x. z = New class of z y = y. The current values of x is used beyond this program point Current values of y and z are not used beyond this program point z.sum = x.data + y.data
CS 618 Program Analysis: About These Slides 13/19 Liveness of Stack Data w = x while (x.data < max) x = x. y = x. z = New class of z y = y. Nothing is known as of now Some information will be available in the next iteration point z.sum = x.data + y.data
CS 618 Program Analysis: About These Slides 13/19 Liveness of Stack Data w = x while (x.data < max) y = x. x = x. z = New class of z Current value of x is used beyond this program point However its value is different before and after the assignment y = y. z.sum = x.data + y.data
CS 618 Program Analysis: About These Slides 13/19 Liveness of Stack Data w = x while (x.data < max) x = x. y = x. z = New class of z Current value of x is used beyond this program point There are two control flow paths beyond this program point y = y. z.sum = x.data + y.data
CS 618 Program Analysis: About These Slides 13/19 Liveness of Stack Data w = x while (x.data < max) x = x. Current value of x is used beyond this program point y = x. z = New class of z y = y. z.sum = x.data + y.data
CS 618 Program Analysis: About These Slides 13/19 Liveness of Stack Data w = x while (x.data < max) Current value of x is used beyond this program point x = x. y = x. z = New class of z y = y. z.sum = x.data + y.data
CS 618 Program Analysis: About These Slides 13/19 Liveness of Stack Data w = x while (x.data < max) x = x. y = x. z = New class of z y = y. z.sum = x.data + y.data End of iteration #1 Live Dead
CS 618 Program Analysis: About These Slides 13/19 Liveness of Stack Data w = x while (x.data < max) x = x. y = x. z = New class of z y = y. z.sum = x.data + y.data End of iteration #2 Live Dead
CS 618 Program Analysis: About These Slides 14/19 Applying Cedar Mesa Folk Wisdom to Heap Data Liveness Analysis of Heap Data If the while loop is not executed even once. 1 w = x // x points to m a 2 while (x.data < max) 3 x = x. 4 y = x. 5 z = New class of z 6 y = y. 7 z.sum = x.data + y.data z x w y p a q b i c f j m d e g h k l n o Stack Heap
CS 618 Program Analysis: About These Slides 14/19 Applying Cedar Mesa Folk Wisdom to Heap Data Liveness Analysis of Heap Data If the while loop is executed once. 1 w = x // x points to m a 2 while (x.data < max) 3 x = x. 4 y = x. 5 z = New class of z 6 y = y. 7 z.sum = x.data + y.data z x w y p a q b i c f j m d e g h k l n o Stack Heap
CS 618 Program Analysis: About These Slides 14/19 Applying Cedar Mesa Folk Wisdom to Heap Data Liveness Analysis of Heap Data If the while loop is executed twice. 1 w = x // x points to m a 2 while (x.data < max) 3 x = x. 4 y = x. 5 z = New class of z 6 y = y. 7 z.sum = x.data + y.data z x w y p a q b i c f j m d e g h k l n o Stack Heap
CS 618 Program Analysis: About These Slides 15/19 The Moral of the Story Mappings between access expressions and l-values keep changing This is a rule for heap data For stack and static data, it is an exception! Static analysis of programs has made significant progress for stack and static data. What about heap data? Given two access expressions at a program point, do they have the same l-value? Given the same access expression at two program points, does it have the same l-value?
CS 618 Program Analysis: About These Slides 16/19 Our Solution 1 w = x 2 while (x.data < max) y = z = null w = null { x. = null 3 x = x. } 4 y = x. 5 z = New class of z 6 y = y. 7 z.sum = x.data + y.data x. = x.. = null x... = null x... = null x. = y. = null y.. = y.. = null z. = z. = null y. = y. = null x = y = z = null
CS 618 Program Analysis: About These Slides 17/19 Our Solution y = z = null 1 w = x w = null 2 while (x.data < max) { x. = null 3 x = x. } x. = x.. = null x... = null x... = null 4 y = x. x. = y. = null y.. = y.. = null 5 z = New class of z z. = z. = null 6 y = y. y. = y. = null 7 z.sum = x.data + y.data x = y = z = null While loop is not executed even once z x w y Stack p a q b i c f j m Heap d e g h k l n o
CS 618 Program Analysis: About These Slides 17/19 Our Solution y = z = null 1 w = x w = null 2 while (x.data < max) { x. = null 3 x = x. } x. = x.. = null x... = null x... = null 4 y = x. x. = y. = null y.. = y.. = null 5 z = New class of z z. = z. = null 6 y = y. y. = y. = null 7 z.sum = x.data + y.data x = y = z = null While loop is not executed even once z x w y Stack p a q b i c f j m Heap d e g h k l n o
CS 618 Program Analysis: About These Slides 17/19 Our Solution y = z = null 1 w = x w = null 2 while (x.data < max) { x. = null 3 x = x. } x. = x.. = null x... = null x... = null 4 y = x. x. = y. = null y.. = y.. = null 5 z = New class of z z. = z. = null 6 y = y. y. = y. = null 7 z.sum = x.data + y.data x = y = z = null While loop is not executed even once z x w y Stack p a q b i c f j m Heap d e g h k l n o
CS 618 Program Analysis: About These Slides 17/19 Our Solution y = z = null 1 w = x w = null 2 while (x.data < max) { x. = null 3 x = x. } x. = x.. = null x... = null x... = null 4 y = x. x. = y. = null y.. = y.. = null 5 z = New class of z z. = z. = null 6 y = y. y. = y. = null 7 z.sum = x.data + y.data x = y = z = null While loop is not executed even once z x w y Stack p a q b i c f j m Heap d e g h k l n o
CS 618 Program Analysis: About These Slides 17/19 Our Solution y = z = null 1 w = x w = null 2 while (x.data < max) { x. = null 3 x = x. } x. = x.. = null x... = null x... = null 4 y = x. x. = y. = null y.. = y.. = null 5 z = New class of z z. = z. = null 6 y = y. y. = y. = null 7 z.sum = x.data + y.data x = y = z = null While loop is not executed even once z x w y Stack p a q b i c f j m Heap d e g h k l n o
CS 618 Program Analysis: About These Slides 17/19 Our Solution y = z = null 1 w = x w = null 2 while (x.data < max) { x. = null 3 x = x. } x. = x.. = null x... = null x... = null 4 y = x. x. = y. = null y.. = y.. = null 5 z = New class of z z. = z. = null 6 y = y. y. = y. = null 7 z.sum = x.data + y.data x = y = z = null While loop is not executed even once z x w y Stack p a q b i c f j m Heap d e g h k l n o
CS 618 Program Analysis: About These Slides 17/19 Our Solution y = z = null 1 w = x w = null 2 while (x.data < max) { x. = null 3 x = x. } x. = x.. = null x... = null x... = null 4 y = x. x. = y. = null y.. = y.. = null 5 z = New class of z z. = z. = null 6 y = y. y. = y. = null 7 z.sum = x.data + y.data x = y = z = null While loop is not executed even once z x w y Stack p a q b i c f j m Heap d e g h k l n o
CS 618 Program Analysis: About These Slides 17/19 Our Solution y = z = null 1 w = x w = null 2 while (x.data < max) { x. = null 3 x = x. } x. = x.. = null x... = null x... = null 4 y = x. x. = y. = null y.. = y.. = null 5 z = New class of z z. = z. = null 6 y = y. y. = y. = null 7 z.sum = x.data + y.data x = y = z = null z x w y Stack While loop is executed once p a q b i c f j m Heap d e g h k l n o
CS 618 Program Analysis: About These Slides 17/19 Our Solution y = z = null 1 w = x w = null 2 while (x.data < max) { x. = null 3 x = x. } x. = x.. = null x... = null x... = null 4 y = x. x. = y. = null y.. = y.. = null 5 z = New class of z z. = z. = null 6 y = y. y. = y. = null 7 z.sum = x.data + y.data x = y = z = null z x w y Stack While loop is executed twice p a q b i c f j m Heap d e g h k l n o
CS 618 Program Analysis: About These Slides 18/19 Some Observations y = z = null 1 w = x w = null 2 while (x.data < max) { x. = null 3 x = x. } x. = x.. = null x... = null x... = null 4 y = x. x. = y. = null y.. = y.. = null 5 z = New class of z z. = z. = null 6 y = y. y. = y. = null 7 z.sum = x.data + y.data x = y = z = null Node i is live but link a i is nullified z x w y Stack p a q b i c f j m Heap d e g h k l n o
CS 618 Program Analysis: About These Slides 18/19 Some Observations y = z = null 1 w = x w = null 2 while (x.data < max) { x. = null 3 x = x. } x. = x.. = null x... = null x... = null 4 y = x. x. = y. = null y.. = y.. = null 5 z = New class of z z. = z. = null 6 y = y. y. = y. = null 7 z.sum = x.data + y.data x = y = z = null New access expressions are created. Can they cause exceptions? z x w y Stack p a q b i c f j m Heap d e g h k l n o
CS 618 Program Analysis: About These Slides 19/19 Goals of This Tutorial Main Goal: Explain how to get the modified program Side goals:
CS 618 Program Analysis: About These Slides 19/19 Goals of This Tutorial Main Goal: Explain how to get the modified program Side goals: Data Flow Values Acceptable Operations Desired Solutions Practical Algorithms Explain the intuitions Relate them to mathematical rigour Highlight the frontiers