Reference Analyses Variable Type Analysis for Java Related points-to analyses for C Steengaard Andersen Field-sensitive points-to for Java Object-sensitive points-to for Java Other analysis approaches 1 VTA - Variable Type Analysis Analysis included in the McGill SOOT system How works? follows type propagation from a new site through plausible chains of assignments to reference variables Builds a type propagation graph, using a CHA call graph Nodes are reference vars/fields/parameters Edges represent reference to reference assignments Initializes types for some reference nodes with type from an associated object creation site Propagates types along directed edges Effectively, this is a points-to analyses using inclusion relations and abstract objects with fields, that traces flow through reference assignments 2 1
cf. V.Sundaresan, et. al, Practical Virtual Method Call Resolution for Java, OOPSLA 00 VTA Uses a separate representative program-wide for each named reference Propagates one abstract object per class to represent all created objects of that class Is a flow-insensitive, context-insensitive analysis //B extends A A a1, a2, a3; B b3; a2 = new A(); a1 = a2; a3 = a1; a3 = b3; b3 = (B)a3; a1 a2 a3 {A} b3 Initial graph and types cf. Sundaresan et. al 3 VTA Example //B extends A A a1, a2, a3; B b3; a2 = new A(); a1 = a2; a3 = a1; a3 = b3; b3 = (B)a3; {A} a1 a2 a3 {A} {A} b3 scc collapsed Empirical results report 30% CHA call graph edges removed in most cases by VTA 4 2
Points-to Analyses for C Popular flow- and context-insensitive formulations of points-to analysis Scalable Good enough for ensuring safety of some optimizations Good for program understanding applications Not great for applications needing precise def-use information (e.g., program slicing, testing) General approach is unification or inclusion constraints Newer versions kept track of individual struct fields as pointer targets Extended to points-to analyses for OOPL reference variables 5 Points-to Analyses for C Steensgaard s algorithm (POPL 96) Uses unification constraints so that for pointer assignments, p = q, algorithm makes Pts-to(p)=Pts-to(q) This union operation is done recursively for multiple-level pointers Reduces the size of the points-to graph (in terms of both nodes and edges) Almost linear solution time in terms of program size, O(n) Uses fast union-find algorithm Imprecision stems from merging points-to sets One points-to set per pointer variable over entire program 6 3
cf M Shapiro and S. Horwitz, Fast and Accurate Flow-insensitive Points-to Analysis POPL 97 1 2 a b c d Steensgaard - Example e 1 2 a b c d 3 e 1.a = &b 2.b = &c 3.d = &e 4.a = &d Points-to sets found 1 a b 2 c 4 d e 3 a b,d c,e 7 Steensgaard Solution Procedure - At a glance Find all pointer assignments in program Form set of points-to graph nodes from pointer variables/fields and variables (in the heap or whose address has been taken) Examine each statement, in arbitrary order, and construct points-to edges Merge nodes (and edges) where indicated by unification constraints After linear pass over these assignments, points-to graph is complete 8 4
Points-to Analysis for C Andersen s analysis Uses inclusion constraints so that for pointer assignments, p = q, algorithm makes Pts-to(q) Õ Pts-to(p) Points-to graph is larger than Steensgaard s and more precise Cubic complexity in program size, O(n 3 ) One points-to set per pointer variable over entire program 9 Andersen - Example 7 1.a = &b 2.b = &c 3.d = &e 4.a = &d 5.d = &f 6.g = d 7.g = *a 1 2 a b c 4 g d 3 e 6 6 Steensgaard solution 5 f a 1 b 2 c 3 d e 4 10 5
Andersen s Solution Procedure - At a glance Find all pointer assignments in program Form set of points-to graph nodes from pointer variables/fields and variables on the heap or whose address is taken Examine each statement, in arbitrary order, and construct points-to edges Need to create more edges when see p = q type assignments so that all outgoing points-to edges from q are copied to be outgoing from p (i.e. processing inclusion constraints) If new outgoing edges are added to q during the algorithm, they must be also copied to p Work results in O(n 3 ) worst case cost 11 Points-to Analysis for Java Which objects may reference variable x point to? Builds a points-to graph to encapsulate points-to relations between objects and between references and objects x o 1 f y o 2 12 6
Field-sensitive Points-to Analysis Based on Andersen s points-to analysis Define and solve a system of annotated setinclusion constraints Handles virtual calls by simulation of run-time method lookup Models the fields of objects Extended BANE (UC Berkeley) constraint solver Analyzes only possibly executed code Ignores unreachable code from libraries Rountev, A. Milnova, B. Ryder, Points-to Analysis for Java Using Annotated Constraints OOPSLA 00 13 Points-to Analysis in Action class A { void m(x p) {..} } class B extends A { X f; void m(x q) { this.f=q; } } B b = new B(); X x = new X(); A a = b; a.m(x); A.m() not analyzed because it s unreachable. a b o 1 this B.m f x o 2 q 14 7
Annotated Constraints Form: L Õ a R L and R denote set expressions Annotation a: additional information (e.g., object fields) Kinds of set expressions L and R Set variables: represent points-to sets ref terms: represent objects Other kinds of expressions 15 Set variables and ref terms Set variables represent points-to sets For each reference variable p: V P For each object o: V o Object o is denoted by term ref(o,v o ) p o o f 1 o 2 ref(o,v o ) Õ V P ref(o 2,V o2 ) Õ f V o 1 16 8
Example: Accessing Fields p = new A(); q = new B(); p.f = q; ref(o 1,V O1 ) Õ V p ref(o 2,V O2 ) Õ V q V p Õ proj(ref,w) p o 1 V q Õ f W q o 2 f Constraint generation 17 Example: Solving Constraints ref(o 1,V O1 ) Õ V p ref(o 2,V O2 ) Õ V q V p Õ proj(ref,w) V q Õ f W Constraint resolution W Õ V O1 o 1.f points to o 2 V q Õ f V O 1 ref(o 2,V O2 ) Õ f V O1 18 9
Example: Virtual Calls p.m(x); V P Õ m lam(v x ) receiver object o ref(o,v O ) Õ V P Actual method called, A.m(z) V x Õ V z ref(o,v O ) Õ V this(a.m) 19 Experiments 23 Java programs: 14 677 user classes Added the necessary library classes Machine: 360 MHz, 512Mb SUN Ultra-60 Cost measured in time and memory Precision (wrt usage in client analyses and tranformations) Object read-write information Call graph construction Synchronization removal and stack allocation 20 10
Analysis Time 400 350 300 Seconds 250 200 150 100 50 0 proxy compress db jb echo raytrace mtrt jtar jlex javacup rabbit jack jflex jess mpegaudio jjtree sablecc javac creature mindterm soot muffin javacc 21 Resolution of Virtual Call Sites 90 80 % Resolved Call Sites 70 60 50 40 30 20 10 0 proxy db echo mtrt jlex Points-to rabbit jflex RTA mpegaudio sablecc creature soot javacc 22 11
Thread-local new sites 70% 60% % Thread-local Sites 50% 40% 30% 20% 10% 0% 23 Imprecision of Context-insensitive Analysis Does not distinguish contexts for instance methods and constructors States of distinct objects are merged Common OOPL features and idioms result in imprecision Encapsulation Inheritance Containers, maps and iterators 24 12
Example: Imprecision class Y extends X { } class A { X f; void m(x q) { this.f=q ; } } A a = new A() ; a.m(new X()) ; A aa = new A() ; aa.m(new Y()) ; a this A.m o 1 o 2 aa o 3 f o 4 f f f q 25 Context Sensitivity Keeping calling contexts distinct during the analysis Classically two approaches (Sharir, Pneuli 1981) Call string - distinguish analysis result by (truncated) call stack on which it is obtained e.g., k-cfa Functional - distinguish analysis result by (partial) program state at call e.g., receiver identity, argument types 26 13
Object-sensitive Points-to Analysis Object sensitivity Form of functional context sensitivity for flow-insensitive points-to analysis of OO languages Object-sensitive Andersen s analysis Object sensitivity also applicable to other analyses Parameterization framework Cost vs. precision tradeoff Empirical evaluation Vs. field-sensitive analysis A. Milnova, A. Rountev, B. G. Ryder, Practical Points-to Analyses for Java, ISSTA 02 27 Object-sensitive Analysis Instance methods and constructors analyzed for different contexts Receiver objects used as calling context Multiple copies of reference variables this.f=q o 1 this o 1 A.m.f=q o 1 28 14
Example: Object-sensitive Analysis o1 class A { X f; void m(x q) { this o1 o3 o1 A.m this.f=q o3; } } A a = new A() ; a.m(new X()) ; A aa = new A() ; aa.m(new Y()) ; a o this 1 A.m o 1 f o 2 aa o 3 o 4 o qa.m 1 o 3 this A.m o 3 qa.m f 29 Implementation Implemented one instance of parameterization framework this, formals and return variables (effectively) replicated Optimized constraint-based analysis using previous technique Comparison with field-sensitive analysis 30 15
Empirical Results 23 Java programs: 14 677 user classes Added the necessary library classes Machine: 360 MHz, 512Mb, SUN Ultra-60 Object-sensitive vs. field-sensitive points-to Found comparable cost with better precision Modification side-effect analysis Virtual call resolution 31 250 Analysis Time 200 Seconds 150 100 50 0 javacc muffin soot mindterm creature sablecc jjtree mpegaudio jess jflex jack rabbit javacup jlex jtar mtrt raytrace echo jb db compress Object Sensitive OOPSLA'01 32 16
Side-effect Analysis: Modified Objects Per Statement 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% OBJECT SENSITIVE OOPSLA 01 1 2 3 4 5 6 7 8 9 10 11 12 13 14 jb jess sablecc raytrace Average One Two or three Four to nine More than nine 33 Improvement in Resolved Calls 60 50 40 Percent 30 20 10 0 Avg javacc muffin soot mindterm creature javac sablecc jjtree mpegaudio jess jflex jack rabbit javacup jlex jtar mtrt raytrace echo jb db compress proxy 34 17
Recent Context-insensitive Points-to Analyses Liang et. al, Paste 01 Empirical comparison of flow- and contextinsensitive analyses Steensgaard- and Andersen-based analyses for Java Static call graph (CHA, RTA) with on-the-fly Experiments with instance fields and abstract class fields Found Andersen (inclusion) analyses significantly more precise than Steensgaard (unification) Whaley and Lam, SAS 02 35 More Reference Analyses Context-sensitive reference analyses Palsberg and Schwartzbach OOPSLA 91 Oxhoj, Palsberg, Schwartzbach ECOOP 92 Plevyak and Chien OOPSLA 94 Agesen ECOOP 95 Chatterjee, Ryder, Landi POPL 99 Ruf PLDI 00 Grove and Chambers TOPLAS 01 Most judged too expensive 36 18
Algorithm Construct Choices Representations Static call graph versus on-the-fly construction Abstract object or object instantiations Fields or no fields Abstract reference (by class), or reference representatives per method, or references program-wide by name Directionality Unification Inclusion constraints Accounting for flow Flow sensitivity Context sensitivity 37 Examples Representations Static call graph VTA versus on-the-fly construction RTA Abstract object XTA or object instantiations field-sensitive Fields or no fields Abstract reference (by class) RTA, or reference representatives per method XTA, or references program-wide by name VTA, fieldsensitive Directionality Unification, Hendren 00, Liang-Paste01 Inclusion constraints, field-sensitive, object-sensitive Accounting for flow Flow sensitivity, Chatterjee POPL99 Context sensitivity, object-sensitive 38 19