Pointer Analysis in the Presence of Dynamic Class Loading Martin Hirzel, Amer Diwan University of Colorado at Boulder Michael Hind IBM T.J. Watson Research Center 1
Pointer analysis motivation Code a = new C( ); // G b = new C( ); // H a = b; a.f = b; Clients Tools Optimizations What is it good for? What does it do? Code browsing Code transformations Error detection Devirtualization Load elimination Parallelization Connectivity-based garbage collection Points-to sets pointsto(a) == {G,H} pointsto(b) == {H} pointsto(g.f ) == {H} pointsto(h.f ) == {H} 2
Static flow- and context-insensitive pointer analysis by Andersen Analysis components Analysis data structures Call graph builder Code Method compilation Constraint finder Constraint propagator Clients Points-to sets Constraint graph 3
Static analysis can not deal with all of Java Class loading may be implicitly triggered by any Constructor call Static field access Static method call Classes may come from the web or be generated on the fly Pretending a static world fails for most real-world applications 4
Java challenges Analysis components Call graph builder Analysis data structures 1. Online call graph building Method compilation 4. Reflection and native code Constraint finder 3. Unresolved types 2. Re-propagation Constraint propagator Clients Constraint graph 5
1. Online call graph building caller a = x.m(b); callee A::m(c) { return d; } 6
1. Online call graph building caller a = x.m(b); e = y.m(f ); callee A::m(c) { return d; } 7
1. Online call graph building caller a = x.m(b); e = y.m(f ); callee A::m(c) { return d; } B::m(g) { return h; } 8
Architecture for online call graph building Analysis components Call graph builder Analysis data structures Caller/callee look-up Method compilation Constraint finder Constraint propagator Clients Constraint graph 9
Java challenges Analysis components Call graph builder Analysis data structures Caller/callee look-up Method compilation 4. Reflection and native code Constraint finder 3. Unresolved types 2. Re-propagation Constraint propagator Clients Constraint graph 10
2. Focused re-propagation Code a = new C( ); // G b = new C( ); // H a.f = b; a = b; Points-to sets pointsto(a) == {G} pointsto(b) == {H} pointsto(g.f ) == {H} pointsto(h.f ) == { } 11
2. Focused re-propagation Code a = new C( ); // G b = new C( ); // H a.f = b; a = b; Points-to sets pointsto(a) == {G,H} pointsto(b) == {H} pointsto(g.f ) == {H} pointsto(h.f ) == { } 12
2. Focused re-propagation Code a = new C( ); // G b = new C( ); // H a.f = b; a = b; Points-to sets pointsto(a) == {G,H} pointsto(b) == {H} pointsto(g.f ) == {H} pointsto(h.f ) == {H} 13
Architecture for focused re-propagation Analysis components Call graph builder Analysis data structures Caller/callee look-up Method compilation Constraint finder Propagator worklist Constraint propagator Clients Constraint graph 14
Java challenges Analysis components Call graph builder Analysis data structures Caller/callee look-up Method compilation 4. Reflection and native code Constraint finder 3. Unresolved types Constraint propagator Clients Propagator worklist Constraint graph 15
3. Unresolved types caller X x = ; a = x.m(b);? Can X have a subclass that inherits m from Y? callee Y::m(c) { return d; }! Cannot tell before X is resolved! 16
Architecture for managing unresolved types Virtual machine events Analysis components Analysis data structures Call graph builder Caller/callee look-up Method compilation Type resolution Constraint finder Resolution manager Constraint propagator Clients Deferred constraints Propagator worklist Constraint graph 17
Java challenges Virtual machine events Analysis components Analysis data structures Call graph builder Caller/callee look-up Method compilation 4. Reflection and native code Type resolution Constraint finder Resolution manager Constraint propagator Clients Deferred constraints Propagator worklist Constraint graph 18
4. Reflection and native code Reflection Field f = B.class.getField( ); B b = ; f.set(b,v); Java-side code a = b.m(c); Native code Object VM_JNIFunctions. CallObjectMethod(method, args) { return method.invoke(args); } Native-side code 00100101 01001110 10010011 01001001 10001111 10001111 00100101 19
Architecture for dealing with reflection and native code Virtual machine events Analysis components Analysis data structures Call graph builder Caller/callee look-up Method compilation Reflection execution Native code execution Type resolution Constraint finder Resolution manager Constraint propagator Clients Deferred constraints Propagator worklist Constraint graph 20
Other events leading to constraints Virtual machine events Analysis components Analysis data structures Building and start-up Bytecode attributes Method compilation Reflection execution Native code execution Type resolution Call graph builder Constraint finder Resolution manager Constraint propagator Clients Caller/callee look-up Deferred constraints Propagator worklist Constraint graph 21
Clients using our pointer analysis Virtual machine events Analysis components Analysis data structures Building and start-up Bytecode attributes Method compilation Reflection execution Native code execution Type resolution Call graph builder Constraint finder Resolution manager Constraint propagator Clients Caller/callee look-up Deferred constraints Propagator worklist Constraint graph 22
Dealing with invalidated results Many techniques from prior work Guard optimized code (extant analysis) Pre-existence based inlining On-stack replacement and more Connectivity-based garbage collection Trigger propagator only before collection Merge partitions if necessary 23
Evaluation methodology Java virtual machine Jikes RVM from IBM, is itself written in Java Benchmarks SPECjvm98 suite, xalan, hsql Results not comparable to static analysis Analyze more code: Jikes RVM adds a lot of Java code Analyze less code: Not all application classes get loaded 24
Propagation cost Eager At GC At End Count Avg. Total Count Avg. Total Total hsql 391 10.1s 1h06m 6 1m17s 7m40s 7m07s jess 734 16.8s 3h26m 3 1m58s 5m53s 3m02s javac 1,103 12.5s 3h50m 5 1m54s 9m32s 6m27s xalan 1,726 11.2s 5h22m 1 2m01s 2m01s 7m45s Eagerness trades off average cost against total cost On average, focused re-propagation is much cheaper than full propagation Total cost is a function of code size and propagator eagerness 25
How long does a program have to run to amortize the analysis cost? Eager At GC At End Count Avg. Total Count Avg. Total Total hsql 391 10.1s 1h06m 6 1m17s 7m40s 7m07s jess 734 16.8s 3h26m 3 1m58s 5m53s 3m02s javac 1,103 12.5s 3h50m 5 1m54s 9m32s 6m27s xalan 1,726 11.2s 5h22m 1 2m01s 2m01s 7m45s Analysis cost to amortize Application runtime 5m 15m 1h 5h Overall analysis overhead 10% 5% 2.5% 50m 1h40m 3h20m 2h30m 5h 10h 10h 20h 1d16h 1d16h 4d04h 8d08h Long-running applications can amortize not-too-eager analysis cost 26
Validation Virtual machine events Analysis components Analysis data structures Building and start-up Bytecode attributes Method compilation Reflection execution Native code execution Type resolution Call graph builder Constraint finder Resolution manager Constraint propagator Clients Validation Caller/callee look-up Deferred constraints Propagator worklist Constraint graph 27
Validation Piggy-back validation on garbage collection For each pointer, check consistency with analysis results Incorrect analysis would lead to tricky bugs in clients 28
Related work Andersen s analysis for static Java [RountevMilanovaRyder 01] [LiangPenningsHarrold 01] [WhaleyLam 02] [LhotakHendren 03] Weaker analyses with dynamic class loading DOIT [PechtchanskiSarkar 01] XTA [QianHendren 04] Ruf s escape analysis [BogdaSingh 01, King 03] Demand-driven / incremental analysis 29
Conclusions 1 st non-trivial pointer analysis for all of Java Identified and solved the challenges: 1. Online call graph building 2. Focused re-propagation 3. Managing unresolved types 4. Reflection and native code Evaluated efficiency Validated correctness 30