MethodHandlesArrayElementGetterBench.testCreate Analysis
Overview Benchmark : nom.indy.methodhandlesarrayelementgetterbench.testcreate Results with JDK8 (ops/us) JDK8 Intel 234 T8 T8 with -XX:FreqInlineSize=325 90 (Intel is 2.5x than this) 189 (Intel is 1.2x than this) -XX:FreqInlineSize=<size> : Integer specifying maximum number of bytecode instructions in a frequently executed method which gets inlined. Default value for Intel is 325 Default value for SPARC is 175 3
MethodHandlesArrayElementGetterBench.testCreate Benchmark source code @Benchmark public MethodHandle testcreate() { return MethodHandles.arrayElementGetter(int[].class); } /*Call the same function with the same argument multiple times */ MethodHandle arrayelementgetter(class<?> arrayclass) throws IllegalArgumentException { return MethodHandleImpl.makeArrayElementAccessor(arrayClass, false); } 4
MethodHandlesArrayElementGetterBench.testCreate Benchmark source code static MethodHandle makearrayelementaccessor(class<?> arrayclass, boolean issetter) { if (arrayclass == Object[].class) return (issetter? ArrayAccessor.OBJECT_ARRAY_SETTER : ArrayAccessor.OBJECT_ARRAY_GETTER); if (!arrayclass.isarray()) throw newillegalargumentexception("not an array: "+arrayclass); MethodHandle[] cache = ArrayAccessor.TYPED_ACCESSORS.get(arrayClass); int cacheindex = (issetter? ArrayAccessor.SETTER_INDEX : ArrayAccessor.GETTER_INDEX); MethodHandle mh = cache[cacheindex]; if (mh!= null) return mh; mh = ArrayAccessor.getAccessor(arrayClass, issetter); MethodType correcttype = ArrayAccessor.correctType(arrayClass, issetter); if (mh.type()!= correcttype) { assert(mh.type().parametertype(0) == Object[].class); assert((issetter? mh.type().parametertype(2) : mh.type().returntype()) == Object.class); assert(issetter correcttype.parametertype(0).getcomponenttype() == correcttype.returntype()); // safe to view non-strictly, because element type follows from array type mh = mh.viewastype(correcttype, false); } mh = makeintrinsic(mh, (issetter? Intrinsic.ARRAY_STORE : Intrinsic.ARRAY_LOAD)); // Atomically update accessor cache. synchronized(cache) { if (cache[cacheindex] == null) { cache[cacheindex] = mh; } else { // Throw away newly constructed accessor and use cached version. mh = cache[cacheindex]; } } } return mh; http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40- b25/java/lang/invoke/methodhandleimpl.java#methodhandleimpl.makearrayelementaccessor%28java.lang.class%2cboolean%29 5
Intel disassembly (JDK8) Inline optimization Before inline optimization 6
T8 disassembly (JDK8) After forcing Inline optimization Does not do inline optimization, invokes call and ret. The method being called is the hotspot and not optimized. It has redundant branch instructions. 7
Analysis Inline method provides more opportunity to optimize. Inline method saves call, ret and stack allocate, also uses less branches -XX:FreqInlineSize controls maximum inline function size, default value for x86 is 325, for SPARC is 175. The hotspot method in this case is 285 bytes, so x86 used inline optimization, SPARC did not Adding -XX:FreqInlineSize=325 on T8 doubled the performance
SPARC -XX:FreqInlineSize=325 VS XX:FreqInlineSize=175 Run on java 1.8.0_77 -XX:FreqInlineSize=325 # of benchmark Percentage Improvement Performance gain >=0 859 73.2% Performance gain > 10% 153 13.0% Performance loss 314 26.7% Performance loss >10% 59 5.0% All Benchmark Performance Gain Chart Performance gain -0.5--0.4 1-0.4--0.3 5-0.3--0.2 10-0.2--0.1 43-0.1-0 255 0-0.1 706 0.1-0.2 49 0.2-0.3 21 0.3-0.4 20 0.4-0.5 13 0.5-0.6 10 0.6-0.7 6 0.7-0.8 20 0.8-0.9 6 0.9-1 3 1-1.1 1 1.1-1.2 1 1.4-1.5 1 1.5-1.6 1 1.9-2 1 Grand Total 1173 # of benchmark
Intel -XX:FreqInlineSize=175 VS XX:FreqInlineSize=325 Run on java 1.8.0_77 -XX:FreqInlineSize=325 # of benchmark Percentage Performance gain >=0 734 62.6% Performance gain > 10% 55 4.7% Performance loss 439 37.4% Performance loss >10% 39 3.3% All Benchmark Performance Gain Chart Performance gain # of benchmark -0.6--0.5 3-0.4--0.3 2-0.3--0.2 10-0.2--0.1 24-0.1-0 400 0-0.1 679 0.1-0.2 28 0.2-0.3 11 0.3-0.4 5 0.4-0.5 1 0.5-0.6 2 0.6-0.7 1 0.8-0.9 1 1-1.1 1 1.6-1.7 2 1.7-1.8 2 2.8-2.9 1 Grand Total 1173
Performance Gain for Each Benchmark on SPARC X axis: benchmark ID Y axis: performance gain Performance Gain for Each Benchmark on x86
Comparison Analysis X86: With -XX:FreqInlineSize=325, 62.6% of benchmarks performance improved, 37.4% of benchmarks lose performance SPARC: With-XX:FreqInlineSize=325, 73.2% of benchmarks performance improved on SPARC, 26.7% of benchmarks lose performance
Conclusion Should -XX:FreqInlineSize=325 be made default JVM configuration for SPARC?