Java method call performance
I have this code to do range minimum query When t = 100000, I and j always change in each input line, and its execution time in Java 8u60 is about 12 seconds
for (int a0 = 0; a0 < t; a0++) { String line = reader.readLine(); String[] ls = line.split(" "); int i = Integer.parseInt(ls[0]); int j = Integer.parseInt(ls[1]); int min = width[i]; for (int k = i + 1; k <= j; k++) { if (min > width[k]) { min = width[k]; } } writer.write(min + ""); writer.newLine(); }
When I extract a new method to find the minimum value, the execution time is 4 times faster (about 2.5 seconds)
for (int a0 = 0; a0 < t; a0++) { String line = reader.readLine(); String[] ls = line.split(" "); int i = Integer.parseInt(ls[0]); int j = Integer.parseInt(ls[1]); int min = getMin(i,j); writer.write(min + ""); writer.newLine(); } private int getMin(int i,int j) { int min = width[i]; for (int k = i + 1; k <= j; k++) { if (min > width[k]) { min = width[k]; } } return min; }
I always thought method calls were slow But this example shows the opposite Java 6 also demonstrates this, but in both cases (17 seconds and 10 seconds), the execution time is much slower Can anyone provide some insight?
Solution
I've tried to reproduce the problem with fewer test cases There are no I / O or string operations involved, only two nested loops with array access
public class NestedLoop { private static final int ARRAY_SIZE = 5000; private static final int ITERATIONS = 1000000; private int[] width = new java.util.Random(0).ints(ARRAY_SIZE).toArray(); public long inline() { long sum = 0; for (int i = 0; i < ITERATIONS; i++) { int min = width[0]; for (int k = 1; k < ARRAY_SIZE; k++) { if (min > width[k]) { min = width[k]; } } sum += min; } return sum; } public long methodCall() { long sum = 0; for (int i = 0; i < ITERATIONS; i++) { int min = getMin(); sum += min; } return sum; } private int getMin() { int min = width[0]; for (int k = 1; k < ARRAY_SIZE; k++) { if (min > width[k]) { min = width[k]; } } return min; } public static void main(String[] args) { long startTime = System.nanoTime(); long sum = new NestedLoop().inline(); // or .methodCall(); long endTime = System.nanoTime(); long ms = (endTime - startTime) / 1000000; System.out.println("sum = " + sum + ",time = " + ms + " ms"); } }
The inline variant is indeed 3-4 times slower than methodcall
I have used the following JVM options to confirm that the two benchmarks are compiled at the highest level, and OSR (on stack replacement) will occur successfully in both cases
-XX:-TieredCompilation -XX:CompileOnly=NestedLoop -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+TraceNMethodInstalls
'inline' compilation log:
251 46 % NestedLoop::inline @ 21 (70 bytes) Installing osr method (4) NestedLoop.inline()J @ 21
'methodcall 'compilation log:
271 46 NestedLoop::getMin (41 bytes) Installing method (4) NestedLoop.getMin()I 274 47 % NestedLoop::getMin @ 9 (41 bytes) Installing osr method (4) NestedLoop.getMin()I @ 9 314 48 % NestedLoop::methodCall @ 4 (30 bytes) Installing osr method (4) NestedLoop.methodCall()J @ 4
This means that the JIT performs its tasks, but the generated code must be different Let's analyze - XX: printassembly
'inline' disassembly (hottest segment)
0x0000000002df4dd0: inc %ebp ; OopMap{r11=Derived_oop_rbx rbx=Oop off=114} ;*goto ; - NestedLoop::inline@53 (line 12) 0x0000000002df4dd2: test %eax,-0x1d64dd8(%rip) # 0x0000000001090000 ;*iload ; - NestedLoop::inline@21 (line 12) ; {poll} 0x0000000002df4dd8: cmp $0x1388,%ebp 0x0000000002df4dde: jge 0x0000000002df4dfd ;*if_icmpge ; - NestedLoop::inline@26 (line 12) 0x0000000002df4de0: test %rbx,%rbx 0x0000000002df4de3: je 0x0000000002df4e4c 0x0000000002df4de5: mov (%r11),%r10d ;*getfield width ; - NestedLoop::inline@32 (line 13) 0x0000000002df4de8: mov 0xc(%r10),%r9d ; implicit exception 0x0000000002df4dec: cmp %r9d,%ebp 0x0000000002df4def: jae 0x0000000002df4e59 0x0000000002df4df1: mov 0x10(%r10,%rbp,4),%r8d ;*iaload ; - NestedLoop::inline@37 (line 13) 0x0000000002df4df6: cmp %r8d,%r13d 0x0000000002df4df9: jg 0x0000000002df4dc6 ;*if_icmple ; - NestedLoop::inline@38 (line 13) 0x0000000002df4dfb: jmp 0x0000000002df4dd0
'methodcall 'disassembly (also the hottest part)
0x0000000002da2af0: add $0x8,%edx ;*iinc ; - NestedLoop::getMin@33 (line 36) ; - NestedLoop::methodCall@11 (line 27) 0x0000000002da2af3: cmp $0x1381,%edx 0x0000000002da2af9: jge 0x0000000002da2b70 ;*iload_1 ; - NestedLoop::getMin@16 (line 37) ; - NestedLoop::methodCall@11 (line 27) 0x0000000002da2afb: mov 0x10(%r9,%rdx,%r11d ;*iaload ; - NestedLoop::getMin@22 (line 37) ; - NestedLoop::methodCall@11 (line 27) 0x0000000002da2b00: cmp %r11d,%ecx 0x0000000002da2b03: jg 0x0000000002da2b6b ;*iinc ; - NestedLoop::getMin@33 (line 36) ; - NestedLoop::methodCall@11 (line 27) 0x0000000002da2b05: mov 0x14(%r9,%r11d ;*iaload ; - NestedLoop::getMin@22 (line 37) ; - NestedLoop::methodCall@11 (line 27) 0x0000000002da2b0a: cmp %r11d,%ecx 0x0000000002da2b0d: jg 0x0000000002da2b5c ;*iinc ; - NestedLoop::getMin@33 (line 36) ; - NestedLoop::methodCall@11 (line 27) 0x0000000002da2b0f: mov 0x18(%r9,%r11d ;*iaload ; - NestedLoop::getMin@22 (line 37) ; - NestedLoop::methodCall@11 (line 27) 0x0000000002da2b14: cmp %r11d,%ecx 0x0000000002da2b17: jg 0x0000000002da2b4d ;*iinc ; - NestedLoop::getMin@33 (line 36) ; - NestedLoop::methodCall@11 (line 27) 0x0000000002da2b19: mov 0x1c(%r9,%r11d ;*iaload ; - NestedLoop::getMin@22 (line 37) ; - NestedLoop::methodCall@11 (line 27) 0x0000000002da2b1e: cmp %r11d,%ecx 0x0000000002da2b21: jg 0x0000000002da2b66 ;*iinc ; - NestedLoop::getMin@33 (line 36) ; - NestedLoop::methodCall@11 (line 27) 0x0000000002da2b23: mov 0x20(%r9,%r11d ;*iaload ; - NestedLoop::getMin@22 (line 37) ; - NestedLoop::methodCall@11 (line 27) 0x0000000002da2b28: cmp %r11d,%ecx 0x0000000002da2b2b: jg 0x0000000002da2b61 ;*iinc ; - NestedLoop::getMin@33 (line 36) ; - NestedLoop::methodCall@11 (line 27) 0x0000000002da2b2d: mov 0x24(%r9,%r11d ;*iaload ; - NestedLoop::getMin@22 (line 37) ; - NestedLoop::methodCall@11 (line 27) 0x0000000002da2b32: cmp %r11d,%ecx 0x0000000002da2b35: jg 0x0000000002da2b52 ;*iinc ; - NestedLoop::getMin@33 (line 36) ; - NestedLoop::methodCall@11 (line 27) 0x0000000002da2b37: mov 0x28(%r9,%r11d ;*iaload ; - NestedLoop::getMin@22 (line 37) ; - NestedLoop::methodCall@11 (line 27) 0x0000000002da2b3c: cmp %r11d,%ecx 0x0000000002da2b3f: jg 0x0000000002da2b57 ;*iinc ; - NestedLoop::getMin@33 (line 36) ; - NestedLoop::methodCall@11 (line 27) 0x0000000002da2b41: mov 0x2c(%r9,%r11d ;*iaload ; - NestedLoop::getMin@22 (line 37) ; - NestedLoop::methodCall@11 (line 27) 0x0000000002da2b46: cmp %r11d,%ecx 0x0000000002da2b49: jg 0x0000000002da2ae6 ;*if_icmple ; - NestedLoop::getMin@23 (line 37) ; - NestedLoop::methodCall@11 (line 27) 0x0000000002da2b4b: jmp 0x0000000002da2af0
The compiled code is completely different; Methodcall is optimized better
The loop has 8 iterations; > There is no array boundary check; > The width field is cached in a register
In contrast, inline variants
>No cyclic expansion; Load width array from memory every time; > Perform an array boundary check for each iteration
OSR compilation methods are not always well optimized because they must maintain the state of interpreting stack frames at the transition point This is the same problem as another example
Stack substitution usually occurs at the backward branch (i.e., the bottom of the loop) The inline method has two nested loops. OSR occurs within the inner loop, while methodcall has only one outer loop OSR conversion in external loops is more advantageous because the JIT compiler has more freedom to optimize internal loops This is exactly what happened to you