Java method call performance

I have this code to do range minimum query When t = 100000, I and j always change in each input line, and its execution time in Java 8u60 is about 12 seconds

for (int a0 = 0; a0 < t; a0++) {
    String line = reader.readLine();
    String[] ls = line.split(" ");
    int i = Integer.parseInt(ls[0]);
    int j = Integer.parseInt(ls[1]);
    int min = width[i];
    for (int k = i + 1; k <= j; k++) {
        if (min > width[k]) {
            min = width[k];
        }
    }
    writer.write(min + "");
    writer.newLine();
}

When I extract a new method to find the minimum value, the execution time is 4 times faster (about 2.5 seconds)

for (int a0 = 0; a0 < t; a0++) {
        String line = reader.readLine();
        String[] ls = line.split(" ");
        int i = Integer.parseInt(ls[0]);
        int j = Integer.parseInt(ls[1]);
        int min = getMin(i,j);
        writer.write(min + "");
        writer.newLine();
    }

private int getMin(int i,int j) {
    int min = width[i];
    for (int k = i + 1; k <= j; k++) {
        if (min > width[k]) {
            min = width[k];
        }
    }
    return min;
}

I always thought method calls were slow But this example shows the opposite Java 6 also demonstrates this, but in both cases (17 seconds and 10 seconds), the execution time is much slower Can anyone provide some insight?

Solution

I've tried to reproduce the problem with fewer test cases There are no I / O or string operations involved, only two nested loops with array access

public class NestedLoop {
    private static final int ARRAY_SIZE = 5000;
    private static final int ITERATIONS = 1000000;

    private int[] width = new java.util.Random(0).ints(ARRAY_SIZE).toArray();

    public long inline() {
        long sum = 0;

        for (int i = 0; i < ITERATIONS; i++) {
            int min = width[0];
            for (int k = 1; k < ARRAY_SIZE; k++) {
                if (min > width[k]) {
                    min = width[k];
                }
            }
            sum += min;
        }

        return sum;
    }

    public long methodCall() {
        long sum = 0;

        for (int i = 0; i < ITERATIONS; i++) {
            int min = getMin();
            sum += min;
        }

        return sum;
    }

    private int getMin() {
        int min = width[0];
        for (int k = 1; k < ARRAY_SIZE; k++) {
            if (min > width[k]) {
                min = width[k];
            }
        }
        return min;
    }

    public static void main(String[] args) {
        long startTime = System.nanoTime();
        long sum = new NestedLoop().inline();  // or .methodCall();
        long endTime = System.nanoTime();

        long ms = (endTime - startTime) / 1000000;
        System.out.println("sum = " + sum + ",time = " + ms + " ms");
    }
}

The inline variant is indeed 3-4 times slower than methodcall

I have used the following JVM options to confirm that the two benchmarks are compiled at the highest level, and OSR (on stack replacement) will occur successfully in both cases

-XX:-TieredCompilation
-XX:CompileOnly=NestedLoop
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintCompilation
-XX:+TraceNMethodInstalls

'inline' compilation log:

251   46 %           NestedLoop::inline @ 21 (70 bytes)
Installing osr method (4) NestedLoop.inline()J @ 21

'methodcall 'compilation log:

271   46             NestedLoop::getMin (41 bytes)
Installing method (4) NestedLoop.getMin()I 
    274   47 %           NestedLoop::getMin @ 9 (41 bytes)
Installing osr method (4) NestedLoop.getMin()I @ 9
    314   48 %           NestedLoop::methodCall @ 4 (30 bytes)
Installing osr method (4) NestedLoop.methodCall()J @ 4

This means that the JIT performs its tasks, but the generated code must be different Let's analyze - XX: printassembly

'inline' disassembly (hottest segment)

0x0000000002df4dd0: inc    %ebp               ; OopMap{r11=Derived_oop_rbx rbx=Oop off=114}
                                              ;*goto
                                              ; - NestedLoop::inline@53 (line 12)

0x0000000002df4dd2: test   %eax,-0x1d64dd8(%rip)        # 0x0000000001090000
                                              ;*iload
                                              ; - NestedLoop::inline@21 (line 12)
                                              ;   {poll}
0x0000000002df4dd8: cmp    $0x1388,%ebp
0x0000000002df4dde: jge    0x0000000002df4dfd  ;*if_icmpge
                                              ; - NestedLoop::inline@26 (line 12)

0x0000000002df4de0: test   %rbx,%rbx
0x0000000002df4de3: je     0x0000000002df4e4c
0x0000000002df4de5: mov    (%r11),%r10d       ;*getfield width
                                              ; - NestedLoop::inline@32 (line 13)

0x0000000002df4de8: mov    0xc(%r10),%r9d     ; implicit exception
0x0000000002df4dec: cmp    %r9d,%ebp
0x0000000002df4def: jae    0x0000000002df4e59
0x0000000002df4df1: mov    0x10(%r10,%rbp,4),%r8d  ;*iaload
                                              ; - NestedLoop::inline@37 (line 13)

0x0000000002df4df6: cmp    %r8d,%r13d
0x0000000002df4df9: jg     0x0000000002df4dc6  ;*if_icmple
                                              ; - NestedLoop::inline@38 (line 13)

0x0000000002df4dfb: jmp    0x0000000002df4dd0

'methodcall 'disassembly (also the hottest part)

0x0000000002da2af0: add    $0x8,%edx          ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2af3: cmp    $0x1381,%edx
0x0000000002da2af9: jge    0x0000000002da2b70  ;*iload_1
                                              ; - NestedLoop::getMin@16 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2afb: mov    0x10(%r9,%rdx,%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b00: cmp    %r11d,%ecx
0x0000000002da2b03: jg     0x0000000002da2b6b  ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b05: mov    0x14(%r9,%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b0a: cmp    %r11d,%ecx
0x0000000002da2b0d: jg     0x0000000002da2b5c  ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b0f: mov    0x18(%r9,%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b14: cmp    %r11d,%ecx
0x0000000002da2b17: jg     0x0000000002da2b4d  ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b19: mov    0x1c(%r9,%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b1e: cmp    %r11d,%ecx
0x0000000002da2b21: jg     0x0000000002da2b66  ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b23: mov    0x20(%r9,%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b28: cmp    %r11d,%ecx
0x0000000002da2b2b: jg     0x0000000002da2b61  ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b2d: mov    0x24(%r9,%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b32: cmp    %r11d,%ecx
0x0000000002da2b35: jg     0x0000000002da2b52  ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b37: mov    0x28(%r9,%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b3c: cmp    %r11d,%ecx
0x0000000002da2b3f: jg     0x0000000002da2b57  ;*iinc
                                              ; - NestedLoop::getMin@33 (line 36)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b41: mov    0x2c(%r9,%r11d  ;*iaload
                                              ; - NestedLoop::getMin@22 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b46: cmp    %r11d,%ecx
0x0000000002da2b49: jg     0x0000000002da2ae6  ;*if_icmple
                                              ; - NestedLoop::getMin@23 (line 37)
                                              ; - NestedLoop::methodCall@11 (line 27)

0x0000000002da2b4b: jmp    0x0000000002da2af0

The compiled code is completely different; Methodcall is optimized better

The loop has 8 iterations; > There is no array boundary check; > The width field is cached in a register

In contrast, inline variants

>No cyclic expansion; Load width array from memory every time; > Perform an array boundary check for each iteration

OSR compilation methods are not always well optimized because they must maintain the state of interpreting stack frames at the transition point This is the same problem as another example

Stack substitution usually occurs at the backward branch (i.e., the bottom of the loop) The inline method has two nested loops. OSR occurs within the inner loop, while methodcall has only one outer loop OSR conversion in external loops is more advantageous because the JIT compiler has more freedom to optimize internal loops This is exactly what happened to you

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>