Java – jmh means M1 is faster than m2, but M1 stands for M2
I wrote a jmh benchmark involving two methods: M1 and M2 M1 calls m2, but jmh claims that M1 is faster than m2 for some reason
The following is the benchmark source code:
import java.util.concurrent.TimeUnit; import static org.bitbucket.cowwoc.requirements.Requirements.assertThat; import static org.bitbucket.cowwoc.requirements.Requirements.requireThat; import org.openjdk.jmh.annotations.Benchmark; import org.openjdk.jmh.annotations.BenchmarkMode; import org.openjdk.jmh.annotations.Mode; import org.openjdk.jmh.annotations.OutputTimeUnit; import org.openjdk.jmh.runner.Runner; import org.openjdk.jmh.runner.RunnerException; import org.openjdk.jmh.runner.options.Options; import org.openjdk.jmh.runner.options.OptionsBuilder; @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) public class MyBenchmark { @Benchmark public void assertMethod() { assertThat("value","name").isNotNull().isNotEmpty(); } @Benchmark public void requireMethod() { requireThat("value","name").isNotNull().isNotEmpty(); } public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder() .include(MyBenchmark.class.getSimpleName()) .forks(1) .build(); new Runner(opt).run(); } }
In the above example, M1 is assertthat(), and M2 is requirethat() It means that assertthat() calls requirethat()
This is the reference output:
# JMH 1.13 (released 8 days ago) # VM version: JDK 1.8.0_102,VM 25.102-b14 # VM invoker: C:\Program Files\Java\jdk1.8.0_102\jre\bin\java.exe # VM options: -ea # Warmup: 20 iterations,1 s each # Measurement: 20 iterations,1 s each # Timeout: 10 min per iteration # Threads: 1 thread,will synchronize iterations # Benchmark mode: Average time,time/op # Benchmark: com.mycompany.jmh.MyBenchmark.assertMethod # Run progress: 0.00% complete,ETA 00:01:20 # Fork: 1 of 1 # Warmup Iteration 1: 8.268 ns/op # Warmup Iteration 2: 6.082 ns/op # Warmup Iteration 3: 4.846 ns/op # Warmup Iteration 4: 4.854 ns/op # Warmup Iteration 5: 4.834 ns/op # Warmup Iteration 6: 4.831 ns/op # Warmup Iteration 7: 4.815 ns/op # Warmup Iteration 8: 4.839 ns/op # Warmup Iteration 9: 4.825 ns/op # Warmup Iteration 10: 4.812 ns/op # Warmup Iteration 11: 4.806 ns/op # Warmup Iteration 12: 4.805 ns/op # Warmup Iteration 13: 4.802 ns/op # Warmup Iteration 14: 4.813 ns/op # Warmup Iteration 15: 4.805 ns/op # Warmup Iteration 16: 4.818 ns/op # Warmup Iteration 17: 4.815 ns/op # Warmup Iteration 18: 4.817 ns/op # Warmup Iteration 19: 4.812 ns/op # Warmup Iteration 20: 4.810 ns/op Iteration 1: 4.805 ns/op Iteration 2: 4.816 ns/op Iteration 3: 4.813 ns/op Iteration 4: 4.938 ns/op Iteration 5: 5.061 ns/op Iteration 6: 5.129 ns/op Iteration 7: 4.828 ns/op Iteration 8: 4.837 ns/op Iteration 9: 4.819 ns/op Iteration 10: 4.815 ns/op Iteration 11: 4.872 ns/op Iteration 12: 4.806 ns/op Iteration 13: 4.811 ns/op Iteration 14: 4.827 ns/op Iteration 15: 4.837 ns/op Iteration 16: 4.842 ns/op Iteration 17: 4.812 ns/op Iteration 18: 4.809 ns/op Iteration 19: 4.806 ns/op Iteration 20: 4.815 ns/op Result "assertMethod": 4.855 �(99.9%) 0.077 ns/op [Average] (min,avg,max) = (4.805,4.855,5.129),stdev = 0.088 CI (99.9%): [4.778,4.932] (assumes normal distribution) # JMH 1.13 (released 8 days ago) # VM version: JDK 1.8.0_102,time/op # Benchmark: com.mycompany.jmh.MyBenchmark.requireMethod # Run progress: 50.00% complete,ETA 00:00:40 # Fork: 1 of 1 # Warmup Iteration 1: 7.193 ns/op # Warmup Iteration 2: 4.835 ns/op # Warmup Iteration 3: 5.039 ns/op # Warmup Iteration 4: 5.053 ns/op # Warmup Iteration 5: 5.077 ns/op # Warmup Iteration 6: 5.102 ns/op # Warmup Iteration 7: 5.088 ns/op # Warmup Iteration 8: 5.109 ns/op # Warmup Iteration 9: 5.096 ns/op # Warmup Iteration 10: 5.096 ns/op # Warmup Iteration 11: 5.091 ns/op # Warmup Iteration 12: 5.089 ns/op # Warmup Iteration 13: 5.099 ns/op # Warmup Iteration 14: 5.097 ns/op # Warmup Iteration 15: 5.090 ns/op # Warmup Iteration 16: 5.096 ns/op # Warmup Iteration 17: 5.088 ns/op # Warmup Iteration 18: 5.086 ns/op # Warmup Iteration 19: 5.087 ns/op # Warmup Iteration 20: 5.097 ns/op Iteration 1: 5.097 ns/op Iteration 2: 5.088 ns/op Iteration 3: 5.092 ns/op Iteration 4: 5.097 ns/op Iteration 5: 5.082 ns/op Iteration 6: 5.089 ns/op Iteration 7: 5.086 ns/op Iteration 8: 5.084 ns/op Iteration 9: 5.090 ns/op Iteration 10: 5.086 ns/op Iteration 11: 5.084 ns/op Iteration 12: 5.088 ns/op Iteration 13: 5.091 ns/op Iteration 14: 5.092 ns/op Iteration 15: 5.085 ns/op Iteration 16: 5.096 ns/op Iteration 17: 5.078 ns/op Iteration 18: 5.125 ns/op Iteration 19: 5.089 ns/op Iteration 20: 5.091 ns/op Result "requireMethod": 5.091 �(99.9%) 0.008 ns/op [Average] (min,max) = (5.078,5.091,5.125),stdev = 0.010 CI (99.9%): [5.082,5.099] (assumes normal distribution) # Run complete. Total time: 00:01:21 Benchmark Mode Cnt score Error Units MyBenchmark.assertMethod avgt 20 4.855 � 0.077 ns/op MyBenchmark.requireMethod avgt 20 5.091 � 0.008 ns/op
Reproduce locally:
>Create a maven project that contains the above benchmarks. > Add the following dependencies:
<dependency> <groupId>org.bitbucket.cowwoc</groupId> <artifactId>requirements</artifactId> <version>2.0.0</version> </dependency>
>Or, from https://bitbucket.org/cowwoc/requirements/ Download library
I have the following questions:
Can you reproduce this result? > What, if anything, is the benchmark wrong?
Update: I released updated benchmark source code, benchmark output, jmh test output and xperfasm output to https://bitbucket.org/cowwoc/requirements/downloads Every Aleksey shipilev's suggestion Due to the 30K character limit of the problem, I can't publish it to stackoverflow
Update 2: I finally got consistent and meaningful results
Benchmark Mode Cnt score Error Units MyBenchmark.assertMethod avgt 60 22.552 ± 0.020 ns/op MyBenchmark.requireMethod avgt 60 22.411 ± 0.114 ns/op
By consistency, I mean I get almost the same value in the run
This means that assertmethod() is slower than requiremethod()
I made the following changes:
>Lock CPU clock (in Windows power option, min / max CPU is set to 99%) > Add JVM option - XX: - tieredcompilation - XX: - profileinterpreter
Can anyone achieve these results without doubling the running time?
Update3: disabling inlining will produce the same results without significant performance degradation I sent a more detailed answer here
Solution
In this special case, because of the registration allocation problem, assertmethod does compile better than requiremethod
The benchmark looks right. I can reproduce your results all the time In order to analyze my problems, the simplified benchmark:
package bench; import com.google.common.collect.ImmutableMap; import org.openjdk.jmh.annotations.*; @State(Scope.Benchmark) public class Requirements { private static boolean enabled = true; private String name = "name"; private String value = "value"; @Benchmark public Object assertMethod() { if (enabled) return requireThat(value,name); return null; } @Benchmark public Object requireMethod() { return requireThat(value,name); } public static Object requireThat(String parameter,String name) { if (name.trim().isEmpty()) throw new IllegalArgumentException(); return new StringRequirementsImpl(parameter,name,new Configuration()); } static class Configuration { private Object context = ImmutableMap.of(); } static class StringRequirementsImpl { private String parameter; private String name; private Configuration config; private ObjectRequirementsImpl asObject; StringRequirementsImpl(String parameter,String name,Configuration config) { this.parameter = parameter; this.name = name; this.config = config; this.asObject = new ObjectRequirementsImpl(parameter,config); } } static class ObjectRequirementsImpl { private Object parameter; private String name; private Configuration config; ObjectRequirementsImpl(Object parameter,Configuration config) { this.parameter = parameter; this.name = name; this.config = config; } } }
First of all, I have verified that the whole benchmark is a big method through - XX: printinlining Obviously, this compilation unit has many nodes and there are not enough CPU registers to hold all intermediate variables That is, the compiler needs to spin some of them
>In assertmethod 4, registers overflow into the stack before calling trim(). > After requiremethod 7 registers, it will be overflowed after calling the new configuration()
-20: Printassembly output:
assertMethod | requireMethod -------------------------|------------------------ mov %r11d,0x5c(%rsp) | mov %rcx,0x20(%rsp) mov %r10d,0x58(%rsp) | mov %r11,0x48(%rsp) mov %rbp,0x50(%rsp) | mov %r10,0x30(%rsp) mov %rbx,0x48(%rsp) | mov %rbp,0x50(%rsp) | mov %r9d,0x58(%rsp) | mov %edi,0x5c(%rsp) | mov %r8,0x60(%rsp)
Apart from if (enable) checking, this is almost the only difference between the two compilation methods Therefore, the performance difference is explained by more variables overflowing into memory
Why are smaller methods not compiled well? Then the registration allocation problem is NP complete Since the problem cannot be solved ideally in a reasonable time, compilers usually rely on some heuristics In a large method, something like an extra small thing may significantly change the result of the register allocation algorithm
But you don't have to worry The effect we see does not mean that requiremethod is always compiled worse In other use cases, the compiled diagram will be completely different due to inlining In any case, a nanosecond difference has no impact on real application performance