In depth understanding of loop expansion and coarse locking in compilation optimization

brief introduction

When talking about JIT, I mentioned two optimization loop expansion and coarse locking in the compilation process. Today, we and my younger martial sister will verify these two compilation optimization methods from the perspective of assembly. Let's have a look.

Loop expansion and coarsening lock

Younger martial sister: elder martial brother F, last time you mentioned that some optimizations above compilation will be carried out during JIT compilation, including loop expansion and coarsening lock. I am very interested in these two optimization methods. Can you explain them?

Of course, let's review what loop unrolling is.

More highlights:

Loop unrolling is an example of loop traversal as follows:

for (int i = 0; i < 1000; i++) {
                x += 0x51;
        }

Because jump operation is required for each cycle, in order to improve efficiency, the above code can be optimized as follows:

for (int i = 0; i < 250; i++) {
                x += 0x144; //0x51 * 4
        }

Note that we use hexadecimal numbers above. Why do we use hexadecimal numbers? This is to facilitate us to quickly find them in the later assembly code.

OK, let's add a layer of synchronized lock outside x + = 0x51 to see if the synchronized lock will be coarsened with loop unrolling.

for (int i = 0; i < 1000; i++) {
            synchronized (this) {
                x += 0x51;
            }
 }

Everything is ready. We only owe our running code. Here we still use jmh to execute.

Relevant codes are as follows:

@Warmup(iterations = 10,time = 1,timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5,timeUnit = TimeUnit.SECONDS)
@Fork(value = 1,jvmArgsPrepend = {
        "-XX:-UseBiasedLocking","-XX:CompileCommand=print,com.flydean.LockOptimization::test"
}
        )
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class LockOptimization {

    int x;
    @Benchmark
    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public void test() {
        for (int i = 0; i < 1000; i++) {
            synchronized (this) {
                x += 0x51;
            }
        }
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(LockOptimization.class.getSimpleName())
                .build();
        new Runner(opt).run();
    }
}

In the above code, we cancel the use of bias lock: - XX: - usebiasedlocking. Why cancel this option? Because in the case of biased lock, if the thread obtains the lock and no other thread accesses the lock during subsequent execution, the thread holding the biased lock does not need to trigger synchronization.

In order to better understand the synchronized process, we will disable the bias lock here.

Others are the routine operations of jmh we talked about before.

Next is the moment to witness miracles.

Analyze assembly log

When we run the above program, we will get a series of output. Because this article does not explain the assembly language, this article only gives a general understanding of the use of assembly, and will not introduce the assembly language in detail. If you want to know more about assembly, you can leave a message at the end of the article.

Analyzing the output results of the assembly, we can see that the results are divided into C1 compiled nmethod and C2 compiled nmethod.

First look at the C1 compiled nmethod:

The first line is monitorenter, which indicates that the lock range is entered, followed by the number of lines of code for.

The last line is monitorexit, indicating the scope of the exit lock.

There is an add $0x51,% eax operation in the middle, which refers to the add operation in our code.

You can see that loop unrolling is not performed in C1 - compiled nmethod.

Let's look at the C2 compiled nmethod:

Similar to C1, the difference is that the value of add becomes 0x144, indicating loop unrolling, and the corresponding lock range is expanded.

Finally, let's look at the operation results:


Benchmark              Mode  Cnt     score     Error  Units
LockOptimization.test  avgt    5  5601.819 ± 620.017  ns/op

Good score.

Disable loop unrolling

Next, let's look at the results if loop unrolling is disabled.

To disable loop unrolling, just set - XX: loopunrolllimit = 1.

Let's run the above program again:

You can see that the number in C2 compiled nmethod has changed to the original 0x51, indicating that loop unrolling is not performed.

Let's look at the running results:

Benchmark              Mode  Cnt      score      Error  Units
LockOptimization.test  avgt    5  20846.709 ± 3292.522  ns/op

You can see that the running time is basically about 4 times that after optimization. It shows that loop unrolling is still very useful.

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>