Java – why was the last thread not interrupted?
I'm trying to demonstrate a "anytime algorithm" - an algorithm that can stop and return its current results at any time The demo algorithm only returns some mathematical functions of I, where I is increasing It will check whether it is interrupted, and if so, return the current value:
static int algorithm(int n) { int bestSoFar = 0; for (int i=0; i<n; ++i) { if (Thread.interrupted()) break; bestSoFar = (int)Math.pow(i,0.3); } return bestSoFar; }
In the main program, I use it like this:
Runnable task = () -> { Instant start = Instant.Now(); int bestSoFar = algorithm(1000000000); double durationInMillis = Duration.between(start,Instant.Now()).toMillis(); System.out.println("after "+durationInMillis+" ms,the result is "+bestSoFar); }; Thread t = new Thread(task); t.start(); Thread.sleep(1); t.interrupt(); t = new Thread(task); t.start(); Thread.sleep(10); t.interrupt(); t = new Thread(task); t.start(); Thread.sleep(100); t.interrupt(); t = new Thread(task); t.start(); Thread.sleep(1000); t.interrupt(); } }
When I run this program, I get the following input:
after 0.0 ms,the result is 7 after 10.0 ms,the result is 36 after 100.0 ms,the result is 85 after 21952.0 ms,the result is 501
That is, when I told them, the first three threads were indeed interrupted, but the last thread was not interrupted after 1 second - it continued to work for nearly 22 seconds Why is that?
Editor: I use future Get timed out to get similar results In this Code:
Instant start = Instant.Now(); ExecutorService executor = Executors.newCachedThreadPool(); Future<?> future = executor.submit(task); try { future.get(800,TimeUnit.MILLISECONDS); } catch (TimeoutException e) { future.cancel(true); double durationInMillis = Duration.between(start,Instant.Now()).toMillis(); System.out.println("Timeout after "+durationInMillis+" [ms]"); }
If the timeout is 800 at most, everything is normal, and the content similar to "timeout after 806.0 [MS] is printed However, if the timeout is 900, print "timeout after 5084.0 [MS]"
Editor 2: my computer has four cores The program runs on open JDK 8
Solution
I can confirm that this is a hotspot JVM error This is my preliminary analysis of the problem
@Adam Skywalker is absolutely right to think that this problem is related to the safety point elimination optimization in the hotspot hit compiler Although bug JDK - 8154302 looks similar, it is actually a different problem
What is the safety point problem
Safepoint is a JVM mechanism used to stop application threads to perform operations that require stop the world pause The safety point in hotspot is collaborative, that is, application threads regularly check whether they need to stop This check usually occurs at the method exit and in the internal loop
Of course, this test is not free Therefore, for performance reasons, the JVM attempts to eliminate redundant secure point polling One of the optimizations is to remove the safe point polling - form loop from the calculated loop
for (int i = 0; i < N; i++)
Or equivalent Here n is a loop invariant of type int
Usually these cycles run briefly, but in some cases they may take a long time, for example, when n = 2_ 000_ 000_ 000 hours The safe point operation requires that all Java threads (excluding those running native methods) be stopped That is, a single long-running counting loop may delay the entire safe point operation, while all other threads will wait for the loop to stop
This is exactly what happened in 07000 Please note that
int l = 0; while (true) { if (++l == 0) ... }
It's just another way to express the counting cycle of 232 iterations When thread When sleep returns from a native function and finds that a security point operation is requested, it stops and waits until the long-running counting cycle is completed This is the source of strange delays
There is a task to solve this problem - jdk-8186027 The idea is to divide a long cycle into two parts:
for (int i = 0; i < N; i += step) { for (int j = 0; j < step; j++) { // loop body } safepoint_poll(); }
It is not yet implemented, but the fix is for JDK 10 There is also a solution: the JVM flag - XX: usecountedloopsafepoints will also force the security point to check the inside of the counting loop
Thread. What's wrong with interrupted()
I'm pretty sure thread The sleep bug will be closed as a copy of the loop strip mining issue You can use the - XX: usecountedloopsafepoints option to verify that this error disappears
Unfortunately, this option does not help with the original problem I caught the moment when the algorithm in the original problem hung and looked at the code being executed under GDB:
loop_begin: 0x00002aaaabe903d0: mov %ecx,%r11d 0x00002aaaabe903d3: inc %r11d ; i++ 0x00002aaaabe903d6: cmp %ebp,%r11d ; if (i >= n) 0x00002aaaabe903d9: jge 0x2aaaabe90413 ; break; 0x00002aaaabe903db: mov %ecx,%r8d 0x00002aaaabe903de: mov %r11d,%ecx 0x00002aaaabe903e1: mov 0x1d0(%r15),%rsi ; rsi = Thread.current(); 0x00002aaaabe903e8: mov 0x1d0(%r15),%r10 ; r10 = Thread.current(); 0x00002aaaabe903ef: cmp %rsi,%r10 ; if (rsi != r10) 0x00002aaaabe903f2: jne 0x2aaaabe903b9 ; goto slow_path; 0x00002aaaabe903f4: mov 0x128(%r15),%r10 ; r10 = current_os_thread(); 0x00002aaaabe903fb: mov 0x14(%r10),%r11d ; isInterrupted = r10.interrupt_flag; 0x00002aaaabe903ff: test %r11d,%r11d ; if (!isInterrupted) 0x00002aaaabe90402: je 0x2aaaabe903d0 ; goto loop_begin
This is the way to compile loops in algorithmic methods There is no security point survey here, even if - XX: usecountedloopsafepoints is set
It seems that the security point check was incorrectly eliminated because thread The isinterrupted call should have checked the security point itself However, thread Isinterrupted is the intrinsic method of hotspot This means that there is no real native method call, but the JIT replaces the call to the thread with a series of machine instructions Isinterrupted call, there is no security point check
I will report this error to Oracle soon At the same time, the solution is to change the type of the loop counter from int to long If you rewrite the loop to
for (long i=0; i<n; ++i) { ...
There are no more strange delays