Monitor various indicators of thread pool in real time through micrometer
Monitor various indicators of thread pool in real time through micrometer
premise
A recent project involves file upload and download. The thread pool ThreadPoolExecutor of JUC is used. In the production environment, the thread pool runs at full load at some time. Due to the use of callerrunspolicy rejection policy, the application interface call cannot respond and is in a suspended state under full load. Considering that the monitoring system has been built with micrometer + Prometheus + grafana before, it is considered to use micrometer for an active thread pool measurement data collection, which can finally be displayed in grafana's panel in relatively real time.
Practice process
The following is a simulation example through the real combat process for replay.
Code transformation
First, we need to sort out the mapping relationship between the measurement data items provided in ThreadPoolExecutor and the tag corresponding to micrometer:
Then write specific code, and the functions are as follows:
Since these statistical values will fluctuate with time, gauge type meter can be considered for recording.
// ThreadPoolMonitor
import io.micrometer.core.instrument.Metrics;
import io.micrometer.core.instrument.Tag;
import org.springframework.beans.factory.InitializingBean;
import org.springframework.stereotype.Service;
import java.util.Collections;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicInteger;
/**
* @author throwable
* @version v1.0
* @description
* @since 2019/4/7 21:02
*/
@Service
public class ThreadPoolMonitor implements InitializingBean {
private static final String EXECUTOR_NAME = "ThreadPoolMonitorSample";
private static final Iterable<Tag> TAG = Collections.singletonList(Tag.of("thread.pool.name",EXECUTOR_NAME));
private final scheduledexecutorservice scheduledExecutor = Executors.newSingleThreadScheduledExecutor();
private final ThreadPoolExecutor executor = new ThreadPoolExecutor(10,10,TimeUnit.SECONDS,new ArrayBlockingQueue<>(10),new ThreadFactory() {
private final AtomicInteger counter = new AtomicInteger();
@Override
public Thread newThread(Runnable r) {
Thread thread = new Thread(r);
thread.setDaemon(true);
thread.setName("thread-pool-" + counter.getAndIncrement());
return thread;
}
},new ThreadPoolExecutor.AbortPolicy());
private Runnable monitor = () -> {
//这里需要捕获异常,尽管实际上不会产生异常,但是必须预防异常导致调度线程池线程失效的问题
try {
Metrics.gauge("thread.pool.core.size",TAG,executor,ThreadPoolExecutor::getCorePoolSize);
Metrics.gauge("thread.pool.largest.size",ThreadPoolExecutor::getLargestPoolSize);
Metrics.gauge("thread.pool.max.size",ThreadPoolExecutor::getMaximumPoolSize);
Metrics.gauge("thread.pool.active.size",ThreadPoolExecutor::getActiveCount);
Metrics.gauge("thread.pool.thread.count",ThreadPoolExecutor::getPoolSize);
// 注意如果阻塞队列使用无界队列这里不能直接取size
Metrics.gauge("thread.pool.queue.size",e -> e.getQueue().size());
} catch (Exception e) {
//ignore
}
};
@Override
public void afterPropertiesSet() throws Exception {
// 每5秒执行一次
scheduledExecutor.scheduleWithFixedDelay(monitor,5,TimeUnit.SECONDS);
}
public void shortTimeWork() {
executor.execute(() -> {
try {
// 5秒
Thread.sleep(5000);
} catch (InterruptedException e) {
//ignore
}
});
}
public void longTimeWork() {
executor.execute(() -> {
try {
// 500秒
Thread.sleep(5000 * 100);
} catch (InterruptedException e) {
//ignore
}
});
}
public void clearTaskQueue() {
executor.getQueue().clear();
}
}
//ThreadPoolMonitorController
import club.throwable.smp.service.ThreadPoolMonitor;
import lombok.requiredArgsConstructor;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
/**
* @author throwable
* @version v1.0
* @description
* @since 2019/4/7 21:20
*/
@requiredArgsConstructor
@RestController
public class ThreadPoolMonitorController {
private final ThreadPoolMonitor threadPoolMonitor;
@GetMapping(value = "/shortTimeWork")
public ResponseEntity<String> shortTimeWork() {
threadPoolMonitor.shortTimeWork();
return ResponseEntity.ok("success");
}
@GetMapping(value = "/longTimeWork")
public ResponseEntity<String> longTimeWork() {
threadPoolMonitor.longTimeWork();
return ResponseEntity.ok("success");
}
@GetMapping(value = "/clearTaskQueue")
public ResponseEntity<String> clearTaskQueue() {
threadPoolMonitor.clearTaskQueue();
return ResponseEntity.ok("success");
}
}
The configuration is as follows:
server:
port: 9091
management:
server:
port: 9091
endpoints:
web:
exposure:
include: '*'
base-path: /management
The scheduling job frequency of Prometheus can also be adjusted appropriately. Here, the default is to pull the / Prometheus endpoint once every 15 seconds, that is, the data of 3 collection cycles will be submitted each time. After the project is started, you can try to call / management / Prometheus to view the data submitted by the endpoint:
Because threadpoolmonitorsample is a custom named tag, seeing the relevant words indicates that data collection is normal. If the job of Prometheus is not configured incorrectly, check the background of Prometheus after the local spring boot project is up:
OK, perfect, you can go to the next step.
Grafana panel configuration
To ensure that the JVM application and the scheduling job of Prometheus are normal, the next important step is to configure the grafana panel. If you don't want to study Prometheus' PSQL carefully for the time being, you can directly search the corresponding sample expression from the / graph panel in the background of Prometheus and copy it into the grafana configuration. Of course, you'd better go to the Prometheus documentation system to learn how to write PSQL.
The query configuration is as follows:
Final effect
Call several interfaces provided in the example several times to get a chart for monitoring the presentation of thread pool:
Summary
Monitoring the data of the ThreadPoolExecutor of the thread pool is helpful to find the exceptions of the interface using the thread pool in time. If you want to recover quickly, the most effective way is to clear the backlog of tasks in the task queue in the thread pool. The specific approach is: you can delegate the ThreadPoolExecutor to the IOC container for management, and expose the method of clearing the task queue of ThreadPoolExecutor as a rest endpoint. Monitoring of HTTP client connection pools, such as Apache HTTP client or okhttp, can be implemented in a similar way. During data collection, there may be a small amount of performance loss due to locking and other reasons, but these can be ignored. If you are really afraid of performance impact, you can try to directly obtain the attribute value inside the ThreadPoolExecutor instance using the reflection API, In this way, the performance loss of locking can be avoided.
Original link of personal blog: http://www.throwable.club/2019/04/14/jvm-micrometer-thread-pool-monitor
(end of this article c-2-d 20190414)
The official account of Technology (Throwable Digest), which is not regularly pushed to the original technical article (never copied or copied):
Entertainment official account ("sand sculpture"), select interesting sand sculptures, videos and videos, push them to relieve life and work stress.