How does the Java – NUMA architecture affect the performance of activepivot?
We are migrating the activepivot application to the new server (4 slots Intel Xeon, 512gb memory) After deployment, we started the application benchmark (a combination of large OLAP queries and real-time transaction concurrency) The measured performance is almost twice slower than our previous servers, with similar processors, but twice the kernel and twice the memory
We investigated the differences between the two servers, and the large NUMA architecture seems to be a non-uniform memory footprint Each CPU socket is physically close to 1 / 4 of the memory, but far away from the rest... The JVM running our application allocates a large global heap, and there is a random part of the heap on each NUMA node Our analysis is that the memory access pattern is very random, and the CPU kernel often wastes time accessing remote memory
We are looking at more feedback on NUMA servers using activepivot Can we configure activepivot cubes or thread pools, change our queries, and configure the operating system?
Solution
Peter introduced the general JVM options available today to reduce the performance impact of NUMA architecture To keep it short, the NUMA aware JVM partitions the heap relative to the NUMA node, and when a thread creates a new object, the object is allocated in the NUMA node of the core running the thread (if the same thread uses it later, the object will be in local memory) In addition, when compressing the heap, NUMA aware JVM can avoid moving large data blocks between nodes (and reduce the length of stop world events)
Therefore, the - XX: usenuma option should be enabled on any NUMA hardware and any Java application
But this is not very useful for activepivot: activepivot is an in - memory database There are real-time updates, but most of the data exists in the main memory of the application life cycle Regardless of JVM options, the data will be split between NUMA nodes, and the thread executing the query will randomly access memory We know that most activepivot query engines run as fast as memory, and the impact of NuMA is particularly obvious
So how do you get the most out of the activepivot solution on NUMA hardware?
We can find a simple solution when an activepivot application uses only part of its resources (we found that this is often the case when running multiple activepivot solutions on the same server) For example, the activepivot solution uses only 64 cores, 64 memory, and 256 terabytes In this case, you can limit the JVM process itself to NUMA nodes
On Linux, you can use the following options( http://linux.die.net/man/8/numactl )Start prefix for JVM:
numactl --cpunodebind=xxx
If the entire server is dedicated to an activepivot solution, the data can be partitioned using the activepivot distributed architecture If you have four NUMA nodes, you will start four JVMs to host four activepivot nodes, and each node is bound to its NUMA node Through this deployment, queries are distributed among nodes, and each node will perform its work share with maximum performance in the correct NUMA node