If the variable modified by the thread is marked volatile, why false sharing problem
I've been reading Martin Thompson's article This is an explanation of false sharing
http://mechanical-sympathy.blogspot.co.uk/2011/07/false-sharing.html
public final class FalseSharing implements Runnable { public final static int NUM_THREADS = 4; // change public final static long ITERATIONS = 500L * 1000L * 1000L; private final int arrayIndex; private static VolatileLong[] longs = new VolatileLong[NUM_THREADS]; static { for (int i = 0; i < longs.length; i++) { longs[i] = new VolatileLong(); } } public FalseSharing(final int arrayIndex) { this.arrayIndex = arrayIndex; } public static void main(final String[] args) throws Exception { final long start = System.nanoTime(); runtest(); System.out.println("duration = " + (System.nanoTime() -start)); } private static void runtest() throws InterruptedException { Thread[] threads = new Thread[NUM_THREADS]; for (int i = 0; i < threads.length; i++) { threads[i] = new Thread(new FalseSharing(i)); } for (Thread t : threads) { t.start(); } for (Thread t : threads) { t.join(); } } public void run() { long i = ITERATIONS + 1; while (0 != --i) { longs[arrayIndex].value = i; } } public final static class VolatileLong { public volatile long value = 0L; public long p1,p2,p3,p4,p5,p6; // comment out } }
This example demonstrates the deceleration experienced by multiple threads invalidating each other's cache rows, even if each thread updates only one variable
My question is as follows If all variables to be updated are volatile, why does this padding lead to performance improvement? My understanding is that volatile variables are always written to and read from main memory Therefore, I assume that every write and read to any variable in this example will result in a refresh of the current core cache row
So according to my understanding If thread 1 invalidates the cache line of thread 2, it will not be a substitute for thread 2 until it reads the value from its own cache line The value it reads is a volatile value, so this effectively dirty the cache, resulting in reading from main memory
What went wrong with my understanding?
thank you
Solution
So here are two things:
>We are dealing with a series of volatilelong objects, and each thread works on its own volatilelong (see private final int arrayindex). > Each volatilelong object has a volatile field
Volatile access means that the thread must invalidate the cache "line" that holds its volatile variable length value and need to lock the cache line to update it As mentioned in the article, cache lines are usually about 64 bytes
The article says that by adding padding to the volatilelong object, it moves the object locked by each thread to a different cache line Therefore, even if different threads still cross the memory barrier when allocating their volatile long values, they are in different cache lines, so they will not lead to excessive L2 cache bandwidth
In short, the reason for the performance improvement is that even if the thread still locks its cache row to update the volatile field, these locks are now located on different memory blocks, so they will not conflict with the locks of other threads and cause cache invalidation