Java – thread safe sorting linked list
I'm trying to write a thread - safe sorted single - linked list I wrote two versions: coarse - grained synchronization and fine - grained synchronization Here are two implementations:
Fine grained:
public void add(T t) { Node curr = head; curr.lock.lock(); while (curr.next != null) { // Invariant: curr is locked // Invariant: curr.data < t curr.next.lock.lock(); if (t.compareTo(curr.next.data) <= 0) { break; } Node tmp = curr.next; curr.lock.unlock(); curr = tmp; } // curr is acquired curr.next = new Node(curr.next,t); if (curr.next.next != null) { // old curr's next is acquired curr.next.next.lock.unlock(); } curr.lock.unlock(); }
Coarse grain size:
public void add(T t) { lock.lock(); Node curr = head; while (curr.next != null) { if (t.compareTo(curr.next.data) <= 0) { break; } curr = curr.next; } curr.next = new Node(curr.next,t); lock.unlock(); }
I regularly insert 20000 integers into two versions of four threads (on four logical CPU cores) The time of each thread shows the CPU time (that is, it does not include the waiting time)
Fine grained: Worked 1 spent 1080 ms Worked 2 spent 1230 ms Worked 0 spent 1250 ms Worked 3 spent 1260 ms wall time: 1620 ms Coarse grained: Worked 1 spent 190 ms Worked 2 spent 270 ms Worked 3 spent 410 ms Worked 0 spent 280 ms wall time: 1298 ms
My first thought was Lock() and Unlock () is a problem, but I analyzed the implementation and they only consumed 30% of the time together My second guess is that the fine-grained solution has more cache misses, but I doubt it, because a single linked list is different from an array, which is inherently prone to cache misses
Do you know why I didn't get the expected parallelization?
Solution
Yes, this may be due to a cache miss Cache lines containing locks bounce back and forth between CPUs
Also, please note that you have gained a lot of parallelism:
Fine grained: Worked 1 spent 1080 ms Worked 2 spent 1230 ms Worked 0 spent 1250 ms Worked 3 spent 1260 ms wall time: 1620 ms Coarse grained: Worked 1 spent 190 ms Worked 2 spent 270 ms Worked 3 spent 410 ms Worked 0 spent 280 ms wall time: 1298 ms
Although each individual thread takes more time due to cache misses (and increased overhead), the whole process is only slightly slower