Parallel Haskell – GHC gc’ing spark
I have a program that I'm trying to parallelize (completely paste with runnable code here)
I have analyzed and found that most of my time is spent on FINDNEAREST, which is basically a large data Map's simple folder
findNearest :: RGB -> M.Map k RGB -> (k,Word32) findNearest rgb m0 = M.foldrWithKey' minDistance (k0,distance rgb r0) m0 where (k0,r0) = M.findMin m0 minDistance k r x@(_,d1) = -- Euclidean distance in RGB-space let d0 = distance rgb r in if d0 < d1 then (k,d0) else x
Parfindneighbor should execute findneighbor in parallel on the subtree of a larger map
parFindNearest :: NFData k => RGB -> M.Map k RGB -> (k,Word32) parFindNearest rgb = minimumBy (comparing snd) . parMap rdeepseq (findNearest rgb) . M.splitRoot
Unfortunately, GHC GC was my biggest spark before it was converted to useful parallelism
This is the result of compiling with GHC - O2 - thaded and running with RTS - S - N2
839,892,616 bytes allocated in the heap 123,999,464 bytes copied during GC 5,320,184 bytes maximum residency (19 sample(s)) 3,214,200 bytes maximum slop 16 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 1550 colls,1550 par 0.23s 0.11s 0.0001s 0.0004s Gen 1 19 colls,18 par 0.11s 0.06s 0.0030s 0.0052s Parallel GC work balance: 16.48% (serial 0%,perfect 100%) TASKS: 6 (1 bound,5 peak workers (5 total),using -N2) SPARKS: 215623 (1318 converted,0 overflowed,0 dud,198111 GC'd,16194 fizzled) INIT time 0.00s ( 0.00s elapsed) MUT time 3.72s ( 3.66s elapsed) GC time 0.34s ( 0.17s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 4.07s ( 3.84s elapsed) Alloc rate 225,726,318 bytes per MUT second Productivity 91.6% of total user,97.1% of total elapsed gc_alloc_block_sync: 9862 whitehole_spin: 0 gen[0].sync: 0 gen[1].sync: 2103
As you can see, most sparks are GC or puzzle before conversion I've tried different strictures. FINDNEAREST returns custom strictness to data types instead of tuples, or use from control Parallel. Strategies rdeepseq, but my spark is still GC'd
I want to know
>Why was my spark GC before conversion? > How can I change the program to take advantage of parallelism?
Solution
I am not an expert in parallel strategy, so I may be completely wrong However:
If you disable GC by setting a large enough allocation area (for example, using the - a20m runtime option), you will see that most sparks fail, not GC This means that they are evaluated through the normal process before the end of the corresponding spark
Minimumby immediately forces parmap results and starts evaluating them At the same time, sparks were scheduled and implemented, but it was too late When spark completes, the value has been evaluated by the main thread If there is no - a20m, the spark will be carried out by GC, because this value will be evaluated and GC measurement will be carried out even before arranging the spark
This is a simplified test case:
import Control.Parallel.Strategies f :: Integer -> Integer f 0 = 1 f n = n * f (n - 1) main :: IO () main = do let l = [n..n+10] n = 1 res = parMap rdeepseq f l print res
In this case, all sparks will fail:
SPARKS: 11 (0 converted,0 GC'd,11 fizzled)
(sometimes they are GC'd)
But if I generate the main thread before printing the results,
import Control.Parallel.Strategies import Control.Concurrent f :: Integer -> Integer f 0 = 1 f n = n * f (n - 1) main :: IO () main = do let l = [n..n+10] n = 1 res = parMap rdeepseq f l res `seq` threadDelay 1 print res
Then all sparks are converted:
SPARKS: 11 (11 converted,0 fizzled)
So, it looks like you don't have enough sparks (try setting L = [n.. N 1000] in my example) and they're not heavy enough (try setting n = 1000 in my example)