Why does adding a kernel after about 10 kernels slow down my java program?

My program uses fork / join to run thousands of tasks as follows:

private static class Generator extends RecursiveTask<Long> {
    final MyHelper mol;
    final static SatChecker satCheck = new SatChecker();

    public Generator(final MyHelper mol) {
        super();
        this.mol = mol;
    }

    @Override
    protected Long compute() {
        long count = 0;
        try {
            if (mol.isComplete(satCheck)) {
                count = 1;
            }
            ArrayList<MyHelper> molList = mol.extend();
            List<Generator> tasks = new ArrayList<>();
            for (final MyHelper child : molList) {
                tasks.add(new Generator(child)); 
            }
            for(final Generator task : invokeAll(tasks)) { 
                count += task.join(); 
            }
        } catch (Exception e){
            e.printStackTrace();
        }       
        return count;           
    }
}

My program makes extensive use of iscomplete and third-party libraries of extension methods The extend method also uses native libraries As far as the myhelper class is concerned, there is no shared variable or synchronization between tasks

I use the taskset command in Linux to limit the number of cores used by my application I get the best speed by using about 10 cores (such as about 60 seconds) This means that using more than 10 cores will slow down the application, so 16 cores can complete 6 cores at the same time (about 90 seconds)

I'm more confused because the selected kernel is 100% busy (except garbage collection) Who knows what can cause so slow? Where should I solve this problem?

PS: I also used ThreadPoolExecutor in scala / akka, but the results were similar (although slower than fork / join)

PPS: my guess is that someone crossed the memory barrier (poisoned cache) deep in myhelper or satcheck But how can I find and repair or do it?

Solution

Overload may occur due to the allocation of threads / tasks to different cores In addition, are you sure your program is fully parallelizable? In fact, some programs cannot always use all available CPUs 100% efficiently, and the time spent allocating tasks may slow down the program rather than help it

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>