Does Java parallel flow use only one thread?

I am using the latest Java 8 Lambdas and parallel streams to process data

ForkJoinPool forkJoinPool = new ForkJoinPool(10);
List<String> files = Arrays.asList(new String[]{"1.txt"}); 
List<String> result = forkJoinPool.submit(() ->
    files.stream().parallel()
        .flatMap(x -> stage1(x)) //at this stage we add more elements to the stream
        .map(x -> stage2(x))
        .map(x -> stage3(x))
        .collect(Collectors.toList())
).get();

The flow starts with one element, but adds more elements in the second phase My assumption is that the stream should run in parallel, but in this case only one worker thread is used

If I start with two elements (that is, I add the second element to the initial list), two threads will be generated to process the stream, etc... this will also happen if I do not explicitly submit the stream to forkjoinpool

The question is: is it a documented behavior or is it likely to change in implementation? Is there a way to control this behavior and allow more threads, regardless of the initial list?

Solution

You observe implementation - specific behavior, not specified behavior

The current JDK 8 implementation looks at the splitter of the outermost stream and uses it as the basis for splitting parallel workloads Because the sample has only one element in the original source stream, it cannot be split, and the stream runs a single thread This applies to common (but not limited to) cases where flatmap returns zero, one or only a few elements, but when a large number of elements are returned, they are processed in order In fact, the stream returned by the flatmap function is forced into sequential mode See referencepipeline Java line 270

The "obvious" thing is to make the stream parallel, or at least not force it to be sequential This may or may not improve Most likely, it will improve some things, but make others worse There must be a demand for a better policy, but I'm not sure what it will look like

Please also note that the technology used to force parallel flows to run in the fork join pool of your choice is also implementation specific by submitting the task of running the pipeline to it It works this way in JDK 8, but it may change in the future

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>