Multithreading – clojure – effectively increases the number in the list at the same time

Short version: what is the right way to store hundreds of numeric lists in clojure, increasing each number millions of times (possibly across multiple threads)?

Long version: the program starts with an empty vector, where each value is initialized to 0:

[0 0 0 0 0 0 0 0 0 ...]

It then reads millions of lines of files line by line After performing some arbitrary calculations on a line, the program increases some values in the vector After the first line, the vector may look like:

[1 1 1 2 0 1 0 1 1 ...]

After the second line:

[2 2 3 2 2 1 0 2 2 ...]

After ~ 5000 lines, it may look like:

[5000 4998 5008 5002 4225 5098 5002 5043 ...]

Since clojure's data structure is immutable, it seems wasteful to use Assoc to increase the value in the vector, because the entire vector will be copied for each increment

What is the correct way to perform such concurrent data aggregation without spending all my CPU time copying immutable data structures? Should I have a vector where each element is like ref or atom, and all threads increment these shared values? Or, is there some thread level data structure that can store counts, and then the last step is to integrate the counts of each thread?

This may not bind I / O on a single thread, so I guess I'll split line processing between several threads There is no limit to the length of the vector (the length may be thousands of elements), but it is likely to be about 100 elements long

Solution

Clojure's vectors are persistent data structures When updating an element in a vector, it does not copy the entire element, and it basically takes time, which means o (log32 n)

But it seems that every iteration updates almost every element in the vector Maybe you want to refer to transient data structures

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>