Java – use the streams API to perform operations on N different elements in the collection

I'm trying to use the streams API in Java 8 to retrieve n unique random elements from the collection for further processing, but I don't have much or any luck

More precisely, I want something like this:

Set<Integer> subList = new HashSet<>();
Queue<Integer> collection = new PriorityQueue<>();
collection.addAll(Arrays.asList(1,2,3,4,5,6,7,8,9));
Random random = new Random();
int n = 4;
while (subList.size() < n) {
  subList.add(collection.get(random.nextInt()));
}
sublist.forEach(v -> v.doSomethingFancy());

I want to do this as efficiently as possible

Can this be done

Editor: my second attempt – though not exactly my goal:

List<Integer> sublist = new ArrayList<>(collection);
Collections.shuffle(sublist);
sublist.stream().limit(n).forEach(v -> v.doSomethingFancy());

Edit: third attempt (inspired by Holger), if coll Size () is huge and n small, which eliminates a lot of shuffle overhead:

int n = // unique element count
List<Integer> sublist = new ArrayList<>(collection);   
Random r = new Random();
for(int i = 0; i < n; i++)
    Collections.swap(sublist,i,i + r.nextInt(source.size() - i));
sublist.stream().limit(n).forEach(v -> v.doSomethingFancy());

Solution

Shuffling works reasonably, as shown in comment 0700 and zouzou answer This is a version of the shuffle method:

static <E> List<E> shuffleSelectN(Collection<? extends E> coll,int n) {
    assert n <= coll.size();
    List<E> list = new ArrayList<>(coll);
    Collections.shuffle(list);
    return list.subList(0,n);
}

I will notice that using a sublist is better than getting the stream, and then calling limit (n), as shown in other answers, because the generated stream has known size and can be more effectively segmented.

The shuffle method has several disadvantages It needs to copy all the elements, and then it needs to shuffle all the elements If the total number of components is large and the number of components to be selected is small, this can be quite expensive

The method proposed by OP and several other answers are to randomly select elements and reject repetition until the required number of unique elements are selected If the number of elements to be selected is small relative to the total number, this effect will be good, but as the number of choices increases, this situation will slow down due to the possibility of repeated rise of choices

Isn't it good if there is a way to select and select the required number separately in the space of the input element, and whether these selections are made randomly and evenly? It turns out that, as usual, the answer can be found in Knut See taocp Vol 2, SEC 3.4 2,Random Sampling and Shuffling,Algorithm S.

In short, the algorithm accesses each element and determines whether to select them according to the number of accessed elements and the number of selected elements In Knuth's symbol, suppose you have n elements, and you want to choose n elements at will The next element should be selected by probability

Where t is the number of elements accessed so far and M is the number of elements selected so far

This is not obvious, it will give a uniform distribution of the selected elements, but it is obvious The proof leaves the reader with an exercise; See exercise 3 in this section

Given this algorithm, it is very simple to implement it by looping through the collection and adding the result list to "traditional" Java according to random tests OP asked about using streams, so here's a shot

Algorithm s itself is not explicitly used for Java stream operations It is completely described in order. The decision whether to select the current element depends on the random decision plus state derived from all previous decisions This may make it look continuous in nature, but I was wrong before I'm just saying that it's not obvious how to make this algorithm run in parallel

However, there is a way to adapt the algorithm to flow What we need is stateful predicates The predicate will return a random result based on the probability determined by the current state, and update the state to yes, mutation based on the random result It seems difficult to run in parallel, but at least it's easy to make the thread safe in case it runs from a parallel stream: just synchronize it However, if the stream is parallel, it will be degraded to sequential operation

The implementation is very simple Knuth's description uses random numbers between 0 and 1, but the java random class allows us to select a random integer in a half open time interval Therefore, what we need to do is to keep the counter of how many elements are left and how many elements are left to choose from,

/**
 * A stateful predicate that,given a total number
 * of items and the number to choose,will return 'true'
 * the chosen number of times distributed randomly
 * across the total number of calls to its test() method.
 */
static class Selector implements Predicate<Object> {
    int total;  // total number items remaining
    int remain; // number of items remaining to select
    Random random = new Random();

    Selector(int total,int remain) {
        this.total = total;
        this.remain = remain;
    }

    @Override
    public synchronized boolean test(Object o) {
        assert total > 0;
        if (random.nextInt(total--) < remain) {
            remain--;
            return true;
        } else {
            return false;
        }
    }
}

Now we have our predicate, which is easy to use in the stream:

static <E> List<E> randomSelectN(Collection<? extends E> coll,int n) {
    assert n <= coll.size();
    return coll.stream()
        .filter(new Selector(coll.size(),n))
        .collect(toList());
}

Another option is also mentioned in the same part of Knuth. It is suggested to randomly select an element with an invariant probability of N / n This is useful if you do not need to select n elements exactly It will select n elements on average, but of course there will be some changes If this is acceptable, stateful predicates become simpler Instead of writing an entire class, we can simply create a random state and capture it from a local variable:

/**
 * Returns a predicate that evaluates to true with a probability
 * of tochoose/total.
 */
static Predicate<Object> randomPredicate(int total,int tochoose) {
    Random random = new Random();
    return obj -> random.nextInt(total) < tochoose;
}

To use it, replace the filter line in the above flow pipe

.filter(randomPredicate(coll.size(),n))

Finally, for the purpose of comparison, here is the implementation of the selection algorithm written in conventional Java, that is, use for loop and add it to the collection:

static <E> List<E> conventionalSelectN(Collection<? extends E> coll,int remain) {
    assert remain <= coll.size();
    int total = coll.size();
    List<E> result = new ArrayList<>(remain);
    Random random = new Random();

    for (E e : coll) {
        if (random.nextInt(total--) < remain) {
            remain--;
            result.add(e);
        }
    }            

    return result;
}

This is very simple, there is no real mistake It is simpler and more independent than the flow method Still, the flow method illustrates some interesting techniques that may be useful in other contexts

reference resources:

Knuth,Donald E. The Art of Computer Programming:Volume 2,Seminumerical Algorithms,2nd edition. Copyright 1981, 1969 Addison Wesley

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>