Java – a quick way to find unique values in a list

Given a list of keyvaluepairs, where each pair has a getValue () method, what is the fastest way to get a list (or set) of unique values?

All of the following produce acceptable results U1 seems to be the fastest than expected list size (about 1000-2000 kVp)

Can we do better (faster)?

private static Set<String> u1(List<_KVPair> pairs) {
    Set<String> undefined = new HashSet<String>();

    for (_KVPair pair : pairs) {
        undefined.add(pair.getValue());
    }

    if (undefined.size() == 1) {
        return new HashSet<String>();
    }
    return undefined;
}

private static List<String> u2(List<_KVPair> pairs) {

    List<String> undefined = new ArrayList<String>();
    for (_KVPair pair : pairs) {
        if (!undefined.contains(pair.getValue())) {
            undefined.add(pair.getValue());
        }
    }

    return undefined;
}

private static List<String> u3(List<_KVPair> pairs) {

    List<String> undefined = new LinkedList<String>();

    Iterator<_KVPair> it = pairs.iterator();
    while (it.hasNext()) {
        String value = it.next().getValue();
        if (!undefined.contains(value)) {
            undefined.add(value);
        }
    }
    return undefined;
}

About 3600 pairs, 'U3' won About 1500 pairs, 'U1' won

Solution

The first option should be faster You can adjust the collection size to make it faster before you use it. In general, if you expect a small number of repetitions:

Set<String> undefined = new HashSet<String>(pairs.size(),1);

Note that I use 1 as the load factor to prevent any resizing

Out of curiosity, I tested (the following code) – the result is (after compilation):

Test 1 (Note: warm up takes a few minutes)

Test 2

That makes sense:

>When there are many duplicates, list #contains will run fairly fast because duplicates can be found faster and the cost of allocating large collections is being affected > when there are no or few duplicates, the collection wins significantly

public class TestPerf {

    private static int NUM_RUN;
    private static Random r = new Random(System.currentTimeMillis());
    private static boolean random = false; //toggle to false for no duplicates in original list


    public static void main(String[] args) {

        List<String> list = new ArrayList<>();

        for (int i = 0; i < 30_000; i++) {
            list.add(getRandomString());
        }

        //warm up
        for (int i = 0; i < 10_000; i++) {
            method1(list);
            method2(list);
            method3(list);
        }

        NUM_RUN = 100;
        long sum = 0;
        long start = System.nanoTime();
        for (int i = 0; i < NUM_RUN; i++) {
            sum += method1(list);
        }
        long end = System.nanoTime();
        System.out.println("set: " + (end - start) / 1000000);

        sum = 0;
        start = System.nanoTime();
        for (int i = 0; i < NUM_RUN; i++) {
            sum += method2(list);
        }
        end = System.nanoTime();
        System.out.println("arraylist: " + (end - start) / 1000000);

        sum = 0;
        start = System.nanoTime();
        for (int i = 0; i < NUM_RUN; i++) {
            sum += method3(list);
        }
        end = System.nanoTime();
        System.out.println("linkelist: " + (end - start) / 1000000);

        System.out.println(sum);
    }

    private static int method1(final List<String> list) {
        Set<String> set = new HashSet<>(list.size(),1);
        for (String s : list) {
            set.add(s);
        }
        return set.size();
    }

    private static int method2(final List<String> list) {
        List<String> undefined = new ArrayList<>();
        for (String s : list) {
            if (!undefined.contains(s)) {
                undefined.add(s);
            }
        }
        return undefined.size();
    }

    private static int method3(final List<String> list) {
        List<String> undefined = new LinkedList<>();

        Iterator<String> it = list.iterator();
        while (it.hasNext()) {
            String value = it.next();
            if (!undefined.contains(value)) {
                undefined.add(value);
            }
        }
        return undefined.size();
    }

    private static String getRandomString() {
        if (!random) {
            return "skdjhflkjrglajhsdkhkjqwhkdjahkshd";
        }
        int size = r.nextInt(100);
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < size; i++) {
            char c = (char) ('a' + r.nextInt(27));
            sb.append(c);
        }
        System.out.println(sb);
        return sb.toString();
    }
}
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>