How to calculate a good hash code for a large list of strings?

What is the best way to calculate a hash code based on the value of these strings in a pass?

OK, I mean, it needs to:

1 – fast: I need to get the hash code of short string of large string (10 ^ 3.. 10 ^ 8 items)

2 – identify the entire data list. There may be only a few different strings in so many lists. There must be different hash codes

How to do it in Java?

There may be a way to use existing string hash codes, but how to combine many hash codes calculated into separate strings?

thank you.

Solution

Create a placeholder class for you, and then use CRC32 class It's simple and fast:

import java.util.zip.CRC32;

public class HugeStringCollection {
    private Collection<String> strings;

    public HugeStringCollection(Collection<String> strings) {
        this.strings = strings;
    }

    public int hashCode() {
        CRC32 crc = new CRC32();
        for(String string : strings) {
            crc.update(string.getBytes())
        }

        return (int)( crc.getValue() );
    }
}

If the collection itself is immutable, you can compute a hash and store it for reuse

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>