Construct unique numbers for strings in Java

We require more than 10 million strings to be read / written in the file In addition, we do not want duplicates in the document Since the string will be refreshed to the file immediately after reading, we will not keep it in memory

We cannot use hash codes because we may miss strings as duplicates due to conflicts in hash codes Two other methods I found in Google search:

1. Use message summarization algorithms, such as MD5 - but the computing and storage costs may be too high

2. Use checksum algorithm [I'm not sure if this will produce a unique key for a string – someone can confirm]

Are there any other methods available thank you.

Solution

If you are satisfied with the risk of microscope conflicts, you can use some hash functions as recommended, such as MD5, and rely on hash values

Another alternative that may have a larger memory footprint is to store strings that have been encountered in trie (a special type of tree)

Update: another alternative is to use bloom filter However, this still depends on hashing, but can be adjusted to have any small collision probability

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>