Construct unique numbers for strings in Java
We require more than 10 million strings to be read / written in the file In addition, we do not want duplicates in the document Since the string will be refreshed to the file immediately after reading, we will not keep it in memory
We cannot use hash codes because we may miss strings as duplicates due to conflicts in hash codes Two other methods I found in Google search:
1. Use message summarization algorithms, such as MD5 - but the computing and storage costs may be too high
2. Use checksum algorithm [I'm not sure if this will produce a unique key for a string – someone can confirm]
Are there any other methods available thank you.
Solution
If you are satisfied with the risk of microscope conflicts, you can use some hash functions as recommended, such as MD5, and rely on hash values
Another alternative that may have a larger memory footprint is to store strings that have been encountered in trie (a special type of tree)
Update: another alternative is to use bloom filter However, this still depends on hashing, but can be adjusted to have any small collision probability