Java: disk based fast hash set
I need to store a large hash set that can contain up to 200 million 40 bit values It is acceptable to store it as a value of 200 million 64 bits (although 200 million * 16 bits are lost)
The requirements are:
>Small memory footprint (disk space is not a problem, memory is) > fast includes (long l) and add (long l) methods (much faster than SQL) > embedded > free and no annoying licenses (no Berkeley DB) LGPL is good. > There are no false positives and no false positives, so things like disk based Bloom filters are not what I pursue
SQL is not what I pursue
Because I really think I prefer such things (note that the solution is much faster than the SQL solution):
Fast disk-based hashtables?
Does Google have such a Java API?
Disk based fast key / value pair implementation do I only work with 'key'?
Or something else?
I'd rather not reinvent it
Solution
If you can afford 128 GB disk, you can store one bit for every 40 bit value You can then use the random access file to check the set bit or change it You do not have to insert any values or maintain indexes