Java implementation method of Bloom filter

The principle of Bloom filter is very simple: hash a string into an integer key, and then select a long bit sequence, starting with 0, and change the 0 at this position to 1 at the key; The next time a string comes in, the hash value key. If the value on this bit is also 1, it indicates that the string exists.

If you follow the above method, it is no different from the hash algorithm. The hash algorithm is still repeated.

The bloom filter hashes a string into multiple keys. I'd better follow the book.

First create a 1.6 billion binary constant, and then set all the 1.6 billion binary bits to 0. For each string, 8 information fingerprints (F1, F2,..., F8) are generated by 8 different random generators (F1, F2,..., F8) Then a random number generator g is used to map the eight information fingerprints to eight natural numbers G1, G2, g8。 Now change all the binary bits of these eight positions to 1. Such a bloom filter is built.

So how to detect whether a string already exists?

Now use eight random number generators (F1, F8) to generate eight information fingerprints S1, S2 and S8 for this string, and then correspond these eight information fingerprints to the eight binary bits of the bloom filter, namely T1, T2 and T8. If the string exists, it is obvious that the binary bits corresponding to T1 and T8 should be 1. That is how to judge whether the string already exists.

In fact, the bloom filter is an extension of the hash algorithm. Since the essence is hash, there must be deficiencies, that is, there must be misjudgment. A string has not appeared, but the bloom filter judgment does appear. Although it is very unlikely, it does exist.

So how to reduce this probability? First of all, it can be thought that the probability of expanding 8 information fingerprints to 16 errors will certainly be reduced, but it should also be considered that in this way, the number of strings that can be stored by a bloom filter will also be reduced by one time; In addition, select a good hash function. There are many hash methods for strings, including good hash functions.

Bloom filter is mainly used to filter malicious websites. All malicious websites are established on a bloom filter, and then the website visited by the user is detected. If it is in the malicious website, the user will be notified. In this way, we can also set a white list for some websites with frequent misjudgment, and then match the existing websites with those in the white list. If they are in the white list, they will be released. Of course, the white list cannot be too large or too large. The probability of Blum filter error is very small. Interested readers can check the error rate of Bloom filter.

The Java version of the bloom filter source code is given below:

Summary: bloom filter is an innovation of hash algorithm, and the space consumed is very small, and the error rate is very low. In short, this innovative idea is worth learning. It is an application of bit data type.

The above Java implementation method of Bloom filter is all the content shared by Xiaobian. I hope it can give you a reference and support more programming tips.

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>