Hash algorithm and hash code
1、 Introduce
Looking at this result, the problem arises. There is clearly groudhog {number = 3} in the map. Why does the result show key not find?? What's the problem? It turns out that the groudhog class does not override the hashcode () method, so here the hashcode () method of object is used to generate the hash code, and by default, it uses the address of the object to calculate the hash code. Therefore, the hash code of the first instance generated by groudhog (3) is different from the hash code generated by groudhog (3), so the key cannot be found. But rewriting hashcode () is not enough unless you override the equals () method. The reason is that different objects may calculate the same hashcode value. The hashcode value is not unique. When the hashcode value is the same, equals() will be used to judge whether the current "key" is "the same" as the existing key in the table, that is“
The correct equals() method must meet the following five conditions:
1. Reflexivity: X. equals (x) must be true.
2. Symmetry: if x.equals (y) holds, then y.equals (x) must also hold.
3. Transitivity: if x.equals (y) = true and y.equals (z) = true, then x.equals (z) = true is also true.
4. Consistency: no matter how many times X. equal (y) is called, the returned results should be consistent.
5. X.equals (null) must return false for any X that is not null.
2、 Understand hashcode ()
The value of hashing is speed: hashing enables queries to execute quickly. Because the bottleneck of speed is to query the "key", and the fastest data structure for storing a group of elements is array, it is used to represent the key information. Note: array does not save the "key" itself. The "key" object generates a number as the index of the array. This number is the hash code, which is generated by the hashcode () defined in the object (or called the hash function). At the same time, in order to solve the problem that the array capacity is fixed, different "keys" can produce the same subscript. What about arrays? How to save multiple values in the same index?? The original array does not directly save "values", but a list of "values". Then use the equals () method to linearly query the "values" in the list. This part of the query will naturally be slow, but if there is a good hash function, each subscript index only saves a small number of values and compares only a few elements, it will be much faster.
I wonder if you understand what I'm saying above. But it doesn't matter. Here is an example to help you understand. However, I have been puzzled by a question before: why does the subscript of a hashcode have multiple values? Because there can only be a unique key in a HashMap, it is only right to have a unique value in that subscript. Here we will propose a new concept of hash conflict, using an example on the Internet:
For example, the length of the array is 5. At this time, one data is 6. So how to store this 6 in an array with a length of only 5. According to the modular method, calculate 6% 5, and the result is 1. Then put 6 in the position where the subscript of the array is 1. Then, 7 should be placed in the position of 2. Hash conflicts have not occurred at this location. At this time, there is a data of 11. According to the modulus method, 11% 5 = 1, which is also equal to 1. So there are several places where the subscript of the original array is 1, which is 6. At this time, the position of 1 is calculated, so the position of array 1 must store two numbers. At this time, it is called hash conflict. After the conflict, it should be stored in order. Therefore, the solution used in Java here is to save a list on this hashcode. When the same hashcode is encountered, add elements to the list. This is the essence of hash principle!
3、 Performance factors for HashMap
Capacity: the quantity in the hash table.
Initial capacity: the number of buckets when creating a hash table. Both HashMap and HashSet allow you to specify the initial capacity in the constructor.
Size: the number of records in the current hash table.
Load factor: equal to "size / capacity". A load factor of 0 indicates an empty hash table, 0.5 indicates a half full hash table, and so on. Lightly loaded hash tables have the characteristics of less conflicts, suitable insertion and suitable query (but traversal using iterators will be slower). The constructors of HashMap and HashSet allow you to specify load factors. This means that when the load reaches the specified value, the container will automatically double its capacity, redistribute the original objects and store them in a new container (this is called "rehashing"). The default load factor of HashMap is 0.75, which well balances the cost of time and space.
When size > capacity * load factor in HashMap, that is, the capacity exceeds the maximum allowed number of elements, the bucket of HashMap needs to be expanded. Of course, the array in Java cannot be expanded automatically. The method is to use a new array to replace the existing array with small capacity.
There is a problem here. Even if the load factor and hash algorithm are designed reasonably, the zipper will inevitably be too long. Once the zipper is too long, it will seriously affect the performance of HashMap. So, in jdk1 In version 8, the data structure is further optimized and the red black tree is introduced. When the length of the linked list is too long (more than 8 by default), the linked list will be transformed into a red black tree. The performance of HashMap will be improved by using the characteristics of rapid addition, deletion, modification and query of the red black tree. In this way, the time complexity will be reduced from O (n) to o (logn), in which the insertion, deletion, search and other algorithms of the red black tree will be used.
Tips: in order to balance the hash distribution, Java hash functions use the integer power of 2 as the ideal capacity of the Hash list. Division and remainder are the slowest actions for modern processors. A hash table that uses an integer power of 2. You can use a mask instead of division. Because get () is the most used operation, the% operation of remainder is the most expensive, and using the integer power of 2 can eliminate this overhead (which may also have some impact on hashcode())
4、 How to rewrite hashcode ()
In today's ide tools, we can automatically rewrite hashcode () and equals () methods, but that may not be optimal. There are two principles for rewriting hashcode ():
Here is a basic guide to how to write a decent hashcode():
1. Assign a non-zero constant to the int variable result, such as 17.
2. Calculate an int hash code C for each meaningful attribute f in each object (that is, each attribute that can do equals()).
3. Hash value obtained by consolidated calculation: result = 37 * result + C;
4. Return result;
5. Check the final result generated by hashcode () to ensure that the same object has the same hash code.
5、 Customize HashMap
Next, we will write a HashMap to understand the underlying principle. If you can understand the following code, you will have a good understanding of the principle of hashcode.
Original works, please indicate the source for Reprint: http://www.cnblogs.com/jmcui/p/7419779.html