Analysis of indexfor method in HashMap
When sorting out the working principle of HashMap, it is found that it calls the indexfor (int h, int length) method to calculate the array index value saved by the entry object in table:
static int indexFor(int h,int length) { return h & (length-1); }
It does not remainder the length of the hash table, but uses bit operation to get the index. Why is this? I suddenly doubt it~
analysis
//获取当前table的长度 int newCapacity = table.length; //若长度小于目标长度,则扩展为之前的2倍 while (newCapacity < targetCapacity) newCapacity <<= 1;
The initial capacity and expansion of HashMap are carried out to the power of 2. If length-1 is converted into binary, all bits must be 1. For example, the third power of 2 is 8, and the binary representation of length-1 is 111. The principle of bit and calculation is that the two bits are "1" at the same time, and the result is "1", otherwise it is "0". Therefore, the H & (length-1) operation is numerically equivalent to taking the module of length, that is, H% length.
What happens if the precondition "the initial capacity and expansion of HashMap are carried out to the power of 2" is not met?
Assuming that the length of the current table is 15 and the binary representation is 1111, then the length-1 is 1110. At this time, two keys with hash values of 8 and 9 need to calculate the index value. The calculation process is as follows:
8的二进制表示:1000 8&(length-1)= 1000 & 1110 = 1000,索引值即为8; 9的二进制表示:1001 9&(length-1)= 1001 & 1110 = 1000,索引值也为8;
In this way, the same index value is generated, that is, two keys with hash values of 8 and 9 will be located at the same position in the array to form a linked list, which leads to collision.
The query needs to traverse the linked list, which reduces the efficiency of the query. At the same time, we can also find that when the array length is 15, The hash value will be the same as length-1 (1110) for bitwise and, the last bit will always be 0, and 00010011010110011111101 can never store elements, which will cause serious space waste. What's worse, in this case, the available position of the array is much smaller than the length of the array, which means that it further increases the probability of collision and slows down the efficiency of query 。
Therefore, it can be seen that only when the array length is to the nth power of 2, the probability of the same index calculated by different keys is small, the data is evenly distributed on the array, and the collision probability is small. Relatively, the linked list at a certain location does not need to be traversed during query, so the query efficiency is high.
In addition, bit operation is faster than decimal operation, and HashMap expansion is also bit expansion, which also improves the operation efficiency.