Hash table of Java data structure (organized by power node Java College)

Basic concepts

Hash table (also known as hash table) is a data structure that is accessed directly according to key value.

Specifically, it accesses records by mapping the key value to a location in the table, so as to speed up the search.

The function that implements key value mapping is called hash function

The array in which records are stored is called a hash table

The process of implementing hash is usually called hashing, which is often called hash

hash

The concept of hash here is not limited to data structure. In the field of computer science, hash hash is a processing method of information. Through a specific function / algorithm (hash function / hash () method), the items to be retrieved are associated with the index (hash value) used for retrieval to generate a data structure convenient for search - hash table. Nowadays, because the hash value calculated by the hash algorithm is irreversible (the original value cannot be inversely calculated), the hash algorithm is widely used in encryption technology.

Application of hash:

1. Encrypted hash, used in the field of information security

2. Hash table, a data structure that uses hash functions to associate key names with key values

3. Associative array, a data structure often implemented using hash tables

4. Geometric hashing is an effective method to find the same or similar geometry

Hash function

As can be seen from the above, the implementation of hash technology is based on hash function. Here is a deeper understanding of hash functions. As we know earlier, hash function is to complete the mapping between key value and position. Generally speaking, keys are mostly in the form of strings, and the position is a numeric value. It can be seen that the hash function is like the compression of information, compressing the message string into a numerical summary, reducing the amount of data and fixing the format. Working principle of hash function:

However, it should be noted that the key value does not uniquely correspond to the hash value processed by the hash function, which causes different key values to have the same index position. This phenomenon is called hash collision, also known as hash collision. The solutions to hash conflicts will be summarized later. As for the specific implementation of hash function, many encryption technologies have very nice implementation. Here, let's take a look at the hash () method implementation of HashMap in Java. HashMap is implemented by internal hash technology, in which hash () method is a hash function to complete the mapping from key value to array index position.

The above code is the concrete implementation of hash function in HashMap. JDK1. 7 here, the author shows the commonly used hash algorithms:

Hash table

After understanding the above concept of hash \ hash function, we formally enter the learning of Hash list A popular example is that in order to find someone's number in the phone book, you can create a table arranged in alphabetical order of person's name (that is, establish a functional relationship between person's name x and the initial letter F (x)). It is obviously much faster to find the phone number of "Wang" in the table with the initial letter W than directly. Here, the person name is used as the keyword, "take the initial letter" is the function rule f () of the hash function in this example, and the hash table corresponding to the initial letter is stored. Keywords and function rules can be determined arbitrarily in theory.

Construction of hash function

For the data structure of hash table, the construction of hash function is very key. Hash function realizes the mapping of key, and access records can be located faster. Generally speaking, the construction of hash function is based on two standards: simple, uniform and simple, which means that the hash algorithm is simple and fast, and the generation of hash value is simple. Uniform means that for any keyword in the key value set, the hash function can map to any index position of the array with the probability of uniform, which can reduce hash collision. Hash function construction method:

1. Direct address method:

Directly take the key value or a linear function value of the key value as the hash address. That is, hash (k) = k or hash (k) = a * k + B.

Tips: simply think about this method. There is basically no hash conflict in this method. However, we should know the size of the key set in advance. Moreover, if the linear function value is used as the hash address, it will cause a waste of space to a great extent. Hash (k) = k is more unnecessary. Hashing in this way is not as good as direct array index.

2. Digital analysis:

The so-called number analysis method is to assume that the keyword key is a number based on R, and the possible keywords in the hash table are known in advance, then several digits of the keyword can be used to form the hash address.

Tips: this method is extremely inflexible and has too many restrictions.

3. Square middle method:

First, expand the difference of similar numbers by calculating the square value of keywords, and then take the middle digits as the hash function value according to the table length.

Tips: in this way, the middle digits are related to none of the keywords, and the hash address generated is relatively uniform.

4. Folding method:

Divide the keyword into the same number of digits (the last digit can be different), and then remove the superposition and sum of these parts. The folding method is generally used together with the residual method.

5. Residual method:

The hash address is the remainder obtained after the keyword is divided by a number P not greater than the hash table length M. That is, hash (k) = k mod p, P < M. You can not only take the module directly for the keyword, but also take the module after folding method, square middle method and other operations. The choice of P is very important. It usually takes prime or M. if P is not selected well, it is easy to collide.

6. Random method:

H (key) = random (key), where random is a pseudo-random function, but ensure that the function value is between 0 and M-1. Summary: among the above methods, the combination of 3, 4 and 5 is better. Method 5 was used in previous versions of JDK.

Hash Collisions

From the above learning, we know that the key - index position obtained by the hash function does not correspond uniquely, which may cause different key values to correspond to the same index position. This is the problem we should solve. The practical solutions are generally as follows:

1. Separate connection method:

First, let's look at the separation and connection method. To put it bluntly, this method is the way of linked list array. Hash to the same value, and all elements are saved in a table to produce the same value, which is stored in the form of linked list in the hash table. The position of hash conflict is the starting position of the linked list. In JKD, HashMap solves hash conflicts in this way!

The conflict handling code in HashMap is as follows

2. Open address method

The disadvantage of the separate connection method is that it uses a linked list. Due to the time-consuming allocation of addresses to new units, the algorithm speed is slow. The solution is the open address method. There are two commonly used open address methods: linear detection method and square detection method. Open address method:

hash_ i=(hash(key) + d(i)) mod m,i=1,2... K \, (k < m-1), where hash (key) is the hash function, M is the hash table length, D (I) is the incremental sequence, and I is the number of collisions. The incremental sequence can be taken as follows:

d(i)=1,2,3... (m-1) is called linear detection; That is, D (I) = I, or other linear function. It is equivalent to probing the table storing addresses one by one until an empty cell is found, and storing the hash address in the empty cell. d(i)=1^2, 2^2,3^2... K ^ 2 (k < m / 2) is called square detection. Relative to linear detection, it is equivalent to whether the position of the detection interval D (I) = I ^ 2 cells is empty in case of collision. If it is empty, store the address. D (I) = pseudo-random number sequence, called pseudo-random detection.

Linear detection method

Next, the author will demonstrate the process of linear detection with an example, then analyze the characteristics of linear detection, and lead to the insertion of square detection keyword {89,18,49,58,69} into a hash table. At this time, the method of linear detection is to take D (I) = I. It is assumed that the remainder of keyword divided by 10 is the hash function rule.

1. At the beginning, hash (89) = 9, no conflict, direct insertion;

2. Hash (18) = 8, no conflict, direct insertion;

3. Hash (49) = 9 conflicts. Open the address. Put 49 into the next free address 0

4. Hash (58) = 8 conflicts. Open the address. Put 58 into 9 conflicts, 0 conflicts and 1 conflicts

5. Hash (69) = 9 conflicts, open address, put 69 into 0 conflicts, put 69 into 1 conflicts, and put 69 into 2 conflicts

Tips: think about its shortcomings!

The method of linear detection is very simple. It is clear that each insertion can always find an address, but a block will slowly form, and the result is called an aggregation. Any keyword needs to be detected more and more times to solve the conflict, and after completion, the block is increased by the introduction. When the filling factor is > 0.5, this method is not a good method!

Square detection method:

The square detection method can solve the primary aggregation problem of linear detection. Generally, D (I) = I ^ 2. As for its specific steps, readers can simulate it by themselves according to the above example. In this way, secondary aggregation occurs: which elements hashed to the same location will detect the same candidate unit.

3. Double hash, re hash

I won't mention the methods of double hashing and re hashing here. Readers can refer to the relevant materials. Summary: for the implementation of hash, novices don't have to care too much. The key is to understand the concepts related to hash. Understand and master the function and general implementation of hash function. Understand general hash conflicts and common solutions.

The above is the scattered list of Java data structures introduced by Xiaobian to you (organized by the power node Java College). I hope it will be helpful to you. If you have any questions, please leave me a message and Xiaobian will reply to you in time. Thank you very much for your support for the programming tips website!

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>