Implementation of red black tree with treemap and TreeSet in Java

The implementation of treemap is a red black tree data structure, that is, a self balanced sorting binary tree, which can ensure that when it is necessary to quickly retrieve the specified nodes.

Relationship between TreeSet and treemap

In order to let you understand the relationship between treemap and TreeSet, let's first look at part of the source code of TreeSet class:

As can be seen from the above code, the ① and ② constructors of TreeSet create a new treemap as the container for actually storing set elements, while the other two constructors depend on the ① and ② constructors respectively. It can be seen that the storage container actually used at the bottom of TreeSet is treemap.

Similar to HashSet, most methods in TreeSet are implemented by directly calling treemap methods. Readers can refer to the source code of TreeSet, which will not be given here.

For treemap, it uses a sort binary tree called "red black tree" to save each entry in the map -- each entry is treated as a node of the "red black tree". For example, for the following procedures:

When the program executes map put("ccc",89.0); When, the system will directly put the entry "CCC" - 89.0 into the map, and this entry is the root node of the "red black tree". Then the program executes map put("aaa",80.0); When, the program will add "AAA" - 80.0 as a new node to the existing red black tree.

In the future, every time a key value pair is put into the treemap, the system needs to treat the entry as a new node and add it into the existing red black tree. In this way, it can ensure that all keys in the treemap are always arranged from small to large. For example, when we output the above program, we will see the following results (all keys are arranged from small to large):

Add node of treemap

Red black tree

The red black tree is a self balancing sort binary tree. The value of each node in the tree is greater than or equal to the value of all nodes in its left subtree and less than or equal to the value of all nodes in its right subtree, which ensures that the required nodes can be quickly found and located in the tree when the red black tree runs.

For treemap, a "red black tree" is used at the bottom to save the entries in the collection, which means that the performance of adding and taking out elements in treemap is lower than that of HashMap: when adding elements in treemap, it needs to find the insertion position of the new entry through circulation, so it consumes more performance; When extracting elements from the treemap, you need to loop to find the appropriate entry, which also consumes performance. However, the advantages of treemap and TreeSet over HashMap and HashSet are that all entries in treemap are always in order according to the specified sorting rule by key, and all elements in TreeSet are always in order according to the specified sorting rule.

In order to understand the underlying implementation of treemap, we must first introduce two data structures: sorted binary tree and red black tree. Red black tree is a special sort binary tree.

Sort binary tree is a special structure of binary tree, which can sort and retrieve all nodes in the tree very conveniently.

The sort binary tree is either an empty binary tree or a binary tree with the following properties:

Figure 1 shows a sort binary tree:

Figure 1 Sorted binary tree

For sorted binary trees, if we traverse in middle order, we can get an ordered sequence from small to large. As shown in Figure 1, the binary tree is traversed in the middle order:

{2,3,4,8,9,9,10,13,15,18}

The steps of creating a sort binary tree are the process of constantly adding nodes to the sort binary tree. The steps of adding nodes to the sort binary tree are as follows:

After mastering the above theory, let's analyze the implementation of adding nodes to treemap (the internal class of entry is used to represent nodes in treemap). The put (k key, V value) method of treemap set implements putting entries into the sorted binary tree. The following is the source code of this method:

The bold code in the above program is the key algorithm to realize the "sorting binary tree". Whenever the program wants to add a new node, the system always starts from the root node of the tree - that is, the root node is regarded as the current node. If the new node is larger than the current node and the right child node of the current node exists, the right child node is regarded as the current node; If the new node is smaller than the current node and the left child node of the current node exists, the left child node is taken as the current node; If the new node is equal to the current node, overwrite the current node with the new node, and end the cycle - until the left and right child nodes of a node do not exist, add the new node to the child node of the node - if the new node is larger than the node, add it as the right child node; If the new node is smaller than the node, it is added as the left child node.

Delete node of treemap

When the program deletes a node from the sort binary tree, in order to keep it as a sort binary tree, the program must maintain the sort binary tree. Maintenance can be divided into the following situations:

(1) If the deleted node is a leaf node, you only need to delete it from its parent node.

(2) The deleted node P has only a left subtree, and the left subtree pl of P can be added as the left subtree of P's parent node; the deleted node P has only a right subtree, and the right subtree PR of P can be added as the right subtree of P's parent node.

(3) If the left and right subtrees of the deleted node P are not empty, there are two methods:

Set PL as the left or right child node of the parent node Q of P (depending on whether P is the left and right child nodes of its parent node q), and set PR as the right child node of the middle order leading node s of P (s is the node at the bottom right of PL, that is, the largest node in the PL subtree). Replace the node referred to by P with the middle order antecedent or successor of P node, and then delete the middle order antecedent or successor node from the original sorted binary tree. (that is, replace P node with the smallest node greater than P or the largest node less than P).

Figure 2 shows the schematic diagram of the deleted node with only the left subtree:

Figure 2 The deleted node has only the left subtree

Figure 3 shows the schematic diagram of the deleted node with only the right subtree:

Figure 3 The deleted node has only the right subtree

Figure 4 shows that the deleted node has both left and right child nodes. At this time, we use the first method for maintenance:

Figure 4 The deleted node has both left and right subtrees

Figure 5 shows that the deleted node has both left and right subtrees. At this time, we use the second method for maintenance:

Figure 5 The deleted node has both left and right subtrees

The treemap deleted node is maintained in the case on the right as shown in Figure 5 - that is, it is maintained by exchanging the smallest node in the right subtree of the deleted node with the deleted point.

The method of deleting nodes in treemap is realized by the following methods:

Red black tree

Although the sorted binary tree can be retrieved quickly, in the worst case, if the inserted node set itself is ordered, either from small to large or from large to small, Then the final sorted binary tree will become a linked list: all nodes have only left nodes (if the inserted node set itself is a large to small arrangement); or all nodes have only right nodes (if the inserted node set itself is a small to large arrangement). In this case, the sorted binary tree will become an ordinary linked list, and its retrieval efficiency will be very poor.

In order to change the shortcomings of sorted binary tree, Rudolf Bayer invented another improved sorted binary tree: red black tree in 1972. He called this sorted binary tree "symmetric binary B tree", and the name of red black tree was first proposed by Leo J. guibas and Robert Sedgewick in 1978.

Red black tree is a more efficient retrieval binary tree, so it is often used to implement associative arrays. Typically, the collection class treemap provided by JDK itself is the implementation of a red black tree.

The red black tree adds the following requirements to the original sorting binary tree:

Red black tree implemented in Java

Each leaf node of the red black tree specified in property 3 above is empty, and all leaf nodes are black. However, the red black tree implemented in Java will use null to represent empty nodes. Therefore, when traversing the red black tree, you will not see the black leaf nodes, but see that each leaf node is red.

Property 1: each node is either red or black.

Property 2: the root node is always black.

Property 3: all leaf nodes are empty (i.e. null) and black.

Property 4: the two child nodes of each red node are black. (there will not be two consecutive red nodes on the path from each leaf to the root)

Property 5: the path from any node to each leaf node in its subtree contains the same number of black nodes.

The red black tree implemented in Java may have a structure as shown in Figure 6:

Figure 6 Schematic diagram of Java red black tree

Note: white represents red in all schematic diagrams of red black tree in this paper. Black nodes are still represented in black.

According to property 5: the path from the root node to each leaf node of the red black tree contains the same number of black nodes, so the number of black nodes contained in the path from the root node to the leaf node is called the "black height" of the tree.

Property 4 guarantees that the length of the longest path from the root node to the leaf node will not exceed twice that of any other path. If there is a red black tree with a black height of 3: the shortest path length from the root node to the leaf node is 2, The path is full of black nodes (black node - Black node - Black node). The longest path can only be 4. Insert a red node between each black node (black node - red node - Black node - red node - Black node), property 4 guarantees that it is impossible to insert more red nodes. Therefore, the longest path in the red black tree is a path alternating red and black.

Red black tree and balanced binary tree

Red black tree is not a real balanced binary tree, but in practical application, the statistical performance of red black tree is higher than that of balanced binary tree, but the extreme performance is slightly worse.

Therefore, we can conclude that for a given red black tree with black height n, the shortest path length from root to leaf node is n-1 and the longest path length is 2 * (n-1).

Tip: the depth of the sorted binary tree directly affects the retrieval performance. As pointed out earlier, when the inserted nodes are arranged from small to large, the sorted binary tree will become a linked list. The retrieval performance of this sorted binary tree is the lowest: the depth of the binary tree of N nodes is n-1.

The red black tree is roughly balanced through the above restrictions - because the height of the red black tree will not increase infinitely, which ensures that the red black tree is efficient in the worst case and will not appear in the case of ordinary sorted binary tree.

Because the red black tree is only a special sort binary tree, the read-only operation on the red black tree is exactly the same as that on the ordinary sort binary tree, but the red black tree maintains a general balance, so the retrieval performance is much better than that of the sort binary tree.

However, the insertion and deletion operations on the red black tree will cause the tree to no longer meet the characteristics of the red black tree. Therefore, the insertion and deletion operations need to be maintained to ensure that the tree after inserting and deleting nodes is still a red black tree.

Repair after adding points

In the above put (k key, V value) method, fix after insertion (E) method is used at code ① to repair the red black tree -- therefore, a simple repair must be carried out after each node is inserted to make the sorted binary tree meet the requirements of the red black tree.

Insert as follows:

(1) Insert the new node by sorting the binary tree and set it to red.

(2) Color swap and tree rotation.

Repair after insertion

In the insertion operation, the properties 1 and 3 of the red black tree will never change, so it is unnecessary to consider these two properties of the red black tree.

This color call and tree rotation is more complex, which will be introduced in the following sections. In the introduction, we define the newly inserted node as N node, the parent node of N node as P node, the brother node of P node as u node, and the parent node of P node as G node.

The insertion operation is analyzed in different cases

Case 1: the new node n is the root node of the tree and has no parent node

In this case, it is directly set to black to meet property 2.

Case 2: the parent node P of the new node is black

In this case, the newly inserted node is red, so it still satisfies property 4. And because the new node n has two black leaf nodes; However, since the new node n is red, the path through each of its child nodes still maintains the same number of black nodes, so it still satisfies property 5.

Scenario 3: if the parent node P and the parent node's brother node u are both red

In this case, the program should set the P node and u node to black, The parent node of node P is set to red (to maintain property 5). Now the new node n has a black parent node P. since any path from node P and node u to the root node must pass through node g, the number of black nodes on these paths has not changed (there were two black nodes, leaf and node g, and now there are two black nodes, leaf and node P).

After the above processing, the parent node of the red g node may also be red, which violates property 4. Therefore, the whole process of G node needs to be carried out recursively (just treat g as a newly inserted node).

Figure 7 shows this process:

Figure 7 Color swap after inserting nodes

Note: Although figure 11.28 shows the new node n as the left child node of parent node P, the new node n as the right child node of parent node P is exactly the same as figure 11.28.

Case 4: the parent node P is red and its brother node u is black or missing; The new node n is the right child of the parent node P, and the parent node P is the left child of its parent node G.

In this case, we perform a left rotation on the new node and its parent node, and then process the previous parent node P according to case 5 (that is, we can treat P as a newly inserted node). This causes some paths to pass through the new node n or one of the parent nodes P they did not pass before, but both nodes are red, so property 5 will not be affected.

Fig. 8 shows the processing of case 4:

Figure 8 Tree rotation after node insertion

Note: node P in Figure 11.29 is the left child of node g. if node P is the right child of its parent node g, the above processing should be reversed.

Case 5: the parent node P is red and its brother node u is black or missing; The new node n is the left child node of its parent node, and the parent node P is the left child node of its parent node G.

In this case, a right rotation of node G is required. In the tree generated by the rotation, the previous parent node P is now the parent node of new node n and node G. Since the previous node G is black, otherwise the parent node P cannot be red. We switch the colors of the previous parent node P and node g to meet property 4, and property 5 remains satisfied, because all paths through any of the three nodes previously passed through node g, and now they all pass through the previous parent node P. In each case, this is the only black node among the three nodes.

Fig. 9 shows the process of case 5:

Figure 9 Color adjustment and tree rotation after node insertion

Note: node P in Figure 11.30 is the left child of node g. if node P is the right child of its parent node g, the above processing should be reversed.

Treemap is the repair operation after inserting a node. It is provided by the fixafterinsertion (entry < K, V > x) method. The source code of this method is as follows:

Repair after deleting nodes

Similar to the repair after adding nodes, a similar repair operation is required after deleting nodes in treemap to ensure that the sorted binary tree still meets the characteristics of red black tree. You can refer to the repair after inserting nodes to analyze the repair after deletion. The repair operation of treemap after deletion is provided by the fixafterdeletion (entry < K, V > x) method. The source code of this method is as follows:

Retrieval node

When treemap retrieves value according to key, the corresponding method of treemap is as follows:

It can be seen from the bold code of the above program that the get (object key) method is actually implemented by the getentry () method. The code of this getentry () method is as follows:

The above getentry (object obj) method also makes full use of the characteristics of the sorted binary tree to search the target entry. The program still starts from the root node of the binary tree. If the searched node is larger than the current node, the program searches the "right subtree"; If the searched node is smaller than the current node, the program searches the "left subtree"; If equal, the specified node is found.

When the comparator in treemap= Null indicates that the treemap adopts custom sorting. In the case of custom sorting, the treemap adopts the getentryusingcomparator (key) method to obtain the entry according to the key. Here is the code of this method:

In fact, the implementation ideas of getentry and getentryusingcomparator are completely similar, except that the former is effective for naturally sorted treemaps, and the latter is effective for customized sorted treemaps.

Through the above source code analysis, it is not difficult to see that the implementation of the tool class treemap is actually very simple. In other words, from the internal structure, treemap is essentially a "red black tree", and each entry of treemap is a node of the red black tree.

The above is the whole content of this article. I hope it will be helpful to your study, and I hope you can support programming tips.

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>