Introduction to avoiding hidden traps of equals method in Java programming
abstract
This paper describes the technology of overloading equals method, which can ensure the correctness of equals semantics even if fields are added to subclasses of existing classes.
In item 8 of effective Java, Josh Bloch describes the difficulties faced in ensuring the equal correctness semantics of derived classes when inherited classes are the basis of equivalence relations in object-oriented languages. Bloch wrote:
Unless you forget the benefits of object-oriented abstraction, you can't guarantee that the semantics of equal are still correct when you inherit a new class or add a value component to the class
Chapter 28 in programming in scala demonstrates a method that allows the semantics of equal to be guaranteed even if new classes are inherited and new value components are added. Although this technology is used in the scala class environment in this book, it can also be applied to Java defined classes. The description in this article comes from the textual description in programming in Scala, but the code was translated from Scala to Java
Common equivalent method traps
java. The lang. object class defines the equals method, and its subclasses can override it through overloading. Unfortunately, it is very difficult to write the correct equals method in object-oriented. In fact, after studying a lot of Java code, the author of 2007 paper came to the following conclusion:
Almost all the equals methods are implemented incorrectly!
This problem is because equivalence is related to many other things. For example, one of them, a wrong equivalent method of type C may mean that you can't trust the object of type C into the container. For example, you have two elements elem1 and elem2. They are both objects of type C, and they are equal, that is, elem1 Equals (elm2) returns true. However, as long as the equals method is the wrong implementation, you may see the following behaviors:
When equals is overloaded, there are four common traps that cause inconsistent equals behavior:
Wrong equals method signature defined equals with the wrong signature
The method of equals is overloaded, but hashcode is not overloaded at the same time. Changing equals without also changing hashCode.
An equals definition based on a variable field. Defining equals in terms of mutable fields.
Failing to define equals as an equivalence relation
In the remaining chapters, we will discuss these four traps in turn.
Trap 1: define wrong equals method signature
Consider adding an equivalence method to the following simple class point:
It seems obvious, but defining equals in this way is wrong.
What's wrong with this method? At first glance, it works perfectly:
However, once we put the instance of this point class into a container, the problem arises:
Why is P2 not included in coll? Even P1 is added to the set. Are P1 and P2 equivalent objects? In the following program, we can find some reasons. P2a is defined as an object pointing to P2, but the type of P2a is object rather than point:
Now we repeat the first comparison, but instead of using P2, we will get the following results:
What's wrong with it? In fact, the equals version given earlier does not override the equals method of the object class because of its different types. The following is the definition of the equals method of object
Because the equals method in the point class uses the point class instead of the object class as the parameter, it does not override the equals method in the object. It's a changed overload. In Java, overloads are resolved to static parameter types instead of runtime types. Therefore, when the static parameter type is point, the equals method of point is called. However, when the static parameter type is object, equals of the object class is called. Because this method is not overridden, it is still implemented as a comparison object identifier. This is why although P1 and P2a have the same X, Y values, "P1 Equals (P2a) "still returns false. This is also the reason why the contains method of hasset returns false, because this method operates on a generic type and calls the generalized equals method on object rather than the changed overloaded equals method on point class
A better but imperfect equals method is defined as follows:
Now that equals has the correct type, it uses an object type parameter and a result that returns a Boolean. The implementation of this method uses instanceof operation and modeling. It first checks whether the object is a point class. If so, it compares the coordinates of the two points and returns the result. Otherwise, it returns false.
Trap 2: the method of equals is overloaded but hashcode is not overloaded at the same time
If you use the point class defined above to repeatedly compare P1 and P2a, you will get the result of true you expect. But if you put this class object into HashSet If you test in the contains () method, you may still get false results:
In fact, this result is not 100% false, and you may have the experience of returning to nature. If the result you get is true, then you try other coordinate values, and eventually you will get a result that is not included in the set. The reason for this result is that point overloads equals but not hashcode.
Note that the container in the above example is a HashSet, which means that the elements in the container are put into "hash buckets" according to their hash code. The contains method first looks up in the hash bucket according to the hash code, and then compares all elements in the bucket with the given parameters. Now, although the version of the last point class redefines the equals method, it does not redefine the hashcode at the same time. Therefore, hashcode is still the version of the object class, that is, the transformation of an address of the assigned object. Therefore, the hash codes of P1 and P2 are naturally different, even if the coordinates of the two points are exactly the same. Different hash codes result in a high probability that they will be put into different hash buckets in the collection. The contains method will find the matching element in the hash bucket corresponding to the hash code of P2. But in most cases, P1 must be in another bucket, so P2 can never find P1 to match. Of course, P2 and P2 may occasionally be put into a bucket. In this case, the result of contains is true.
The latest problem with the implementation of the point class is that its implementation violates the semantics of the hashcode defined as the object class.
If two objects are equal according to the equals (object) method, calling the hashcode method on the two objects should produce the same value
In fact, it is well known in Java that hashcode and equals need to be redefined together. In addition, hashcode can only rely on the fields that equals depends on to generate values. For the point class, the following hashcode definition is a very appropriate definition.
This is just one possible implementation of hashcode. The result of adding the constant 41 to the X field is multiplied by and 41, and the result is added to the value of the Y field. In this way, a reasonable distribution of hash codes can be obtained with low-cost running time and low-cost code size.
Adding hashcode method overload to correct the problem of defining equivalence similar to point class. However, there are still other problems about class equivalence to be found.
Trap 3: equals definitions based on variable fields
Let's make a very small change in the point class
The only difference is that the X and Y fields are no longer final, and the two set methods are added to the class and allow customers to change the values of X and y. The definitions of equals and hashcode methods are now based on these two fields that will change, so when the value of their field changes, the result will change. So once you put this point object into the collection, you will see a very magical effect.
Now, if you change a field in P, will this collection still contain points? We'll wait and see.
It looks very strange. Where did P go? If you check whether P is included through the iterator of the collection, you will get more strange results.
As a result, the set does not contain P, but p is in the elements of the set! What the hell happened! Of course, all this happens after the modification of the X field. P the final hashcode is in the hash bucket of the collection coll error. That is, the original hash bucket no longer has a hash code corresponding to its new value. In other words, P is already outside the field of view of the set coll, although it still belongs to the elements of coll.
The lesson from this example is that when equals and hashcode depend on changing states, it will cause problems for users. If such objects are put into the collection, the user must be careful not to modify the state on which these objects depend. This is a small trap. If you need to compare according to the current state of the object, you should not redefine equals, but give other method names instead of equals. For the final definition of our point class, we'd better omit the overload of hashcode and name the method name of comparison as equalscontents or other names different from equals. Then point will inherit the original default implementation of equals and hashcode, so when we modify the X domain, P will still stay where it should be in the container.
Trap 4: wrong definition of equals that does not satisfy equivalence relation
The specification of equals in object describes the equivalence relationship that the equals method must implement on non null objects:
Reflexive principle: the expression x.equals (x) always returns true for any non null value X.
Equivalence: for any non null values X and y, x.equals (y) returns true if and only if y.equals (x) returns true.
Transitivity: for any non null values x, y, and Z, if x.equals (y) returns true and y.equals (z) returns true, then x.equals (z) should also return true.
Consistency: for non empty x, y, multiple calls to X. equals (y) should consistently return true or false. The information provided to the equals method for comparison should not contain changed information.
X. equals (null) should always return false for any non null value X
The equals definition of the point class has been developed to meet the equals specification. However, when inheritance is considered, things begin to get very complicated. For example, there is a subclass coloredpoint of point, which adds a color field of type color more than point. Suppose color is defined as an enumeration type:
Coloredpoint overloads the equals method. Considering the newly added color field, the code is as follows:
This is the code that many programmers can write. Note that in this example, the class coloredpointed does not need to overload hashcode, because the definition of equals on the new coloredpoint class strictly overloads the definition of equals on point. The hashcode specification is still valid. If two colored points are equal, their coordinates must be equal, so its hashcode also ensures the same value.
There is no problem comparing the objects of the coloredpoint class itself, but there is a problem if you use a mixture of coloredpoint and point for comparison.
The comparison of "P is equivalent to CP" calls the equals method defined on the point class. This method only considers the coordinates of two points. So the comparison returns true. On the other hand, the comparison of "CP is equivalent to P" calls the equals method defined on the coloredpoint class, but the returned result is false. This is because P is not coloredpoint, so the definition of equals violates symmetry.
Violation of symmetry will lead to unexpected consequences for the set, such as:
Therefore, although P and CP are equivalent, in the contains test, one returns success and the other returns failure.
How can you modify the definition of equals to make this method meet symmetry? Essentially, there are two ways you can make this relationship more general or more rigorous. More generally, it means that this pair of objects, a and B, are used for comparison. Both a and B return true. Here is the code:
The new definition of equals in coloredpoint checks more than the old definition: if the object is a point object instead of coloredpoint, the method becomes an equals method call of the point class. The desired effect is the symmetry of equals. The result of "cp.equals (P)" or "p.equals (CP)" is true. However, in this method, the specification of equals is still broken. The problem now is that this new equivalence does not satisfy transitivity. Consider the following code example, which defines a point and two different color points on the point:
Redp is equivalent to P, and P is equivalent to bluep
However, the result of comparing redp and bluep is false:
Therefore, the transitivity of equals is violated.
Making the relationship between equals more general seems to lead us to a dead end. We should adopt a more rigorous approach. A more rigorous equals approach is to think that objects of different classes are different. This can be achieved by modifying the equals method of point class and coloredpoint class. You can add additional comparison to check whether this point class and that point class in running state are the same class, as shown in the following code:
You can now use the equals implementation of the coloredpoint class back to the equals implementation that doesn't satisfy the symmetry.
Here, instances of the point class are considered equal only when they are of the same class and have the same coordinates as another object, which means that GetClass () returns the same value. This newly defined equivalence relationship satisfies symmetry and transitivity, because the result is always false when the comparison object is a different class. Therefore, the colored point will never be equal to the point. Usually this seems very reasonable, but there is another argument here - this comparison is too strict.
Consider the following slightly circuitous way to define our coordinate points (1,2)
Is panon equal to p? The answer is false because P and panon's Java Lang. class objects are different. P is point, and panon is an anonymous derived class of point. However, it is very clear that panon is indeed another point on coordinates 1 and 2. So there is no reason to think they are different.
Canequal method
At this point, it seems that we have encountered obstacles. Is there a normal way to not only define equivalence at different class inheritance levels, but also ensure the standardization of equivalence? In fact, there is such a method, but it requires another method to be defined in addition to redefining equals and hashcode. The basic idea is that while overloading equals (and hashcode), it should also explicitly declare that the object of this class will never be equivalent to other superclass objects that implement different equivalence methods. To achieve this goal, we add a new method canequal method for each class overloaded with equals. The method signature of this method is:
If the other object is an instance of canequals (redefined) that class, this method should return true, otherwise it returns false. This method is called by the equals method and ensures that the two objects can be compared with each other. The new and final implementation of the following point class:
The equals method of this version of the point class contains an additional requirement to determine whether another object is a comparable object through the canequals method. Canequal in point declares that all point class instances can be compared.
The following is the corresponding implementation of coloredpoint
The new version of point class and coloredpoint class definitions shown on ensure equivalent specifications. Equivalence is symmetric and transitive. Comparing a point with a coloredpoint class always returns false. Because of point P and coloring point CP, "p.equals (CP) returns false. Moreover, cp.canequal (P) always returns false. On the contrary, cp.equals (P) also returns false. Since P is not a coloredpoint, the first instanceof check in the equals method body of coloredpoint fails.
On the other hand, instances of different point subclasses can be compared. Similarly, classes without redefined equivalence methods can be compared. For the definition of this new class, the comparison between P and panon will always return true. Here are some examples:
These examples show that if the parent class defines and calls canequals in the implementation of equals, the subclass implemented by the developer can determine whether the subclass can be compared with the instance of its parent class. For example, coloredpoint overloads canequal with "a shading point can never be equal to an ordinary point without color", so they can't compare. However, because the anonymous subclass referenced by panon does not overload canequal, its instance can be compared with that of point.
A potential argument for the canequal method is whether it violates the Liskov Substitution criterion (LSP). For example, the comparison technology implemented by comparing running classes (the previous version of canequal, the version using. GetClass ()) will lead to the failure to define a subclass. The instance of this subclass can be compared with its parent class, so it violates LSP. This is because the LSP principle is that wherever you can use a parent class, you can use a child class to replace it. In the previous example, although the X and Y coordinates of CP match those points in the set, "coll "Contains (CP)" still returns false, which seems to violate the LSP rule, because you can't use a coloredpoint where you can use point here. However, we believe that this interpretation is wrong, because the LSP principle does not require the behavior of children and parents to be consistent, but only that their behavior can meet the specification of parents in one way.
The problem of writing the equals method by comparing running classes is not a violation of the LSP criterion, but it does not indicate a method to create an instance of a derived class that can be compared with an instance of a parent class. For example, we use this running state comparison technique in the previous "coll Contains (panon) "will return false, and this is not what we want. On the contrary, we want "Col. contains (CP)" to return false, because through the overloaded equals in coloredpoint, I can basically say that a colored point on coordinates 1 and 2 is not the same as an ordinary point on coordinates 1 and 2. However, in the last example, we can pass two different subclass instances of point to the contains method in the collection, and we can get two different answers, and both answers are correct.
summary
The above is all about avoiding the hidden trap of equals method in Java programming. I hope it will be helpful to you. Interested friends can continue to refer to this website:
Introduction to hashcode method in Java
How to create and run a java thread
On the usage of = = and equals methods in Java