Every Java object inherits a set of base methods from java.lang.Object
that every client can use:
|
|
Each of these methods has a sensible default behavior that can be overridden in the subclasses (except for final
methods, marked above with F
). This article discusses overriding the equals()
and hashCode()
methods for data objects.
So, what do the equals() and hashCode() methods do?
The purpose of the equals()
method is to determine whether the argument is equal to the current instance. This method is used in virtually all of the java.util
collections classes, and many other low-level libraries (such as RMI (Remote Method Invocation), JDBC (Java Database Connectivity), etc.) implicitly depend on the correct behavior. The method should return true
if the two objects can be considered equal and false
otherwise. Of course, what data is considered equal is up to each individual class to define.
Since computing an object's equality is a time-consuming task, Java also provides a quick way of determining if an object is equal or not, using hashCode()
. This returns a small number based on the object's internal datastructure; if two objects have different hash codes, then they cannot be equal to each other. (Think of it like searching for two words in a dictionary; if they both begin with "A" then they may be equal; however, if one begins with "A" and the other begins with "B" then they cannot be equal.)
The purpose of computing a hash code is that the hash should be quicker to calculate and compare than computing full object equality. Datastructures such as the HashMap
implicitly use the hash code to avoid computing equality of objects where possible. One of the reasons why a HashMap
looks up data faster than a List
is because the list has to search the entire datastructure for a match, whereas the HashMap
only searches those that have the same hash value.
Importantly, it is an error for a class to have an equals()
method without overriding the default hashCode()
method. In an inheritance hierarchy, only the top class needs to provide a hashCode()
method. This is discussed further below.
Implementing equals()
The equals()
method's signature must be:
public boolean equals(Object)
Note: Regardless of which class contains the equals()
method, the signature must always declare an Object
argument type. Since Java's libraries look for one with an Object
argument, if the signature is not correct, then the java.lang.Object
method will be called instead, leading to incorrect behavior.
The equals()
method's Javadoc declares that it must be:
- Reflexive
- An object must always be equal to itself; i.e.,
a.equals(a)
- Symmetric
- If two objects are equal, then they should be equal in both directions; i.e., if
a.equals(b)
, thenb.equals(a)
- Transitive
- If an object is equal to two others, then they must both be equal; i.e., if
a.equals(b)
andb.equals(c)
, thena.equals(c)
- Non-null
- An object can never be equal to
null
; i.e.,a.equals(null)
is always false
Fortunately, it is fairly easy to write a method that has this behavior. The equals method should compare:
- If the argument is
this
; if so, returntrue
(reflexivity) - If the argument is
null
; if so, returnfalse
(non-null) - If the argument is of a different type; if so, return
false
(symmetry) - Non-
static
and non-transient
fields are equal, or bothnull
in the case of reference types (symmetry, transitivity)
The reason why static
fields are not compared is because they are the same for all class instances and thus clearly identical in an instance-based comparison. transient
fields are not compared because the transient
keyword's purpose is to prevent fields from being written during serialization; and these fields should therefore not play a part in equality testing. Otherwise, if an object is serialized and then de-serialized, then it will not be equal to itself.
Let's consider a simple point in 2D space to see how a simple equals()
method looks; the comparison is shown graphically in Figure 1:
public class Point { private static double version = 1.0; private transient double distance; private int x, y; public boolean equals(Object other) { if (other == this) return true; if (other == null) return false; if (getClass() != other.getClass()) return false; Point point = (Point)other; return (x == point.x && y == point.y); } }
The code above shows a simple example of an equals()
method. Note that only the two non-static
and non-transient
fields (x
and y
) are being compared; the others (distance
and version
) are not relevant to instance-based comparison.
Note: In this case, the getClass()
method is being used to determine whether the class is the correct type; this is the more correct choice than instanceof
, as discussed below.
To compare whether two instances are of the same type, their getClass()
method is invoked. Each instance has a single Class
instance associated with it; classes of the same instance share exactly the same Class
instance. In the VM, there is only a single Class
instance for each class name (visible within a single ClassLoader
, anyway). As a result, these Class
instances can be compared with the identity test ==
instead of having to use an equals()
on the Class
instances.
Comparing reference types
So what happens when a reference type is declared within an object? The answer is that their equals()
methods are used to compare the references directly; the only extra work involved is determining if the references are both null
. The logic is:
- If the reference is
null
inthis
, then it must benull
in the other - If the reference is non-
null
inthis
, then it must beequals()
in the other
Here's an example of a Person
class that performs equality checking on two reference types, name and birth:
public class Person { private String name; private Date birth; public boolean equals(Object other) { if (other == this) return true; if (other == null) return false; if (getClass() != other.getClass()) return false; Person person = (Person)other; return ( (name == person.name || (name != null && name.equals(person.name))) && (birth == person.birth || (birth != null && birth.equals(person.birth))) ); } }
In this case, the identity check (name == person.name)
captures whether both references are null
. It also removes the equality test when the two fields are identical, thus not requiring a recursive call to equals()
in the other instance.
Note: It is possible to ignore the test for null
in the second part if you know the value can never be null
; i.e., it is assigned non-null
in the constructor. However, it is better to be safe than have numerous NullPointerException
s coming out of datastructures because your equals()
method does not handle null
data correctly.
A similar implementation holds for other reference types, such as arrays. However, since arrays lack an equals()
implementation, you either have to write the for
loop manually to test for element equality or use the Arrays
class to perform the equality check (see Resources for link to the Javadoc).
Implementing hashCode()
The hash code allows collections such as HashMap
to organize and access data quicker than searching through the entire datastructure. It does this by partitioning the data into different buckets, then searching through the single bucket corresponding to the hash code required. As a result, if an object's hash code differs, it won't be searched; just like you wouldn't expect the word "Banana" to appear in a dictionary's "A" section. Figure 2 shows a simplified example of how a hash table is structured, using initial letters as the hash:
The hash code is just an int
calculated from the instance data. Importantly, if two instances are considered equal by equals()
, then they must have the same hash code. As a consequence, the hash code can only be computed on those fields compared in the equals()
method. It does not have to use all of the equality fields:
public class Point { private int x, y; public boolean equals(Object other) { ... Point point = (Point)other; return (x == point.x && y == point.y); } public int hashCode() { return x; } }
The code above is a correct (if not optimal) implementation of hashCode()
, since it only relies on fields compared by the equals()
method. In this case, all points that have the same x
value have the same hash code and are deposited into the same bucket for hash comparisons. Thus, when searching for a point in a Map
, only the Point
s with the same x
value will be compared.
Of course, it is desirable for hash codes to be distributed, which means that where possible, two different objects should have different hash codes. Additionally, if two objects are "close" to each other, they should have a very different hash code.
We can improve on our hash function by using other variables in the hash computation:
public class Point { private int x, y; public boolean equals(Object other) { ... } public int hashCode() { return x + y; } }
Now, our hash function only returns the same for points that lie on a diagonal line. However, this is still not very distributed, because most likely, it may occur in some applications. We can improve the function by using some of the following options:
- Use multiplication instead of addition
- Use bitwise xor to combine values (the
^
operator) instead of addition - Multiply integral values with a prime number to distribute the values
We can now add a hashCode()
method to the Point
class:
public class Point { private int x, y; public boolean equals(Object other) { ... } public int hashCode() { return x*31 ^ y*37; } }
Of course, the hash code's computation should be quick, which may often negate multiplication's use.
Using hashCode() with reference types
If the class uses a reference type, then the hashCode()
method can delegate the work to the enclosed reference type. However, care must be taken when the reference may be null
, because as with equals()
, it would be undesirable to generate NullPointerException
s. In the case of the reference type being null
, a fixed number can return. (The number should be non-zero and non-negative, since these values may have special meanings in some hash map implementations.)
Although the method can be implemented with multiple if
blocks, it is more compact using the ternary if operator:
public class Person { private String name; private Date birth; public boolean hashCode() { return (name == null ? 17 : name.hashCode()) ^ (birth == null ? 31 : name.hashCode()); } }
Default behavior of equals() and hashCode()
The default behavior for these two methods gives answers that work only for simple cases. The equals()
method returns true
only when it is being compared to itself (i.e., the identity check). The hashCode()
method returns an int
based on a unique instance hash (such as the object's location in memory and may be calculated differently for different VM implementations).
Because the default hashCode()
gives an answer based on the instance's identity, it is incorrect to implement equals()
without implementing a corresponding hashCode()
method. Otherwise, two objects may possibly be equal, but have different hash codes—a violation of the equality contract, which will manifest itself in a number of odd ways. It is much better, if a suitable hash code cannot be calculated, to return a constant value (such as 7) rather than use the default implementation. Of course, using a constant value will degrade the performance of a Map
into that of a List
, so even a simple implementation that returns the value of an integer field will prove beneficial.
Advanced strategies
The computation of both equals()
and hashCode()
must be as quick as possible, since they will be called repeatedly on objects. In a List
with 1,000 elements, it is likely that 10,000 comparisons will be done in sorting the list. For each of these comparisons, equals()
is called once. Thus, optimizing the comparison speed is a highly desirable goal. Although dynamic comparison utilities (such as the Apache EqualsBuilder
) make writing equals()
methods easier, because they are based on dynamic field comparisons using introspection, the execution is much slower and less beneficial than an implemented method.
Since some parts of the equals method run faster than others, it makes sense to order the statements so the quicker ones are tested before the slower ones. For example, comparing the values of a primitive is much faster than invoking the equals()
method; so primitives are compared first. Similarly, if the object is of the wrong type, there is no point in comparing any of the fields first.