Object equality

Writing equals and hashCode methods for data objects

Every Java object inherits a set of base methods from java.lang.Object that every client can use:

  • Creational methods
    Object()
    Default no-argument constructor
    clone()
    Returns a new instance of the class
  • Synchronizing methods
    notify()F
    Sends a signal to a waiting thread (on the current instance)
    notifyAll()F
    Sends a signal to all waiting threads (on the current instance)
    wait()F
    Forces the current thread to wait for a signal (on the current instance)
  • Equality methods
    equals(Object)
    Returns true if this instance is equal to the argument
    hashCode()
    Returns a hash code based on the instance data
  • Other methods
    toString()
    Returns a string representation of the object
    finalize()
    Performs garbage-collection duties
    getClass()F
    Returns the Class object associated with the instance

Each of these methods has a sensible default behavior that can be overridden in the subclasses (except for final methods, marked above with F). This article discusses overriding the equals() and hashCode() methods for data objects.

So, what do the equals() and hashCode() methods do?

The purpose of the equals() method is to determine whether the argument is equal to the current instance. This method is used in virtually all of the java.util collections classes, and many other low-level libraries (such as RMI (Remote Method Invocation), JDBC (Java Database Connectivity), etc.) implicitly depend on the correct behavior. The method should return true if the two objects can be considered equal and false otherwise. Of course, what data is considered equal is up to each individual class to define.

Since computing an object's equality is a time-consuming task, Java also provides a quick way of determining if an object is equal or not, using hashCode(). This returns a small number based on the object's internal datastructure; if two objects have different hash codes, then they cannot be equal to each other. (Think of it like searching for two words in a dictionary; if they both begin with "A" then they may be equal; however, if one begins with "A" and the other begins with "B" then they cannot be equal.)

The purpose of computing a hash code is that the hash should be quicker to calculate and compare than computing full object equality. Datastructures such as the HashMap implicitly use the hash code to avoid computing equality of objects where possible. One of the reasons why a HashMap looks up data faster than a List is because the list has to search the entire datastructure for a match, whereas the HashMap only searches those that have the same hash value.

Importantly, it is an error for a class to have an equals() method without overriding the default hashCode() method. In an inheritance hierarchy, only the top class needs to provide a hashCode() method. This is discussed further below.

Implementing equals()

The equals() method's signature must be:

public boolean equals(Object)

Note: Regardless of which class contains the equals() method, the signature must always declare an Object argument type. Since Java's libraries look for one with an Object argument, if the signature is not correct, then the java.lang.Object method will be called instead, leading to incorrect behavior.

The equals() method's Javadoc declares that it must be:

Reflexive
An object must always be equal to itself; i.e., a.equals(a)
Symmetric
If two objects are equal, then they should be equal in both directions; i.e., if a.equals(b), then b.equals(a)
Transitive
If an object is equal to two others, then they must both be equal; i.e., if a.equals(b) and b.equals(c), then a.equals(c)
Non-null
An object can never be equal to null; i.e., a.equals(null) is always false

Fortunately, it is fairly easy to write a method that has this behavior. The equals method should compare:

  1. If the argument is this; if so, return true (reflexivity)
  2. If the argument is null; if so, return false (non-null)
  3. If the argument is of a different type; if so, return false (symmetry)
  4. Non-static and non-transient fields are equal, or both null in the case of reference types (symmetry, transitivity)

The reason why static fields are not compared is because they are the same for all class instances and thus clearly identical in an instance-based comparison. transient fields are not compared because the transient keyword's purpose is to prevent fields from being written during serialization; and these fields should therefore not play a part in equality testing. Otherwise, if an object is serialized and then de-serialized, then it will not be equal to itself.

Let's consider a simple point in 2D space to see how a simple equals() method looks; the comparison is shown graphically in Figure 1:

public class Point {
  private static double version = 1.0;
  private transient double distance;
  private int x, y;
  public boolean equals(Object other) {
    if (other == this) return true;
    if (other == null) return false;
    if (getClass() != other.getClass()) return false;
    Point point = (Point)other;
    return (x == point.x && y == point.y);
  }
}
Figure 1. Comparing fields in the Point class

The code above shows a simple example of an equals() method. Note that only the two non-static and non-transient fields (x and y) are being compared; the others (distance and version) are not relevant to instance-based comparison.

Note: In this case, the getClass() method is being used to determine whether the class is the correct type; this is the more correct choice than instanceof, as discussed below.

To compare whether two instances are of the same type, their getClass() method is invoked. Each instance has a single Class instance associated with it; classes of the same instance share exactly the same Class instance. In the VM, there is only a single Class instance for each class name (visible within a single ClassLoader, anyway). As a result, these Class instances can be compared with the identity test == instead of having to use an equals() on the Class instances.

Comparing reference types

So what happens when a reference type is declared within an object? The answer is that their equals() methods are used to compare the references directly; the only extra work involved is determining if the references are both null. The logic is:

  1. If the reference is null in this, then it must be null in the other
  2. If the reference is non-null in this, then it must be equals() in the other

Here's an example of a Person class that performs equality checking on two reference types, name and birth:

public class Person {
  private String name;
  private Date birth;
  public boolean equals(Object other) {
    if (other == this) return true;
    if (other == null) return false;
    if (getClass() != other.getClass()) return false;
    Person person = (Person)other;
    return (
      (name == person.name || 
        (name != null && name.equals(person.name))) &&
      (birth == person.birth || 
        (birth != null && birth.equals(person.birth)))
    );
  }
}

In this case, the identity check (name == person.name) captures whether both references are null. It also removes the equality test when the two fields are identical, thus not requiring a recursive call to equals() in the other instance.

Note: It is possible to ignore the test for null in the second part if you know the value can never be null; i.e., it is assigned non-null in the constructor. However, it is better to be safe than have numerous NullPointerExceptions coming out of datastructures because your equals() method does not handle null data correctly.

A similar implementation holds for other reference types, such as arrays. However, since arrays lack an equals() implementation, you either have to write the for loop manually to test for element equality or use the Arrays class to perform the equality check (see Resources for link to the Javadoc).

Implementing hashCode()

The hash code allows collections such as HashMap to organize and access data quicker than searching through the entire datastructure. It does this by partitioning the data into different buckets, then searching through the single bucket corresponding to the hash code required. As a result, if an object's hash code differs, it won't be searched; just like you wouldn't expect the word "Banana" to appear in a dictionary's "A" section. Figure 2 shows a simplified example of how a hash table is structured, using initial letters as the hash:

Figure 2. Simplified example of how a hash table works

The hash code is just an int calculated from the instance data. Importantly, if two instances are considered equal by equals(), then they must have the same hash code. As a consequence, the hash code can only be computed on those fields compared in the equals() method. It does not have to use all of the equality fields:

public class Point {
  private int x, y;
  public boolean equals(Object other) {
    ...
    Point point = (Point)other;
    return (x == point.x && y == point.y);
  }
  public int hashCode() {
    return x;
  }
}

The code above is a correct (if not optimal) implementation of hashCode(), since it only relies on fields compared by the equals() method. In this case, all points that have the same x value have the same hash code and are deposited into the same bucket for hash comparisons. Thus, when searching for a point in a Map, only the Points with the same x value will be compared.

Of course, it is desirable for hash codes to be distributed, which means that where possible, two different objects should have different hash codes. Additionally, if two objects are "close" to each other, they should have a very different hash code.

We can improve on our hash function by using other variables in the hash computation:

public class Point {
  private int x, y;
  public boolean equals(Object other) {
    ...
  }
  public int hashCode() {
    return x + y;
  }
}

Now, our hash function only returns the same for points that lie on a diagonal line. However, this is still not very distributed, because most likely, it may occur in some applications. We can improve the function by using some of the following options:

  • Use multiplication instead of addition
  • Use bitwise xor to combine values (the ^ operator) instead of addition
  • Multiply integral values with a prime number to distribute the values

We can now add a hashCode() method to the Point class:

public class Point {
  private int x, y;
  public boolean equals(Object other) {
    ...
  }
  public int hashCode() {
    return x*31 ^ y*37;
  }
}

Of course, the hash code's computation should be quick, which may often negate multiplication's use.

Using hashCode() with reference types

If the class uses a reference type, then the hashCode() method can delegate the work to the enclosed reference type. However, care must be taken when the reference may be null, because as with equals(), it would be undesirable to generate NullPointerExceptions. In the case of the reference type being null, a fixed number can return. (The number should be non-zero and non-negative, since these values may have special meanings in some hash map implementations.)

Although the method can be implemented with multiple if blocks, it is more compact using the ternary if operator:

public class Person {
  private String name;
  private Date birth;
  public boolean hashCode() {
    return
     (name  == null ? 17 : name.hashCode()) ^ 
     (birth == null ? 31 : name.hashCode());
  }
}

Default behavior of equals() and hashCode()

The default behavior for these two methods gives answers that work only for simple cases. The equals() method returns true only when it is being compared to itself (i.e., the identity check). The hashCode() method returns an int based on a unique instance hash (such as the object's location in memory and may be calculated differently for different VM implementations).

Because the default hashCode() gives an answer based on the instance's identity, it is incorrect to implement equals() without implementing a corresponding hashCode() method. Otherwise, two objects may possibly be equal, but have different hash codes—a violation of the equality contract, which will manifest itself in a number of odd ways. It is much better, if a suitable hash code cannot be calculated, to return a constant value (such as 7) rather than use the default implementation. Of course, using a constant value will degrade the performance of a Map into that of a List, so even a simple implementation that returns the value of an integer field will prove beneficial.

Advanced strategies

The computation of both equals() and hashCode() must be as quick as possible, since they will be called repeatedly on objects. In a List with 1,000 elements, it is likely that 10,000 comparisons will be done in sorting the list. For each of these comparisons, equals() is called once. Thus, optimizing the comparison speed is a highly desirable goal. Although dynamic comparison utilities (such as the Apache EqualsBuilder) make writing equals() methods easier, because they are based on dynamic field comparisons using introspection, the execution is much slower and less beneficial than an implemented method.

Since some parts of the equals method run faster than others, it makes sense to order the statements so the quicker ones are tested before the slower ones. For example, comparing the values of a primitive is much faster than invoking the equals() method; so primitives are compared first. Similarly, if the object is of the wrong type, there is no point in comparing any of the fields first.

1 2 Page 1
Page 1 of 2