Using threads with collections, Part 2

New design issues arise when making collections larger

The collection model provides an abstraction for grouping related objects, which proves useful when writing programs. As the size of a collection grows, however, manipulating it in its entirety can become unwieldy. At this point it becomes desirable to support subsets. But the ability to subset a collection imposes new restrictions on the design of the collection classes themselves.

This column is a continuation of the discussion I started in the March Java In Depth column. In that first part I discussed building a class that implements the collection abstraction called SynchroList. In this column I'll update that class so that I can subset the collection. Then we'll go through the issues that arise when multiple threads use a subsetted collection. Finally, I'll show you an applet I wrote that helps demonstrate what is going on inside the collection when multiple threads are accessing it.

Collections that support subsets

The class that I created in part 1 of this series is called a SynchroList, and it is an ordered collection. Being ordered means that the objects within the collection have some notion of order associated with them. For example, words may be sorted into alphabetical order, numbers into numerical order, and so on. In the case of my class, the ordering is determined by a helper class called a Comparator. This ordering is similar to the way in which collections are set up in the 1.2 release of Java.

Once a collection is ordered, a user of the collection can specify a subset of elements in terms of example objects. While this may be confusing at first, it turns out to be a pretty simple concept.

Consider a collection of words (which for the purpose of this example will consist of an arbitrary number of characters and letters, tokens, not necessarily English words) that are ordered alphabetically using the convention that letters that appear earlier in the alphabet are considered to have a value less than letters that appear later in the alphabet. Within the collection, you may wish to select all tokens that start with the letter a. You make your selections from the list by comparing all of the tokens in the list with the tokens "a" and "b." If the list element is greater than or equal to the first token, that element is included in the subset as long as it is also less than the second token. The tokens selected become the subset. That subset can be expressed as ("a," "b"]. The difference between the ( symbol and the ] symbol is that a parenthesis implies greater than or equal to, while the bracket implies less than. The set is said to be inclusive of "a" and exclusive of "b."

To implement subsets in the SynchroList class I changed the inner class LinkEnumerator. Why? The original LinkEnumerator (for a review, see the original source in part 1 of this series) had only the null constructor. This set an internal index variable named current to point to the head of the sorted list of Link objects. In the new version of this class we add two constructors for returning subsets of the list. Further, we've added a pointer to the last object to be returned, named last, that tells the class when it has reached the end of the list. The original constructor was modified as follows:

   
LinkEnumerator( ) { current = head; last = null }

Next we've added two additional constructors to return subsets of the list. The first returns the subset from and object referenced by the variable from to the end of the collection. This constructor is shown next:

LinkEnumerator(Object from) {
    last = null;
    for (current = head; current != null; current = current.next()) {
        if (cmp.lessThan(from, current.getData()))
           break;
    }
}

The code above simply walks through the list, using the Comparator object to compare objects, until the testing method lessThan fails or reaches the end of the list. The second constructor is shown below:

LinkEnumerator(Object from, Object to) {
    last = null;
    if (cmp.greaterThan(from, to)) {
        Object tmp = from;
        from = to;
        to = tmp;
    }
    for (current = head; current != null; current = current.next()) {
        if (cmp.lessThan(from, current.getData()))
           break;
    }
    for (last = current; last != null; last = last.next()) {
        if (! cmp.greaterThan(to, last.getData()))
           break;
    }
}

As in the previous code sample, the constructor is pretty straightforward, however, it does add two important elements. First, it checks to see that object from and object to are ordered from lower-valued object to higher-valued object. Next, it finds the first element exactly the same way the previous constructor did. Finally, it locates the first object that is larger than the to object and sets the variable last to reference it.

The method nextElement is modified to check to see if the enumeration has reached the object referenced by last and, if it has, it stops returning elements. I should note here that I made the somewhat arbitrary decision at the beginning of the design process that the range returned would be inclusive of the lower element and exclusive of the upper element. You may want to check to see how you would go about changing the code to be inclusive of both elements, upper and lower, or exclusive of those elements. If you choose to investigate different versions, be aware that there are at least two valid solutions.

After modifying the LinkEnumerator class, additional methods are added to the SynchroList class to return a subsetted enumeration. These methods are part of the SynchroList class and are shown next.

public synchronized Enumeration elements() {
    return new LinkEnumerator();
}
public synchronized Enumeration elements(Object fr) {
    return new LinkEnumerator(fr);
}
public synchronized Enumeration elements(Object fr, Object to) {
    return new LinkEnumerator(fr, to);
}

The important point to note in the methods above is that these methods are synchronized. This synchronization is unavoidably necessary as the list must be stable when you are constructing an enumeration. However, as I'll show in the next session, once constructed you can still provide unsynchronized access to the list. I have included a working example of the SynchroList with subset enumerations in the Resources section.

Adding threads to the picture

Let us take a minute to review where we started and where we have arrived. I started with the assertion that a collection that was thread aware would be a useful tool to have in the toolbox, and described where the Java 1.0 collections Vector and Hashtable fell short of this capability. So far what we've accomplished is to create an ordered collection that can be subsetted, but we haven't yet addressed the aspects relating to threads.

I prefer to differentiate between threadsafe collections and thread hot collections in the following way:

  • A threadsafe collection is one that works correctly when accessed by multiple threads

  • A thread hot collection is one that works correctly and efficiently when accessed by multiple threads

To build a thread hot collection, it is necessary to be able to update the collection with one thread while it is being enumerated by another thread. Further, updates to one region shouldn't be blocked by enumerating another region if the regions don't overlap.

To achieve this objective, the SynchroList class has to keep state concerning the LinkEnumeration classes that have been created. This is where this class begins to differ most significantly from collection classes that you may have already used.

An enumeration of the collection held by a SynchroList class is a series of Link objects whose next and prev pointers identify the next and previous links in the chain. To guarantee consistency of this chain of links, it is required that the SynchroList class not change the previous or next pointers while an enumeration is outstanding. This piece of state, whether or not a Link object is participating in an enumeration, can be maintained in several ways. I chose to implement it with a bit set.

The reason I chose a bit set is because the information required was boolean, that is one bit, and in Java the java.util.BitSet class can be used relatively efficiently to create a dynamically sized array of bits. What I didn't realize was that the BitSet class was missing a couple of key methods that I would need to make my design work. Further, like many classes in the core Java packages, the java.util.BitSet class was declared final so I could not easily extend it to add the methods I needed. The saving grace was that the source to the class is included with the JDK so I could create a modified version of the class that met my needs. I proceeded to create a modified class, called a ChuckBitSet, and added two methods to the standard class:

  • isZero -- This method returns true if the set currently contains no "one" bits.

  • lastBit -- This method returns the index of the last (bit with the highest index) one-bit set (the extent of the bit set).

Perhaps, with luck, these might make it into the Sun version of BitSet some day.

I chose to use the term in play to represent Link objects that were currently part of an outstanding enumeration and the term player to represent an instance of the LinkEnumerator class that had been created but still had elements that it had not yet returned to its client.

Changes made in the Link object to support this design are straightforward I've added a new private variable, inplay, which is a ChuckBitSet and the following methods to manipulate this set:

   
/** return true if this link is on one or more Enumerations */
boolean inPlay() { return ! inplay.isZero(); }
/** Return true if this link is on the passed Enumeration */
boolean inPlay(int who) { return inplay.get(who); }
/** Mark this link as being in an enumeration */
void setPlayer(int who) { inplay.set(who); }
/** Remove this link from being on a particular enumeration */
synchronized void release(int who) {
    inplay.clear(who);
    notifyAll();
}
/** return the identity of the last active enumeration */
int lastPlayer() { return inplay.lastBit(); }

The first method, inPlay, has two versions. In its first version it returns true if any enumeration has this Link object on its list; the second version returns true if a specific enumeration, identified by the parameter who, has this link on its list. The third method, setPlayer, adds the player identifier in who to the list of enumerations that have this link on their list, and the next method release resets the bit associated with a particular enumeration. The final method lastPlayer returns the largest player ID that the link is associated with. All of these methods are very straightforward.

With the addition of these five methods and the new bit set, the Link object is able to maintain information about the lists in which it has been selected. These new methods are used by the LinkEnumerator inner class when a client invokes one of the elements methods within LinkEnumerator.

The LinkEnumerator class is modifed to update these bits in the Link objects by adding a private helper method named markLinks shown below. The private field id stores an integer value that represents the current enumeration, the method newPlayerID is defined in SynchroList, and I'll get to that in a minute. This method works by simply invoking the setPlayer method on every link selected for the enumeration.

     
private int id;
private void markLinks(Link from, Link to) {
    Link tmp;
    if (from == null)
        from = head;
    id = newPlayerID();
    for (tmp = from; tmp != to; tmp = tmp.next()) {
        tmp.setPlayer(id);
    }
}

The next change is that when an object is returned from the method nextElement in LinkEnumerator, the link used is no longer in play (currently being used) by this enumeration. Thus, at the end of the nextElement method I've added a call to the new release method in the Link class. In this way, as the enumeration is used up (elements are removed), the number of links currently in use also is reduced. When the last element in an enumeration is returned, the nextElement method resets the bit associated with the current enumeration in the players bit set. This allows that bit to be reused for a future enumeration.

The final change to LinkEnumerator is to add a finalizer method. Shown below, this method is called just before the object is garbage collected. If the enumeration has not been "used up" -- that is to say, the object that instantiated it did not call the nextElement method until it returned null -- the finalize method releases the remaining links associated with this enumeration and resets the player ID bit in the containing SynchroList object.

public void finalize() {
    boolean didRelease = false;
    if (current == null)
       return;
    for (Link l = current; l != null; l = l.next()) {
        if (l.inPlay(id)) {
            l.release(id);
            didRelease = true;
        }
    }
    if (didRelease)
        players.clear(id);
}

At this point all of the groundwork has been laid, and all that remains is to put together the new pieces that allow us to maintain thread specific state in the SynchroList class.

1 2 Page 1
Page 1 of 2
How to choose a low-code development platform