Using threads with collections, Part 1

Threads add a new degree of complexity to otherwise straightforward design issues

Threads are an integral part of the Java language. Using threads, many algorithms, such as queue management systems, are easier to access than they are using polling and looping techniques. Recently, while writing a Java class, I found I needed to use threads while enumerating lists, and this uncovered some interesting issues associated with thread-aware collections.

This Java In Depth column describes the issues that I uncovered in my attempt to develop a thread-safe collection. A collection is called "thread-safe" when it can be used safely by multiple clients (threads) at the same time. "So what's the problem?" you ask. The problem is that, in typical usage, a program both changes a collection (called mutating), and reads it (called enumerating).

Some people simply do not register the statement, "The Java platform is multithreaded." Sure, they hear it, and they nod their heads. But they don't grasp that, unlike C or C++, in which threading was bolted on from the side through the OS, threads in Java are basic language constructs. This misunderstanding, or poor grasp, of the inherently threaded nature of Java, inevitably leads to two common flaws in programmers' Java code: Either they fail to declare a method as synchronized that should be (because the object is in an inconsistent state during the method's execution) or they declare a method as synchronized in order to protect it, which causes the rest of the system to operate inefficiently.

I came across this problem when I wanted a collection that multiple threads could use without needlessly blocking the execution of the other threads. None of the collection classes in the 1.1 version of the JDK are thread-safe. Specifically, none of the collection classes will allow you to enumerate with one thread while mutating with another.

Non-thread-safe collections

My basic problem was as follows: Assuming you have an ordered collection of objects, design a Java class such that a thread can enumerate all or part of the collection without worrying about the enumeration becoming invalid due to other threads that are changing the collection. As an example of the problem, consider Java's Vector class. This class is not thread-safe and causes many problems for new Java programmers when they combine it with a multithreaded program.

The Vector class provides a very useful facility for Java programmers: namely, a dynamically-sized array of objects. In practice, you might use this facility to store results where the final number of objects you will be dealing with isn't known until you are done with them all. I constructed the following example to demonstrate this concept.

01 import java.util.Vector;
02 import java.util.Enumeration;
03 public class Demo {
04     public static void main(String args[]) {
05         Vector digits = new Vector();
06         int     result = 0;
07 
08         if (args.length == 0) {
09             System.out.println("Usage is java demo 12345");
10             System.exit(1);
11         }
12 
13         for (int i = 0; i < args[0].length(); i++) {
14             char c = args[0].charAt(i);
15             if ((c >= '0') && (c <= '9'))
16                 digits.addElement(new Integer(c - '0'));
17             else
18                 break;
19         }
20         System.out.println("There are "+digits.size()+" digits.");
21         for (Enumeration e = digits.elements(); e.hasMoreElements();) {
22             result = result * 10 + ((Integer) e.nextElement()).intValue();
23         }
24         System.out.println(args[0]+" = "+result);
25         System.exit(0);
26     }
27 }

The simple class above uses a Vector object to collect digit characters from a string. The collection is then enumerated to compute the integer value of the string. There is nothing wrong with this class except that it is not thread-safe. If another thread happened to hold a reference to the digits vector, and that thread inserted a new character into the vector, the results of the loop in lines 21 through 23 above would be unpredictable. If the insertion occurred before the enumeration object had passed the insertion point, the thread computing result would process the new character. If the insertion happened after the enumeration had passed the insertion point, the loop would not process the character. The worst-case scenario is that the loop might throw a NoSuchElementException if the internal list was compromised.

This example is just that -- a contrived example. It demonstrates the problem, but what is the chance of another thread running during a short five- or six-digit enumeration? In this example, the risk is low. The amount of time that passes when one thread starts an operation at risk, which in this example is the enumeration, and then finishes the task is called the thread's window of vulnerability, or window. This particular window is known as a race condition because one thread is "racing" to finish its task before another thread uses the critical resource (the list of digits). However, when you start using collections to represent a group of several thousand elements, such as with a database, the window of vulnerability increases because the thread enumerating will spend much more time in its enumeration loop, and that makes the chance of another thread running much higher. You certainly don't want some other thread changing the list beneath you! What you want is an assurance that the Enumeration object you are holding is valid.

One way to look at this problem is to note that the Enumeration object is separate from the Vector object. Because they are separate, they are unable to retain control over each other once they are created. This loose binding suggested to me that perhaps a useful path to explore was an enumeration that was more tightly bound to the collection that produced it.

Creating collections

To create my thread-safe collection, I first needed a collection. In my case, a sorted collection was needed, but I didn't bother going the full binary tree route. Instead, I created a collection that I called a SynchroList. This month, I'll look at the core elements of the SynchroList collection and describe how to use it. Next month, in Part 2, I'll take the collection from a simple, easy-to-understand Java class to a complex multithreaded Java class. My goal is to keep the design and implementation of a collection distinct and understandable relative to the techniques used to make it thread-aware.

I named my class SynchroList. The name "SynchroList," of course, comes from the concatenation of "synchronization" and "list." The collection is simply a doubly-linked list as you might find in any college textbook on programming, though through the use of an inner class named Link, a certain elegance can be achieved. The inner class Link is defined as follows:

    class Link {
    private Object data;
    private Link nxt, prv;
    Link(Object o, Link p, Link n) {
        nxt = n; prv = p; data = o;
        if (n != null)
            n.prv = this;
        if (p != null)
            p.nxt = this;
    }
    Object getData() { return data; }
    Link next() { return nxt; }
    Link next(Link newNext) { Link r = nxt; nxt = newNext; return r;}
    Link prev() { return prv; }
    Link prev(Link newPrev) { Link r = prv; prv = newPrev; return r;}
    public String toString() { return "Link("+data+")"; }
}

As you can see in the code above, a Link object encapsulates the linking behavior that the list will use to organize its objects. To implement the doubly-linked list behavior, the object contains references to its data object, a reference to the next link in the chain, and a reference to the previous link in the chain. Further, the methods next and prev are overloaded to provide a means of updating the object's pointer. This is necessary as the parent class will need to insert and delete links into the list. The link constructor is designed to create and insert a link at the same time. This saves a method call in the implementation of the list.

Another inner class is used in the list -- in this case, an enumerator class named ListEnumerator. This class implements the java.util.Enumeration interface: the standard mechanism that Java uses to iterate over a collection of objects. By having our enumerator implement this interface, our collection will be compatible with any other Java classes that use this interface to enumerate the contents of a collection. The implementation of this class is shown in the code below.

        class LinkEnumerator implements Enumeration {
        private Link current, previous;
        LinkEnumerator( ) {
            current = head;
        }
        public boolean hasMoreElements() { return (current != null); }
        public Object nextElement() {
            Object result = null;
            Link tmp;
            if (current != null) {
                result = current.getData();
                current = current.next();
            }
            return result;
        }
    }

In its present incarnation, the LinkEnumerator class is pretty straightforward; it will become more complicated as we modify it. In this incarnation, it simply walks through the list for the calling object until it comes to the last link in the internal linked list. The two methods required to implement the java.util.Enumeration interface are hasMoreElements and nextElement.

Of course, one of the reasons we aren't using the java.util.Vector class is because I needed to sort the values in the collection. We had a choice: to build this collection to be specific to a particular type of object, thus using that intimate knowledge of the object type in order to sort it, or to create a more generic solution based on interfaces. I chose the latter method and defined an interface named Comparator to encapsulate the methods necessary to sort objects. That interface is shown below.

  public interface Comparator {
  public boolean lessThan(Object a, Object b);
  public boolean greaterThan(Object a, Object b);
  public boolean equalTo(Object a, Object b);
  void typeCheck(Object a);
}

As you can see in the above code, the Comparator interface is pretty simple. The interface requires one method for each of the three basic comparison operations. Using this interface, the list can compare the objects that are being added or removed with objects that are already on the list. The final method, typeCheck, is used to ensure type safety of the collection. When the Comparator object is used, the Comparator can be used to insure that objects in the collection are all of the same type. The value of this type checking is that it saves you from seeing object casting exceptions if the object in the list was not of the type you expected. I've got an example later on that uses a Comparator, but before we get to the example, let's look at the SynchroList class directly.

    public class SynchroList {
    class Link {
    ... this was shown above ...
    }
    class LinkEnumerator implements Enumeration {
        ... the enumerator class ...
    }
    /* An object for comparing our elements */
    Comparator cmp;
    Link head, tail;
    public SynchroList() { }
    public SynchroList(Comparator c) { cmp = c; }
    private void before(Object o, Link p) {
        new Link(o, p.prev(), p);
    }
    private void after(Object o, Link p) {
        new Link(o, p, p.next());
    }
    private void remove(Link p) {
        if (p.prev() == null) {
            head = p.next();
            (p.next()).prev(null);
        } else if (p.next() == null) {
            tail = p.prev();
            (p.prev()).next(null);
        } else {
            p.prev().next(p.next());
            p.next().prev(p.prev());
        }
    }
    public void add(Object o) {
        // if cmp is null, always add to the tail of the list.
        if (cmp == null) {
            if (head == null) {
                head = new Link(o, null, null);
                tail = head;
            } else {
                tail = new Link(o, tail, null);
            }
            return;
        } 
        cmp.typeCheck(o);
        if (head == null) {
            head = new Link(o, null, null);
            tail = head;
        } else if (cmp.lessThan(o, head.getData())) {
           head = new Link(o, null, head);
        } else {
            Link l;
            for (l = head; l.next() != null; l = l.next()) {
                if (cmp.lessThan(o, l.getData())) {
                    before(o, l);
                    return;
                }
            }
            tail = new Link(o, tail, null);
        }
        return;
    }
    public boolean delete(Object o) {
        if (cmp == null)
            return false;
        cmp.typeCheck(o);
        for (Link l = head; l != null; l = l.next()) {
            if (cmp.equalTo(o, l.getData())) {
               remove(l);
               return true;
            }
            if (cmp.lessThan(o, l.getData()))
               break;
        }
        return false;
    }
    public synchronized Enumeration elements() {
        return new LinkEnumerator();
    }
    public int size() {
        int result = 0;
        for (Link l = head; l != null; l = l.next()) result++;
        return result;
    }
}
1 2 Page 1
Page 1 of 2