Critique of the Java 1.2(beta) Collection Class Model

Joseph Bergin
Pace University
berginf@pace.edu

Abstract

The Java 1.2 Collection Classes are seriously flawed. This paper examines some background, the design, discusses the problems, and provides an alternate design. The less important and easier to fix errors involve weaker support for linked linear collections than for dense linear ones. The more serious flaw involves the new concept of an optional method of an interface. The proposed solution to the latter is to use semantic tags to make information about designer intentions available to clients.

Background

In Java 1.0 and 1.1, the java.util package contained a few collection classes such as Dictionary and Vector. These classes were quite simple, and were adequate for many purposes, but were generally seen as incomplete. Java 1.2 [1] has offered a new Collection class hierarchy to address these problems. This set of classes and interfaces, while much more complete, has a number of unfortunate properties. The design seems to have been driven by a need to solve rare problems that do not occur in the vast majority of software projects. The design also has unfortunate side effects relating to the use of interfaces that will unfortunately corrupt best practice with Java interfaces. To explain the problems fully, requires a bit of a look at Java exceptions, Java interfaces, and Java concurrency.

Java exceptions fall into two broad groups. The Exceptions which should be caught and the Errors and Runtime Exceptions which should not be. Errors are generally errors internal to the run-time system. Runtime Exceptions, such as IndexOutOfBounds and IllegalArgumentException should never occur in a correct program. The Exceptions, such as the various IOExceptions, can occur in a correct program and therefore must be

handled by the programmer as handling them can prevent bad things from happening in the absence of required resources. To handle Runtime Exceptions, however, would be a waste of machine cycles in a correct program. Therefore the general rule of programming in Java is do not handle Runtime Exceptions. Program so that runtime exceptions cannot

occur; but handle all Exceptions. In fact the compiler enforces this latter rule.

Java interfaces are intended to define first class types. Variables may be declared to have an interface type, as can function parameters and return values. Interfaces can not be directly instantiated. One only instantiates a class implementing an interface. An interface variable can refer to any object of a class that explicitly implements that interface. Such a variable could be cast to any class that implements the interface, though not without a runtime check. Programmers are encouraged to write code in which variables and parameters are interface types rather than class types to obtain generality in the code. Indeed the Java 1.1 event model relies completely on this idea. An interface is a contract that a variable defined to have this type will support certain functionality. Part of this contract is informal and is stated at the meta-level (comments) in the interface definitions. Programmers can thus rely on both the syntax and the semantics when working with variables of interface type. Java interfaces provide some of the functionality of templates, though with runtime rather than compile time checking.

Java concurrency is based on the monitor model. Any object can act as a monitor. Synchronized methods are only activated by an object when the thread invoking the corresponding method holds the lock on that object. Since individual method invocations are the normal granularity of locking, and the locks are obtained on individual objects, a LinkedList object, composed of Node Objects can still be modified by one thread even though another thread holds a lock on the LinkedList itself. Only the LinkedList object and not the objects to which it holds references (its own Nodes) is locked when a synchronized method is invoked on the list. Java also contains a statement level concurrency control mechanism, that still depends on obtaining a lock on an individual object. This of course, means that atomicity of operations in concurrent programs is not always easy to guarantee. One principle of server design in the presence of concurrent clients is that the atomicity should be guaranteed by the server and not forced onto the clients.

The atomicity desired for safe concurrent operation at odds with collections (linked list in particular) that are designed to support algorithm development. Consider any algorithm that must look at a value and then conditionally modify/replace that value. To meet the atomicity needs and desires of concurrency, it would imply that this algorithm should be developed within the class itself (behind the abstraction boundary) so that the developer can guarantee atomicity. On the other hand, this means that the developer of the list or whatever has to perfectly anticipate all algorithms that will ever be needed, unless programmers are to continuously modify the implementations.

The Java 1.2 Collections Design

Designers of the Java 1.2 collection classes sought a lightweight design, much simpler than the Standard Template Library of C++ (STL), for example. They also, obviously, looked for a design that would fit well with other Java features, such as interfaces and concurrency. Finally, they were after a design that would not impede the work currently going forward that seeks to bring generic (a.k.a. template) programming to Java. It was also thought undesirable to have a large number of interfaces to cover the large number of variations that collection classes might take. These include mutability and synchronization, among others.

The Java 1.2 collection classes, in java.util consist primarily of four interfaces, Collection, List, Map, and Iterator, along with a number of classes that support and some that implement the interfaces. There is also a Set interface, extending Collection, but with no additional functional requirements. It does have meta level requirements, however as it (informally) imposes set-like behavior on implementations. This is not enforced, however, nor can it be. Map is separate from Collection and does not extend it. List, meaning any linear collection, does extend collection. Iterators are used to iterate or "pass over" Collection (not Maps). One can get a collection view of a Map, however, and iterate over it. Iterators can be used to modify and replace elements in a Collection, but not to extend a Collection. A Map is conceptually a container (but not a Collection) of key-value pairs. It could be backed (implemented) by a hash table or a binary search tree, for example. Maps are functional and do not contain duplicate keys. There is no provision here for multi-maps as in the STL.

List is a catchall covering both dense and linked implementations (i.e. vector and linked list). The methods support vectorlike collections quite well, but linked structures not so well. Many of the methods of List assume an indexing model, appropriate for vector/array type structures but not efficient for linked implementations. There is no facility in the library anywhere for relinking nodes of one linked list into another (or even the same list), though with a reference model of computation, this is a minor defect as the nodes contain only references, rather than blocks of data in almost all cases. It is not possible to use the methods of the interfaces supplied to sort a linked list efficiently in place. Indeed, the provided implementation of the sort routine for List, which will sort any object implementing List, copies the contents of the List to be sorted into an array, sorts the array, and copies the result back into the original. The sort is stable (merge sort). While sorting linked lists may be rare, the lack of support for so basic an operation does not give one confidence that other algorithms will be able to be implemented efficiently with this model.

The Iterator interface is intended as a replacement for Enumeration of Java 1.0. It supports shorter names of methods but little increase in functionality. It does have a remove method to remove the current element, however. It is extended by ListIterator, a bidirectional iterator, intended for Lists. This latter interface also provides add, which adds an element at roughly the current location: the location before the one that would next be returned by next. Note that although the List interface is intended for both dense and linked lists, its iterator supports only movement by one item at a time. It is also not possible to retrieve the current element at the point of the iterator without also moving the iterator, making many algorithms difficult and inefficient (e.g., sort). Therefore, the iterator interface does not support any algorithm in which an element must be examined and then conditionally replaced unless done so immediately. Making iterators cloneable would help with this problem quite cheaply.

The design is quite different from that of the STL and the roughly equivalent Java Generic Library, making iterator ranges ineffective as collections to be acted upon. Indeed, it is part of the design philosophy to disallow such "poor man’s collections," in the words of the designers. Algorithms are to be applied to collections directly, not to sub-ranges of their elements. Thus, when an algorithm must be applied to part of a given collection, that part must first be copied into a new collection before the algorithm is to be applied. There is no easy means of copying it back, however. A constructor to create a Collection from an iterator range and a method to replace an iterator range with an arbitrary collection would help with this problem.

Serious Problems

A few of the less serious problems have been discussed above. The most serious one, however, concerns a new concept which this proposal introduces to Java interfaces called the optional method. The List interface has twenty-six methods, of which eleven are marked optional. This means that an implementation of the interface may throw an exception UnsupportedOperationException, rather than implement the method. All of the mutators of the List are optional. This new exception is a RuntimeException and therefore is not expected to be caught by programmers. Thus, the List interface really defines over 2000 different combinations of methods that can be implemented, though the optional methods fall into different categories that would likely all be implemented together or not at all. In reality, there are probably about ten or so reasonable combinations that would be likely to occur. The library itself provides some scaffolding for the implementation of unmodifiable collections and maps and synchronized collections and maps. An unmodifiable collection would return a runtime exception if any of the mutators defined in the Interface were applied. These are all optional methods.

The design was apparently chosen to support something more elaborate (Doug Lea, private correspondence). Some Collections/Maps might like to be modifiable at times and then unmodifiable at other times. The locking associated with synchronization comes to mind as one possible reason for this, but there could be others, such as a database being backed up, but still available for query but not update.

The optional methods in the collections interfaces were put there to support transient mutability as well as concurrent atomicity. This, on the other hand, makes it impossible to give rich enough interfaces to support algorithm development without either continuously trapping run time exceptions, or dipping behind the abstraction boundary to implement new algorithms.

For normal server objects that must operate in a concurrent environment, atomicity control should almost certainly be up to the server itself. It does not seem, however, that a linked list can properly be said to fall into this category, unless it has within itself all possible algorithms that you might want to implement on it. This seems to be both impossible and undesirable. But this puts concurrency control back in the lap of the user.

However, this design also makes interfaces unreliable and unusable as first class parameter and return value types. You can no longer write a method of a class that takes List as a parameter type and apply the methods of List to the object passed in. You can only apply non-optional methods, unless you are willing to test for the presence of each of the methods you require. The only test available is to try them and catch the resulting exception if they fail. This would be understandable, perhaps, if these were not runtime exceptions which are supposed to signal program logic errors and should not occur in a correct program. However, what is incorrect about applying methods of List to a variable whose type is List? If anything is, then our type system is badly flawed. It may well be that the function in question and the List in question were written by different programmers, at different times, and are implemented on different systems, connected via the serialization API. If this design holds, then programmers will not be able to rely on interfaces to give semantic level information, but will be required to rely only on the information on the classes specifically implementing the interfaces. This would be the end of the usefulness of interfaces. Programmers will need knowledge beyond the definition of the interface into some of the implementation details of the classes used.

The same problem occurs when a function returns a List. What "flavor" of List, you will have to ask somehow. In a large and dynamic project this question may be difficult to answer, other than by trying the methods and catching the resulting runtime exceptions. As suggested above, the fact that a method worked once on a given List does not mean that it will work again in the future.

To end this section, let me emphasize that if programmers find optional methods throwing runtime exceptions in the core Java libraries, then they will take it as a point of accepted Java style to do likewise in their own projects. Over time, this will utterly destroy both interfaces as first class types and the basic idea of runtime exceptions as signaling avoidable logic errors. These were both excellent ideas in the original Java architecture. To achieve a dubious and rarely needed result, the Java 1.2 Collection interfaces are destroying a good design.

 

Alternate Design

Here I present an alternate design for collections that is not remarkably different from the Java 1.2 proposal, except for the absence of optional methods. This design is based on the idea of an "semantic tag." This is an interface with no functional requirements other than at the meta level. The Serializable interface of Java 1.1 is such a tag:

interface Serializable{
}

Serializable defines no methods. However, all of the methods of the serialization API require Serializable data to work. An interface tag is trivial to implement, can be checked, and serves as semantic information being passed from one part of a program to another.

This design relies on (at least) seven semantic tags: FixedSize, Immutable, UniqueElements, Ordered, Sync, Random, and Bidirectional. The latter two inform a user about the kind of iterator that the Collection returns. Additionally there is an interface TransientlyMutable that has a single method to determine if a collection is currently mutable or not. This latter must be used carefully in the presence of multiple threads, of course.

This design also does not utilize iterator ranges as first class collections, though it provides for the initialization of a collection from an iterator range. It also provides a range of iterator types, not unlike the STL. Both dense and linked linear lists are combined into a single interface: Linear. They are distinguished by their iterators. A dense implementation such as a vector will have an iterator that behaves like an index. A linked implementation one that is more like a pointer.

The functionality is split between the iterator classes and the collections themselves. For example, to retrieve a value without advancing an iterator in a List, you can invoke the List method get(Iterator).

In other ways this design is much like the Java 1.2 proposal. Its main difference is in replacing optional methods with semantic tags. In doing this you lose the possibility of guaranteeing atomicity of some operations in the presence of concurrency, but gain a richer interface that better supports algorithm development. This is an important consideration for the flexible use of Collections if the designer can not anticipate all uses to which a List, for example, might be put.

A large number of implementations are possible in this framework. Some of the more important ones are:


HashBag implements Collection
DenseBag implements Collection, Random
// or Linear, Random
TreeBag implements Collection, Ordered
HashSet implements Collection, UniqueElements
DenseSet implements Linear, Random, UniqueElements
TreeSet implements Collection, Ordered, UniqueElements
Vector implements Linear, Random
Deque implements Linear, Random
SingleList implements Linked
DoubleList implements Linked, Bidirectional
CircularList implements Linked //,Bidirectional (opt)
HashMap implements Map, UniqueElements
HashMultiMap implements Map
ArrayMap implements Map, UniqueElements
TreeMap implements Map, Ordered, UniqueElements
TreeMultiMap implements Map, Ordered
ImmutableSet implements Collection, Immutable
FixedSizeVector implements Linear, Random, FixedSize
ConstantVector implements Linear, Random, Immutable

Other Work

Doug Lea has built a rather complete collections library for Java [3]. It has a large number of interfaces and supports mutability/ immutability directly. Tim Budd [4] has also developed a proposal, simpler than either Lea’s or this one.

Conclusion

Optional methods in Java interfaces are a seriously flawed idea. They will make interfaces unreliable generally. Java Collections design must support the large majority of applications more directly. This implies that the interfaces must be rich enough to support algorithm development. If compromises need to be made, they should be made in favor of providing solutions for the most common problems, rather than for the rarest. This paper has suggested semantic tags as an alternative to optional methods. This will make concurrent programming somewhat more difficult, but otherwise better support algorithm development with the Collections.

Bibliography

[1] Java 1.2, http://java.sun.com/products/jdk/1.2/

[2] Joseph Bergin, http://csis.pace.edu/~bergin/java/collection/

[3] Doug Lea, http://gee.cs.oswego.edu/dl/classes/collections/

[4] Tim Budd, http://www.cs.orst.edu/~budd/java/collect/

 

Appendix


Collection, Linear, and the Iterator interfaces are presented here.


   interface Collection 
   {	boolean add (Object o); 
   		// Returns true if the Collection changed as a result. 
   	boolean addAll(Collection c); 
   		// Returns true if the Collection changed as a result. 
   		// NOTE: addAll is here for efficiency. Internally the collection
 		//    could take advantage of its own implementation to make allocations 		//    efficient. This is also true of removeAll and some others here. 
   	boolean clear(); // Returns true if the Collection changed as a result.    
   	boolean contains(Object o); 
   	boolean containsAll(Collection c); 
   	boolean equals(Object o); 
   	int hashCode(); 
   	Iterator begin(); 
   	Iterator end(); // hasNext returns false. "After the end." 
   	boolean remove(Object o); // Returns true if the Collection changed. 
   	boolean removeAll(Collection c); 
   		// Returns true if the Collection changed as a result. 
   	boolean retainAll(Collection c); 
   		// Returns true if the Collection changed as a result. 
   	int size(); 
   	Object[] toArray(); 
   		// Allocate and return a new array with the contents of this. 
   	int toArray(Object [] a, int start); 
   		// Copy this into a starting at start. If the length of this more
 		//    than the remaining portion of a, it is filled only to the end. 
   		// Returns the number of entries actually placed into a. 
   		// The standard use of this is something like: 
   		// Foo[] fa = new Foo[fc.size()]; 
   		//	int size = fc.toArray(fa, 0); 
   		// where fc is a Collection of Foo's. Note that this is not
 		//    thread safe even if both size and toArray are Synchronized. To 
   		// guarantee thread safety you need to write something like:
   		//	synchronized(fc) 
   		//	{	Foo[] fa = new Foo[fc.size()]; 
   		//		int size = fc.toArray(fa, 0); 
   		//	} 
   	boolean addAll(Iterator begin, Iterator end); 
   		// insert into this all elements between begin and end 
   		// including begin but not including end. 
   		// Returns true if the Collection changed as a result. 
   	boolean removeAll(Iterator begin, Iterator end); 
   		// Remove the elements [begin, end), including begin, but not 
   		// including end. 
   		// If begin and end are iterators into this Collection, then only
		// those elements are removed, though the values might occur
		// elsewhere. begin and end are iterators into some other Collection,
		// then values defined by those positions are removed from this 
		// Collection. Returns true if the Collection changed    as a result.
 		// begin and end are    not invalidated. end is not relocated. 
   }  
    
   interface Linear extends Collection 
   {	boolean add(Iterator i, Object o); 
   		// Returns true if the Collection changed as a result. 
   		// Insert o before i. i is not invalidated. 
   	boolean addAll(Iterator i, Collection c); 
   		// Returns true if the Collection changed as a result. 
   		// insert all elements of c before i in c's natural (iterator) 
   		// order. Iterator i is not invalidated. 
   	boolean addAll(Iterator here, Iterator begin, Iterator end); 
   		// insert into this at location here all elements between begin 
   		// and end including begin but not including end. 
   		// Returns true if the Collection changed as a result. 
   	Object get(Iterator i); // Constant time 
   	Iterator locationOf(Object o); 
   		// Linear time. May be better for Ordered implementations. 
   	Iterator lastLocationOf(Object o); 
   		// Linear time. May be better for Ordered implementations. 
   	boolean removeFrom(Iterator i); 
   		// Returns true if the Collection changed as a result. 
   		// i is not invalidated. 
   	boolean setValue(Iterator i, Object o); 
   		// Returns true if the Collection changed as a result. 
   }  
   interface Iterator extends Cloneable 
   {	boolean hasNext(); Does not change the location 
	Object next(); // Returns the current value and moves to the next.  
	boolean equals(Object o); 
		// Iterator o references same place in same collection as this does. 
	Object clone(); 
	void to(Iterator i);  
		// Move to loc of i within the same Collection/Map only. 
	void reset();  
		// Permit the same iterator to scan over its collection again.  
		// Note that this method precludes anything like the input  
		// iterators of STL 
   }  

   interface BidirectionalIterator extends Iterator 
   {	boolean hasPrevious(); 
	Object previous(); 
   }  

   interface RandomIterator extends BidirectionalIterator 
   {	boolean hasNext(int n); // Does not change the location. 
	Object next(int n);  
		// Move n locations "forward" and returns the value there. 
		//  n may be negative.  
	boolean hasPrevious(int n); // hasPrevious(k) == hasNext(-k) 
	Object previous(int n);  
		// Move n locations "backward" and returns the value there. 
		//  n may be negative.  
	int distance(RandomIterator i);  
		// Distance between this and i: "i - this", if i  
		// follows this, the result is positive.  
	void to(int n); // to absolute position n (zero based).  
	int indexOf(); 
   }