Lists with "Current" Considered Harmful

On the Importance of Showing Only Good Deigns to Students

Lists with "Current" Considered Harmful.

Joseph Bergin
Pace University

Novices should see good designs because they trust us, the professors, to show them good designs that they can emulate in their own work. Psychologically, it seems to me, it is natural for a student to look back with favor on what they learned early as a novice, especially if it made an important impression on them. Educators have a responsibility to be careful in what they show students so that short cuts that we would not like them to make, we don't make either--especially at first.

In a number of textbooks and in a number of courses, when "object-oriented" lists are presented, instructors need to present a design of lists that meets several criteria. First it must faithfully teach the principles of linked storage management, as this is likely the students first introduction to it. Secondly, if the object-oriented paradigm is being used in the course, the instructor needs to provide an interface adequate to solve interesting problems using lists and to provide an appropriate abstraction boundary and hide the implementation behind that boundary. Finally the design needs to be one that is sound and can and should be emulated by students. Doing this well will advance the student understanding on several levels simultaneously. Doing this poorly may mislead them in one or more ways. Certainly a design that leaves the pointers themselves in the interface is generally seen as a bad design. But other mistakes can be made, and some of them are a bit subtle.

One list design that has occurred frequently in recent textbooks is, I believe, deeply flawed. Some authors provide a list abstraction in which there is a notion of "current position" as part of the list abstraction itself. Some authors call this a "cursor". Using C++ template syntax, the interface for such a list might look like the following:

template <class E>
class list
{	public:
        list()
        ... // destructor, etc.
        void reset(); // reset current to beginning
        void advance(); // advance current location
        void insert(E); // insert E at current location;
        void delete(); // delete the current item
        ... // additional members and implementation details
};

While this may be a useful abstraction for some problems, I hesitate to name it list or to present it as the first try at a list abstraction. It might better be called a "rolodex" because of the central position that the current position holds in the abstraction. As a list abstraction, however, I believe that this design is deeply flawed, and should not be presented to students. I justify this claim, which may disturb the reader, on the following grounds. I will outline a better design toward the end of this discussion.

When doing object oriented design with proper information hiding and encapsulation, the fundamental job is to partition responsibilities between objects. Here we have given responsibility for maintaining position within the list to the list itself. The design criteria for the interface of an abstraction, however, is to provide an interface that will be rich enough so that all of the needed operations may be applied to the structure. Said another way, and thinking in terms of the client-server nature of most object oriented programs and systems, the class/object must supply all needed services.

But what are the services that we wish a list abstraction to fulfill for us? This is difficult to specify completely, since lists are one of the basic building blocks of other structures and one of the flexible data structures for maintaining data for use by algorithms. I think this is fundamental to the CS2 course, which is why I believe that this is an important issue.

Insertion and removal services of lists are fundamental, of course, and are easily provided as shown above. We will certainly need to search the list, and here the current position serves well as linear search is the only really feasible mechanism. Of course, if the list is doubly linked, we assume that "current" can be retreated as well as advanced.

Sorting, however is much more problematical. The above design requires that the sort be place within the abstraction boundary as hardly any sort algorithm can be expressed that uses only a single location into the structure to be sorted. Even bubble sort seems to require at least two locations be maintained for even the most naive and inefficient algorithm. Quick sort probably needs three locations. But then, which sort do you provide--certainly not all known sorts that are possible on lists. Do you provide the fastest possible sort (an adaptive quick sort perhaps) or do you provide a stable sort (merge sort) that is known to be a constant multiple slower than the fastest possible? What then happens when the choice that we make turns out to have been the wrong choice for a particular application? Do we require that the structure be modified and the interface extended? Or do we break abstraction, by giving some external algorithm access to the internals of the structure via something like the C++ friend facility? Both of these methods are extremely undesirable, both in practice and as structures for students to emulate.

Do we really want to show them a structure, early in their learning, that has the requirement that it be frequently modified so that it can be used in new problems. Or worse, that they should throw this class away a build a different one for another problem to be solved? Think about using the above list to implement a queue. We need to maintain locations to the beginning and the end (two locations). While this can be done as a special case, independent of "current," how many special cases should there be? And what if we guess wrong?

Or is it a better design to provide a common and generally useful interface that is neutral as to the possible algorithms that might be implemented. I strongly believe that the responsibilities should be refactored. List position is a

separate abstraction from list itself. It is related to list and subsidiary to list, but nevertheless it is a separate "idea" and so should be modeled as a separate class of objects. There is a 1-many relationship between a list and its positions. Many algorithms, not just sorting, depend on this 1-many relationship.

A list position (often called an iterator) encapsulates the idea of a movable position in a list (or other structure). The implementation of a position/iterator can be a pointer to a list node, or it can be somewhat more elaborate. Keeping a pointer to the list itself permits a position/iterator to be reset to the beginning of its own list, though this extra pointer has some negative consequences as well.

List positions/iterators can be created directly with their own constructors, or via a member function/method of the list over which they will iterate. The latter is preferred. A single pointer implementation is useful in that it guarantees that when a list is modified by removing items, only those positions that actually refer to the item deleted will be made invalid. It is also possible with this simple implementation to permit splicing of lists to create longer ones from smaller parts, and any position to an involved node will be maintained automatically. While this is possible with other implementations, it is a bit more involved.

The advantage of this refactoring is that any algorithm can be developed external to the list abstraction boundary without moving that boundary. This increases flexibility greatly and leads to stable designs. The list class might still want to provide a few algorithms within the boundary, such as a stable sort (the most generally useful kind, but not the most efficient) for the convenience of the user, but if the chosen algorithm is infeasible in a given application, the user/client can provide an alternate algorithm outside the boundary, since the interface is rich enough to permit it.

Some will object that a design that permits multiple positions is problematical in that many positions could be inadvertently invalidated when the list changes and it might be difficult to know which ones are still valid and which encapsulate dangling pointers. This problem can be solved if the list can provide a "registration" service for its positions. When a position in a list is created, it registers with the list. The list maintains information about all registered positions. When the list changes it broadcasts the impending change to the registered positions so that they may update their own location if necessary. The details of doing this are somewhat complex, but the list itself need not provide this registration service. It could be provided externally when needed, or it could be provided within a subclass of list. This latter method shows an appropriate use of inheritance. It also shows a number of efficiency/ safety tradeoffs and so is pedagogically useful.

In conclusion, teaching object oriented programming well requires that we think hard about abstraction boundaries and factoring responsibilities. One principle that can nearly always be relied on to give at least approximately correct designs, is that different "things" should be treated differently. "Things" should be interpreted very broadly. In the above factoring the "things" were really ideas. Position is a different idea than List. Therefore consider implementing it differently and providing different abstractions for it.

The STL does a particularly good job of this, in which the iterators are all external to the container classes. Iterators are generated from the containers directly. The algorithms interface to the containers via iterators, and hardly any algorithms are provided within the container abstractions. In my "Data Abstraction" text, a List is composed from four separate abstractions: List, Position, Iterator, and Node. The node is hidden and used for implementation. I separated out the "motion" aspects of locality into iterator and the stable aspects into position. The iterators are easily used with while loops and positions are used for more ad-hoc things. I had earlier decided that in the rewrite of the text that Position and Iterator should be combined into a single abstraction. The STL design convinced me of the correctness of this decision.

If you have committed to a design such as the one objected to here, I apologize if I have offended you. Such was not my intent. I hope that I can convince you of the soundness of my wider goal, if not in this specific case, that we need to have a higher than expected standard of quality in the designs that we show students. We should not trade correctness of design for a supposed simplicity that we fear is all the novice can absorb. If we want them to use industrial strength tools we need to give them industrial strength mechanisms for their use as well.