Run-time Design for Object-Oriented Extensions to Pascal

Joseph Bergin
Computer Science
Pace University
One Pace Plaza
New York, NY 10038

A version of this paper was presented at the 1995 Computer Science Conference and appears in the proceedings.

1. Introduction

This paper proposes a possible design for the run-time object and class layouts for the language described in Object-Oriented Extensions to Pascal [1]. This language has a brief description in [2]. It is defined to be an extension to the languages Pascal [3] and Extended Pascal [4]. A brief and helpful description of the latter is found in [5]. In short, the Object Extensions to Pascal (here abbreviated OOPascal) describes an object oriented language with reference semantics for objects. The type system includes an asymmetric form of multiple inheritance. In this form of inheritance, all instantiated objects have a built in class Root as one of their ancestors. Other classes (Property classes) from independently rooted trees may be "mixed in" to the main (concrete) hierarchy to add functionality. The author describes a simple design for object and class layout in the run-time system of such a language. It should be emphasized at the beginning that this is in no way to be interpreted as a prescribed method, and that implementors are free to implement the abstract notion of a "reference variable" in any appropriate way. It is assumed that the reader is familiar with the Object Extensions to Pascal and is generally familiar with the main features of Extended Pascal. Papers [2] and [5] form an elementary introduction and reports [1] and [4] are definitive.

2. The Problem to be Solved

The Object-Oriented Extensions to Pascal is a hybrid object-oriented language exhibiting multiple inheritance, reference semantics for objects, run-time type inquiry, safe coercion, and a novel type view system. In contrast to C++ and Eiffel, which have symmetric multiple inheritance systems, in OOPascal there are two different kinds of classes, concrete and property. All instantiated objects have as one of their types the class Root, which forms the single root of the concrete part of the class structure. A class may be formed using inheritance from many other classes but only one of them may be a concrete class, and if the new class is to be a property class then none of the parents may be concrete.

Classes defining properties of objects, but not objects themselves may be defined and "mixed in" to the concrete hierarchy to provide services in a uniform way. The property classes mixed in must all come from independently rooted hierarchies. Thus, one may define a class List, derived from Root, as a typical linked list structure and may define a property class Sortable, providing element comparison methods. Then the class Sortable_List may be defined using multiple inheritance from the concrete class List and the Property class Sortable. Since the expectation is that the features of a class defined in the concrete hierarchy will be accessed more frequently than those from the property parts, it is desirable to devise an implementation methodology that makes access to the former as fast as possible.

Objects are referenced via "reference variables," which are similar in some ways to pointers in that they imply indirect semantics and a "heap-like" implementation. The implementation is not revealed to the programmer however, and the notions of pointers and references are kept completely distinct. The semantics of a reference variable is that it references the value "as a whole", rather than a sub-object as is the case in C++ in some situations.

The language also provides an operator IS to query an object as to its actual (polymorphic) type. We may thus inquire if anObject IS List,and get a Boolean reply. We may also inquire if anObject IS Sortable. In general these tests may only be performed at runtime as a variable declared to be of a class type may hold a reference to a value of any derived type.

We may also coerce a reference variable to a derived type. This is necessary if, for example, our List class was declared to hold Root objects and we have been inserting Expression objects into a value of type List. When these objects are retrieved from the list the only type known to the compiler is Root, but the programmer knows them to be of type Expression. To regain the full protocol of Expression for these objects the programmer may coerce them to type Expression. The compiler will insert run-time checks as necessary to guard against errors.

Finally, implementations may be complicated by the fact that we may define variables to be of a property class type and objects having these properties may be referred to by these variables. We may have a Sortable_List object, myList, and a variable sortableItem of type Sortable. The assignment


	sortableItem := myList;

is legal.

The above features detail the main features complicating the implementation. On the other hand, the builder is aided by the fact that the notion of a reference variable, which is the only means of accessing a value of a class type, is completely abstract. The implementation of reference is not revealed to the programmer, unlike the case with pointer variables, which are otherwise similar. Thus the implementor is free to use whatever means are required to represent the features of objects.

3. Goals for the Solution

One of the major goals in the run-time design of object-oriented languages is to minimize the time required to access object features, both the data elements and the methods. If objects utilize reference semantics then the best that can be expected in general is a fixed (compile-time) offset from a dereferenced pointer. Many systems employ more elaborate, albeit slower, mechanisms. In particular, multiple-inheritance complicates the design of the run-time system as it is difficult to place an object feature at a particular offset and guarantee that the offset will be unchanged in objects instantiated from subclasses.

In seeking a simple implementation strategy, we would like to use fixed, compile-time offsets as much as possible. This is the typical mechanism with ordinary records, though the presence of schemata in Extended Pascal complicates this to some extent. Since schemata permit structures to have a size that is known only at run-time, some implementations do not use fixed offsets for all record structures, but employ run-time computations via something like a dope vector. It is, however, possible to put the dope vectors at fixed offsets from some base, which might, in extreme cases, need to be computed at runtime.

In any case, we would like to provide a solution that does not employ searches of run-time records with associated comparisons made to guide further steps of the search. While it may be necessary to traverse run-time links, we would like to do so only when the number of steps along the chain is known to the compiler. While we must, of course, make simple comparisons to do so, we need not fetch information on which to base the comparisons.

As mentioned in Section 2, it is also desirable to optimize access to features defined in the concrete part of the hierarchy, perhaps at the expense of slowing down access to features defined in property classes. The paper [6] by Brown describes a scheme for implementation of symmetric multiple inheritance, in which the language does not guide the implementor in which of the parents of a class is more important. Similarly, [7] by Stroustrup, gives implementation details for multiple inheritance in C++. Here we seek something more specialized.

4. The Solution

I will make the following assumptions concerning the implementation as well as the probable style of programming in OOPascal. First, it is assumed that, in general, a given object will be referenced by several distinct variables (aliases), but most of these variables will be declared to have concrete class types with relatively few property class variables. Second, the general style of programming will put most of the features of most classes into the concrete part of the hierarchy, rather than the property parts. Third, I assume that the dynamic storage used to hold object values is such that values will not physically move during a single run of the program. This last assumption is, of course, not met for generational garbage collectors and some heap management systems, but relatively simple modifications of the following may be made in these cases. Note that garbage collection is not mandated by the language extensions, though it may be implemented if desired.

4.1 Simple Layout

Consider the Example 1, which is a simple, abstract, case that will be used and extended to illustrate the implementation strategy. We assume that property class P and concrete class A have been previously declared. Note that we have not listed instance variable types nor method parameters.


TYPE
A = CLASS(...) (* A may have a parent *)
	PROCEDURE method_a...;
	END;

P = PROPERTY CLASS
	PROCEDURE method_p...;
	END;

Also, while we have used only procedures, methods may be functions as well. Note that the class AA inherits a constructor create from Root. We also assume that it inherits a method method_a from its concrete parent A and that P declaresmethod_p. Finally, we assume that class A has been constructed without use of any property classes anywhere in its hierarchy.


TYPE
PP = PROPERTY CLASS(P)
	PROCEDURE method_q...;
	END;

AA = CLASS(A, PP)
	field_a:...
	field_b:...
	PROCEDURE method_b...;
	END;

VAR
	anObject: AA;
	aProperty: PP

BEGIN
	anObject:= AA.create;
	aProperty := anObject;
...

Example 1

The solution here employs two types of run-time records. The first (object-records) represents the objects themselves. There is one or several such records for each object, one for each independent hierarchy of which its class is composed. If an object is created from a class that employs multiple inheritance, then the object's run time record is composed of several segments, linked together. The first segment represents all instance variables, but not the methods, derived from the concrete part of the hierarchy and the others represent the various property classes.

The second type of run time record (class-records) represents run-time information associated with classes themselves. These class records hold pointers to the code implementing methods (including constructors and destructors) of the class as well as a pointer to the record representing one of the parents of the class. If the class represented is a concrete class this pointer is to the unique concrete parent. If it is a property class then this may be to any of its parents. The layout of each record is fixed by the compiler and does not vary at runtime. Likewise the links between the records do not vary at runtime for any given object. The class-records logically exist for the life of a run-unit. Object-records are created when a concrete class is sent a constructor message as in


	myList := List.create;

They become undefined when the object is sent a destructor message as:


	myList.destroy;

The first segment of the object-record also contains a pointer to the concrete class of which the object is a member (its own-class), and a pointer to one of the property classes which was mixed-in to form the own-class.

Each other segment of the object-record chain contains a pointer to a property class associated with that segment as well as a pointer to the next segment of the object-record.

Note that in this scheme it is not necessary to link the class-records together except for the parent links. It is also unnecessary to duplicate most class-records. Thus, in general there will be only one record for each class. This is complicated, however, by the ability to override property class methods, as will be shown below in Section 4.3.

See Figure 1 for a possible layout for Example 1. Note that the fact that class A and property class PP were mixed to from class AA is not directly represented in the class records. It is, however represented in the pointers from the object records of the object anObject. In this scheme, the inherited method_a occupies a slot at the same offset in the class records for both A and AA. This is similar for method_p in the class records.

4.2 Reference Variables

In OOPascal reference variables may be declared to have either a concrete class type or a property (or abstract) class type. We require different implementations of the references in the case of a concrete and abstract reference from that of a property reference. In the latter case a reference is more than a simple pointer.

The (concrete) reference variable anObject is represented by a pointer variable to the first object-record. Since the compiler has knowledge of the layout, it knows that field_a is at a certain offset in that first segment and that field_p, which was inherited from property class P is in some fixed offset in the second segment. Likewise the compiler knows that a pointer to code for method_a may be found by following a pointer to the class record and then applying a fixed offset and that for method_q may be found by a similar computation from the second object-record.

Note that field_a will always occupy the same slot in its object-record, as will method_q. Also, in any class derived from class AA, the segment for class PP will always occupy the second place.

Variables referencing property classe values are somewhat more complex. Here we represent them with a pair of pointers, with the first (object-pointer) pointing to the first object-record of the object as a whole. The other pointer (property-pointer) points to that object-record segment that represents the property class in question: namely the property class whose type is the declared type of the reference variable. We note that the actual type may have a more refined type than this declared type, so that if aProperty was declared to have type P, the second pointer would still point to the same location shown here. In some circumstances, in which the property class is formed from multiple inheritance of property classes, the picture is somewhat more complex as the class in question will require several segments, with the first representing the property class as a whole, and the other representing various of the properties mixed in to form it. This is discussed later.

It is important to note a few things about this layout. First, it is possible that a smart linker could decide, having seen all of the components of a run-unit, that some classes are never instantiated, and strip their class-records from the executable. It is also possible to decide in certain cases that the links to property records can be reduced to offsets and so avoid the chasing of links implied by the above. This is hard for the compiler, however, because of the possibility of cross-module references. In some cases, as when a class is not exported, the compiler itself may make such optimizations.

4.3 Inheritance

From Figure 1 many of the properties of inheritance may be derived. First, when fields are added in a new class declaration, they are appended to the object-records of the parent class. When methods are added, pointers to the code are appended to the class-records of the parent class. When a new class is mixed in, a new object-segment is appended to the far end of the chain of object segments required for objects of the new type. Since all methods are polymorphic (or "virtual") when a method is overridden, we must replace the pointer to code in the corresponding slot of the parent class with a new pointer in the class of the derived class record. There are some subtleties, however, as shall be shown.

Example 2 and the corresponding Figure 2 show the effect of such simple inheritance, in which a new class is derived from AA and a new property is mixed in. Also shown here is an override of a method of the concrete hierarchy. Overriding property class methods is somewhat more complex. In this example, the property class R is defined starting a new property hierarchy, and AAA mixes it in with AA.


TYPE
R = PROPERTY CLASS 
				{ New root.} 
	field_r:...;
	field_s:...;
	PROCEDURE method_r...;
	END;

AAA = CLASS(AA, R)
	field_c:...
	field_d:...
	PROCEDURE method_a...;
				 override;
	PROCEDURE method_c...;
	END;

Example 2

As we see, the layouts implied by this have new fields in the first segment, a new method in the class-record, and a replacement (indicated by *) in the method_a slot in that class-record. Likewise, a new property segment is required in the object-record chain. This segment contains the instance variables declared in the new property class, as well as a pointer to the property class itself so that methods may be accessed. Note that since property PP is still an ancestor of this new class, the object-record representing PP is still second in the chain. Thus, a reference variable of type AA may refer to an object of this new type without modification of the access protocols established above for class AA.

As mentioned above, the situation is more complex when property class methods are overridden. In particular, a class-record representing a property class may have to be replicated to hold the pointer to the new method body. This is illustrated in Example 3 and Figure 3. Here the situation is as in Example 2, except that we also wish to override method_p which was inherited (indirectly) from PP.


TYPE
R = PROPERTY CLASS { New root.} 
	field_r:...;
	field_s:...;
	PROCEDURE method_r...;
	END;
AAA = CLASS(AA, R)
	field_c:...
	field_d:...
	
PROCEDURE method_a...; override;
	PROCEDURE method_c...;
	PROCEDURE method_p...; override;
	END;

Example 3

In figure 3 we show both a reference to an object of type AA and to one of type AAA. Note that the class-record for class PP has to be duplicated in the run-time system to be able to hold a pointer to the new method body. Again, the knowledge that this must be done resides in the compiler, and is discovered when the override of a property method appears in the class declaration of AAA. This new class-record does not represent a new property class, but only a segment of a new concrete class. However, the duplicated PP record points to the original PP record as if it were its parent. This is required for type inquiry, when the program tests if anAAAObject IS PP.

Notice importantly, that we may rely on the declared type of a reference variable to define the access to all features accessible via that reference. We need not know the run-time value to which the variable refers, relying on the fact that if it refers to a value of a type other than the declared type of the reference, it must be a subtype of it, and therefore uses the same layouts as far as defined in all directions by the class of this reference.

Next we note that although we have been describing the above in which one of the classes mixed in is a concrete class the same layouts are implied if all of the classes are property classes. The only difference is that the resulting structure does not describe any object as a whole, since property classes are not instantiable, but only describes a sequence of consecutive segments of object-records in objects of classes into which this property class structure is mixed. In either case, there is exactly one object-record per hierarchy used to form the own-class of the reference.

It should also be obvious to implementors, that there is no need to allocate the various object-records making up an object in separate operations. In fact it is desirable to do so all at once. Thus, Figure 4, describes a more typical physical layout. Note that we still need pointers to the various segments (unless an optimizer can determine that they are to the same offsets in all instantiated objects) because the same property class may be mixed in to several different concrete classes, and the several property classes mixed in to different parts may be in different orders. Therefore, it is not possible in general to associate a fixed offset in this structure with a given property class variable, except relative to its segment.

4.4 Representing Self.

Within a method of a class, the object in control of the process is represented by a protected (implicit) parameter self. Protection of

the parameter implies that the reference may not be changed to another object. If the class of the method is a concrete class, then the parameter may be represented by a pointer to the first segment of the object-record of the object that receives this message at run time. For a property class, however, the reference to self is a pair of pointers as is the case for other property class references. The property-pointer of this pair must be set by the caller of the method, to point to the segment representing the property class in which this method was originally declared rather than the declared type of the reference variable mentioned in the message statement or expression. If this is a property class variable, then an adjustment may need be made. If it is a concrete class variable, then a property-pointer will need to be generated.

4.5 Type Inquiry and Coercion

If we assume that objects do not move in the run-time system, type inquiry may be implemented by pointer comparison between some fixed addresses and the class pointers in each object. The same may be done with inquiry as to property classes. Again, the layouts permit the declared types of the variables to be used, rather than the run-time values to which they may refer. Note that since an object is a member of the ancestor types of its own-class, it is necessary to search parent chains when seeking a match.

If objects must be permitted to move during a computation, a slight refinement maintains a pointer to each class-record, which must be updated when the class-records move. Thus, again, we can implement type inquiry with simple pointer comparisons.

Type coercion is also quite simple for references of concrete class type, relying on the declared type of the reference variable being coerced and the type to which it is being coerced. If they both represent concrete classes, then nothing need be done except insert a run-time check that the class of the reference is indeed the class of the coercion. If one of the references is concrete and the other not, then we either throw-away the property-pointer, or generate one from knowledge of the types of the two references.

When both references are property classes the situation is a bit more complex in that the property-pointer must be adjusted. This adjustment could cause a property-pointer to move either way in its chain. When we assign a reference of a derived property class to one of one of its ancestors, we must move the property pointer down the chain a fixed number of steps, depending on the two types. Coercion in the opposite direction is only legal when one property class has been derived from others. References of the ancestor classes may be coerced to the derived class(but not to one another, as they may come from different hierarchies). If the coercion is legal the compiler knows from its defined layouts how many steps along the object-record chain must be stepped to obtain a pointer to the object-record of the coerced type. Unfortunately these steps are upward along the chain and as we have not doubly linked the chain, we need to traverse it from the origin, for which we have the object-pointer. These steps and the result type must be guarded with run-time checks, however, since a downward reference may always be made in error.

4.6 Schemata

Run-time sized schemata complicate things only a bit. If we assume that dope vectors may be used to describe the actual types, and that these dope vectors may have fixed size, then these may be placed where we have indicated instance variables. The data in the dope vectors may then be used to allocate storage for the indicated fields, either at the end of the object-segments indicated above, or elsewhere in the heap or stack.

5. Conclusion

Here we have outlined a simple object layout model for the Object Extensions to Pascal [1]. While not optimal, it is relatively simple to implement and makes access to the main features of a class (the concrete features) as fast as is typical in single inheritance systems. This model may be modified slightly or drastically to meet the needs of a specific computing environment.

6. Acknowledgments

I would like to thank Tony Hetherington and Tom Turba for reading and commenting on an earlier draft of this paper. Tony points out that two implementations of Extended Pascal (his own at Prospero Software Ltd. for PC's, and that of Digital Equipment Corp. for workstation and larger systems) use slightly different layouts for schemata than suggested here, which may complicate this design slightly.

Bibliography

[1] Object-Oriented Extensions to Pascal, ANSI/X3-TR-13:1994.

[2] Joseph Bergin, A Report on Object-Oriented Extensions to Pascal, SIGPLAN Notices, Volume 24, Number 11.

[3] Programming Language Pascal, ISO/IEC 7185:1990.

[4] Programming Language Extended Pascal, ANSI/IEEE770X3.160-1989, ISO/IEC 10206:1991.

[5] Tony Hetherington, An Introduction to Extended Pascal, ACM SIGPLAN Notices, Volume 28, Number 11, November 1993, pp. 42-51.

[6] Bob Brown, Non-Linear Type Extensions, ACM SIGPLAN Notices, Volume 29, Number 2, February 1994, pp. 39-43.

[7] Bjarne Stroustrup, The Design and Evolution of C++, Addison-Wesley, 1994.