Exheritance Class Generalisation Revived

Exheritance Class Generalisation Revived Markku Sakkinen Information Technology Research Institute University of Jyväskylä P.O.Box 35 (Agora) FIN-40351 Jyväskylä Finland sakkinenm@acm.org or sakkinen@cs.jyu.fi Abstract. We develop further the old idea that object-oriented languages could support also the inverse of inheritance (specialisation): generalisation or exheritance. It is easy as far as only interfaces are concerned, but attributes and method implementations cause problems. Renaming appears to be a very desirable language feature for exheritance. Combinations of inheritance and exheritance can be interesting and useful. 1. The basic idea Already in 1989, Claus H. Pedersen suggested that a generalisation mechanism be added to object-oriented languages, as a converse of inheritance, which typically means specialisation [Pedersen 1989]. The idea immediately looked nice and logical, but I have neither seen it developed further in the literature nor noted its adoption into any concrete language. On reading the paper again, I noticed a rather significant flaw (see Section 3), but it can be remedied. The paper [Pedersen 1989] is not explicitly aimed only at statically typed languages, but some of its suggestions are mainly relevant to these. Here, static typing will be assumed, but some points are relevant also to dynamically typed languages. One of Pedersen s main arguments was that it is often natural to define more concrete and specialised classes first, and only later note commonalities that could be refactored into common superclasses. Because a generalisation construct does not exist in current languages, this is not possible without modifying all original class definitions. It was not emphasised in [Pedersen 1989] that the problem is more severe when the affected classes come from libraries whose source code might not even be available. Today class libraries and frameworks are very heavily utilised in OO software development, and so the importance of the problem has grown. Pedersen s basic idea is briefly explained as follows: A class G can be defined as a generalisation of one or more previously defined classes A 1,A 2,,A n. It will then in the default case have all features (methods and attributes) that are common to all those classes, but some features can be explicitly excluded. Even single generalisation can be meaningful: in [Pedersen 1989] a Stack class is defined as a generalisation of a Deque (double-ended queue) class, where those methods not appropriate for the stack abstraction are excluded. When class G is defined by generalisation from A 1,A 2,,A n, it becomes a superclass of all these classes. This effect is exactly the same as if G had been defined first and A 1,A 2,,A n as its subclasses. Also in an arbitrarily complicated inheritance hierarchy, it does not affect the semantics which relations have been defined by specialisation and which by generalisation. One can thus say that generalisation does not add conceptual baggage to a language. It is not always self-evident which features in several unrelated classes are to be regarded as common, i.e., as the same feature. We will study this issue in Section 4. Some languages allow the type of an inherited feature to be redefined in a subclass according to some restrictions (usually to a subtype). This issue was not treated in [Pedersen 1989]; clearly, a feature F must be excluded from the generalised class if no such type for F exists that the restrictions can hold with respect to every subclass. Conflicting method preconditions in Eiffel can cause a similar situation, because the precondition in the superclass must imply the precondition in every subclass. Different visibilities (accessibilities) may also have a similar effect in some languages.

If generalisation should be added to C++, there arise the additional problems of public, protected or private inheritance on one hand, and virtual (sharing) or non-virtual (duplicating) inheritance on the other hand. As argued in [Sakkinen 1992], these two aspects should better not be independent, but so they are in C++. For convenience, I will use the term exherit (generalise) as the converse of inherit (specialise). Such a word is needed at least for speaking about exherited features. 2. The most abstract cases It is noted in [Pedersen 1989] that generalisation is simplest when only class interfaces are involved, and no implementation. Today, it would thus look like a very easy enhancement to Java to allow interfaces to be defined by generalisation from classes and other interfaces. Also in other languages, generalising into fully abstract classes (in Java such things can be declared either as classes or as interfaces) would be easiest. Non-public methods do not belong to the class interface, but we need not distinguish between public and non-public in the exheritance of method interfaces. For each concrete (effective) method that is exherited, it should be specifiable whether also its implementation or only the interface is exherited. In the latter case, the method becomes abstract (deferred) in the generalising class. An abstract method is necessarily virtual, even if the original method is not. (Most statically typed OOPLs, e.g., Simula, Eiffel, C++ and Java, allow non-virtual methods.) In multiple generalisation, every exherited method must be virtual in the superclass, whether its implementation is exherited or not, except in some special cases. This may appear surprising, but the reason will be explained in Section 3. Unfortunately, such a change of virtuality is not possible in current C++ and some other OOPLs. There is no problem in Eiffel or Java, where an inherited virtual method can be made non-virtual (frozen, resp. final) in a subclass. The utmost abstract case of generalisation was not explicitly noted in [Pedersen 1989]. Namely, all features can be excluded from the generalisation class. Such classes would be sometimes useful in languages that do not support union types for (reference) variables. Of course, the unrestricted possibility to test the dynamic type of a variable is then required in the language, so this would not work in C++ without other enhancements. As an example of the above, suppose that we are defining classes for a lottery, and there already exist classes for the things that will be given out as prizes, say, Car, Flight_ticket, Camera, Comb, etc. The class Prize could then be defined as a generalisation of these, without any features. In current languages, the only possibilities would be to redefine all the other classes as subclasses of Prize or to define a whole set of new classes, such that Car_Prize has Car as a component class, etc. Both ways are obviously very clumsy and inconvenient. In general, it would seem sensible for the generalisation class to be abstract by default, even if the exherited classes are concrete and implementation is exherited. This would also conform to the rather common recommendation that superclasses in ordinary inheritance should be abstract. 3. Exheriting implementation The implementation of a class consists mainly of attributes and method bodies. Attributes are unproblematic, except for possible type or visibility conflicts. Simply, only those attributes common to all subclasses are exherited. Most OOPLs allow attributes of a class to be declared public, although this is usually strongly discouraged in the literature. Such attributes belong also to the class interface, not just to the implementation. However, that does not affect exheritance. In [Pedersen 1989] the situation where a generalised class G has no virtual methods is discussed first. It is suggested that it should get the whole implementation of one subclass; in multiple generalisation, the programmer should explicitly select which one. We can call this class the principal subclass 1 of G. Alternatively, a complete new implementation can be provided if the programmer prefers that. (It is mentioned that this is not the only possible approach.) 1 This term does not appear in [Pedersen 1989].

Unfortunately, Pedersen s proposal would break the desired inverse relationship between specialisation and generalisation in all common object-oriented languages! The simple reason is that subclasses inherit all features of their superclasses. It is obvious that no other classes which G exherits except its principal subclass would be its subclasses; if a new implementation is written, no exherited class is a subclass of G. 1 If there are virtual methods, Pedersen [1989] suggests that they would also be exherited from the principal subclass as a default, assuming that they exhibit the same behaviour in all subclasses. However, the programmer could override this default for each virtual method separately, by writing a new body or by declaring it abstract. Since this part of Pedersen s proposal is hopelessly wrong, we must either restrict method exheritance to interfaces only or find a feasible solution for implementation (body) exheritance. There is a very simple solution: the programmer should specify for each method from which subclass it should be exherited. This would not yet break the correspondence between specialisation and generalisation, because virtual methods can be freely overridden in all common languages. Method body exheritance has an essential problem, however. Consider a method M of subclass S that should be exherited to class G. The compiler must check that all attributes and other methods 2 of S that M uses are also exherited to G. This probably means in practice that the exheritance of most method implementations is impossible in multiple generalisation. This problem was not present in Pedersen s approach: the whole implementation means that all attributes and also all non-exherited methods of the principal superclass are available inside the class, although they are not visible outside. It was claimed in Section 2 that every exherited method in multiple generalisation must almost always be virtual in the superclass, even if it is non-virtual in some subclass. To prove this, let us assume that classes A and B both have a method M, and class G is generalised from A and B, exheriting M. If G::M is non-virtual, then the equivalence of exheritance and inheritance requires that both A::M and B::M are also non-virtual and the same as G::M. A reasonable definition of the equality between methods requires that M is inherited to both A and B from some common ancestor. Thus, this is the special case in which G::M can be non-virtual. A more subtle problem is the possible redefinition of attribute types and method signatures in some languages. In such languages, the exherited method must be recompiled in the context of G, and may well fail in type checking although it is type correct in the context of S. Even subtler problems can arise from other assumptions that have been made about the current object when the method code has been written; usually they are implicit. In Eiffel, one such assumption is explicit but cannot be statically checked by the compiler: the class invariant. It can cause problems in exheritance because the invariant of a subclass must be the same or stronger than that of its superclasses. In contrast to invariants, method pre- and postconditions do not cause problems: their rules in Eiffel are such that a subclass method would always be valid for a superclass. Note, however, that conflicts between the preconditions of different subclasses may completely prevent the exheritance of a method, even its interface (see Section 1). 4. Name conflicts Pedersen [1989] distinguishes two kinds of possible name conflicts in multiple generalisation, between a name (method signature) N 1 from subclass A 1 and a name N 2 from subclass A 2 : 1. The names N 1 are equal, but they denote different methods. 2. The names N 1 are different, but they denote the same method. It is then said that the problem is essentially the same as in multiple specialisation (inheritance), and mostly out of the scope of the paper. I believe that both kinds of name conflicts can be much more common in practice than most 1 Of course, it is possible for some other exherited class to be defined so that it would already be an implicit subclass, but this is a rare coincidence. 2 Method interfaces suffice, bodies need not be exherited.

language designers seem to imagine. We look closer into them from the viewpoint of generalisation; it is not really quite analogous to specialisation (inheritance). First we note three things: Case 2 is usually not called a conflict, but it is nevertheless important and interesting. When virtual methods are concerned, the word method must be interpreted as operation in the same sense as in CORBA: the whole family of potential methods related by inheritance. The discussion is just as applicable to attributes as to methods. Case 1 is a horizontal name conflict, or false friends. It is clear that the feature must not be exherited, since the features are not the same [Pedersen 1989]. However, the conflict must be declared by the programmer, because it cannot be detected automatically. This can be done simply by excluding N 1 (= N 2 ) from exheritance, but a separate syntax for this situation could be useful to show that there really is a conflict. Case 2 might be called lost friends. It could be solved by some simple syntax to indicate the equality of the features and to choose which of the two names should remain in the exheriting class. Like case 1, this situation must be detected by the programmer. In a language that allows features to be renamed in inheritance, such as Eiffel, there actually are situations in which both kinds of conflicts are caused by renaming and could be detected automatically. The common requirement for both cases is that A 1 and A 2 have a common ancestor. Case 2 is easier: The features N 1 may originate from the same seed (in a class called their origin) by inheritance. They can then be recognised as the same feature by the Eiffel compiler even if they have different names because of renaming. In case 1, the features N 1 must have different seeds. They may be inherited from some ancestor class A where they appear as two different features, thus also have different names. They could then be recognised to be different features also in A 1 and A 2. This is more difficult than case 2, because the features need not have a common origin class. In Eiffel, the existing renaming facility would suffice to solve both kinds of conflicts (also when automatic detection is not possible). In fact, the same feature can be both a false friend and a lost friend, and renaming takes care of both problems at the same time. Renaming has its problems, at least when used excessively: it may be difficult to trace the same feature in different classes. On the other hand, renaming may sometimes be desired already in single generalisation: the original name of some feature may have a (linguistic) meaning that is too restricted for the more general class (e.g., push in Stack, but insert in Collection). 5. Combination of generalisation and specialisation It was regarded as a questionable characteristic of Eiffel in [Sakkinen 1989] that inheritance is not subobject-based, taking the transcontinental drivers example from [Meyer 1988]. The essence of that example is as follows: Class Driver is defined first, and then its subclasses France_driver and US_driver. Last, France_US_driver is defined by multiple inheritance from the two previous classes. Some features originating from Driver occur only once in France_US_driver, but others get duplicated by renaming them in inheriting from France_driver and US_driver. In strictly subobject-based inheritance, e.g. in C++, either no features originating from an indirect ancestor class are duplicated ( virtual inheritance in C++), or all are duplicated. However, if generalisation is added to the language. a kind of controlled splitting of an ancestor class becomes possible and an effect similar to the Eiffel example above can be achieved. In this case, we simply define a new class Person that exherits those features of Driver that should not be duplicated (age and birthday). Note that this is single generalisation. 1 We can build interesting combinations of generalisation and specialisation similarly to fork-join inheritance [Sakkinen 1989]. There are two complementary situations already in the simplest cases, for example: 1. Classes C and D both inherit A, and then B exherits both C and D. I.e., features are transferred from A to B through common subclasses. 1 There is another problem in this example, but it is beyond the scope of the current paper.

2. Classes A and B both exherit C, and then D inherits both A and B. I.e., features are transferred from C to D through common superclasses. Note that the resulting inheritance hierarchy is the same in both these cases. They are illustrated in the figure below, where subclasses are placed below their superclasses, but arrows show the dependencies. The direction of the arrow is chosen as in UML: from the dependent (inheriting or exheriting) class to the previously defined one. A X B X A X B Case 1: X X X X Case 2: X X X C X D X C X D If no features are explicitly excluded in the exheritance, B will have at least all features of A in case 1, and D will have at least all features of C in case 2. D could even be declared to be a subclass of C in case 2, but B cannot be declared to be a subclass of A in case 1, unless the enhancement to be discussed next is added. There is a mirror image situation of this if no new features are added in the inheritance. In [Pedersen 1989], a class defined by generalisation always becomes a root class (in languages like Smalltalk or Java, a direct subclass of Object). 1 This is not always desirable one would wish to add new classes also to the middle of an inheritance hierarchy. As an example, we might have the class Cat originally defined as a subclass of Animal, but later want to add class Mammal. We discuss this idea in the case of single specialisation and single generalisation. Either or both may also be multiple; this makes the situation more complicated but does not add anything essential. The class Mammal must have at least all features of Animal, and may have at most all features of Cat. It is a matter of taste whether the minimum or the maximum should be the default. If type redefinitions are allowed in the language, it is again a matter of taste whether the default type for each feature should be taken from the superclass or the subclass. The types may also be redefined in Animal so long as the language s rules will be fulfilled to both directions. Method implementations are the most interesting issue even here. As discussed in Section 3, method bodies exherited from a subclass can often be invalid, but an inherited method body will always be technically valid. Therefore, the body should be taken from the superclass as a default, but the programmer can also specify exheritance from the subclass or write a new body. For consistency with the last item, similarity with the superclass should be the default also for the set of features and for their types. 6. Further aspects A further extension would be possible: that an inheritance relationship could be declared between two already existing classes. From the viewpoint of conventional inheritance, this would be a kind of retrofitting like generalisation: defining things in wrong order. Of course, such a declaration should not change either class itself in any way, but be rejected if the inheritance relationship is not possible according to the rules of the language. In a language that allows renaming in general, it would make sense to allow renaming (i.e., mapping) of features also in such an inheritance declaration. Two existing classes could also be coalesced by declaring them to be equal; that would require, of course, that they really have exactly the same features. This facility would actually allow a compromise between the explicit inheritance of all mainstream OOPLs and Cardelli-style implicit subtyping [Cardelli 1988], which is applied in some functional languages. Implicit subtyping was applied already in Emerald [Black et al. 1986] to abstract 1 It will not remain a root class if another class is further generalised from it.

types, which otherwise correspond exactly to interfaces in Java. It was pointed out by Peter Grogono that with normal inheritance one cannot tell from a class definition whether it has subclasses and with exheritance one cannot even tell whether it has superclasses. This can certainly be a problem for understanding and maintaining programs. Good development environments can help by showing all inheritance relationships of a class. Further problems appear when classes are modified. Suppose that B is a subclass of A in one of the three possible ways: 1. B inherits A (the conventional case), 2. A exherits B, 3. A and B are defined independently, and the subclass relationship declared afterwards (as suggested above). Modifying A or B has different implications in these cases. In case 1, B can be modified without effects on A. In contrast, modifications to A directly affect B as well, and they may also make the inheritance relationship impossible (unless B is also modified in a suitable way). This is often called the fragile superclass problem. Case 2 is the mirror image of case 1; modification of the superclass A is easy, but a modification of the subclass B causes repercussions. In case 3, modifications to one class do not affect the other class. However, the inheritance relationship can become impossible, e.g., if some new feature is added to A but not to B. In object-oriented databases and persistent object systems, schema evolution mainly means modification of classes and their relationships, and it is a very important and difficult issue [Skarra and Zdonik 1986]. In particular, when instances of some class C already exist in the database and must be preserved, the modification of C is problematic even if C has no subclasses. Exheritance might be especially useful for databases, because by using it one could avoid some class modifications that would otherwise be necessary. I thank Andrew Black for some useful comments on the draft version of this paper. References [Black et al. 1986] Andrew Black, Norman Hutchinson, Eric Jul, and Henry Levy, Object structure in the Emerald system, OOPSLA 86 Proceedings, 78-86. [Cardelli 1988] Luca Cardelli, A semantics of multiple inheritance, Information and Computation Vol. 76 No. 2/3, 138-164. [Pedersen 1989] Claus H. Pedersen, Extending ordinary inheritance schemes to include generalization, OOPSLA 89 Proceedings, 407-417. [Sakkinen 1989] Markku Sakkinen, Disciplined inheritance, ECOOP 89 Proceedings, 39-56. [Sakkinen 1992] Markku Sakkinen, A critique of the inheritance principles of C++, Computing Systems Vol. 5. No. 1, 69-110. Corrigendum, Computing Systems Vol. 5. No. 3, 361-363. [Skarra and Zdonik 1986] Andrea H. Skarra and Stanley B. Zdonik, The management of changing types in an object-oriented database OOPSLA 86 Proceedings, 483-495.