Measuring The Quality Of Inferred Interfaces

Measuring The Quality Of Inferred Interfaces For Statically Typed Languages Zusammenfassung: Measuring The Quality Of Inferred Interfaces Florian Forster Department of Computer Science, University of Hagen florian.forster@fernuni-hagen.de Die Einführung von Interfaces in ein Programm dient der Entkopplung des Quelltextes und der Erhöhung der Flexibilität. Typinferenzalgorithmen können dazu benutzt werden um Interfaces aus der Benutzung eines Deklarationselements im Quelltext abzuleiten. Allerdings ist es so, dass, wenn viele Deklarationselemente den gleiche Typ haben, für diesen Typ durch die Algorithmen unter Umständen sehr viele neue Interfaces eingeführt werden. Allerdings müssen Entwickler ihrer Intuition vertrauen wenn es darum geht den Nutzen der neuen Interfaces zu beurteilen, sprich Entwickler müssen die zusätzlichen Wartungskosten fr jedes Interface dem Nutzen gegenüberstellen. Um diesen Missstand zu beheben wird in diesem Aufsatz eine Metrik hergeleitet die Mengen von Interfaces und einzelne Interfaces im Bezug auf Ihre Qualität vergleichen kann. Diese Metrik hilft Entwicklern bei der Entscheidung welche neuen Interfaces den höchsten Nutzen bringen. Die Aussagekraft der vorgestellte Metrik wird an einem Beispiel aufgezeigt und es wird ein Ausblick auf eine Integration der Metrik in Eclipse gegeben. Schlüsselbegriffe: Metriken für Kopplung, Interfacebasierte Programmierung Abstract: Introducing interfaces to a program serves to decouple the code and to increase its flexibility. Type inference algorithms can be used to extract the interface required from an existing type as expressed by a declaration element typed with this type. However, if many variables in a program are typed with the same type, many new interfaces are likely to be deduced these algorithms. Unfortunately, the developer has to trust his intuition deciding whether the new interfaces proposed by the type inference algorithm are worth the trouble, i.e. if the increased decoupling outweighs the additional maintenance effort which comes along with every new interface and vice versa. Therefore, we provide a measurement to compare sets of inferred interfaces with each other, thus helping developers to select the best set of interfaces for his needs. Furthermore, we briefly evaluate our metric and provide a short sketch for the integration of the metric to the Eclipse IDE. Keywords: Coupling Metrics, Interfacebased Programming IWSM/Metrikon 2006

Florian Forster 1 Introduction Interface-based programming [5] is accepted as a useful object-oriented programming technique. According to it, declaration elements in a program should be declared with interfaces and not classes as their types. Two main benefits can be expected from this: First, flexibility is increased, as classes implementing an interface can be exchanged without notification of the client which uses the services of the class via the interface. Second, access to classes is restricted to the methods declared in the interface which is used to type a declaration element holding a reference to an instance of a class. Several algorithms capable to infer interfaces from source code have been developed over the last few years as described in [1, 4, 6,, 8, 10, 11]. Unfortunately, there is not any case study comparing these algorithms with each other. This is mainly due to the fact that there does not exist any metric which measures the quality of a newly introduced interface. For example, if programmers use the refactoring Extract Interface in the Eclipse IDE, they are left alone with the decision, if the inclusion of an additional method to the interface is worth the trobule in terms of increased decoupling. In this paper we present such a metric, thus laying the foundation stone for a case study that compares the existing interface inference algorithms with each other. Furthermore, the presented metric makes the implementation of tools, which guide developers throughout the creation of interfaces, possible. The rest of this paper is organized as follows. After introducing some conventions in section 2 we present approaches for the inference of interfaces from source code. Section 4 provides the necessary background for the derivation of our metric in section 5. In section we derive a second metric capable of comparing sets of interfaces with each other. We present a small example for the application of the metric in section 8 and outline our vision in section 9. 2 Conventions As mentioned above, we introduce a few conventions to make the remainder of this paper easier to read. Each program in a statically typed language like Java defines a set of types which we will call T. This set of types can be divided into the set of classes, C T, and the set of interfaces, I T. Note that T = C I. Whenever we use c/i/t with no or an arbitrary index, we refer to an arbitrary class/interface/type, i.e. c C/i I/t T. Furthermore, each program consists Software Measurement Conference

Measuring The Quality Of Inferred Interfaces For Statically Typed Languages of a set of declaration elements, i.e. formal parameters, fields, local variables and methods, to which we will refer as D. 3 Inferring Interfaces Developers of classes are responsible for providing implementations of classes and interfaces. Writing the implementation is straightforward 1, as the developer knows which services the class has to provide from the software requirements. Writing interfaces however is a more challenging task for which two approaches exist. First, developers could think about different contexts in which a set of similar 2 classes might be used after being introduced to the program and provide an interface for each of these contexts before these classes are deployed to the program. Second, once the program is finished, developers could manually derive interfaces from the usage of a specific class or use algorithms like the ones described in [1, 4, 6,, 8, 10, 11]. Unfortunately, the second approach, no matter if interface inference is done manually or automatically, requires that developers examine all declaration elements UnrefinedDeclarationElements(t) = {d 1,... d n } D, which are typed with a class or an interface t for the purpose of inferring one or more new interfaces for t. Note that DeclaredDeclarationElements(t) includes only those declaration elements typed with t which do not access non-public methods or fields. Due to the fact that those declaration elements which are typed with t and make use of non-public methods or fields cannot be redeclared with a new type, we argue that they should not influence a metric measuring the benefit of such a redeclaration. Afterwards, for each declaration element d U nref ineddeclarationelements(t) the transitive closure of all the assignments starting with d on the right side has to be deduced from source code. Finally, all method calls which are accessed through declaration elements in the transitive closure of d have to be aggregated. This aggregation of methods, which we call the access set of a declaration element, provides information in which context the type t is used regarding d. However, as shown in [3], redeclaring every declaration element d with its least specific interface, i.e the interface which contains only those methods which are in the access set of d, is not desirable and leads to a bloated type hierarchy. To 1 At least if a good specification exists. 2 Similar in terms of performing the same task with different implementation. IWSM/Metrikon 2006

Florian Forster a certain extend, the inference algorithms from [1, 4, 6,, 8, 10, 11] try to find least specific interfaces (or types). Therefore, to compare these algorithms with each other we present a metric which measures the benefit of introducing a new interface to an existing program. 4 Calculating Access Sets For Declaration Elements As mentioned in section 2, programs in statically typed languages like Java come with a set of types. Each of these types t declares a set of members 3, members(t). We call the subset publicinterf ace(t) := {m members(t) m is a public nonstatic method} the public interface of the type t. In each program the elements of T are arranged in an hierarchy using the reflexive and transitive relation on the types. The relation t 1 t 2, i.e. t 2 is a supertype of t 1, induces the relationship publicinterface(t 1 ) publicinterface(t 2 ) on the public interfaces of t 1 and t 2, i.e. t 1 has at least the same public nonstatic methods as t 2. Each member of the set of declaration elements D in a program has a declared type t. By means of this explicit declaration, each of these declaration elements, d UnrefinedDeclarationElements(t), has access to a certain set of methods, i.e. the public interface publicinterf ace(t) provided by t. However, not all of these methods are accessed using a specific declaration element. The access set of d, accessset t (d), consists of all public nonstatic methods accessed on a declaration element d which is typed with t. This access set is calculated by analyzing the transitive closure of all the assignments, explicit assignments or passing the declaration element to a method, starting with d on the right side. For example let d 1 and d 2 be two declaration elements typed with t and d 2 := d 1 be an assignment in the program. The access set accessset t (d 1 ) consists of all the methods directly invoked on d 1 united with accessset t (d 2 ). Since we target at statically typed languages, Static Class Hierarchy Analysis as described in [2] suffices for computing access sets. More details on calculating access sets for declaration elements can be found in [8]. 3 The members of a class are all fields and methods which are explicitly defined in this class or inherited from one of its superclasses. Software Measurement Conference

Measuring The Quality Of Inferred Interfaces For Statically Typed Languages 5 The Quality Of An Inferred Interface Informally spoken, a good interface i for the purpose of reducing coupling by redeclaring a set of declaration elements currently typed with a type t should meet two requirements. First, the interface i should be a suitable and the most context-specific type for as many declaration elements currently typed with t as possible. The set DeclarationElements(i) represents all declaration elements from U nref ineddeclarationelements(t) which can be retyped with i and i is both a suitable and the most context-specific type. An interface i 1 is the most context-specific one for a declaration element d DeclarationElements(i), if there is no other interface i 2 which is suitable for typing d and the relation publicinterface(i 2 ) < publicinterface(i 1 ) holds true, i.e. there is no better matching type existing in the program. An interface is suitable for a declaration element d DeclaredDeclarationElements(t), if for the access set of d the relation accessset t (d) publicinterface(i) is true. In the following we write redeclarationp ower(i) instead of DeclarationElements(i) to make the reading of the metrics easier. Second, the new interface i should be as context specific as possible. This means that each declaration element d from DeclarationElements(t) which should be retyped with the interface i should make use of most methods available through the interface i, i.e. d DeclarationElements(i) the difference of publicinterface(i) accessset t (d) is minimal or 0. Therefore the average fraction of used methods from the available methods in the interface i for the declaration elements from DeclarationElements(i), calculated by d DeclarationElements(i) accessset t (d) publicinterf ace(i) redeclarationp ower(i) should be maximized. Note that in this case the relation accessset t (d) publicinterface(i) is also true for d DeclarationElements(i). Unfortunately these two goals are inversely correlated with each other as follows. The more contexts an interface i subsumes, i.e. redeclarationp ower(i) is big, and therefore the more methods it contains, the less likely it gets that for an arbitrary declaration element d DeclarationElements(i) that the value of accessset t (d) publicinterface(i), i.e. the number of used methods on d in relation to the available (1) IWSM/Metrikon 2006

Florian Forster methods in the interface i, is close to one. To take this correlation into account we combine the two requirements by multiplication. Therefore the task of finding a good interfaces i for redeclaring declaration elements currently typed with type t leads to the task of maximizing the product d DeclarationElements(i) accessset t (d) publicinterf ace(i) quality(i, t) =redeclarationp ower(i) ( redeclarationp ower(i) accessset t (d) = publicinterf ace(i) d DeclarationElements(i) ) (2) However, this value is still unrelated to the type t for which the interface i is created. To take this relation into account, we have to add the factor redeclarationp ower(i) redeclarationp ower(i) U nref ineddeclarationelements(t) to quality(i, t). DeclaredDeclarationElements(t) represents the fraction of declaration elements from the type t which can be safely 4 redeclared with the interface i. Thus the quality of an interface is redefined as redeclarationp ower(i) quality(i, t) = ( UnrefinedDeclarationElements(t) ) accessset t (d) publicinterf ace(i) d DeclarationElements(i) (3) The alert reader may have noticed that up to now we made the assumption that any interface is introduced to the program for redeclaring declaration elements typed with only one specific type t. Unfortunately, it is not uncommon 5 that one interface i is used to redeclare declaration elements from more than one type. Hence for a set of types {t 1,..., t n } the quality of newly created interface i for a set of types 4 In this case safely means that the program remains type correct after the redeclaration 5 Especially if the interface is deduced manually. Software Measurement Conference

Measuring The Quality Of Inferred Interfaces For Statically Typed Languages is redefined as quality(i, {t 1,..., t n }) =( 1 n t {t 1,...,t n } d DeclarationElements(i) redeclarationp ower(i) UnrefinedDeclarationElements(t) ) accessset t (d) publicinterf ace(i) Hence a good interface i for a set of types is characterized by a relatively high value for quality(i, t) compared to the status quo, i.e. quality(i t, t), i t t {t 1,...,t n } being the implicitly available public interface of a type i, in terms of our metric. 6 Comparing Interfaces t {t m,...,t n } t {t k,...,t l } As we have a measurement for the quality of a newly introduced interface, we are only one step away from our goal. In order to compare two interfaces i 1 and i 2 being introduced for two sets of types, i 1 being introduced for {t m,... t n } and i 2 being introduced for {t k,..., t l }, we have to normalize the quality values. We argue that the situation before the introduction is the status quo and therefore quality(i t, t) or quality(i t, t) respectively serves as our normalization factor. We have to redefine the quality for any interface i which is used to redeclare a set of declaration elements {t 1,..., t n } we measure the quality by normalizedquality(i, {t 1,..., t n }) = quality(i, {t 1,..., t n }) quality(i t, t) t {t 1,...,t n } Let i 1 and i 2 be two interfaces introduced for two sets of types {t m,..., t n } and {t k,..., t l }. Therefore if normalizedquality(i 1, {t m,..., t n }) > normalizedquality(i 2, {t k,..., t l }), quality(i,{t i.e. m,...,t n }) quality(i t,t) > quality(i,{t k,...,t l }) quality(i t,t) holds true, then the introduction of i 1 t {tm,...,tn} t {t k,...,t l } to the program should be preferred. Comparing Decoupling Sets The goal of most type inference algorithms and manual refactorings related to type inference is to reduce coupling for a set of types {t 1,..., t n } from the pro- (4) (5) IWSM/Metrikon 2006

Florian Forster gram by using a set of interfaces {i 1,..., i n } for retyping all the declaration elements typed with the types {t 1,..., t n }. We call the union of the implicitly available public interfaces {i t1,..., i tn } of the types {t 1,..., t n } and {i 1,..., i n } a decoupling set for the class types {t 1,..., t n } and write DS {t1,...,t n } instead of {i t1,..., i tn } {i 1,..., i n }. To find the optimal set of interfaces (in terms of our quality metric) for a set of types we need means to compare the quality of decoupling sets with each other. Therefore, we define the quality of a decoupling set DS t1,...,t n as quality(ds {t1,...,t n }, {t 1,..., t n }) = normalizedquality(i, {t 1,..., t n }) i DS {t1,...,tn} (6) i.e. the sum of the individual qualities for each new and public interface in the decoupling set. Note that if all declaration elements of a type t {t 1,..., t n } can be redeclared with the new interfaces then normalizedquality(i t, {t 1,..., t n }) is zero due to redeclarationp ower(i t ) UnrefinedDeclarationElements(t) being zero for type t. We can use this value to compare two decoupling sets, i.e. one set of types {t 1,..., t n } decoupled with two different sets of interfaces DS {t1,...,t n } 1 and DS {t1,...,t n } 2. The relation quality(ds {t1,...,t n } 1, {t 1,..., t n }) > quality(ds {t1,...,t n } 2, {t 1,..., t n }) () means that the decoupling set DS {t1,...,t n } 1 provides a better decoupling for the set of types {t 1,..., t n } than the decoupling set DS {t1,...,t n } 1 in terms of our quality metric. t {t 1,...,t n } Note that adding a perfect matching interface i for a set of types {t 1,..., t n }, which is most likely only a perfect match for a small number of declaration elements, to a decoupling set DS {t1,...,t n } of this set of types, need not result in a big improvement of quality(ds {t1,...,t n }, {t 1,..., t n }), i.e. quality(ds {t1,...,t n }, {t 1,..., t n }) quality(ds {t1,...,t n } {i}, {t 1,..., t n }) is small, as quality(i, {t 1,..., t n }) is small redeclarationp ower(i) due to U nref ineddeclarationelements(t) being small. This reflects that introducing many perfect matching interfaces, each with a small set of declaration elements, is not necessarily better than finding interfaces which subsume several access sets, as each new interface results in additional maintenance effort [3, 5]. Software Measurement Conference

Measuring The Quality Of Inferred Interfaces For Statically Typed Languages 8 Evaluation Of The Metric We will use the example presented in Listing 1 to illustrate the usage of access set graphs for inferring useful interfaces. The class Actor 6 represents an actor in a theater group who plays several roles (private Set roles). Clients can remove (deleterole(role)) and add (addrole(role)) roles to the actor or get all the roles an actor currently plays (getroles()). The class actually offers four more methods which are omitted here due to space restrictions, but are used later for evaluating the changes we make. p u b l i c f i n a l c l a s s Actor { p r i v a t e S e t r o l e s ; p u b l i c void addrole ( Role r o l e ) { r o l e s. add ( r o l e ) ; } p u b l i c void d e l e t e R o l e ( Role r o l e ) { r o l e s. remove ( r o l e ) ; } p u b l i c S e t g e t R o l e s ( ) { r e t u r n C o l l e c t i o n s. u n m o d i f i a b l e S e t ( r o l e s ) ; }... } Listing 1: A class for an Actor For the evaluation we assume that there are eight declaration elements existing. Four of them (D 1 ) use either deleterole(role) or addrole(role). Two of them (D 2 ) use both deleterole(role) and getroles() and the remaining two (D 3 ) use addrole(role) and getroles() p u b l i c i n t e r f a c e R o l e P l a y e r { p u b l i c void addrole ( Role r o l e ) ; p u b l i c void d e l e t e R o l e ( Role r o l e ) ; p u b l i c S e t g e t R o l e s ( ) ; } Listing 2: The context-specific interface RolePlayer For simplicity and later usage we calculate the normalization factor, i.e. 6 Note that this example is taken from one of our student projects. IWSM/Metrikon 2006

Florian Forster quality(actor, {Actor}) by quality(actor, {Actor}) =( 1 1 ( t {Actor} d D(Actor) redeclarationp ower(actor) UnrefinedDeclarationElements(Actor) ) accessset Actor (d) publicinterface(actor) ) =( 1 1 8 8 ) (1 1 1 1 }{{} D 1 2 2 }{{ } D 2 2 2 ) = }{{ } D 3 For evaluation of the quality of the interface shown in Listing 2, we have to compare the resulting decoupling set with other decoupling sets. The first decoupling set is from the original version of the program, i.e. DS Actor1 = {Actor}. The value quality of this decoupling set is calculated as quality(ds Actor1, {Actor}) = quality(actor, {Actor}) = normalizedquality(actor, {Actor}) quality(actor, {Actor}) = quality(actor, {Actor}) = 1 which is not surprising as we defined the status quo as our normalization factor. Introducing the interface RolePlayer from Listing 2 results in the decoupling set Software Measurement Conference

Measuring The Quality Of Inferred Interfaces For Statically Typed Languages DS Actor2 = {Actor, RoleP layer}. The quality of this new declaration set is quality(ds Actor2, {Actor}) =quality({actor, RoleP layer}, {Actor}) =normalizedquality(actor, {Actor}) normalizedquality(rolep layer, {Actor}) ( 0 8 = d D(Actor) ( 8 8 =0 3 = accessset Actor (d) publicinterface(actor) ) d D(RoleP layer) accessset RoleP layer (d) publicinterface(rolep layer) ) (1 ( 1 3 1 3 1 3 1 3 }{{} = 3 D 1 2 3 2 3 2 3 2 }{{ 3} D 2,D 3 )) as we now have four declaration elements from D 1 using one out of three and four declaration elements from D 2 and D 3 using two out of three available methods from the interface RolePlayer. Algorithms for automatic type inference in existing programs like the one presented in [8] introduce minimal interfaces for declaration element in a program. Using one of these algorithms on the example presented in Listing 1 results in one interface for D 1, D 2 and D 3. Therefore DS Actor3 = {Actor, Iface D1, Iface D2, Iface D3 } is the decoupling set IWSM/Metrikon 2006

Florian Forster in this case. The quality of this decoupling set is calculated as quality(ds Actor3 ) =quality({actor, Iface D1, Iface D2, Iface D3 }, {Actor}) =normalizedquality(actor, {Actor}) normalizedquality(iface D1, {Actor}) normalizedquality(iface D2, {Actor}) normalizedquality(iface D3, {Actor}) ( 0 8 = ( 4 8 accessset Actor (d) publicinterface(actor) ) d D(Actor) accessset IfaceD1 (d) publicinterface(iface D1 ) ) d D(Iface D1 ) accessset IfaceD2 (d) publicinterface(iface D2 ) ) d D(Iface D2 ) ( 2 8 accessset IfaceD3 (d) publicinterface(iface D3 ) ) d D(Iface D3 ) ( 2 8 =0 16 8 4 8 4 8 Even though this is better than the original version, i.e. quality(ds Actor3 ) > quality(ds Actor1 ), it is worse than the manually deduced version, i.e quality(ds Actor3 ) < quality(ds Actor2 ). 9 The Vision One of the reviewers for the draft to this paper noted that [t]he Relevanz is scored low because of my doubt whether the proposed academic measurement ( scientific formulas) will be implemented and used in a non-academic environment.. We admit that the formulas are very cumbersome to calculate manually. Yet our vision is to consider situations like depicted in Figure 1. As one can see there is no information whatsoever whether the selected methods are a good choice for a new interface. Our vision is that the proposed metric is = 4 Software Measurement Conference

Measuring The Quality Of Inferred Interfaces For Statically Typed Languages Figure 1: Using Extract Interface... to create the new interface RolePlayer calculated and presented on-the-fly during the creation of new interfaces using the Extract Interface... -Refactoring. We are currently implementing the metric as a plugin to the Eclipse IDE. Afterwards we will refine the refactoring Extract Interface... to present the metric and investigate its usage in day-to-day development tasks. Furthermore the metric can also be used to measure the usage of already existing interfaces. We envision a tool which assumes that the selected interfaces are not yet introduced to a program and calculates the quality metric. Developers are then capable of deciding, based on the metric and domain knowledge, if the selected interfaces should be removed from the program, i.e. the additional maintenance effort for the interfaces is not worth the benefit. 10 Conclusion After defining access sets for declaration elements we defined a measurement for the quality of a single interface. We refined this metric so that it is now capable of comparing sets of newly created interfaces with each other. A small example was used to depict the application of the metric. Finally a brief look into the future concluded the paper. We plan to finish the implementation of the tools mentioned IWSM/Metrikon 2006

Florian Forster in 9 and only then the metric will show if it is useful or not. Nevertheless, at least to our knowledge, we are the first to present a way decreased decoupling which comes along with interfaces might be measured. References [1] Alur, R., et al. Synthesis of interface specifications for Java classes Proceedings of the 32nd ACM SIGPLAN-SIGACT, pages 98-109, 2005 [2] Dean, J. and Grove, D. and Chambers, C. Optimization of object-oriented programs using static class hierarchy analysis, Proceedings of ECOOP, pages -101, 1995. [3] Forster, F. Cost and Benefit of Rigorous Decoupling with Context-Specific Interfaces, Principles and Practices of Programming Java, 2006. [4] Khedker, U.P. and Dhamdhere, D.M. and Mycroft, A. Bidirectional data flow analysis for type inferencing, Computer Languages, Systems & Structures 29:, pages 1544, 2003. [5] Löwy, J. Programming.NET Components, O Reilly Media, 2005. [6] Palsberg, J. and Schwartzbach, M.I. Object-oriented type inference, Proc. of OOPSLA, pages 146-161, 1991 [] Snelting, G. and Tip, F. Understanding class hierarchies using concept analysis, ACM TOPLAS 22:3, pages 540-582, 2000. [8] Steimann, F. and Mayer, P. and Meißner, A. Decoupling classes with inferred interfaces, Proceedings of the 2006 ACM Symposium on Applied Computing, 2006. [9] Steimann, F. and Siberski, W. and Kühne, T. Towards the systematic use of interfaces in Java programming, Proc. 2nd Int Conf on the Principles and Practice of Programming in Java, pages 13-1, 2003. [10] Tip, F. and Kiezun, A. and Bäumer, D. Refactoring for generalization using type constraints, Proc. of OOPSLA, pages 13-26, 2004 [11] Wang, T. and Smith, S.F. Precise constraint-based type inference for JAVA, Proc. of ECOOP, pages 99-11, 2001. Software Measurement Conference