Product Line Evolution Using Source Packages

Product Line Evolution Using Source Packages Arie van Deursen Merijn de Jonge CWI P.O. Box 94079, 1090 GB Amsterdam, The Netherlands http://www.cwi.nl/ {arie,mdejonge} Abstract We present a language-independent approach for product line initiation and evolution. Our approach is based on a decomposition of a product line into packages. These packages constitute the build time variation points of the product line. They can be configured, selected, combined, and extended in order to yield tailormade products for individual customers. In order to reduce initial investment involved in product line adoption, we propose an incremental method for designing the product line package structure. At each iteration, we analyze the variability required by the existing and currently anticipated products, and we recover the variability offered by the actual product line implementation, Given these two, we design a target package decomposition offering the required variability, as well as a migration path to arrive at this target from the current product line implementation. The proposed approach is illustrated using a commercial product line in the area of documentation generation for legacy systems. Keywords Variability, feature description, configuration management, customization, distribution, source tree composition. 1 Introduction In this paper, we present a light-weight, language-independent approach for initiating and evolving software product lines. Our techniques cover the full range from product line design, configuration, and validation, to product instantiation. Our approach involves a decomposition of software systems into source packages. These packages and their configuration interfaces first of all constitute the build time variation points of a product line. They can be configured, selected, combined, and extended to form tailor-made products for individual customers. This research was sponsored in part by the Dutch Telematica Instituut, project DSL. 1

In addition to that, the packages we propose have a number of attractive software development qualities. To mention a few, they can be independently developed and deployed, they can be subjected to explicit release management making it possible to work with stable, released components, they can be bundled into composite packages, they enforce a uniform configuration and build interface, and they can easily be combined with third-party packages. The actual way in which a product line should be decomposed into such packages, depends on the variability that is required between derived products. Unfortunately, this variability is hard, if not impossible to determine in advance, not in the least because the variability requirements are likely to change over time. Therefore, we propose SIMPLE, 1 an incremental method for product line evolution based on frequent refactorings of the package decomposition. To that end, we split the development of a product line into a series of small iterations. At each iteration, we analyze the variability required by the existing and currently anticipated products, and we recover the variability offered by the actual product line implementation. Given these two, we design a target package decomposition offering the required variability, as well as a migration path to arrive at this target from the current product line implementation. One of the key benefits of our approach is its support for light-weight product line initiation. We can start by developing one or more individual products, and then factor out common elements into separate packages. Moreover, if the existing decomposition turns out to be inadequate for new customers, we have a systematic method for refactoring the package decomposition. As a result, the need to know variability requirements in advance is reduced, thus lowering the investment needed to initiate a software product line. In this paper, we discuss techniques for modeling the feature space of product lines, as well as techniques for expressing and validating configurations. We describe techniques that automatically map configurations to corresponding implementations, including techniques for product assembly from source packages, build time reflection, and customer specializations. Moreover, we describe how to use these mechanisms in order to initiate and evolve a software product line. We illustrate our ideas using our experience with the design, implementation, and deployment of DocGen, a commercial product line in the area of documentation generation for legacy systems (see [4]). 2 Composable Packages A (source) package is a basic development unit which can be separately created, maintained, released, tested, and assigned to a team [5]. We will use a specific notion of package to address product line decomposition and product composition, based on our earlier research on source tree composition [8]. A package is described by a package definition, illustrated in Figure 1. In the example, the packages calls and summary are defined, listing their name, version, source location, and dependencies on other packages. Other entries in a package definition (not shown) cover configuration parameters, keywords, and documentation. 1 SIMPLE is a Simple Incremental Method for Product Line Evolution. 2

package name=calls version=1.0 location=http://www.cwi.nl/docgen requires docgen-base 1.0 html-lib 1.0 package name=summary version=1.0 location=http://www.cwi.nl/docgen requires docgen-base 1.0 html-lib 1.0 Figure 1: Two package definitions for the DocGen product line A package has explicit build instructions. Using these instructions, the package can be compiled, tested, and distributed. Packages can be composed using a process called source tree composition. This involves merging source trees, build instructions, and configuration processes. The result of this is a software bundle: a single, selfcontained source tree with centralized build and configuration processes. Source tree composition is supported and automated by autobundle. 2 Packages based on source tree composition have a number of desirable qualities, making them a suitable mechanism for decomposing software product lines: Individual packages can be developed, maintained, and released separately; Package dependencies are explicitly documented; Product assembly can be based on package composition, thus making use of stable, versioned components only; Developers can work with a focused source tree, obtained by bundling only those packages actually required; Packages provide a uniform build and configuration view, which can also be used for the smooth integration of third party (commercial) components. Source tree composition is a light-weight technique. The actual implementation of autobundle only relies on automake and autoconf, two standard packages for software building and configuration. The use of packages is independent of the source languages or configuration management tools actually used. Moreover, source tree composition is complementary to other forms of configuration, such as aspectoriented programming [10], code generation (packages relying on this need to invoke the appropriate generators in their build processes) or dynamic configuration. 3 Product Variability The first step in initiating a product line architecture is to analyze the variability of different existing or anticipated software products, a process called domain analysis (see [1] for a recent overview). Thus, we identify what the common features of all products are, what the differentiating features are, and what the dependencies between these 2 autobundle is free software and available for download at http://www.cwi.nl/ mdejonge/ autobundle/ 3

Cobol-Docgen : all(cobol-input, Cobol-Presentation) Cobol-Input : all(embedded-languages?, Dialect) Embedded-Languages : more-of(sql, idms, cics,...) Dialect : one-of(standard-85, standard-74, microfocus, digital,...) Cobol-Presentation : more-of(summary, files-used, calls, called-by,...) Figure 2: Example Feature Description variable features are. Moreover, features are classified according to their binding time, distinguishing compile-time, activation time, and runtime variability. In this paper, we will focus on compile-time variability only. To explore the variability of a software product line we use the Feature Description Language FDL discussed by [3], which is based on the feature diagrams of the Feature Oriented Domain Analysis method FODA [9]. An example FDL description for part of the DocGen variability is shown in Figure 2. The atomic features, written in lowercase italics, represent distinctive uservisible characteristics of the individual products [9]. Composite features are written in Uppercase and consist of sub-features that can be atomic or composite. An FDL feature definition indicates dependencies between features, using operators such as all, one-of, and more-of as well as? to denote optional features. The example definition indicates that a DocGen configuration for Cobol should consist of a selection of the expected input language and the required presentations. As an example, the list of features sql standard-85 summary calls called-by is a valid configuration indicating a DocGen product which can generate documentation for Cobol programs containing embedded SQL and written according to the Cobol-85 standard. The documentation derived for these programs consists of a summary, the calls made by each program, and the way in which the program is activated. Feature descriptions are remarkably similar to grammars: atomic features correspond to terminals, composite features to non-terminals, and feature operators to syntax operators. Thus, we can derive a grammar from a feature description, yielding a definition of a configuration language. Each valid sentence in such a language corresponds to a valid product line configuration. We have used this correspondence to obtain tool support for writing feature definitions and product configurations. We implemented an automatic mapping from FDL to the syntax definition formalism SDF. As a result, we could directly reuse existing grammar-based tools to obtain a syntaxdirected configuration editor as well as a validating configuration parser. In addition to that, the derived grammar can be used to obtain a visual configuration wizard, helping product engineers to compose systems from valid feature combinations. 4

4 Package-Based Product Assembly This section discusses how an implementation is obtained from a valid configuration, yielding a mapping from the product line s problem space to its solution space [1]. A key challenge in software product line development is to be able to deliver exactly those parts of a product line that correspond to the features of a configuration. For example, we developed many visualizations of customer-specific artifacts. Because the corresponding implementations often contain essential knowledge of a customer s business, it is essential that this code is not delivered to others. Ideally, each feature is therefore implemented in a separate source package. This has several advantages: Customer specific code will not be delivered to others because the implementation of a particular configuration can be exactly selected; Product line instantiation can be based on source tree composition: a product engineer can assemble new products automatically by selecting and configuring the appropriate packages; Feature implementations are loosely coupled permitting independent development and deployment. Some features, however, cannot be implemented in isolation. An example is switching logging on or off, which affects many different implementation components. Such features are called crosscutting. In source packages, they correspond to a configuration switch shared by all packages affected by the crosscutting feature. As an example of automated product assembly, consider Figure 1 which contains two (shortened) package definitions for the DocGen features calls and summary. This definition provides the URL where the implementation of these features can be downloaded. Two additional source packages (docgen-base and html-lib) are shared by both implementations. Using autobundle to combine the calls and summary features yields a new single source tree containing both packages as well as the ones on which they depend. 4.1 Configuration-Aware Product Instances After a source tree is assembled by autobundle, a product instance has to be built from it that behaves according to its configuration. Correct behavior of a product instance requires that all features are contained in the running application, and that they are activated. The exact contents of a product instance is not known at development time because it is configuration-dependent. Also, implementations of optional and alternative features cannot know in advance which sub-features will be selected. Therefore, a dynamic mechanism is needed, that allows the implementation of a composite feature to discover which sub-features are contained in a configuration. This is essential to determine which functionality should be linked in the application, and to be able to access this functionality from within the application. We implemented such a mechanism based on feature registries. A source package that implements an optional, or alternative feature, queries a specific registry 5

Table 1: Customer Factories (see [2] for a complete description). Customer factories are an extension of the factory design pattern [6], which provides an interface for creating families of related or dependent objects without specifying their concrete classes. The abstract factory serves to easily create customer-specific objects. However, since each customer requires a separate concrete factory, the overhead for instantiating a new customer is not optimal and may become problematic when the number of customers grows. Customer factories solve this problem by having just one customer factory which uses a customer name to find customer-specific classes. Reflection is then used to create an instance of the appropriate customer-specific class. to obtain the list of those sub-features that have been selected in a configuration. The registry is filled during the build process where each sub-feature registers itself in the registry of its containing feature. The registry can be accessed at run-time to configure an application dynamically, or at build time to generate configuration-specific applications. For example, the exact contents of a Cobol presentation depends on the presentation features that have been selected. Since we support customer specific artifact visualization, the complete list of features is not fixed. It would lead to extra and complicated maintenance effort when the component that implements the composite Cobol-Presentation feature should be adapted when a new (customer-specific) visualization has been developed. Instead, each artifact visualization registers itself as being part of the Cobol presentation. During the build process of the Cobol-Presentation feature, the registry is accessed and a configuration-specific Cobol presentation is generated that contains all desired visualizations. 4.2 Customizing Product Instances Each customer has specific requirements. This may result in customer-specific features which extend the configuration space of the product line and which are implemented in separate packages. However, customer wishes cannot always be implemented this way because they are often concerned with adapting the behavior of already existing features. For example, the artifacts visualized by DocGen require code analysis to retrieve the desired information. It turned out that the extraction is influenced by customerspecific coding conventions. For instance, some of our customers do not call programs directly, but indirectly via an assembly utility. In order to support the calls, and called-by features, the analysis of the call relation had to be adapted to this customer-specific convention. For optimal flexibility, the behavior of existing features should therefore be adaptable according the customer needs. We use customer factories (see Table 1), an object 6

oriented approach, to easily deal with such variants and specializations for different customers. This approach requires that customer specific code is put in a separate source code component, and that it is always bundled during source tree composition. New customers can easily be added with this approach. Because the customer factory automatically searches customer Java packages for relevant class specializations, developing additional concrete factories for new customers is not necessary. This approach also makes it easy to turn an existing class into a variation point permitting customer-specific overriding. This requires only a local change to the class, the abstract factory that is used by all customers does not need to be adapted. 5 Restructuring Variability using SIMPLE The previous sections described general techniques for building product lines. In this section we discuss how to use these techniques for incremental product line initiation. To that end, we propose the method SIMPLE, which consists of the following steps: 1. Analyze the required variability of existing and anticipated products. 2. Analyze the actual variability as implemented in the current product line; 3. Restructure the actual variability until it implements the required variability by applying a series of small refactorings. In SIMPLE, these steps are conducted iteratively, for example at every feature added to any product or the product line, but at least at every product and product line release. 5.1 Recovering Product Variability The differences between existing products define the initial configuration space of the product line. To get more fine-grained control over the functionality of a product, new variation points can be introduced to be able to select functionality more explicitly (for example to eliminate undesired commonalities between products). The resulting variability is expressed using FDL, as described in Section 3. These features lead to a target package decomposition, following the principles discussed in Section 4. While analyzing the variability of DocGen, we identified the source languages supported and the presentations required for them as the two main variability categories (see Figure 2). The presentations differ in visualization (logo, color,...) as well as contents (show database dependencies, show call relations, etc.). We proposed more variation points than needed by just the current product set, as we anticipated a new business model requiring a pay-per-feature approach. 5.2 Recovering Implementation Variability Modifying variability is constrained by the current system implementation. The purpose of this step is to understand how each variable feature is implemented and what the configuration options for each feature are. This step involves an analysis of source files, source trees, and the build processes. We distinguish the following cases: 7

f1 f2 f3 f4 f1 f2 f3 f4 f1 f2 f3 f4 p1 p2 p3 p4 p1 p2 p3 p4 p1 p2 p3 p4 C F1 C F4 F1 F2 F3 F4 (a) (b) (c) Figure 3: Variability restructuring by incrementally splitting of components from a monolithic source tree that implements individual features. 1. The feature is implemented in a separate focused package already and can be adequately configured (the ideal case); 2. The feature s code is contained in a single package, which also includes code for other features; 3. The feature s code is scattered over multiple packages, but there is no inherent need for this; 4. The feature s code is scattered over multiple packages but this is a consequence of the feature s crosscutting nature. In addition to this, the existing system must be analyzed for required variability that is not yet implemented and for variability currently offered, but not needed anymore in the target architecture. Implementing required variability in separate packages should be planned on the migration path described in Section 5.3. Superfluous features may be costly to maintain, in which case they can be safely eliminated. Finally, variability in the existing system that is still needed, but not yet covered in the configuration space, should be added to the FDL definition of the product line. While analyzing the implementation of DocGen, we discovered that its structure complicated configuring different DocGen products. For instance, customer-specific code was often scattered throughout the DocGen kernel. Such code is hard to maintain and hard to isolate in order to prevent it from being shipped to other customers. Furthermore, DocGen was structured per processing phase (extraction, query, presentation), which makes it difficult to select the features of a single language. 5.3 Restructuring Variability The next step in the SIMPLE method is to migrate from the existing architecture to the target variability. The best way to keep this step manageable is to apply small 8

variability refactorings at a time. For this, we need an approach that allows us to conduct the restructuring incrementally. From the variability recovery step, we know which features are implemented and which source packages they are implemented in. Our approach starts by creating package definitions for each implementing source package C i and for each feature f j. Each package definition p j corresponding to a feature f j includes a dependency on the package C i that implements the feature. At this point, we obtained a virtual splitting, making it possible to conduct phantom configurations: a configuration in the problem space maps via package definitions to the initial set of monolithic source packages in the solution space. This is depicted in Figure 3(a), where the monolithic component C implements all features f 1 f 4. Clearly, exact selection of code that implements a configuration is not possible at this point yet. The next step is to move variation points implemented in a monolithic component C i to a feature specific component F j, one at a time. This involves moving the code to a separate component, and dropping the dependency to C i in the feature s package definition p j. This is depicted in Figure 3(b). If other parts of the system need to be explicitly aware of the fact that this feature is chosen, they should be adapted to use the property mechanism described in Section 4.1, to check the availability of feature f j. With this incremental approach, we resolved the problems of DocGen that we discovered during variability recovery as follows. Customer code that was scattered throughout the DocGen core is collected and moved to a customer-specific source package. We use the customer factory pattern discussed earlier to override generic behavior with this customer-specific behavior. Furthermore, DocGen s phase-oriented structure is changed into a language-oriented structure by collecting the implementation of the different phases of a single language and moving this to a separate language-specific source package. We created package definitions for each new source package to enable automatic product assembly. Unfortunately, it is not always possible to factor out the implementation of a feature, because it may be scattered across the implementation of multiple features. The implementation will therefore remain separated over several packages and a configuration switch is added to each of them to activate the crosscutting feature. If the overhead of initiating separate packages for each non-crosscutting feature is too high, one may decide to decrease granularity and to implement several features in a single package. This may be the case for instance, when delivering more functionality than exactly configured is no problem, or when several similar features are so small that separating them makes no sense. Configuration switches are then introduced to activate each individual feature. 6 Evaluation We are using the techniques presented in this paper in practice to refactor the variability of the DocGen product line architecture. We have analyzed existing DocGen products to determine available variability and captured this in a DocGen FDL definition. We 9

have also analyzed the implementation of DocGen to discover how the variability was implemented. The analysis of existing products has resulted in a number of recommendations for extra variability (such as the pay-per-feature concept described earlier). The implementation analysis led to suggestions for a restructuring of DocGen to make it languageoriented rather than phase-oriented. With this information available, we implemented a prototype product line for Doc- Gen, which supports all different kinds of variability. The prototype uses all our techniques and demonstrates that our approach is feasible. We are now in the process of defining an iterative migration process to turn the real DocGen into a product line based on our experiences with the prototype. This involves incrementally splitting of the functionality per language and moving all customer specific code to separate customer packages. DocGen is implemented in Java. This is by nature a compositional language because per Java file only one visible (public) class can be defined. This property reduces coupling of features and makes it relatively easy to put the implementation of features in separate source code packages (modular decomposition). Our approach can also be used for less compositional languages (such as C), as demonstrated by the development of a product line for the Linux kernel [7]. In general however, such languages complicate separation of features due to increased internal coupling. Our techniques constitute a framework for assembling and configuring product instances which deals with the build and configuration processes of product lines. Since they do not deal with the implementation of features directly, they can easily be combined with (generative) techniques to adapt the implementation automatically. For instance, aspect oriented programming [10] can be used to deal with crosscutting features by weaving aspects into multiple packages at build time. The techniques that we presented are language independent. Therefore, they put no restrictions on the programming languages used to implement a product line. Customer factories, as we presented them, are the only exception because they were implemented in Java (see Section 4.2). Although this dynamic (runtime) mechanism can completely be replaced by a static (build time) approach (for instance by using aspects), it can also easily be implemented in other languages. 7 Concluding Remarks 7.1 Related Work In [12] an evolutionary process is described in which software components evolve to general reusable software components by separating reusable from specific code and by introducing abstractions for concrete solutions. This is similar to our approach where source packages are split of as soon as customer-specific functionality is considered to be suitable for general use. Feature logic [14] forms a formal foundation for SCM versioning concepts. Feature logic is used to denote sets of components by their features and to describe the 10

semantics of SCM operations. We are currently investigating whether the composition and configuration operations that are performed by autobundle can be defined more formally using feature logic. In [13], the build time architectural view is proposed, as an extension of Kruchten s popular 4+1 views model [11]. We are investigating the use of source tree composition to structure the build time architecture of software systems the current paper does so for product line architectures. 7.2 Contributions This paper describes a language independent, light-weight approach for evolutionary software product lines. It covers product line design, product configuration, and product assembly. We discuss the use of the Feature Description Language FDL to explore the variability of a product line. We observed that FDL definitions and grammars are similar and that language technology can be deployed for configuration construction and configuration validation. To that end, we developed a generator that obtains a configuration language from an FDL definition. We propose to implement non-crosscutting features in separate source packages. These provide several development advantages including the ability for parallel development and individual release, improved reuse, and simple integration with third-party software. Features that are crosscutting correspond to global configuration switches. We use package definitions to map features to corresponding implementing source packages. Products are assembled from the packages that correspond to the features in a configuration. This is supported by autobundle which collects all required packages and integrates their source trees, build instructions and configuration processes. We propose feature registries and customer factories as general techniques to activate and access selected features, and to deviate from standard behavior according to customer-specific needs. These techniques can be incrementally incorporated in existing software systems using SIMPLE. SIMPLE provides a systematic way for extending the variability of product lines by refactoring package decomposition. 7.3 Future Work Package-based product assembly is a promising direction in product line research. We are currently further investigating the use of feature descriptions for deriving product configuration tool support. Moreover, we are elaborating the theoretical foundations, via an analysis of the relation with Snelting and Zeller s work on using feature logic for formalizing configuration management [14]. In addition to that, we are further expanding our experience in using the techniques and methods described. In particular, we will continue to use our techniques on the DocGen product line. We invite practioners reading this article to experiment with our tools and ideas, and to contact us to share experiences with us. 11

About the Authors Arie van Deursen is at the staff of CWI, the Dutch National Research institute for Mathematics and Computer Science. His research interests include reverse engineering, software architecture, and agile processes. He is program co-chair of WCRE 2002, the IEEE Working Conference on Reverse Engineering. Merijn de Jonge is a PhD student at CWI. His research interests include product lines, generative programming, and language-centered software engineering. He is organizer of the 2002 Workshop on Generative Programming co-located with the International Conference on Software Reuse (ICSR). References [1] K. Czarnecki and U. W. Eisenecker. Generative Programming. Methods, Tools, and Applications. Addison-Wesley, 2000. [2] A. van Deursen, M. de Jonge, and T. Kuipers. Feature-based product line instantiation using source-level packages. In Proceedings Second Software Product Line Conference (SPLC2), LNCS. Springer-Verlag, 2002. [3] A. van Deursen and P. Klint. Domain-specific language design requires feature descriptions. Journal of Computing and Information Technology, January 2002. [4] A. van Deursen and T. Kuipers. Building documentation generators. In Proceedings; IEEE International Conference on Software Maintenance, pages 40 49. IEEE Computer Society Press, 1999. [5] Desmond F. D Souza and Alan C. Wills. Objects, Components and Frameworks with UML: The Catalysis Approach, chapter 7. Object Technology Series. Addison-Wesley, 1998. [6] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, 1994. [7] M. de Jonge. The Linux kernel as flexible product-line architecture. Technical Report SEN-R0205, CWI, 2002. [8] M. de Jonge. Source tree composition. In Proceedings Seventh International Conference on Software Reuse (ICSR 02), LNCS. Springer-Verlag, 2002. To appear. [9] K. Kang, S. Cohen, J. Hess, W. Novak, and A. Peterson. Feature-oriented domain analysis (FODA) feasibility study. Technical Report CMU/SEI-90-TR-21, Software Engineering Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, november 1990. [10] G. Kiczales, J. Lamping, A. Mendhekar, C. Maeda, C. Videira Lopes, J.-M Loingtier, and J. Irwin. Aspect-oriented programming. In Proceedings of the European Conference on Object-Oriented Programming (ECOOP), Lecture Notes in Computer Science. Springer-Verlag, 1997. 12

[11] P. B. Kruchten. The 4 + 1 view model of architecture. IEEE Software, pages 42 50, November 1995. [12] D. Roberts and R. Johnson. Evolving frameworks: A pattern language for developing object-oriented frameworks. In Proceedings of Pattern Languages of Programs (PLoP3). Addison-Wesley, September 1996. [13] Q. Tu and M. W. Godfrey. The build-time software architecture view. In Proceedings International Conference on Software Maintenance (ICSM 2001). IEEE Computer Society, 2001. [14] A. Zeller and G. Snelting. Unified versioning through feature logic. ACM Transactions on Software Engineering and Methodology, 6(4):398 441, 1997. 13