A Survey of Concern-Oriented Development Approaches Nicolas Lopez

Size: px

Start display at page:

Download "A Survey of Concern-Oriented Development Approaches Nicolas Lopez"

Gregory Stone
5 years ago
Views:

1 A Survey of Concern-Oriented Development Approaches Nicolas Lopez Advisor: André van der Hoek Abstract Concern-oriented development has been of significant interest to the software engineering community since the early formulation of its importance by Dijkstra and Parnas. Many different approaches have emerged since then, and continue to emerge, which support developers when they need to reason about and make changes to code in terms of the concerns that govern the system. While initially much work focused on modularization of concerns, relying on programmatic constructs to encapsulate concerns in modules, new types of approaches have more recently emerged that address the problem from quite different directions and lead to quite different forms of support. This survey is inclusive of the breadth of approaches to date. Its main contributions are a set of definitions of concerns that precisely delineate the field, an evaluation framework that characterizes a set of capabilities that any approach to concern-oriented development ideally should offer, an articulation of how four, previously disparate categories of approaches closely relate, and a set of nine observations that take stock of the field as it stands today and provide guidance as to where it should head in the future. 1. Introduction Concern-oriented development refers to the methods, tools, and practices that enable developers to reason about, and make changes to, code in terms of concerns. Its overarching objective is to enable developers to abstract away from individual lines of code and programmatic constructs to the topics of interest or focus through which they see the code and for which they need to make corresponding changes. For instance, a developer may wish to make some changes to how a system addresses security or wish to better understand how a particular feature works, since it is behaving erratically. Concern-oriented development finds its roots in Dijkstra s principle of separation of concerns, which advocates that the division of a system into parts should be guided by the objective of dealing with each of its concerns in isolation of the rest [37]. Parnas built upon the abstract goal laid out by Dijkstra and proposed information hiding as a principle for encapsulating the code related to each concern in modules [94]. Doing so, Parnas argued, makes the system more understandable and maintainable over time. A long, rich, and still actively continuing pursuit of concern-oriented development has resulted, first through languages and heuristics for modularizing a system through object orientation (e.g., [2, 95, 113, 114, 119]), then through language approaches that aim to accommodate crosscutting concerns (e.g., [14, 65, 91, 112]), and, more recently, through approaches that step outside of relying solely on programmatic constructs to isolate concerns by supporting separate views of concerns that are mapped onto the software (e.g., [7, 14, 28, 42, 46, 50, 63, 68, 77, 86, 102, 108, 109, 125]). 1

2 Supporting concern-oriented development is both challenging and of importance. It is challenging because the concerns through which developers typically wish to understand a system do not necessarily map neatly to the facilities of the underlying programming language that is used [16]. This mismatch is caused by a variety of factors, with perhaps the two most important factors being that the concerns through which developers view the code evolve over time [86, 120] and that programming languages have inherent limitations in supporting the expression of complex sets of interrelated concerns [28, 121]. Despite these inherent challenges, providing support for concern-oriented development remains important, because virtually every activity that requires developers to interact with the code explicitly or implicitly asks them to do so in terms of one or more concerns [98]. Whether it is the need to implement a new requirement, a feature that must be removed, or an update to address a certain bug report, the developer has to somehow translate their high-level concern(s) into ways they access, navigate, examine, and eventually update the code. Indeed, one could argue that concern-oriented development gets to the heart of software development the essence of what it means to be a consummate programmer: being able to effectively and efficiently work with the code in terms of concerns. To date, only a handful of surveys have documented and compared approaches for supporting concernoriented development. Unfortunately, each of these surveys has focused on a single category of approaches to concern-oriented development, for instance surveying the landscape of aspect-oriented programming languages [23] or examining the various ways in which code is mined to identify potential aspects that can be factored out [62]. Because of the narrow nature of these surveys, it is difficult to get a sense of the overall breadth of the field, the relative strengths and weaknesses that exist among the many approaches that claim to support concern-oriented development, the ways in which different approaches may or may not complement each other, and the overall state of the field in terms of its accomplishments to date. This paper contributes a comprehensive examination of concern-oriented development that is inclusive of the broad range of approaches that have been proposed to date. It does this by returning to the essence of concern-oriented development: it is about developers having an abstract idea or concept of what they want to examine in the code (an abstract concern) and mapping this abstract concept to some subset of the code (a concrete concern). The reverse is naturally also part of concern-oriented development: given a piece of code that the developer is examining, they should be supported in finding out what abstract concern(s) this code addresses. Informally stated as such (we provide precise definitions, in Section 2), it allows us to bring in four distinct categories of approaches to concernoriented development: Modularization of concerns: approaches that rely on programmatic constructs to encapsulate concrete concerns in modules, essentially forgoing an independent expression of abstract concerns in favor of a solely code-oriented approach; Extensional concern modeling: approaches that rely on separate models in which abstract concerns are explicitly defined and explicitly mapped to concrete concerns; 2

3 Intensional concern finding: approaches that rely on queries and associated semi-automated analyses to construct opportunistic mappings from abstract concerns to concrete concerns; and, Concern Mining: approaches that rely on semi-automated analysis to identify previously unknown abstract and concrete concerns. We examine representative approaches in each of these categories using a multi-dimensional framework that outlines an ideal set of capabilities that a concern-based development approach should have. Judging the support for these capabilities, both in terms of expressiveness and usability, we draw a number of conclusions with respect to the state of the field. Specifically, we derive nine key observations that not only highlight underlying trends in the approaches and the field as a whole, but also direct us to important areas where more research is to be performed. This survey is organized as follows. Section 2 presents the definitions that characterize our conceptual view of concern-oriented development. Section 3 introduces our evaluation framework. Section 4 introduces the four categories of approaches that we survey. Sections 5 through 8 describe approaches that are representative of the evolution and features in each category. In Section 9, we use our evaluation framework to compare the approaches and present the main insights of this work. Finally, Section 10 briefly discusses related work, and Section 11 concludes this survey. 2. Definitions To build a common understanding, it is necessary to precisely define the term concern. Several authors have previously done so, but their existing definitions do not quite fit our need here. It is useful however, to review these definitions in order to contextualize our view of concerns. Existing definitions can be classified in three broad groups: (1) definitions that focus on the abstract nature of concerns, (2) definitions that focus on the concrete nature of concerns, and (3) definitions that recognize both the abstract and concrete nature of concerns. Tarr et al. [121] describe concerns as concepts of importance in a domain. Clearly this definition is not about a function or a line of code, but about something much more abstract. In the same vein, Hilliard [55] defines a concern as expressing a specific interest in some topic pertaining to a particular system of interest and suggests concerns should be phrased as interrogatives and not as assertions (e.g., Will the system satisfy a certain need or property in a desired way? ). Rozanski and Woods [104] define concerns at an architectural level: a concern about an architecture is a requirement, an objective, an intention, or an aspiration a stakeholder has for that architecture. They also point out that, even at an abstract level, overlap exists among concerns. As a result, conflicts may take place and resolution of these conflicts requires balancing trade-offs. Whereas the above definitions highlight the abstract nature of concerns, others focus mainly on their concrete nature. Nistor, for instance, defines a concern as a concept that relates a set of different software module fragments [86]. In this case, what characterizes a concern is its link to specific parts of the code. As such, concerns can vary greatly in their granularity, from large features down to a few lines of code that realize a particular algorithm. Robillard and Murphy view concerns as anything a developer 3

4 might want to consider as a conceptual unit in a program [101]. This view of concerns, despite referencing conceptual units, is commonly interpreted to equate these units with code facilities [1, 27], and thus is much more concrete in nature. Other definitions recognize both the abstract and concrete nature of concerns. Sutton et al. [120] differentiate between logical concerns, which represent conceptual matters of interest, and physical concerns, which refer to elements of the software system. This definition highlights the dual nature of concerns: conceptually, as they are related to the developers and their way of understanding the system, and physically, as they are related to parts of the code. Harrison et al. [52] similarly define concerns as both any issue of interest and the various elements relating to an issue of interest. Using different terminology, Biggerstaff et al. [16] refer to human concepts as describing the main ways in which developers must understand a system and programming concepts as characterizing aspects of the code such as its syntactic structure and specific tokens. They argue that the understanding of a program is affected by the concept assignment problem, which occurs as a result of the disparity between human and programmatic concepts. As a final example of a definition that highlights both the abstract and concrete nature of concerns, the participants of the MACS workshop [98] recognized that in software most activities are implicitly or explicitly organized around the idea of concerns. They define a concern simultaneously as a conceptual area of interest or focus for a stakeholder of a software project and the concrete manifestation of a conceptual concern (e.g., in source code, design diagrams, or other artifacts). We support this dual vision of concerns, and through a series of incremental definitions provide a conceptual view that explicitly relates abstract and concrete concerns. We first define an abstract concern. (def 1) Abstract concern: a long-lived topic of interest or focus about which a developer reasons over the software. This definition highlights two key properties of abstract concerns: They represent topics that a developer needs to understand and reason about when dealing with the artifacts that constitute a piece of software. Reasoning is a broad term that focuses on the mental model through which a developer creates, interprets, understands, modifies, and maintains the software. Typically, at a given moment in time, such reasoning involves a substantially smaller set of concerns than all of the concerns that govern a system. They refer to topics that have longevity. Abstract concerns must be of interest throughout a non-trivial part of the overall life of the software. We consider topics that a developer only needs to access once important, and possibly benefitting from some of the approaches, but consider the fact that an abstract concern is revisited at a later time as the key motivation for reifying an abstract concern. As examples, a developer might want to reason about certain functionality that sometimes seems to be buggy, about data that must be exported to a new storage infrastructure, about the performance of a system because it is lower than expected, or about the memory consumption of some algorithm 4

because it is exceeding a quota defined in the server that hosts its execution. All of these are abstract concerns. We note that abstract concerns can be planned or emergent.

Emergent abstract concerns refer to abstract concerns that, in spite of not being initially considered as relevant, are identified and reified as such later on.

5 because it is exceeding a quota defined in the server that hosts its execution. All of these are abstract concerns. We note that abstract concerns can be planned or emergent. Planned abstract concerns include any that were identified a priori as relevant and that have been explicitly defined. Emergent abstract concerns refer to abstract concerns that, in spite of not being initially considered as relevant, are identified and reified as such later on. Before we define a concrete concern, we must first define an ideal concern. (def 2) Ideal concern: exact set of code fragments that a developer must reason over when dealing with an abstract concern. This definition highlights two key properties of ideal concerns: Ideal concerns represent the ultimate goal for concern-oriented development. Given an abstract concern, its ideal concern includes only the code fragments that the developer needs to reason over this concern, no fewer and no more. Ideal concerns may be made up of just one code fragment or many, and the fragments may be large or small, but for each abstract concern, there is just one unique ideal concern. The mental nature of abstract concerns and the fact that many of them may be emergent, combined with the fact that the code base may consist of hundreds of thousands of lines of code or more, means that manually finding and keeping track of an ideal concern is virtually impossible. By the same token, providing a solution that, given any abstract concern, can always support developers in identifying the corresponding ideal concern is far from being a reality at this time. Hence, developers can only work with concern mapping approaches that provide them with an approximation of the ideal concern. Figure 1 presents the resulting conceptual view of concerns that forms the basis of this survey. A developer has an abstract concern for which they would like to obtain the ideal concern (left side of the figure). In actuality, this means they must first codify what they are looking for into a specified concern, which in turn is mapped onto an approximation of the ideal concern the concrete concern that they actually work with (right side of the figure). Figure 1. Conceptual view of concerns in the ideal case (left), and as supported by a concern mapping approach (right). 5

6 More precisely, we use the following three definitions: (def 3) Concern mapping approach: process, technique, tool, or mechanism that, given a specified concern, identifies a concrete concern, and vice versa. (def 4) Specified concern: externalized representation of an abstract concern. (def 5) Concrete concern: set of code fragments, with or without additional structure. This last definition requires some explanation. At the very least, a concrete concern provides the developer with an index into the code that tells them where they should look. However, many approaches attempt to provide the developers with more information than just links to the relevant fragments, for instance relating or typing the fragments in a certain way. We also note that, though most approaches primarily support developers in one direction, from specified concerns to concrete concerns, in the ideal case a concern mapping approach operates in a bidirectional manner. Developers should be able to go to some piece of code and find out to what specified concern(s) it belongs. Given that many concern mapping approaches exist, ranging from using simple naming conventions to full-fledged concern-oriented development environments such as ArchEvol [86] and CME [52], a natural question to ask is how one might go about comparing them. Our next definition lays a foundation for doing so. (def 6) Concern fit: given an abstract concern, a measure of the extent to which the concrete concern produced by a concern mapping approach matches its respective ideal concern. Concern fit forms the basis for metrics one might formulate for comparing the effectiveness of different approaches. Precision and recall are obvious candidates to describe, for some benchmark of abstract and ideal concrete concerns, how well an approach supports the developers in identifying a concrete concern that includes all the code fragments that are part of the ideal concrete concern and excludes any code fragments that are not part of the ideal concrete concern. However, concern fit also matches the more traditional notion of modularity and its associated metrics that assess to what extent concrete concerns are encapsulated in modules [15, 116, 119]. Indeed, the better one modularizes different concerns inside separate modules, the better the concern fit. Two additional definitions are of importance when we consider different concern mapping approaches. These definitions pertain to scattered concerns and tangled concerns. (def 7) Scattered concern: a concrete concern with code fragments that span intended boundaries imposed by programmatic constructs. (def 8) Tangled concern: a concrete concern with code fragments that overlap, whether partially or wholly, with the code fragments of one or more other concrete concerns. When there are scattered or tangled concerns in a system, it is common to refer to the system as being one in which scattering and tangling occurs. 6

7 3. Evaluation Framework The framework with which we assess different concern mapping approaches is divided into two parts. The first part focuses on the descriptive capabilities of the various concern mapping approaches with respect to how specified and concrete concerns are actually captured. Some approaches are minimal, but others very rich in these capabilities. The second part defines a set of functional capabilities that a concern mapping approach should provide in support of concern-oriented development. Again some are minimal, and others very rich. As we will see in Section 9, both the descriptive and functional capabilities will be evaluated with the help of two criteria: expressiveness and usability. In the below, we first introduce the descriptive capabilities, then explain the functional capabilities, and conclude with a brief look at expressiveness and usability Descriptive Capabilities Table 1 shows our comparison dimensions for descriptive capabilities, as broken down by specified concerns and concrete concerns. Table 1. Descriptive Capabilities. Specified concerns Concrete concerns Capability Structure Relationships Persistence Structure Relationships Persistence Description How can developers name, describe, or otherwise distinguish specified concerns? What kinds of relationships can developers define among specified concerns? Can the resulting model of specified concerns and their relationships be persisted? What information are developers provided with that captures and describes a concrete concern? What relationships among concrete concerns are provided that organize the concrete concerns? Can the concrete concerns, their structure, and relationships be persisted? Specified Concerns Structure Any concern mapping approach should provide facilities through which developers can indicate and/or distinguish different abstract concerns. At the very basic level, this involves giving different specified concerns a name, but additional layers of information can enable a richer expression of a set of abstract concerns governing a system. Typing, for instance, is one way in which to provide further discrimination of abstract concerns. Other ways include organizing specified concerns along dimensions or by metatopics, or defining detailed metadata. Relationships A concern mapping approach should provide developers with mechanisms to express relationships among specified concerns. This may include basic relationships, such as containment (a specified 7

8 concern is made up of others), but could also support other kinds of relationships that might be useful to developers, such as extension (a specified concern represents a refinement of another), dependency (one specified concern conceptually relies on others), or interaction (two concerns can collaborate to achieve a common objective). Persistence Developers will not want to have to express a set of specified concerns over and over again, especially if the set is shared among a team working on a project. Therefore, it should be possible to store specified concerns so they can be accessed at any time by anyone Concrete Concerns Structure Just as with specified concerns, the more information a developer has about concrete concerns, the more effective they can be in their reasoning over the code. At the basic level, they are provided with a set of code fragments. More advanced approaches, however, provide them with information that describes these fragments. Typing, again, is one option (e.g., in aspect-oriented programming [65], some fragments are designated as joinpoints and others as base classes.). In many cases an approach can also limit how different types of code fragments are related or structured (e.g., fragments corresponding to aspects can interact only with the base classes). Relationships Similar to specified concerns, concrete concerns can be related to one another. They may contain, extend, depend, or interact with other concrete concerns. The stronger support an approach has for capturing and delivering this information to the developer, the more effective they can reason over the code. Persistence Developers will not want to have to identify the code fragments that are part of a concrete concern over and over again, especially if the set is shared among a team working on a project. Therefore, it should be possible to store concrete concerns so they can be accessed at any time by anyone Functional Capabilities The purpose of any concern mapping approach is to support concern-oriented development by increasing the developer s ability to use the concerns that govern a system to reason about and change the code. Part two of our framework identifies a set of functional capabilities (Table 2) that support developers in attaining this goal. The capabilities do not prescribe how a concern mapping approach should realize them; that is, the features and facilities of specific approaches may differ significantly yet achieve similar capabilities. The capabilities are grouped in four broad categories: Capabilities that support developers in identifying specified and concrete concerns. Capabilities that support developers in comprehending the code in terms of specified concerns. 8

9 Capabilities that support developers in evolving specified and concrete concerns. Capabilities that support other purposes, such as code reuse and system configuration. Table 2. Functional capabilities to be supported by concern mapping approaches. Other uses Evolution Comprehension Identification Capability Specified concerns Concrete concerns Isolation Overlap System Code-driven changes Concern-driven changes Removing scattering and tangling Extracting concerns Build and configuration Description To what extent are developers supported in identifying and capturing specified concerns? To what extent are developers supported in identifying and capturing concrete concerns? To what extent can developers isolate the code fragments corresponding to a given specified concern? To what extent are developers supported in identifying whether a code fragment corresponds to more than one specified concern? To what extent are developers supported in understanding the system as an aggregate of concrete concerns and their relationships? To what extent are developers supported, when making changes to the code, in updating the specified and concrete concerns? To what extent are developers supported, when making changes to the specified or concrete concerns, in updating the code fragments? To what extent are developers supported in removing scattering and tangling of concrete concerns? To what extent are developers supported in selecting a specified concern and extracting its concrete concern for reuse in a different system? To what extent are developers supported in using the specified concerns to drive the build and configuration process of the system? Identification Specified concerns A concern mapping approach must support developers in identifying and capturing specified concerns. While in some approaches developers must manually identify specified concerns, others can support developers in this process by, for instance, using some type of analysis to derive and automatically suggest candidate specified concerns. Concrete concerns A concern mapping approach must support developers in identifying and capturing concrete concerns. Similar to specified concerns, in some approaches developers must manually identify concrete concerns and in others they are supported by, for instance, some type of analysis to derive and automatically suggest candidate concrete concerns. 9

10 Comprehension Isolation Isolation refers back to the origins of concern-oriented development as laid out by Dijsktra [37] and Parnas [94]. The goal is for developers to be able to reason about a concern in isolation while they consider how to implement or change it. A challenge particularly arises when a concrete concern consists of multiple interrelated code fragments. How are these to be effectively structured and presented to the developer so they can easily understand and modify them as needed? Overlap When a concern is tangled with another concern, it becomes important for a developer to be informed of this fact, as they no longer can reason about these concerns in isolation. At a minimum, a concern mapping approach should let them know when this happens. A better approach, though, informs them of the extent of overlap and perhaps even the impact on the ongoing changes on this overlap (e.g., if it is increasing or decreasing, if certain changes to one concrete concern are breaking the implementation of another). System Beyond reasoning about individual or small sets of concrete concerns, it is frequently also important for a developer to be able to examine the system as a whole, or at least a significant part of it, as an aggregate of concrete concerns and their relationships. How a concern mapping approach supports the developer in this matter can greatly influence their ability to, for instance, judge the quality of the overall design or whether certain desired architectural styles or design constraints are preserved Evolution Code-driven changes When developers create and update code fragments, they might well be changing those fragments in such a way that they no longer belong to just the concrete concern(s) to which they originally belonged. As a result of the changes, some part of the original code fragment may now be part of another concrete concern, or even represent a new concrete concern altogether. In reverse, removing or modifying some code from a fragment that is part of multiple concrete concerns may well mean that the fragment now is not part of some of these concrete concerns. At a minimum, a concern mapping approach should support developers in updating which fragments belong to which concrete concerns as they program. More advanced approaches can also provide support for planning changes and gauging the resulting impact before making the changes. Changes to concrete concerns, in turn, may impact their corresponding specified concerns too. New code may well require the introduction of new specified concerns, and removal of code may well mean some specified concerns become obsolete. The developer, thus, must be supported in updating both the set of concrete concerns and the set of specified concerns when they program. 10

11 Concern-driven changes Not all updates are code-driven. Certain approaches allow developers to update specified and/or concrete concerns directly. In such cases, the mapping among the concerns needs to evolve accordingly ant it may require reconfiguration of the code fragments belonging to the concrete concerns. Again, approaches should not just support making the changes, but also planning for them. Removing scattering and tangling A special form of change, which is at the same time code-driven and concern-driven, aims to remove scattering and tangling. This goal has received quite a bit of attention in the research community and a variety of approaches center on just this capability [36, 47, 109], since removing scattering and tangling is important for improving program understanding [65]. To do this, a developer should be able to split the existing scattered and tangled concrete concerns, to transform the shared code fragments by separating the code belonging to each concrete concern, and place them in a single file or module Other Uses Extracting concerns Some concern mapping approaches help developers extract a concrete concern for reuse in a different system; that is, they let developers copy the functionality embedded in the concrete concern and reuse it in a different system. To do this, however, it is not always sufficient to just extract one concrete concern. Frequently, this concern depends on others, and the task becomes one of either removing the dependencies, or of identifying the minimal set of concrete concerns and associated code fragments that need to be extracted. Build and configuration A concern mapping approach may also support developers in using specified concerns to drive the build and configuration process of a system. With this capability, developers can build different variants of a system simply by selecting different sets of specified concerns to include and exclude. Some approaches go even further by enabling the configuration of the system through properties of the specified concerns which are used as input to, for example, a code generator or compiler Assessment Criteria The ideal way in which one would assess and compare different concern mapping approaches would be to directly and precisely measure their concern fit through a benchmark of sample software systems and abstract concerns. Providing such an assessment, however, involves a very difficult undertaking, considering both the number of approaches that exist and the complexities of identifying a representative set of ideal concerns. Such a set must be based on real systems, as possibly authored by very large teams, and involving numerous specified and concrete concerns. Determining an unbiased, representative set is nearly impossible. The approach we take in this survey is to rate each approach based on expressiveness and usability. These two factors are known to embed a fundamental tension between how powerful an approach is versus how effectively and efficiently developers can use it. As such, it allows us to uncover underlying trade-offs among approaches, gaps in the current coverage, and improvements that need to be made, 11

12 both by individual approaches and the field as a whole. We return to this assessment in the observations subsections of sections 5 through 8, where we provide detailed ratings for each of the descriptive capabilities identified in Table 1 and each of the functional capabilities identified in Table Four Categories of Concern Mapping Approaches Now that we have laid a foundation with the definitions we presented in Section 2 and the comparison framework we introduced in Section 3, we turn our attention to the primary content of this paper: the actual survey of concern mapping approaches. A great variety of approaches has emerged over the years, some of which are much more alike than others in terms of their conceptual underpinnings. For instance, aspect-oriented programming [65] can be seen as closely related to Composition Filters [14] and, similarly, Concern Mapper [102] to Software Plans [35], but aspect-oriented programming and Concern Mapper are clearly of a different ilk when it comes to how a developer works in terms of specified and concrete concerns. In aspect-oriented programming, a developer defines a concrete concern by specifying an aspect class and placing the code fragments inside this aspect class. In Concern Mapper, on the other hand, they identify a specified concern by adding its name to the list of specified concerns, and its corresponding concrete concern by manually selecting its code fragments and relating them to the specified concern. To organize the space of concern mapping approaches, we once more return to the separation of specified concerns from concrete concerns. Both play a role in concern-based development and both should be accessible to a developer in order to successfully reason over and update a code base in terms of concerns. How a developer gets to a specified concern or concrete concern, however, differs among the various concern mapping approaches, particularly in whether or not the identification of specified and/or concrete concerns is user-driven. In some approaches the user is clearly in charge. In others, this identification process is driven in some other way with the support of a tool, convention, heuristic, or other facility of the approach. To understand what we mean by user-driven, consider the example of FEAT [99], a concern mapping approach in which the developer provides a query that acts as the specified concern. In executing the query, FEAT produces a graph that acts as the concrete concern. This graph includes those code fragments that FEAT considers of relevance to the specified concern. It is the developer who makes the declaration of the specified concern (expressed as a query) and the supporting tool which does the work of identifying the associated concrete concern. Thereafter, the developer may iterate over the result and in some sense work with FEAT to refine the outcome, but the separation is clear: the developer defines the specified concern; the concrete concern is identified by the tool. FEAT, then, is user-driven with respect to specified concerns and tool-driven with respect to concrete concerns. Different combinations of whether or not specified concerns and concrete concerns are identified in a user-driven manner give rise to the four categories of concern mapping approaches we discuss in this paper. 12

13 Table 3. Characteristics of descriptive capabilities depending on how they are identified. user-driven identification explicit defined directly accessed persistent non-user-driven identification Implicit derived indirectly accessed transient Before we introduce these four categories, we first further differentiate user-driven and non-user-driven identification by examining several characteristics that are usually associated with each kind of approach (Table 3). In user-driven identification, the concerns must be explicitly provided by the user, which means that they define which concerns are of importance and worth specifying. These concerns are typically made persistent, for easy future reference, often by using a name or other identifier to directly access the concern of interest. In non-user driven identification, the opposite characteristics arise. The concerns are implicit, indirectly accessed through a derivation process that involves a tool, heuristic, algorithm, or such. The concerns are usually not persistent, instead derived each time they must be accessed. Table 4. Four categories of concern mapping approaches, according to user- versus non-user driven identification. specified concern user-driven identification non-user-driven identification concrete concern user-driven identification non-user-driven identification extensional concern modeling intensional concern modeling modularization of concerns concern mining Table 4 presents the four canonical categories of concern mapping approaches that we identified as a result of applying the distinction between user-driven identification and non-user-driven identification to both specified and concrete concerns. Depicted in Figure 2, the four categories are the following, now framed in a precise manner as compared to the more informal descriptions we presented earlier in Section 1: 13

14 Figure 2. Four primary categories of concern mapping approaches. The arrows and subtext indicate the main process an approach performs from inputs (left side of each category) to specified and concrete concerns (right side of each category). Modularization of concerns: approaches in which only the concrete concerns are identified in a user-driven manner. By deciding where in the code base they place the code fragments, and encapsulating in them certain functionality, they explicitly associate the fragment with the concrete concern. Specified concerns are not directly modeled, but exist implicitly based on the assumption that each concrete concern addresses a single, implicit specified concern1. Examples in this category are aspect-oriented programming [65] and Composition Filters [14]. Extensional concern modeling: approaches in which both the specified concerns and the concrete concerns are identified in a user-driven manner. Developers maintain the overall set of specified concerns as well as the corresponding concrete concerns in explicit models. These approaches usually provide tool support to create and maintain the models, but remain in 1 One could argue that the identification of specified concerns in these approaches is also user-driven. We observe, however, that the first step is implicit and driven by the approach: every concrete concern maps to just one specified concern. The description of the concerns is, however, left to the user; it is as if an anonymous specified concern exists. 14

15 support of the specified and concrete concerns upon which the user decided. Examples in this category are ArchEvol [86] and Concern Mapper [102]. Intensional concern finding: approaches in which only the specified concerns are identified in a user-driven manner. Developers define a specified concern using, for instance, a seed or query. Concrete concerns are then identified in an opportunistic manner by executing some type of analysis or heuristic which results in the identification of the code fragments. Examples in this category are Aspect Browser [42] and Visual Separation of Concerns [28]. Concern mining: approaches in which neither the specified nor the concrete concerns are identified in a user-driven manner. Given a code base, these approaches produce a set of candidate specified and concrete concerns. Developers do not need to declare specified or concrete concerns; instead the concerns are inferred using some algorithm that operates over the code. Examples in this category are aspect mining approaches [62] and topic location approaches [68, 78, 106]. In the next four sections of the paper, we introduce and discuss representative approaches of each of these four categories in detail. 5. Modularization of Concerns Since the beginning of programming, developers have attempted to partition their efforts by breaking up the code into smaller, more manageable units. Even the first high-level programming languages provided features to enable developers to do so. For instance, early languages such as Lisp and Haskell centered on using functions as units that express and separate different computations [57]. The desire to partition and structure code has always come hand in hand with the question of how the code should be divided in order to make it easier to develop, understand, debug, and maintain over time. Dijkstra [37] proposed the principle of separation of concerns, advocating that the division of a problem into parts should be guided by the objective of dealing with each concern in isolation of the rest. Parnas complemented this work with his ideas on modularization [94], which build upon the abstract goal laid out by Dijkstra in providing concrete suggestions for how one should go about separating concerns. This work has had a great influence on the field, and laid the foundations for modern ideas of modularization [56, 95, 107]. Approaches in our first category, modularization of concerns, build upon these ideas by providing language facilities for developers to explicitly separate concrete concerns to thereby implicitly separate the corresponding specified concerns. Figure 3 presents a view of specified and concrete concerns that highlights the process of programming and abstracting using modularization of concerns. On the left side, it shows the state of the code at some moment in time, separating two concrete concerns that correspond to two abstract concerns. By programming the code, the developer makes changes to the concrete concerns. In the example, they added code fragments c and f to the concrete concerns 1 and 2, respectively, and added a new concrete concern 3 with several new code fragments. 15

16 It is noteworthy that a code fragment can only be part of a single concrete concern, i.e., there is no overlap of concrete concerns. The concrete concerns in the example could, for instance, be classes with the code fragments corresponding to methods. Or they could be aspects with the code fragments corresponding to joinpoint and advice blocks. In either case, where the code fragments are placed within the programmatic constructs available, determines their concrete concern. Moving a method from one class to another, changes the concrete concern to which it belongs. The approach strongly relies on the developer, rather than just placing code fragments where they first seem to fit, thinking deeply to examine the existing modularization carefully, and refactor as needed so that each abstract concern can be reasoned about separately, not just now, but also in the future. Figure 3. Modularization of concerns. Central to modularization of concerns approaches are the programming languages that developers use. A vast amount of research has focused on building better languages, such that a range of different concerns can be appropriately separated.. We review representative advances and examples in the below Object-oriented Programming Languages such as Java [60], Smalltalk [59], and Eiffel [79] provide facilities to define objects, classes, and inheritance, which are at the core of object-oriented programming [123]. These facilities support two relevant concepts that are central to object-oriented programming: encapsulation and information hiding. Encapsulation is about the functionality the language provides to shield a code fragment from being accessed from other parts of the code. The better features a programming language provides for encapsulation, the easier it should be for a developer to keep all code fragments for a single concrete concern inside a single file or module. 16

17 Information hiding refers to a particular strategy that Parnas first identified [94] as a good approach to modularize the most relevant concrete concerns. By identifying what secrets (i.e., information) each concrete concern should hide from other concrete concerns, modularity is promoted. Of course, this assumes that these secrets are good approximations of the abstract concerns that govern the system; something which may or may not be true, depending on the system and the concerns. Much research has gone into object-oriented languages from the perspective of how their facilities support encapsulation and information hiding [33, 53]. Canning et al. [25] discuss the need for explicit interfaces between objects, and how these serve as specifications for modularizing a system. Other authors reflect on issues such as the impact of inheritance on encapsulation [113], and the need for more explicit interfaces between classes and subclasses [114]. Finally, work such as JML [70] provides extensions to OO languages to allow for behavioral specification of interfaces. These enable a great level of specification of detail regarding how interfaces are to be accessed and used. Building upon the features of object-oriented programming languages, many authors have also defined strategies and heuristics as to how to better separate concrete concerns. Development methods, specification techniques, and criteria for decomposition (such as functional decomposition, domainoriented decomposition, and event partitioning) have all been topics of wide interest [31, 124]. Catalogues of reusable knowledge, such as those described by design patterns [41], provide canonical solutions for common problems in modularizing with object-oriented languages. Metrics such as coupling and cohesion [15, 116] and approaches to evaluate alternative modularizations, such as those based on Design Structure Matrices [6, 119], enable developers to better understand the benefits, trade-offs, and impact of their choices. Nevertheless, object-oriented languages have some limitations regarding their support for concernoriented development. For instance, changes that involve abstract concerns which were not initially identified and reified as concrete concerns are usually complex to make. While some might view this as a failure of the design process, as caused by an incorrect or suboptimal modularization from the perspective the abstract concerns governing the system, others realize that it is indicative of the limitations of pure object-orientation [2, 58] and reflective of the nature of software, which evolves in ways that we cannot anticipate. A particular influential formulation of the deeper issues of object-oriented programming is that of Tarr et al. [121], who summarize them as the problem known as the tyranny of dominant decomposition. They phrase the discussion in terms of dimensions, which refer to distinct hierarchies of decomposition. Object-oriented languages provide only a limited set of dimensions for decomposing a system into concrete concerns, usually, in fact, only one, as defined by the class hierarchy. Once a set of concrete concerns has been identified, the resulting hierarchy forms what is called the dominant decomposition. Identifying and understanding other abstract concerns that do not fit this dominant decomposition is typically very difficult. The tyranny states that this is unavoidable: any other decomposition would have also resulted in a set of other concrete concerns that are not modularized and thus much more difficult to identify and understand. The tyranny of dominant decomposition states that it is impossible to isolate the code for all concrete concerns at the same time in distinct modules, 17

and scattering and tangling of concrete concerns will inevitably occur. These concerns are typically called crosscutting concerns [65]. 5.2.

18 and scattering and tangling of concrete concerns will inevitably occur. These concerns are typically called crosscutting concerns [65] Aspect-oriented Programming Inspired by early work such as Flavors [26], the aspect community was among the first to begin addressing the issue of crosscutting concerns upon recognizing that, among other examples, logging and debugging functionality could typically not be cleanly separated from the rest of the code. In the classic paper by Kiczales et al. [65] a language solution was proposed to explicitly represent these kinds of concerns in aspects. The language solution introduces a second dimension of decomposition over the primary dimension of objects in object-oriented programming (base classes). Aspects are composed with base classes via aspect interception, a run-time algorithm that weaves in aspect code at the places where a pointcut states it should be inserted when the execution hits that point. The right-hand side of Figure 4 presents a simplified example of the core concepts underlying aspects, as based on a traditional object-oriented program on the left-hand side. In this example, all three classes invoke method m4() declared in class C4 with some common code surrounding the invocations. This common code is factored out by the aspect on the right-hand side. In the figure, the joinpoint p1() will intercept the execution of methods with signature m1() and m2(), but not m3(), given that the joinpoint specification on the right hand side only matches m1() and m2(). The aspect also defines an advice, the code to execute when a joinpoint is intercepted. An advice can be composed in such way that it is executed before or after the execution of the methods specified by a joinpoint. In the case of the example, the code inside the advice is executed before the code in the classes. Figure 4. A crosscutting concern in objects (LHS) and aspects (RHS). 18

19 With this mechanism, aspects allow for separation of some, but not all, types of crosscutting concerns. Specifically, an aspect groups code that modifies all the base classes in the same manner. This is commonly referred to as a homogeneous crosscutting concern [32]. It is worth noting that joinpoints establish explicit relationships between the concrete concern represented by the aspect and the concrete concerns represented by the classes in the base code. The language to define joinpoints permits the expression of such relationships in both an extensional (e.g., explicit reference to a specific method or set of methods, as in the example of Figure 4) and an intensional (e.g., reference to an implicit set of methods though a regular expression; the expression m*(), for instance, would have intercepted all the base classes on the left side) manner. A benefit of intensional expressions is that they allow developers to implement new methods in the base code that will automatically be intercepted by the aspect, without modifying the code inside the aspect, as long as the new methods match the regular expression. Building on the original ideas, subsequent work identified and aimed to address several challenges. For instance, the problem of obliviousness was recognized: a developer examining the code of a class can be unaware if, and how, an aspect might intercept its execution, making comprehension of program behavior more difficult [29]. Enhancements such as aspect-aware interfaces [66] help reduce the impact of this problem. As another example, questions have arisen as to which types of concrete concerns are best separated using aspects [82]. Some authors observe that traditional examples, such as logging, tracing, and debugging, are not sufficiently representative, and that further research is required to truly assess how well aspects can serve to deal with a wide variety of crosscutting concerns [115] Mixins Mixin layers [112] is representative of a category of approaches that spawned from the initial work on mixins by Bracha and Cook [19]. These approaches recognize that some crosscutting concerns are different in nature from those that are clear candidates to be represented with aspects, calling this unique class of crosscutting concerns collaborations. A collaboration refers to a set of classes that communicate with one another to implement a semantically coherent piece of functionality [4]. Mixins enable developers to separate out such collaborations in their own concrete concern, as encapsulated code fragments that add behavior to a set of base classes. Consider the code fragments in Figure 5. Classes ClassA and ClassB are declared inside class baseclasses. A mixin defines new methods that add behavior by either replacing methods in the original class altogether or adding new methods through dynamic inheritance. The mixin CollaborationOne, on the right hand side of Figure 5, defines a pair of nested classes that dynamically extend ClassA and ClassB. Each nested class defines extensions to the behavior of the base class, just as with traditional inheritance. Unlike traditional inheritance, which adds the behavior of one subclass to one superclass at a time, mixins allow adding behavior of a set of subclasses to a set of superclasses. In this case, class ClassACollabOne extends class ClassA by adding a the new field newfield1, overwriting method m1() and adding method m12(), while class ClassBCollabOne extends class ClassB by adding the new field newfield2 and the new method m22(). 19

20 Figure 5. A simplified example of mixins. A useful way to describe the unique nature of mixins is to compare it with objects and aspects. Collaborations cannot be explicitly defined using objects because they necessarily involve code fragments that must be placed in scattered locations. Collaborations also cannot be properly represented in aspects, because they are not homogeneous in nature. A mixin encapsulates code that extends and adds behavior to a set of classes that represent a coherent extension, but necessarily must be spread out and extend each class in a different way. Mixins provide some advantages as compared to aspects. First, the mechanism to compose mixins with objects is straightforward, with entire methods being replaced or added, which makes it easy to understand. Second, mixins have the advantage that they can be referred to explicitly during execution, independently of the base classes. This means that both the base classes, and the classes with the modified behavior represented by the mixin, can be instantiated separately during runtime. Mixins are not the only approach that allows for capturing concrete concerns that are collaborations. Other approaches that support this (sometimes under the different name of role-based design) include Delegation Layers [90], Object Teams [54], and Adaptive Programming [80]. These alternatives all focus on increasing the expressiveness of concrete concerns by increasing the descriptive power of their structure, in some cases enabling developers to deal better with classes that are simultaneously part of multiple collaborations [90], in other cases enabling explicit description of relationships among collaborations [54], and in yet other cases allowing for definition of more specialized types of code fragments with specific purposes [80]. 20

21 An important research direction that evolved from mixins is feature-oriented programming [10, 11, 97]. Features are similar to mixins, allowing for behavior to be added to base classes. Feature-oriented programming, however, differs in two primary ways: (1) it prescribes a specific decomposition strategy, one based on step-wise refinement [12], to identify concrete concerns, and (2) it defines a mathematical model for the definition of relationships among features. These two features make it particularly suitable to the domain of product lines, where guaranteeing correctness of the program resulting from the composition of features matters Composition Filters Composition Filters [14] is another approach that supports the expression of crosscutting concerns. The approach provides a semantically rich mechanism to define filters, code fragments that intercept incoming and outgoing messages corresponding to events occurring in the base classes, and compose the behavior of several classes and filters in composition filters. Compose [103] is one of several languages supporting this approach. An example in Compose is shown in Figure 6, in the form of a code fragment that defines the ObserverPattern composition filter. The composition filter can be composed with a base class to add new behavior and modify the execution of the base class using filters. The composition filter consists of two parts: a filtermodule, which serves to add filter code fragments that modify the behavior of the base code, and a superimposition, which serves to define the characteristics of base code that will be composed with this composition filter. Filtermodules (lines 3-11) have a similar purpose to advice specifications in aspect-oriented programming. They define internal fields and attributes (i.e., subject), input filters (i.e., atdet and notif), and output filters (not included in this example). Superimpositions specify selectors, which have a similar purpose to joinpoints in aspect-oriented programming. The selector constitutes an expression that is evaluated to find classes that will be composed. In the figure, the selector in line 15 specifies that any classes with the name Shape are to be composed with this composition filter. The result of the composition of the ObserverPattern composition filter with the base class is presented in the bottom of Figure 6. The new Shape class, on the right hand side of the figure, includes two new methods and two modified methods. The atdet:dispatch filter is used to generate the attach( ) and detach( ) methods, which delegate their execution to the Subject class. The notif:after filter is used to modify the setx( ) and sety( ) methods in such way that, after their execution, the notify() method in the Subject class is executed. Filters provide a rich semantics for defining the interaction between a composition filter and base classes, due to the fact that input and output filters are defined in terms of filter types. Filter types specify semantics through which the developer can express which messages will be accepted and rejected based on whether they match a given regular expression or a particular condition (such as, for instance, an error condition). As a result, the same mechanism can be used to express language concepts as diverse as multiple views, inheritance, and synchronization [14]. Moreover, filter types allow developers to reason about the semantics of the composed behavior better than with general purpose programming languages (e.g., an after or before advice in aspects is semantically less rich than a specific filter used to handle error conditions). 21

22 Figure 6. Composition filters example, from [103]. Although some mechanisms of composition filters are similar to those of aspects, the nature of concerns which can be expressed with this approach is quite different. Rather than just adding behavior to a single class, composition filters allow for the representation of concrete concerns that aggregate the behavior of several classes. In the example of Figure 6, the resulting composition aggregates the behavior of the Shape class and that of the Subject class. Another key difference is that the approach provides better support for composition of multiple types of concrete concerns, not just composing a single aspect to base code or a single mixin with a set of classes. Understanding multiple aspects with pointcuts that intercept the same method can be cumbersome, because aspects provide limited features for the explicit definition of interactions among aspects [73]. In mixins, the definition of the composition of multiple mixins is based on the mechanics of simple inheritance (overwriting or adding methods to the existing code). Filters, on the other hand, are executed sequentially, and the semantics of a filter type defines if, and how, the next filter is applied. Additionally, composition filters can aggregate not just base classes, but also other composed classes, whereas an aspect can only intercept base objects (and not other aspects) Explicit Programming Explicit Programming [24] is an approach that also supports developers in understanding and evolving crosscutting concerns. It takes a rather different approach by enabling them to attach modifiers to 22

23 declarations of other programming language constructs, such as class, method, and attribute declarations. Figure 7 and Figure 8 show an example of how this works. Figure 7. Attribute and method declarations with explicit programming modifiers. Each modifier defines a series of transformations that operate directly over the code, prior to the final compilation. The field and method declarations in Figure 7 include the modifiers property<>, and invariant< >, with the modifier property<> presented in detail in Figure 8. This modifier defines a series of transformation as methods inside the class, including, in this case, addaccessor( ) and addmutator( ) which create a new method for accessing and modifying the value of the field, respectively. The satisfies< > declaration, directly before the transformation specification, indicates the type of code fragment that the particular transformation applies to (in this case, attributes only). Transformations are expressed in a meta-programming language that can receive many different types of parameters and that can directly manipulate the base code. For instance, the transformation in Figure 8 modifies the target class by creating a new method that returns the value of a field. Figure 8. Simplified modifier class in explicit programming. 23

24 Explicit programming differs from all of the previous approaches in this section in several ways. The meta-programming language of transformations provides a much more powerful mechanism for expressing how crosscutting concerns modify the base code. Shiman and Katz [111] call these invasive modifications, as they can directly alter the base code. Invasive modification enables explicit programming to support heterogeneous crosscutting concerns [32]. Heterogeneous crosscutting concerns go beyond collaborations as supported by mixins. For example, insertion of pre-and post-condition assertions or exception handling code, is cumbersome, if not impossible to do, with mixins, since they focus on extending the code through a tightly cohesive, but relatively coarse-grained mechanism. In this sense, composition filters combine the fine-grained nature of aspect joinpoints with the ability to make complex modifications to base code Hyperspaces Hyperspaces [88, 89, 121] is an approach that relies on a new modeling paradigm, called multidimensional separation of concerns (MDSOC), to group concerns in different dimensions. Recall from Section 5.1 that a dimension is a distinct hierarchy of decomposition. Hyperspaces provide explicit support for simultaneously representing multiple distinct dimensions in hyperslices. A hyperslice specifies a set of conventional code fragments, for instance a class hierarchy, which can be composed with other hyperslices, to form a hypermodule. Figure 9 presents an example of a set of hyperslices for an expression evaluation system from Tarr et al. [121]. Each hyperslice is a distinct class hierarchy, and there is clear overlap between the classes in each. The Expression class is the topmost code fragment in the hierarchy, and most of its subclasses are present in each dimension. However, there are some differences. For instance, both hyperslices in the bottom of the figure do not include the BinaryOperator class, and the methods they define in each class are different. Figure 9. A set of hyperslices for an expression evaluation system, from [121]. Each hyperslice represents the system from a particular perspective, for example, the syntax check slice defines the decomposition of the system from the perspective of the syntax checking concern. As a 24

25 result, each one of the classes in this hyperslice defines a check() method. Hyperslices can be composed into hypermodules, which define integration relationships that guide the process of combining hyperslices, including how to resolve any conflicts that arise among the hyperslices. An example integration relationship is mergebyname, which simply puts the methods of classes with the same name together in the merged result. Integration relationships are based on the composition rules of subjectoriented programming [53, 87]. All of the more advanced approaches in this section are asymmetrical, combining base code with code fragments of a different type that are specified in another dimension, be it aspects, mixins, filters, or modifiers. Unlike these approaches, hyperspaces is symmetrical [49], since it employs a single kind of composable code fragment, the hyperslice. Support for symmetrical composition is a factor that increases the potential for reuse [49]. Hyperspaces also differ from other approaches in that the composition of concrete concerns is defined separately from the code, whereas in other approaches, the concrete concerns themselves specify their relationships to other concrete concerns (e.g. method calls among classes, joinpoints in aspects, filters inside composition filters, etc.). In hyperspaces, composition is specified entirely separately from the code. Changing the integration relationship, for instance, from mergebyname to override (which is similar to class extension), does not require any changes within the hyperslices themselves Observations Many other approaches to modularization of concerns have been proposed, each providing some variant of the approaches we have presented here. In parallel, too, researchers have identified heuristics for partitioning a system into concrete concerns along with metrics and methods for evaluating the resulting concern fit in terms of modularity (e.g., Lopes and Bajracharya [72] used Design Structure Matrices to evaluate the modularity of an aspect oriented program, Hanneman and Kiczales [48] compared modularity properties of code implementing some design patterns coded using aspects, and Tonella and Ceccato [122] migrated some interfaces to aspects and reported on the positive impacts). Together, these approaches can be used to design a system with high levels of modularity from the start, and maintain this modularity as the system evolves. Underneath all the approaches remains a strong focus on the programming language and its facilities for capturing different kinds of concrete concerns and addressing problematic issues with (then) existing approaches. For instance, expressive pointcuts [91] provide additional mechanisms for developers to more precisely define how an aspect can intercept base code, Aspectual components [71] allow for the definition of required and provided interfaces in aspects, Translucid contracts [5] allow for specification of how advice from several aspects can interact when they intercept the same parts of the base code, and XPIs [117, 118] provide mechanisms to define contracts to richly express the interaction between aspects and base code. In this continued focus on expressiveness lies both the strength and the weakness of modularization of concerns approaches: they are strong in their support for concrete concerns and precisely specifying all sorts of information about them. Their inherent weakness, however, lies in the absence of support for specified concerns. The fact that the user must maintain a mental model of which concrete concerns represent which abstract concerns is problematic, especially 25

26 at the system level. This is evidenced, for instance, by the studies of Bowman et al. [18], which highlight the discrepancies in the actual system and its dependencies among concrete concerns versus the view the developers had of it. The lack of a separate model for specified concerns, thus, hampers the power of the approach. A second observation pertains to the tradeoff between expressiveness and usability. Not taking into account the absence of an explicit model for specified concerns, the increase in expressive power for describing concrete concerns has come at the cost of decreased usability. First, there is a need for developers to learn how to effectively use what might seem as unnatural (i.e., non-object-oriented) ways of structuring and relating code fragments. For instance, learning how to use the integration relationships of hyperspaces requires developers to think very differently about how to partition the system so that various compositional mechanisms can be fully exploited. Using compositional filters requires a different way of thinking from using mixins, for instance. Second, there is usually more complexity in the reasoning process required when developers decide where to place the code they are currently programming (i.e., do they create a new method or a new advice block, in which class or aspect). Not just the initial choice of approach matters, but the consistent use of the approach so to maintain its advantages over time is equally important and equally challenging in the more expressive approaches. Of course, that is not to say that these latter approaches should be avoided. We merely highlight that, to use them effectively, time and experience is needed with the approach for a developer to become as fluent in their understanding of how to use it most effectively. A third and final observation about modularization of concerns approaches is that they primarily support planned abstract concerns. They fall short when developers need to deal with emergent abstract concerns. As the tyranny of dominant decomposition states, most, if not all relevant concrete concerns need to be identified up-front for the resulting modularization to be desirable in the long run, extensive refactoring notwithstanding. Such refactoring is typically a very time-consuming and error-prone activity if it goes beyond just simple refactorings to refactorings meant to truly maintain a flexible system architecture. 6. Extensional Concern Modeling Modularization of concerns has proven to be an effective approach to support developers in defining concrete concerns and supporting developers in mentally mapping them to abstract concerns. However, as we highlighted previously, these approaches mainly support developers in dealing with planned abstract concerns, but when it comes to emergent abstract concerns modularization of concerns has some limitations. This is because, for some of these emergent concerns, there is already code existing in the system for the corresponding concrete concern. For these emergent concerns, even if it was possible to manually extract them into separate modules at a later time, this process might be just too complex or time consuming an activity to consistently perform. As a result of these limitations, not all concrete concerns can realistically be encapsulated at the same time using only modularization of concerns. 26

27 Approaches in our second category, extensional concern modeling, overcome these limitations by providing developers with explicit concern models to directly capture a set of specified concerns and map onto their respective concrete concerns. Both specified and concrete concerns are identified in a user-driven manner. Figure 10 typifies this approach. On the left side, it shows the state of the explicit concern models at some moment in time: two specified concerns have been defined, with their corresponding concrete concerns and their code fragments. Keeping models in sync is a user-driven activity. Developers make changes to the code, then must update the specified concerns as needed (including adding new concerns or removing old ones), and finally update the code fragments belonging to each concrete concern. In the example, they added code fragments c and e to the concrete concerns 1 and 2, respectively, and added a new concrete concern 3 with links to the existing code fragments a and d, as well as new code fragment h. Figure 10. Extensional concern modeling. The primary benefit of these approaches is that code that implements a given specified concern can immediately be found. The second important benefit is that the association of code fragments with concrete concerns is not dependent on the current boundaries of programmatic constructs. This means that a code fragment can be associated with multiple concrete concerns at the same time. In the example, code fragments a and d are part of several concrete concerns. It also means that code fragments can be of arbitrary grain, spanning from just a few characters to, for instance, sets of files or entire packages. Finally, the approach is bidirectional: since both specified and concrete concerns are explicitly modeled, developers can navigate from a specified to its concrete concern, and vice versa. 27

28 These benefits, of course, come at the cost of maintaining the specified concern model and the corresponding concrete concern model. Doing this in a fully manual fashion is nearly impossible, which is why the approaches we discuss next provide varying levels of semi-automated support Software Reflexion Models Software Reflexion models [83, 84] is an approach that allows developers to define a high-level model of the code. This high-level model is made up of named specified concerns and their relationships. The developer can define the code fragments that are part of the corresponding concrete concern. The tool then analyzes the code to compare the relationships defined between specified concerns in the highlevel model with the dependencies between code fragments that are part of the concrete concerns. The main purpose of the approach is to support developers in understanding the system, especially in terms of the degree to which the code adheres to or diverges from the high-level model. Figure 11. High-level model (left) and reflexion model (right), from [84]. Figure 11 presents, on the left hand side, an example of a high-level model defined by a developer for a virtual memory system. The model is made up of named specified concerns and their relationships; in the figure the specified concern Pager is related to the specified concern FileSystem. The developer can define, using pattern expressions (Figure 12), the code fragments that make up the concrete concern corresponding to each specified concern. In this case, the Pager concrete concern is made up of all files matching the pattern.*pager.*. The tool performs an analysis of dependencies according to method calls and class hierarchies with a lightweight source model extraction tool [85]. It then compares the model of specified concerns with the actual dependencies between concrete concerns to produce the 28

29 reflexion model (Figure 11, right). The reflexion model shows the level of congruence between the relationships that the developer defined between specified concerns, and the actual dependencies between concrete concerns. Solid arrows indicate convergence between the models, while dashed and dotted arrows represent divergence and absence, respectively. In the figure, for instance, the dotted arrow between Pager and FileSystem indicates that this relationship is absent between the concrete concerns, but defined between the specified concerns. Figure 12. Mapping of code to entities in the high-level model, from [84]. The developer can iterate on the process of defining the specified and concrete concerns, and calculating the differences among relationships, to adjust and refine the reflexion model. The developer can also perform additional queries to find the specific code fragments that define the dependencies corresponding to a particular instance of convergence or divergence in the reflexion model, or to identify code fragments that should, but are not currently, included in the current concrete concerns. Software Reflexion models was one of the first at the time to realize that a model specifying meaningful groupings of code fragments and their relationships can be of great use to developers in understanding and maintaining their software from a high-level perspective. By highlighting differences between relationships among specified concerns versus relationships among concrete concerns, the tool supports developers in understanding if the code is diverging from the desired model. This information, for instance, could be used to guard against architectural erosion [96]. At the same time, we note that software reflexion models are not trivial in use, and require a fairly significant effort to set up an initial model of specified concerns and the mappings of concrete concerns to code fragments Design Pattern Rationale Graphs While software reflexion models addresses relatively coarse-grained specified concerns, the approach of Design Pattern Rationale Graphs (DPRG) [7] addresses more fine-grained concerns. Specifically, it aims to make design patterns evident by supporting the documentation and maintenance of their corresponding code. With DPRG, developers can make explicit that certain code fragments make up a concrete concern that corresponds to the realization of a design pattern, the specified concern. For instance, developers can indicate that a certain class plays the role of the subject in the observer pattern. The approach provides semi-automated support to generate a model representing a design pattern, including those of the Gang of Four [41]. A design pattern rationale graph is made up of three levels: pattern, source, and link. The pattern level represents the specified concerns, while the source represents the concrete concerns with their 29

30 mapping defined in the link level. Figure 13 presents an example of part of a graph for the observer pattern. The pattern level is constructed using lexical analysis of the pattern description to identify relevant elements. For instance, the description of the observer pattern is used to generate the model on the left hand side of Figure 13, with grey boxes being the elements that can be mapped onto code. White boxes represent the richer context of the pattern, and are included to help the developed interpret the overall pattern rational graph. The source level consists of code fragments such as packages, classes, and methods, as well as their relationships derived from code level dependencies. In the figure, the AbstractFigure class is inside package standard and extends class PolyLineFigure. The link level specifies associations between elements at the pattern level and elements at the source level. The figure shows that the subject design element is associated with the AbstractFigure class. Figure 13. A design pattern rationale graph for an observer pattern, from [7]. The approach is backed by a series of tools. Specifically, the pattern level is created in a semi-automated manner by taking in the textual description, extracting from it design elements of relevance, and letting the user update the resulting model. The elements of the source level are inferred automatically using a static code analysis tool. Based upon a user-provided seed link, such as in the example, the association of the AbstractFigure interface with the subject element, the tool infers new links between elements at the pattern and source levels, and iterates on them while allowing the developer to manually confirm, remove, and adjust inferred links. Compared to software reflexion models, it is easier to create a useful set of specified concerns, but we note that it is still not a trivial task. It requires developers to be experienced in using the patterns, and 30

the text describing the pattern should be compact and consistent for it to be useful in the generation of an initial pattern level specification.

31 the text describing the pattern should be compact and consistent for it to be useful in the generation of an initial pattern level specification. The other limitation, of course, is that the approach only supports patterns, making it quite narrow and low level in scope ConcernMapper ConcernMapper [102] is an approach for fully extensional concern mapping. It distinguishes itself from software reflexion models and design pattern rationale graphs in its use of a very lightweight model for specified concerns. The lightweight model allows developers to define arbitrary abstract concerns and include any arbitrary code fragments to their corresponding concrete concern. While this allows any abstract concern to be captured and managed, the trade-off lies in it being difficult to provide meaningful tool support to build and maintain the models. In ConcernMapper, developers define a model of specified concerns that consist of just a set of names. Code fragments do not need to be related in any way, whether structurally, lexically, or otherwise, to be part of a concrete concern. To map a specified concern to its concrete concern, developers manually mark relevant code fragments. Figure 14 shows a specified concern called Autosave Feature whose corresponding concrete concern includes several classes and methods, including the Buffer class, its autosavefile field and its autosave() method. Figure 14. A concern in ConcernMapper, from [102]. ConcernMapper allows for several specified concerns to be presented at the same time, and the association of certain code fragments with concrete concerns can be changed simply by dragging them in the view from one to the other. It also supports filtering results in the search view by concern, which hides any search results (i.e., code fragments) that are not part of the concrete concern. To provide some support for populating the model, ConcernMapper enables the developer to provide a seed code fragment that serves as input to a variety of possible analyses (such as tracing dependencies or looking 31

32 for shared terminology in variable naming) to identify candidate fragments. Other new kinds of analyses can be plugged in as needed. We note that that the approach, to a certain extent, blurs the difference between specified and concrete concerns. The code fragments that collectively make up a concrete concern are shown directly below the name of the specified concern. This on one hand simplifies the approach, but on the other hand lacks facilities to enable developers to define explicit relationships among specified concerns, or to identify where in the code some concrete concerns overlap. Additionally it suffers from scalability problems in that a tree view is not optimal when many concerns and code fragments are present. This last issue is explored in more detail by SourceMiner [27], a tool which extends ConcernMapper with the aim of supporting visualization of concerns at a much larger scale. It particularly provides views based on Polymetric [69] and Treemap layouts [13], which are known to scale much better Software Plans Software plans [93] is an editor-based approach for modeling of concerns. A plan is a partial view of the code that includes only the code fragments relevant to one or more specified concerns. In particular, the approach relies on the editor to highlight or hide code fragments that are part of some concrete concerns corresponding to the specified concerns of expressed interest. Unlike some of the previous approaches in this section, which focus on showing the different code fragments that are part of a single concrete concern, software plans is designed to clearly identify overlapping concrete concerns at a finegrained level. Figure 15 shows an example of code annotated with a software plan. The columns on the left side of the code represent individual specified concerns, with each column having a distinct name (not shown). A mark is present in the column if the code fragment implements the specified concern. For example, the declarations of integers in the first lines of code are related to three different specified concerns, while the following code fragment is related to just one. As with ConcernMapper, we note that the distinction between specified and concrete concern is blurred. One representation of concerns exists that serves both roles. To update the model, developers can select a set of lines of code and add it to one or more of the concerns. They can also define a software plan, a grouping of related specified concerns. Once a software plan is defined, they can select it to show only the concrete concerns for the plan and have the code fragments for all other concerns hidden. 32

33 Figure 15. Example code in software plans, from [35]. The implementation of software plans [34] differs from other extensional concern modeling approaches since it relies on changing the program storage model. That is, the approach abandons the classical module-to-file equivalence by storing all code fragments separately and composing the code shown to the developers on the fly. This enables a fine-grained mapping of concerns, since individual lines of code can themselves be distinct code fragments. The advantage of software plans is that it presents all relevant code together. At the same time the code that is presented is now out of context in terms of not being complete classes or methods, which may make it more difficult to understand. Furthermore, the visualization does not support dependencies across concerns (other than there being overlap), although adapting the visualization technique of Code Bubbles [20] would be interesting to explore to alleviate that issue ArchEvol ArchEvol [86] is an extensional concern modeling approach that, similarly to software plans, operates in the editor and uses columns to highlight which lines of code belong to which concern. However, it provides much more functionality than just highlighting the concerns by coloring lines of code, particularly: (1) an additional visualization at the architecture level, (2) an automated heuristics that update the concrete concerns with each change, and (3) evolution view that highlights how concerns become more or less scattered over time. Figure 16 presents the code-level view of ArchEvol, which allows developers to define and manipulate specified concerns in a separate model (1 in the figure). From this model, the developers can select a 33

34 current set of concerns that they want to be active (2). These active concerns are highlighted in the code gutter (3). The key innovation in ArchEvol is its heuristics, which infer how the concrete concerns should evolve when the developer modifies the code. For instance, code inserted adjacent to a concrete concern s code fragment extends that fragment and automatically becomes part of that concrete concern. As another example, when a developer deletes some lines of code, if the now adjacent code fragments are part of the same concrete concerns they are merged into one larger code fragment. As a result of these and other heuristics, the effort to maintain the concrete concerns is reduced, with as much as 75% of the necessary updates to concrete concerns being automatically derived in one of the assessments of the approach [86]. Figure 16. Code level view of ArchEvol, from [86]. Figure 17 presents the architectural view of ArchEvol, which presents the components of the system and their corresponding specified concerns. A component indicates, using small colored boxes on the inside, each of the specified concerns whose corresponding concrete concerns have code fragments located inside the component. Each colored box is only partially colored, the proportion representing the percentage of the concrete concern that is present in the component. As a result, scattering (spread of one color over multiple components) and tangling (multiple colored boxes inside a component) are easily recognized. 34

35 Figure 17. High-level view of ArchEvol, from [86] Role-based Refactoring Role-based refactoring [47] is an approach that allows developers to use a highly expressive definition of specified concerns to drive a process of refactoring of crosscutting concerns. Role-based refactoring allows developers to abstractly define a series of transformations to the code fragments belonging to a crosscutting concern in order to reduce its tangling and scattering. To achieve this, the approach relies on generic refactoring descriptions called role-based crosscutting concern refactoring descriptions (top of Figure 18) to represent a specified concern as a series of role elements with associated refactoring instructions. Role elements define the abstract structure of the concern in terms of the types of code fragments involved. For example, a refactoring description for an observer crosscutting concern can define that some class must play the role of the subject, and that it must contain methods that realize the role of updating the subject. The refactoring instructions (top right hand side of Figure 18) define transformations of the role elements, such as creating, moving, replacing, or deleting code fragments. Developers apply a refactoring description by associating it to a particular crosscutting concern, defining how the role elements correspond to some specific set of code fragments (e.g., mapping the subject role to a class and the update role to some of its methods). The tool plans the execution of refactoring steps, and uses impact analysis to allow developers to make specific choices for issues that arise during the refactoring process. For instance, if a method that references a private field is moved to an aspect, the tool can suggest changing the visibility of the aspect, changing the visibility of the field, or adding getter/setter methods for the field, to avoid compilation problems. 35

36 Figure 18. Role-based concern refactoring, from [47]. Unlike previous approaches in this section, the primary objective in defining specified concerns in rolebased refactoring is not to build a map to the corresponding concrete concern. Rather its purpose is to modify some code fragments to construct a new concrete concern or to better separate the existing concrete concerns. To do this, it relies on developers defining a generic strategy for realizing a potentially quite complex refactoring process, which ideally can be applied multiple times to correct similar problems elsewhere in the code. As such, role-based refactoring is a generic technique. However, the effort involved in defining the specified concern and the refactoring instructions might only be worth it if indeed it can be reused in the future Arcum Arcum [109] is an approach, which, similarly to role-based refactoring, supports developers in refactoring crosscutting concerns based on an explicit model of the specified concern. Arcum allows for code fragments belonging to a crosscutting concern to behave like a module from the perspective of checkability and substitutability, but not from the traditional perspective of encapsulation. This means that the refactoring process of Arcum does not result in the extension of a crosscutting concern into its own separate file or module; instead Arcum is unique in allowing a single concrete concern to have multiple alternative implementations that can be substituted for one another, as long as they match the specified pre- and post-conditions. Figure 19 presents an example of the refactoring of a crosscutting concern with Arcum. In this example, the developers wants to refactor the use of the field alttext in the original program (left hand side, bottom of figure) to use a static map instead (right hand, bottom). To do this, they must first define a generic behavior interface (top of figure), which in this case specifies a generic behavior of setting an attribute value attrset( ) with various pre- and post-conditions (not shown). The generic behavior, then, 36

37 is detailed with two alternative implementations (called options, which are defined in a metaspecification language), the original using field assignment on the left side, and the new one through the use of the static map on the right. Based on the behavior interface and the options, Arcum identifies how to change a code base with an attribute stored and accessed using a static field to one with the attribute stored in a map, as indicated by the arrows in the figure. Figure 19. Behavior interface and options in Arcum, from [110]. The key benefit is that the transformation is generic and automated. Arcum actually allows developers to go back to the original version of the concrete concern, or to replace it with a third alternative altogether. As such, the approach allows for replacement of the crosscutting concern even though it is not strictly encapsulated, which represents an interesting alternative compared to other approaches for refactoring crosscutting concerns. At the same time, it is quite complex to internalize the approach and its capabilities, and specifying behavior interfaces and their options is quite involved CME CME [50, 52] is an approach that supports what its authors call the extraction/composition cycle of concerns. This cycle refers to the process of identifying specified and concrete concerns, refactoring and extracting of crosscutting concerns, and composing the resulting concrete concerns in one of many different ways. The objective of CME is to provide a rich concern modeling environment that is also generic and extensible. In this environment concerns are the first-class elements that drive the whole development process. 37

38 The model for specified and concrete concerns in CME is based on Cosmos [120]. This model is much richer and more expressive than any of the others presented in this section, allowing for arbitrary modeling of both specified and concrete concerns, as well as of rich relationships among them. Figure 20 presents an example view 2 of a model that represents some specified concerns (e.g., the feature Naming), relationships (e.g., nodes under class Entity), and concrete concerns with their code fragments (e.g., class org.eclipse.cme.util.parsedtypename). The approach allows for code fragments that make up a concrete concern to be identified both extensionally or using queries over the code 3. In the figure, the Naming specified concern corresponds to a concrete concern whose code fragments are identified using a query that finds methods matching a specific pattern. Figure 20. Concern Explorer view in CME, from [50]. CME stands apart in its extensibility. The approach provides a series of extension points through which more elaborate support for the extraction and composition of concerns can be provided. These extension points can be used to recognize, for instance, the kinds of crosscutting concerns identified in Section 5 by adding custom components to support their specification and identification. Moreover, to 2 The view only highlights some key elements and does not capture the full expressiveness of the underlying model. 3 These queries are similar to those of intensional concern finding approaches, a topic we explore in more detail in the Section 7. 38

then refactor them out and have them be separately, developers can plug in additional refactoring algorithms, and to then compose actual systems out of these independent specified concerns, a default

39 then refactor them out and have them be separately, developers can plug in additional refactoring algorithms, and to then compose actual systems out of these independent specified concerns, a default builder can be replaced with more specialized builders that deal with their selection and configuration Identifying, Assigning, and Quantifying Crosscutting Concerns Eaddy et al. [38] define a methodology to identify specified concerns and concrete concerns, and a set of metrics to evaluate their degree of scattering and tangling. The approach stands apart in providing guidelines for identifying specified concerns and assigning code fragments to the concrete concerns through qualitative heuristics, which are based on determining what they call a component-code removal dependency. The process is divided into two phases: concern identification and concern assignment. The first phase includes the identification and hierarchical organization of specified concerns. Figure 21 presents a conceptual view of the specified concerns (source model on the left of the figure). Concerns are organized in a hierarchy, and are identified by the developer with the help of guidelines. These guidelines include, for instance, defining concerns objectively, as opposed to subjectively, and definitively, meaning that their specification allows developers to precisely reason about the abstract concern. Figure 21. Source model (concerns), target model (code), and their relationships, from [38]. The second phase, concern assignment, involves the association of code fragments to concrete concerns (target model on right in the figure). The approach provides a series of guidelines for this phase as well. These guidelines are based on a heuristic they call component-code removal dependency, which means that a code fragment is part of a concrete concern (i.e., it is related to a specified concern in the source) if removing the specified concern from the system implies removing or modifying that code fragment. Once manual identification of concrete concerns has been performed, the approach provides the developer with two metrics that quantify the level of scattering and tangling of the concrete concerns: the degree of scattering (how many lines of a concrete concern are contained in a component) and degree of focus (how many lines of a component are part of a concrete concern). The main advantage of the approach is that it provides explicit guidance as to how to identify specified and concrete concerns. However, the guidelines are quite generic and, although the authors claim that mapping to fine-grained code fragments is possible, the guidelines presented seem to be more appropriate to coarser code fragments. 39

40 6.10. Observations The key advance made by extensional concern modeling approaches is that they have an explicit model for specified concerns. All of the approaches have such a model, which can be separately created, examined, and updated. The power of the model varies, with some approaches just having a set of names (ConcernMapper), but others expanding the model with relationships (software reflexion and CME) or with more elements to represent richer semantics (design pattern rationale graphs). As a result, the range of support that is given to developers using this model also varies a lot, from approaches that simply allow finding the concrete concerns for some given specified concerns such as software plans, to approaches that allow refactoring concrete concerns such as role-based refactoring and Arcum. Given that a separate model exists, the relationship from this specified model to its concrete model becomes of interest. Concrete concerns no longer exist in isolation, but in the context of their specified concern. Whereas the mapping is one-to-one (i.e., one specified concern to one concrete concern), how the specified concern is identified and how code fragments in the concrete concern are found are of crucial importance, and differ significantly among the approaches. While some approaches, such as Software Reflexion Models and ArchEvol, provide only manual support for this purpose, others go further in this respect by, for instance, relying on an elaborate model that is semi-automatically mapped to code, such as in Design Pattern Rationale Graphs, or providing some heuristics to identify specified concerns, such as in the approach of Eaddy et al. With this rise in capabilities and expressiveness of the concern models, of course, comes the tradeoff of usability. First, there is the issue of defining the appropriate size and scope for the specified and concrete concern models. It might be relatively easy to identify a few specified concerns with their corresponding coarse grained concrete concerns but this may not be so helpful to the next developer examining the concern. To get the full power of the approach, and a much stronger concern fit, it is necessary to identify the concrete concerns in as much precision as possible. This, typically, is much more labor intensive, especially given the relative absence of automated support for evolving the concerns when the code is modified. ArchEvol is an important exception in this regard and more work is needed in this realm. 7. Intensional concern finding Extensional concern modeling approaches presents advantages regarding their flexibility in accommodating emergent concrete concerns. However, current approaches typically require a significant effort from developers to explicitly identify the specified and concrete concerns, with no guarantee that the specified concerns identified will remain of relevance in the future. Intensional concern finding approaches address these shortcomings by providing automated support for identifying a concrete concern given an intensional expression of a specified concern. An intensional expression defines a specified concern in terms of a predicate that can be evaluated by performing some type of analysis over the code. This expression can be a query, pattern, or even just a 40

41 heuristic. As a result, the effort required from developers to express interest in a specified concern, and identify its corresponding concrete concern, is usually decreased. Intensional concern finding approaches provide support for finding concrete concerns as illustrated in Figure 22. At any moment in time, and with no previous effort required to identify concrete concerns, a developer can define one or more specified concerns (left hand side of the figure). Executing the corresponding analysis for each of the specified concerns produces a series of concrete concerns (right hand side of the figure). Developers can then easily find the code fragments that implement the specified concerns. Figure 22. Intensional concern finding One of the first projects that conceptually supported the idea of intensional concern finding is Information Transparency [43]. This work argues that an approach can support modularity with an alternative to traditional information hiding and encapsulation, by relying on the concept of similarity. This concept refers to identifying code fragments belonging to a concrete concern based on them having similar characteristics, such as textual similarity, structural resemblance, or the common use of some specific variables, object, or resource. Information transparency exposes interdependence among code fragments that might not be physically grouped in a module, and allows developers to understand them and reason about them in isolation of the rest. In the below, we overview the diversity of intensional concern finding approaches that build on the base ideas of information transparency. 41

42 7.1. AspectBrowser AspectBrowser [42, 44] is an approach that enables scalable visualization and manipulation of a crosscutting concern. A crosscutting concern 4 in AspectBroswser is the set of all lines of code, possibly scattered over several files, which contain text that matches a user defined regular expression that acts as the specified concern. AspectBrowser visually overlays crosscutting concerns over a traditional modular view of the system, treating both as first-class entities. Figure 23 presents a view of AspectBrowser. Modularized concerns are represented spatially with boxes in a Seesoft [40] inspired view. Crosscutting concerns are represented by highlighting lines of code with different colors for each concern. In the figure, two crosscutting concerns, format and cursor, are highlighted in blue and green respectively, with red lines representing code where both overlap (lines that have both the text format and cursor). AspectBrowser can hide modularized concerns that do not have code related to any of the crosscutting concerns, and allows developers to zoom in and out of the view. The editor can be opened from the view to access specific code fragments related to any concrete concern. Figure 23. The AspectBrowser tool, from [44]. AspectBrowser is a tool that provides the base functionality of intensional concern finding. It supports finding lines of code related to a specified concern using textual pattern matching, creates a visualization of the concrete concern (i.e., modularized concerns represented as boxes, and crosscutting 4 What we refer here to as crosscutting concerns are called aspects by the authors of AspectBrowser (thus its name), but we use the more generic term since the aspect term is heavily associated with crosscutting concerns encapsulated using a particular programming language, which are different from the ones in this approach. 42

concerns as colored lines), and allows developers to directly manipulate and navigate the code fragments for both modularized and crosscutting concerns.

43 concerns as colored lines), and allows developers to directly manipulate and navigate the code fragments for both modularized and crosscutting concerns. However, representing a specified concern using textual patterns might not be intuitive to developers, and expressing complex abstract concerns in this manner might not be a trivial task or even possible in some cases FEAT FEAT [99, 100] is a tool that compared to AspectBrowser, provides two key advantages: (1) the approach does not rely solely on regular expressions, and (2) it allows for the incremental construction of the concrete concern. In this task-oriented approach, the developers progressively explore the code and explicitly decide which code fragments are of relevance to the current task. The success of identifying a concrete concern, then, is not as dependent on the developers being disciplined in their coding practices, which makes FEAT able to provide good support for opportunistic or late identification of concerns. Using FEAT, a developer starts by first defining a seed element, for instance by selecting a class that seems of relevance to the current task. They can then run queries such as get superclass or fan-out (get all outbound relationships from certain method). Developers can select, from the query results, the code fragments that are relevant to the current task. They can iterate again on this process by executing more queries and selecting other code fragments of importance. This process result is a concern graph, which constitutes an unlabeled specified concern, and defines a set of code fragments, which constitutes the concrete concern. Figure 24 presents an example of a concern graph (left) and its presentation in the FEAT tool (right). The concern graph shows a specified concern which corresponds to some code fragments inside class ExceptionBlock and all the code from class TryBlock. It specifies that, from the ExceptionBlock class, the code fragments of relevance are its field aelements, part of method getexceptions() (which specific part is not visible in this view), the method addelements( ), and the relationship between the method getexceptions() and the field, which also has been annotated to indicate that the method getexceptions() reads the field aelements. Figure 24. FEAT tool (left) and corresponding Concern Graph (right), from [99]. Another distinct characteristic of FEAT is that the relationships between the code fragments that make up a concrete concern can be explicitly captured and labeled, making interpretation easier. Developers can identify many different types of code level dependencies such as those corresponding to method 43

44 invocation, field access, object declaration, and class hierarchy. They can mark some of the dependencies, reifying them as relationships in the concern graph and annotating them with informative labels. On the other hand, the approach also has some drawbacks. First, developers can only access a single concern graph at a time, making it impossible to understand concerns in relation to one another. More significantly, however, the approach assumes that, in order to overcome the cost of building a concern graph for a particular activity, it must be of relevance to other tasks in the future, which may or may not be true. This highlights the issue that tasks are only to a certain extent an appropriate representation of concerns Visual Separation of Concerns Visual Separations of Concerns [28] is an approach that finds concrete concerns in a manner similar to AspectBrowser, but differs in that the code fragments belonging to a concrete concern are presented sequentially, as if they were contained inside a single module (much like the approach of software plans discussed in Section 6.4). To achieve this, the approach utilizes a different program storage model, one in which code is not stored in files, but in smaller, fine-grained units called fragments. Developers manipulate and modify the code in terms of a virtual source file, the representation of a concrete concern as a collection of code fragments grouped in a single view. A virtual source file, thus, represents an opportunistic grouping of a set of code fragment according to a specified concern of interest. Figure 25 presents an example view of the virtual source file editor. Developers create a virtual source file dynamically by selecting code fragments using regular expressions. In the example, the developer has included methods that implement the interface IMarker. Code fragments that match this selection are included in the virtual source file, and separated with horizontal bars in the editor on the left. The outline view on the right shows an overview of the virtual source file and all its code fragments. 44

Figure 25. The Virtual Source File editor, from [28]. The approach is similar to AspectBrowser in the use of syntactic and lexical code analysis to identify a concrete concern.

45 Figure 25. The Virtual Source File editor, from [28]. The approach is similar to AspectBrowser in the use of syntactic and lexical code analysis to identify a concrete concern. However it goes one step beyond, presenting a view that shows the matching code fragments as if they were part of a single encapsulated module. The approach also provides a more expressive mechanism to identify code fragments than AspectBrowser. Finally, the approach is implemented with the support of a software configuration management system, which treats fragments and virtual source files as first-class entities. The authors argue that this is beneficial because information regarding historical changes could be used to support a visualization of evolution of the concrete concerns Software Reconnaissance Software reconnaissance [125] uses dynamic analysis of test cases to identify the concrete concern corresponding to a specified concern. The approach, implemented in the TraceGraph tool [74], allows developers to describe a specified concern of interest by selecting a set of test cases that exhibit an abstract concern (i.e., whose execution is indicative of the existence of the abstract concern in the system), and a set of test cases that do not exhibit it. The tool then compares the execution traces of the test cases to identify those code fragments that were exclusively executed by test cases that exhibit the abstract concern. Figure 26 presents an example view of TraceGraph. Vertical lines represent time intervals in the execution of the test cases and each row represents a component in the system. A dark rectangle represents that code inside the component was executed during that time period. In this way, developers can visually tell that rows which have more dark rectangles are the ones that represent the components most related to the concrete concern. 45

46 Figure 26. A view of TraceGraphs, from [74]. Unlike any of the previous approaches, in software reconnaissance the association of a code fragment to the concrete concern is not binary, but forms a spectrum. Some components are distinctly not related to the concrete concern (their row is completely empty), others only slightly, yet others more significantly, and some even completely, as evidenced by the presence of a row full of dark rectangles. The strength of the approach lies in not relying on developers having to define a query or pattern over the source code but on the actual execution of the system with test cases to identify the concrete concern. Test cases, in focusing on functionality, can be closer to the nature of concerns of interest than, for instance, a regular expression over the code, since the latter remains syntax oriented. However, the approach relies on having available a sufficient number of test cases to distinguish different abstract concerns, and particularly requires each test case being sufficiently narrow to be precisely associated with a limited amount of code Mylyn Mylyn is a tool similar to FEAT in assisting the developer with a task model to capture specified and concrete concerns. Unlike FEAT, which requires the developer to explicitly build this model with seeds and queries, Mylyn automatically constructs a model by monitoring the interaction of the developer with the IDE. Particularly, it tracks which files the developer opens, navigates, and modifies, from which it determines the relevance of each part of each file to the current task. Figure 27 presents a view of Mylyn; each side showing the environment for one of two tasks on which a developer is working concurrently. The tool highlights the code fragments that make up the current concrete concern in several different ways. First, the package explorer only shows a fraction of all the packages and classes in the project. Second, the tool highlights the most relevant classes and methods by showing them in bold lettering. For the task on the right side, for instance, the run() method as well 46

as a few other code fragments are highlighted. Using Mylyn, then, the developer can easily switch between tasks and be presented with just the code fragments related to the task at hand. Figure 27.

47 as a few other code fragments are highlighted. Using Mylyn, then, the developer can easily switch between tasks and be presented with just the code fragments related to the task at hand. Figure 27. Views of Mylar (predecessor to Mylyn) for two different tasks, from [64]. In Mylyn, once again, the task serves as proxy for the specified concern, enabling the automation of keeping track of which code fragments form the concrete concern. Similar to Software Reconnaissance, Mylyn also supports a non-binary association of code fragments to concrete concerns. The numeric relevance of a certain code fragment in the task context model increases and decreases over time as a developer interacts with the code. However, the threshold for determining code fragments of relevance, and the specific value that indicates the relevance of a code fragment to a concrete concern, are not visible to developers. Clearly, Mylyn provides advantages in reducing the effort to define the specified concern and identify its corresponding concrete concern. The trade-off, however, is that the usefulness of a particular task context model is limited to the duration of a task. After the developer finishes the task, its task model disappears. Reusing it, even if a future task is highly similar, is not currently possible Observations The approaches we present here are representative of the relative diversity of intensional concern finding approaches. While all of the approaches seek automation of the identification of relevant code fragments, they employ a wide spectrum of analyses with a broad range of resulting capabilities. On one end of the spectrum are simple pattern matchers, which are fast and easy to interpret, but not too expressive with respect to the full range of abstract concerns a developer may have. On the other end are dynamic analysis and IDE monitoring techniques, which are much more precise, but depend on the abstract concern being accurately approximated by heuristics reflecting human behavior. Regardless of the approach, the key benefit is the reduction in effort to find the concrete concern, but the concession is that it does not work for all different types of concerns, since different heuristics can only approximate certain types of concerns. Yet, for some emergent concerns, intensional concern finding 47

48 may well be the only option available to the developer when it comes to relatively rapidly constructing a concrete concern. Another important observation is that intensional concern finding approaches can complement traditional modularization of concerns approaches. Neither excludes the other, making their combination appealing to flexibly address both planned and emergent concerns.. Intensional concern finding also nicely meshes with extensional concern mining approaches, as intensional queries and the analysis results can seed the concern models. This is the case, for example, with CME in Section 6.8, which can use intensional queries to initially find and then maintain the concrete concerns. Finally, we observe that, to date, intensional concern finding approaches are still in the relative early phases of research compared to the decades of research of modularization of concerns. As a result, the full range of heuristics and analyses has certainly not been explored to the fullest. There are many opportunities in this respect, particularly in combining some of the existing analyses, or using machine learning techniques and large-scale data mining techniques to learn more about the true nature of concrete concerns and the actual ways in which developers find and work with them. The Mylyn archive of all tasks performed, given its current adoption rate [39] provides a rich, unexplored resource in this regard. 8. Concern Mining Intensional concern finding reduces drastically the effort required to identify concrete concerns by relying on the user-driven identification of specified concerns as input to an analysis process that results in a set of concrete concerns. Going one step further, several more recent approaches allow developers to find both specified and concrete concerns in an automated manner, without the need to define either the specified or the concrete concerns in advance. Approaches in this last category, concern mining, realize that it may be possible to automatically identify some specified and/or concrete concerns by deriving them from the code. Figure 28 highlights how concern mining approaches work. On the left side, it shows the state of the code at some moment in time in which no specified or concrete concerns have been defined. By using an algorithm that operates over the code, a concern mining approach groups the code into three specified concerns and their corresponding concrete concerns. 48

49 Figure 28. Concern mining. The approaches we present in this section differ in the result of the algorithm, particularly the type of specified and/or concrete concern produced as output. Some approaches produce a visual representation of concrete concerns, others a list of aspect candidates, and yet others a list of specified concerns. In the below, we present representative approaches of these three types: concern visualization in Sections 8.1 and 8.2, aspect mining in Sections 8.3 through 8.6, and specified concern mining in Sections 8.6 through Active Models Active models [30] is a concern visualization approach that focuses on supporting developers by visually representing relationships between the aspects and base code. By analyzing relationships between aspects and classes, it produces an active model, a visualization that is similar to a UML class diagram. The model is active, in the sense that developers can automatically expand and reduce the scope of the model to introduce or remove code fragments of relevance. The approach makes many relationships between concrete concerns visible and explicit. The overarching goal is to allow developers to better understand and search through an aspect-oriented code base. To build the active model the developer selects an existing aspect as seed, upon which the tool performs a syntactic code analysis to identify all other relevant concrete concerns and their relationships to the aspect. Figure 29 presents an example of an active model from [30], which shows how the Billing aspect is related with the base classes. Particularly, it shows that the drop() method is intercepted, and that the only invocation of drop(), which triggers the execution of the aspect, is from inside the class Call. 49

50 Additionally it shows inter-type declarations (i.e., declarations of fields or methods in the aspect that are added to a base class). In this particular case, the payer field and callrate() methods are added to the Connection class (notice the arrow icon next to these declarations). Finally, the visualization represents dynamic inheritance, corresponding to declare parents statements in aspects. In the figure, the LongDistance and Local classes are shown as extending the Connection class, which is inferred from these types of declarations in the Billing aspect. Figure 29. Active Model for aspect Billing, from [30]. It is important to outline some of the key differences with the intensional concern finding approaches of Section 7. While at first glance Active Models might seem similar in that it involves a seed and an analysis, we highlight that the seed is minimal, and that the purpose is not to find the code fragments belonging to a concern, but rather to represent how a concrete concern fits with the other concrete concerns. The goal is more toward creating a system level understanding of the existing concrete concerns. As a further difference, the expansion and reduction of the scope of the diagram is performed in a non-user-driven manner, which means that the tool automatically decides which elements to include or remove from the view, unlike the user-driven exploration of query-based approaches such as FEAT SoQueT SoQueT [77] is another concern visualization approach with a focus on automatically finding specific types of crosscutting concerns characterized by their implementation in OO languages [75]. SoQueT performs an analysis of the code that results in the identification of a list of crosscutting concerns that match (i.e., are instances of) crosscutting concern sorts. A crosscutting concern sort (a sort for brevity) is a generic description of a type of functionality identified by its intent, and its common implementation structure in object-oriented code. The method consistent behavior sort, for instance, finds sets of methods that consistently invoke the same action as part of their execution. Other examples of semantically expressive sorts are: code units that consistently check the same condition, common layers of redirection, superimposition of roles, and similar call-chains. 50

51 Figure 30 presents the SoQueT tool; the bottom panel shows how a developer uses the method consistent behavior sort. In this case, they selected one of the found results, namely the consistent invocation of the method DrawingView.checkDamage(). The view for this sort (see Figure 30, bottom of the IDE) shows the set of corresponding code fragments. This view enables navigation between the code fragments and allows for expansion using queries on specific nodes (e.g., filtering for other sorts that include a particular code fragment). Unlike with Active Models, the views resulting from this process are stored in what the authors call crosscutting concern documentation. This documentation can be accessed by developers to maintain the code for the concern, or extract it into an aspect in a process of refactoring using an associated tool called SAIR [76]. Figure 30. Concerns in SoQueT, from [77]. The use of semantic analysis of code represents a departure from the pure syntax based approaches like Active Models. This is advantageous because the identification of more complex crosscutting concerns cannot be performed syntactically. SoQueT recognizes this problem and focuses on common symptoms of crosscutting concerns PDG-based clone detection Shepherd et al. [108] introduce an aspect mining approach that uses code clone analysis to produce a list of aspect candidates. An aspect candidate is a crosscutting concern that is currently scattered and tangled and can be found and refactored according to some heuristics and analysis. By identifying and refactoring these crosscutting concerns into aspects, they are no longer scattered and tangled and should therefore be easier to understand and maintain. 51

52 Shepherd et al. s technique assumes that the common manifestation of crosscutting concerns is the duplication of code. Particularly, the approach uses an analysis of program dependency graphs (thus, PDG-based) to identify code clones. A PDG represents methods and invocation dependencies as nodes and edges, respectively. Figure 31 presents an example of two PDGs for code fragments which it decided are code clones. The structure of the code and the dependencies are relatively similar, but there are some differences in the individual statements. The PDG analysis is, however, capable of identifying that these two code fragments are clones and could potentially be refactored into an aspect. Figure 31. PDGs of two code clones, from[108] The approach receives as input a code base and runs an analysis that searches for code clones across the AST. The approach incorporates several filtering and pruning techniques to reduce the number of matching PDGs. Its final output is a list of aspect candidates in the form of sets of classes that contain code clones. The developer must then manually go through the list of candidates and decide which ones are to be manually refactored into aspects. Using clone detection to identify aspect candidates is an approach relying on a relatively straightforward heuristic to identify crosscutting concerns. Nevertheless, the PDG-based clone detection approach is an important step forward, particularly since identifying crosscutting concerns that can potentially be encapsulated in a separate module can be very helpful with respect to the modularization of concerns approaches of Section 5. An important factor in this regard is that the developer no longer needs to search for possible crosscutting concerns by hand, but is automatically provided with candidates that they can then examine DynAMiT DynAMiT [21], unlike PDG-based clone detection, which relies only on static code analysis, relies on dynamic analysis. Particularly, it analyzes event traces generated from a program s execution. To identify different code fragments that exhibit reoccurring method call patterns, DynAMiT finds different types of execution relationships between methods. The inside-first relationship, for instance, indicates that a method is the first method executed inside some other method. Figure 32 presents an example of 52

a program trace in which method issessionactive() has an inside-first relationship with several isenabled() methods, which are declared inside various classes.

53 a program trace in which method issessionactive() has an inside-first relationship with several isenabled() methods, which are declared inside various classes. The icon beside the issessionactive() methods indicates the type of execution relationship, namely inside-first. This execution trace is identified as an aspect candidate by DynAMiT, because the pattern was found multiple times. Figure 32. Example traces for an aspect candidate produced by DynAMiT, from [21] The use of dynamic code analysis is an interesting alternative to static analysis techniques such as clone detection, and has the potential to find different crosscutting concerns, or, if not, at least those that are prevalent in a program s execution and therefore perhaps more important to refactor. The quality of the findings of DynAMiT, however, depends on the availability of executable use cases that can expose the presence of the crosscutting concerns. The use cases ideally represent an exercise across the broad range of features of the system HAM HAM [22] leverages the code version history to find aspect candidates. HAM is based on the analysis of transactions (i.e., commits of a set of changes to several files) in the versioning system. Specifically, it analyzes the insertions of method invocations in these transactions. It finds aspect candidates by filtering all transactions using heuristics that, the authors argue, are indicators of crosscutting concerns. For instance, it looks for specific transactions in which many scattered invocations of methods are inserted, and in which those method invocations are not commonly inserted in other transactions. It also aggregates transactions that occur close in time and by the same developer. The tool produces a list of aspect candidates in the form of a list of methods whose invocation is indicative of the crosscutting concern. HAM is a departure from approaches that analyze a single version of the code. By including information mined from several versions of the code, HAM is able to identify changes that are likely logically related, and thus correspond to specified concerns that might be of relevance to the developers. In relying on developers actions, instead of the code base, it is somewhat analogous to FEAT and Mylyn, and is able to find aspect candidates that would probably not be identified using code-based aspect mining approaches. HAM, however, is still limited in merely presenting a list of aspect candidates, and not supporting the user in analyzing the candidates and actually refactoring those that they deem worthy of being refactored Semantic clustering Semantic clustering [68] is an approach to identify specified concerns and their corresponding concrete concerns by finding clusters of topics called semantic clusters. A semantic cluster represents as specified concern, whose corresponding concrete concern contains a set of code fragments that use similar 53

54 vocabulary concepts. The vocabulary concepts (words or terms) present in a code fragment are identified using Latent Semantic Indexing, a generic technique for finding similarities in vocabulary use and clustering the results accordingly. Semantic clustering, unlike the previous approaches in this section that support finding concrete concerns, focus on finding abstract concerns. Figure 33 presents a visualization of semantic clusters for a sample system, from [68]. Each gray box on the left side of the figure represents a semantic cluster. Each colored box with a list of words, on the right, represents a set of related vocabulary concepts that are most present in the corresponding concrete concern, and thus related to a particular semantic cluster. The size of the grey boxes conveys the amount of code that makes up the corresponding concrete concern, and thus, the relevance of a semantic cluster. Figure 33. Visualization of semantic clusters, from [68] The semantic clustering approach additionally provides support for visually representing the mapping of semantic clusters to the concrete concern and its code fragments (see Figure 34). Each box represents a package, each file is represented by a colored box. The color associates each file (a code fragment) with its corresponding semantic cluster (e.g., the red boxes in Figure 34 represent code fragments that correspond to the topmost specified concern of Figure 33). 54

55 Figure 34. Semantic clusters mapped to packages and classes, from [68] The authors of the approach argue that the visualizations of semantic clusters support developers in getting a first impression of the topics of relevance in an unfamiliar system. In our terminology, the approach identifies potential abstract concerns which are specified as lists of terms. This is an important step forward for concern-oriented development, as this is one of the first approaches attempting to do so. At the same time, and unsurprisingly given the newness of the approach, the authors found that most topics refer to implementation specific concerns. Other types of abstract concerns, such as those related to system domain semantics, were not as evident TopicXP TopicXP [106] is an approach to identify a series of relevant topics in a system using a type of analysis called Latent Dirichlet Allocation [17]. Latent Dirichlet Allocation is an upgrade over Latent Semantic Indexing, that has shown particular value in identifying latent topics in large corpora of text documents [78]. Adjusted to code in TopicXP, the approach uses as input the text in the declarations of code fragments as well as inside in-line comments, with some predefined weights for different types of code fragments (e.g., text in class declarations is considered more relevant that text in an attribute declaration). As output, TopicXP produces a list of the keywords that co-occur most frequently, which can then be manually labeled by developers as topics. Already useful for finding potential abstract concerns (the topics it finds), the tool extends the basic analysis to also find the corresponding concrete concerns. To do so, it repeats the LDA analysis at the package and class level, which finds which topics, out of those found at the system level, are most frequent in each package or class. Combined with a static analysis of dependencies among packages and classes, the result is shown in Figure 35. Each box represents a topic, a collection of terms, and a list of packages or classes highly associated with this topic. Arrows indicate relationships between topics, according to dependencies identified between the packages and classes referenced by each topic. The tool allows users to zoom into a topic, and visually identify the packages and classes that are more or less related to the topic. The tool additionally provides an assessment of the cohesion of packages and classes based on an inverse relationship to the number of topics found. The more topics, the lower 55

56 the cohesion. Users can navigate to code fragments from this view, define queries to filter the view, and configure the parameters of the LDA analysis to define, for instance, the number of keywords per topic, and the probability thresholds to include or discard keywords in a topic. Figure 35. TopicXP topic dependency view, from [106] TopicXP is very new still and, just like semantic clustering, has promise in identifying potential abstract concerns of which a developer should be aware. At the same time, the approach supports only the basic functionality of identifying a set of topics and mapping the topics to code. Developers cannot iterate onthe analysis to further refine and improve the set of topics found Source Code Summarization Source code summarization [45, 46], similarly to the topic location approaches of the previous two sections, aims at finding the abstract concerns that govern a system. However, instead of providing the user just with a list of topics and terms, the approach creates a human-readable description of the abstract concerns underlying a set of code fragments. The source code summarization approach automatically generates a text description that characterizes the most relevant aspects of the semantics and structure of a code fragment in what they call an extractive summary. The approach, conceptually, is based on finding the most relevant terms in the code fragment, ignoring common words and programming keywords, and adding information based on the structure of the code (e.g., making more relevant terms that correspond to method declarations). The authors present some initial evidence of the usefulness of summaries using different techniques to identify the relevant terms. These techniques include lead summaries (creating a summary using only the leading terms of a document), Vector Space Models (VSM) [105], and Latent Semantic Indexing. The source code summarization approach is perhaps closest to discovering abstract concerns, and giving a human-readable description of them. As such, it breaks important new ground and might well 56

AOSA - Betriebssystemkomponenten und der Aspektmoderatoransatz

AOSA - Betriebssystemkomponenten und der Aspektmoderatoransatz Results obtained by researchers in the aspect-oriented programming are promoting the aim to export these ideas to whole software development