Using AOP to build complex data centric component frameworks

Using AOP to build complex data centric component frameworks Tom Mahieu, Bart Vanhaute, Karel De Vlaminck, Gerda Janssens, Wouter Joosen Katholieke Universiteit Leuven Computer Science Dept. - Distrinet Celestijnenlaan 200A B-3001 Heverlee {tomma, bartvh, kdv, gerda, wouter}@cs.kuleuven.ac.be ABSTRACT This paper describes our experiences with building a component framework for a family of complex data processing applications. The framework architecture consists of a number of processing components operating on a central data structure. In building such a framework two important issues arise. The first issue concerns the complexity of the central data structure. The second problem arises when correct component compositions need to be made. The complex nature of the natural language processing application domain results in a core data structure, consisting of information common to the entire domain, plus a number of additional data types for each branch in the application framework. Due to the complexity and diversity of this information, these data types can easily cut across each other. We solved this problem by creating an aspect for each data type that may or may not cut across the core data structure. Secondly the component composition issue revealed the problem of hidden dependencies between components. These hidden dependencies complicate the creation of correct compositions with respect to component execution order. The dependencies boil down to the availability of componentspecific data. In our approach, we make the data availability requirement explicit by defining a number of dependency relations between processing components and aspects implementing the branch specific data types. These dependencies specify the types of data a component needs to be able to execute and the types of data that result from its execution. With this information we can create correctly ordered component compositions in a more validating way. In our component framework, aspects thus serve a dual role: On the one hand they solve the problem of cross cutting data structures, on the other hand they are used as a entities to validate component compositions. 1 Introduction Data processing applications usually revolve around a complex data structure that is processed and/or modified by a number of small subtasks. A possible way to model these applications is by the use of the pipes-and-filters architectural design pattern [1]. Each filter accepts a data structure, processes the data, stores its results in the data structure, and passes it on the next filter. In a framework environment, we would implement these filters to be small but welldefined tasks in order to obtain a set of highly reusable filters. Making an application would boil down to make a semantically meaningful composition of filters. In a data processing framework the data structure is an important issue. The data structure needs to contain all the necessary information relevant to the problem domain, and needs to be very flexible. Filters depend heavily on this data structure and every change made to the central data structure will affect all the filters.

In making filter compositions, one will soon notice there exist dependencies between different filters: filters need information stored in the data structure that is computed by other filters. However, a filter does not usually explicitly list these dependencies. The absence of this information renders it more difficult and more error prone to create valid compositions with respect to filter execution order. One could make an erroneous composition of filters and only become aware of it at runtime because the application would crash or produce erroneous results. Although there exist examples from pipeline architectures (See [3]) that check filter composability at compile-time through type checking, the data passed along these pipes is pretty simple. The type of data processing applications we target can have very complex data structures, which make simple type checking not feasible as composition validator. This paper describes our approach to solving this hidden component dependency problem. From now on we will talk about components because the complexity of our case study, a component framework for text processing systems, resulted in the processing subtasks being very configurable frameworks on their own. Section 2 elaborates on the text processing system we built. We will use this case study as an example in the paper. Section 3 clarifies the goals we wish to see accomplished. In section 4 we explain how we approached these goals using aspect-oriented programming. Finally section 5 discusses our approach. 2 Case study Our text processing system consists of a centralized tree based data structure. This core data structure reflects the structure of the text. The possible structure units are among others chapters, tables, bulleted lists, etc. The smallest structural unit is a paragraph consisting of a number of sentences with formatting information. The tree is designed to hold several variants of the same text. These variants can be several translations of the same text, or the same text annotated with application specific data. Next to the structured text content, the tree structure also contains data structures that are specific to several branches in the text-processing application domain. We will refer to these sub-data structures as component specific data types, because these data structures will only be used by certain processing components. The processing components in our framework are a number of visitors [2] that operate on the tree data structure. The visitor approach allows us to change our text-processing behavior with respect to the structure of the text. For example, in a translation system, we might want to use a different translation strategy for the items in a bulleted list than a normal paragraph. Every processing component is responsible for a small task in a branch of the large text-processing domain. Of course, a component will most likely use a subset of the component specific data types of the branch it belongs to. Component specific data types will usually be chosen in function of the processing components. In the case of a translation system, the components can be a dictionary lookup system, a chart parser, transfer and generation components, etc. The data types these components would use are the text structure, morphological information on words in the text, locale information, grammar parse trees and language transfer trees. 3 Goal To have more control over the semantic correctness of component compositions, we need to make the hidden dependencies between components explicit. These dependencies are mainly data-oriented: components use data computed by other components. The latter components will thus need to be executed before the first components. This introduces a partial order relation on the component execution. The order relation will allow us to check whether a list of selected components is in the correct order for execution. It will even allow us to create a possible but not necessarily unique or meaningful execution order out of a number of unordered components.

The data-oriented nature of the component dependencies is closely related to the component specific data types mentioned in section 2. Due to the wide range of applications possible in the domain of text processing applications, it is very likely that not all the data structures will perfectly align with the tree structure we use as a core data model. Component specific data types can thus cut across the core data structure. We want a mechanism to manage these cross cutting data types. 4 Approach 4.1 Cross cutting data types In a large application domain such as the text processing system domain, the core data structure can contain a large number of application specific data types. These component specific data types do not necessarily align well with a core data structure as the one described in Section 2. For example, component specific data structures used by a text categorization system may affect all structural nodes if the system wishes to use structural information. There exists no clean way to add all this structure related information to the central data structure without affecting the information and functionality in the classes that encapsulate text structure information. We defined a number of aspects ([4]) which each encapsulate a component specific data type. When a data type is needed, its corresponding aspect is woven through the core data structure. Since aspects define new data structures, they will consist mainly of introduction statements: new data members and data accessor methods (get/set) are introduced in the central data structure classes. How many classes of the central data structure are involved, and how many methods in these classes are affected depends on how much cross cutting there is with the core data structure. For example, in a translation system we will need a grammar tree, which is a grammatical representation of a sentence. The aspect implementing the grammar tree would cut across the classes that define the sentence data structure. A morphological information aspect would only affect the word related data structures. The data types we defined usually belong to a specific branch in the application domain. This results in aspects that affect only the central data structure, and have no interference relations with other aspects. In the translation system we developed, we did not encounter aspects that depend on other aspects. However, we do not want to conclude from this case study that dependencies between aspects will never occur. 4.2 Execution order Processing components have two different types of relations with the data types that exist in the data structure. Requires-dependencies denote the need for a specific type of data. Without that data, the processing component is not able to function. With a providesdependency, a component denotes that the results of its processing are of the given type, and that these results can be used by other components. These dependencies define a partial order relation between components. For example, if we would want to do a grammar check on a sentence in our data structure, we need morphological information on all the words in the text. This word lookup is implemented in a separate component due the potential complexity of the task: all possible declensions of every word need to be recognized, as well as numbers, roman numbers, dates, ISBN numbers, etc. For the grammar-checking component to work, the word lookup component needs to be executed first to retrieve morphological data and store this information in the central data structure. When given an ordered list of components, we can check the correctness of the order by checking data dependencies of all components in the list. The list would fail the correctness check when component with a requires-dependency for a given data type would occur, without being preceded by a component with a corresponding provides-dependency.

4.3 Aspects as composition tool To obtain a correct component execution order we check the availability of data types. Since aspects correspond to data types, we check for the availability of aspects. Consequently, to build an application out of an unordered list of components in practice, we check the aspect dependencies of the involved components to build or check the component execution order. We have made the component dependencies explicit by including an XML description for each component. Each XML description allows us to list a number of aspects for the two dependency types. When checking the order we need these XML descriptions of each component at composition time. 5 Discussion 5.1 The use of aspects We used aspects as a way to represent a number of data types that each address another concern within the same application domain. Obviously other technologies that takes separation of concerns to a higher level could be used. Data types could for instance be represented using a subject (Subject-oriented programming: [5]), or a hyperslice (Multi dimensional separation of concerns with the hyperspace approach: [6]). One could also decide to implement the component specific data structure handling on a meta level instead of using aspects. The meta level would offer us a generic execution model for the component specific data type handling. Meta level programming however can get very complex and are very difficult to reason about. It would have been possible to solve this problem using mixin inheritance. Every data structure would be implemented in a number of classes that would be unified into the classes of the central data structure when needed. However, for cross cutting component specific data types, this could result in a large number of classes and code duplication. We wanted to avoid this situation at all cost because of maintenance issues. With aspect-oriented programming approach, the component specific data structure is located in only one aspect. Using aspects as a technology to solve data representation issues is not conventional. One could claim that the design of the data structure is bad. Using aspects to add new data to a data structure may be considered as irresponsibly breaking of encapsulation. However when the data structures to be added do not align well with the core data structure, it is the most flexible and controlled way to add new data types. And this is exactly what a component builder wants in an application framework environment: flexibility in a controlled way. Aspects that address cross cutting behavior are conceptually not that much different from our data type aspects. They both address cross cutting issues. Without aspects or any similar technology, there is no way to add the cross cutting functionality or data without affecting the entire system. In the case of our data aspects this would mean changing or adding things to the core data structure, potentially breaking a large number of processing components. We also believe that the use of aspects will allow one to create components in a more isolated environment. Instead of having to cope with a large and monolithic data structure when creating a component, one will be able to create a component against the core data structure and the needed data types. If we would not use the aspect-oriented programming approach we would have to cope with a large monolithic data structure with a number of API s scattered all over the classes the data structure is made of. 5.2 Performance An additional benefit of using this aspect approach is the minimization of the memory usage of the central data structure. While checking or building a component execution order, we can deduce a list of the data types that will be needed in the central data structure from the component selection. These data types are the only data types next to the core data structure

that will be needed by the resulting application and are thus the only ones that will be woven through the core data structure. Next to space performance, the aspect approach also implies an execution performance improvement with respect to a meta programming solution. All component specific data type handling would imply code execution on the meta level. In a data processing application context where data may be consulted a lot, this can mean a serious performance penalty. 5.3 Composition restrictions Although the aspect dependencies help us in making correct compositions, it is still possible to make bad or meaningless compositions. For instance, suppose you have a component that implements an entire translation system (that would be a very bad software design, but it is justified for the sake of the example) and a component that prints out a number of variants of the text that reside in the data structure. If you want to make an application that translates a text from English to Dutch, and then prints out the translation of the text, the only data type that needs to be available for the printing component is text. Text is the core of the text-processing framework; that type of data will always be available. It will never be possible to calculate an unambiguous component execution order if they don t share any data type dependencies except for the default dependencies. In our example, we could calculate a correct execution order if we could make any suppositions on the contents of the data structure. However dependencies currently only allow us to make suppositions on the type of the data, not on the content of these data types; data types have no semantic meaning. There is no way to tell that the printing component can execute only when there is a Dutch variant of the text available. 6 References [1] F. Buschmann, R. Meunier, H. Rohnert, P. Sommerlad, M. Stal, A System of Patterns, John Wiley and Sons, ISBN 0471958697 [2] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns : Elements of reusable Object-Oriented Software. Addison-Wesley, Reading, Massachusetts, first edition, 1995 [3] Edward J. Posnak, R. Greg Lavender, and Harrick M. Vin. An adaptive framework for developing multimedia software components. Communications of the ACM, 40(10):43-47, October 1997. [4] Gregor Kiczales, John Lamping, Anurag Mendhekar, Chris Maeda, Cristina Videira Lopes, Jean- Marc Loingtier, John Irwin. "Aspect-Oriented Programming. In proceedings of the European Conference on Object-Oriented Programming (ECOOP), Finland. Springer-Verlag LNCS 1241. June 1997. [5] William Harrison and Harold Ossher. "Subject-oriented programming (a critique of pure objects)." In Proceedings of the Conference on Object-Oriented Programming: Systems, Languages, and Applications (OOPSLA), September 1993. [6] Harold Ossher and Peri Tarr. "Multi-Dimensional Separation of Concerns and the Hyperspace Approach." In Proceedings of the Symposium on Software Architectures and Component Technology: The State of the Art in Software Development. Kluwer, 2000. (To appear.)