Correctness Algorithms

Size: px

Start display at page:

Download "Correctness Algorithms"

Nathaniel Goodman
5 years ago
Views:

1 Ben-Gurion University of the Negev Faculty of Natural Sciences Department of Computer Science Metric-Driven Approach to Benchmarking Model Correctness Algorithms by Victor Makarenkov Supervised by: Prof. Mira Balaban THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR M.Sc DEGREE July 2011

2 Abstract This thesis presents a metric based automatic benchmark creation method. The thesis provides patterns of model based metrics, and an implemented method for translating these patterns to Alloy and automatically create benchmark models. This research was motivated by a study of nite satisability of class diagrams, extending the FiniteSat algorithm, to support the qualier constraint. This extension of FiniteSat algorithm, was also developed within this thesis. Further, during the research of practical occurrence and relevance of correctness problems within class diagrams, a problem of manual creation of class diagrams for experiments was met. Manual creating of benchmark models motivated the rest of the thesis. In order to evaluate algorithms operating on class diagrams correctness problems, a problem sample is needed. Creating such a sample for a benchmark need is not a simple task. Since every algorithm needs dierent models for its evaluation such as nding strengths and weaknesses, an automatic only way is needed for creating the models. The major contribution of this thesis is to the following topics: Analytical evaluation of model metrics. Classication of metrics into patterns. Development of language patterns for description of every metric pattern. Showing an algorithm for translation of each metric pattern into Alloy. II

3 Acknowledgments I am deeply grateful to Professor Mira Balaban for guiding me through the research topic. Professor Balaban taught me a lot about engineering processes in the eld of software engineering, provided good portions of motivation and advise on how a research in computer science must be done. Professor Balaban was absolutely patient to me, at the end giving me an opportunity to learn not only research topics in software engineering, but also far beyond strictly scientic subjects. I would like to thank a very good friend of mine - Azzam Maraee. During both the most exciting and most dicult periods in the last years, he was always supporting, giving me endless professional and personal help. He contributed enormous amount of ideas and insights in this thesis. Lastly, and most importantly, I wish to thank my parents Andrey and Yelena, my wife Nataly, for their love, very constant support and believing in my success. III

4 Table of Contents 1 Introduction 1 I Finite Satisability of Class Diagrams 6 2 Background Class Diagrams: Syntax and Semantics Correctness of Class Diagrams Inconsistency and lack of nite satisability Detection and Identication of Finite Satisability FiniteSat Algorithm FiniteSat Algorithm : Extension to Qualier Qualier Explained Discussion on Qualier Semantics FiniteSat Extension for Qualier Constraint Correctness and Complexity of FiniteSat Practical Occurrence of Finite Satisability in Class Diagrams Class Diagram's Metrics Experiment Experiment Experiment II Metrics 55 5 Metrics Weyuker's Characterization of Metrics Properties IV

5 Table of Contents 5.2 Object Oriented Metrics Bunge's denition of object complexity Chidamber and Kemerer Metrics and Evaluation Metrics for Models Background Weyuker's Properties Adaptation for Models Metrics Metrics Evaluation Related Work on Model Metrics Benchmarking Metric-Driven Benchmark Creation Metrics as means for algorithm evaluation Brute-Force Benchmark Creation Without Abstraction Benchmark Creation via Model Checking Introduction to Alloy Generating Models from Meta-Model Metrics with Alloy Automation: A Language For Metrics Values Denition Metrics classication Conclusions and Future Work 98 A Reasoning Infrastructure Implementation 100 A.1 Implementation of FiniteSat Algorithm A.2 Implementation in Detail A.3 Structural Architecture and Conclusions V

6 List of Figures 1.1 A Class Diagram with a Finite Satisability Problem UML Class Diagram Binary multiplicity constraint Binary Association Unsatisability due to multiplicity constraint conict The digraph representation of a binary association Unconstrained Hierarchy Structures Visual Qualier Notation A binary association example Modeling a Unix le system Modeling an array with qualier A general qualier constraint A general qualier constraint Example of multiplicities bounds under dierent semantic interpretations. (a) Shows unsuitable situation for universal interpretation. (b) Shows the unconstrained model with zero lower bound A TV-Network with broadcast schedule example A reduced qualier constraint A class diagram reduction The Γ mapping of a CD instance to a CD instance Γ Mapping of instances The Γ Mapping Γ Mapping on dierent A objects Γ Mapping on the same A object Γ Mapping on the same A object and same B object VI

7 List of Figures 4.1 Appearing FiniteSat on 50 Classes Diagram Appearing FiniteSat on 100 Classes Diagram A class diagram Class Diagram: CD Class Diagram: CD Class Diagram: CD CD CD Class Diagram: CD Class Diagram: CD Class Diagram: CD Class Diagram: CD Merging of Class Diagram CD 1 and CD An object oriented metamodel Extension to the UML 2.0 metamodel. This UML package diagram shows the denition of the CK metrics as a separate package, with a dependency on classes from the UML metamodel NOC Metric Denition. This OCL code denes the NOC metrics from the CK metrics suite, and is part of a larger denition of the whole CK metric suite which we have implemented using dmml Class Diagram Meta-Model A partial meta-model of UML in Alloy Analyzer Instance nding Instance of the specied metamodel Tree view of Instance of the specied metamodel Example of UML meta-model with two elements : X and Y Example of Alloy-written meta-model with two elements : X and Y A Generated instance of the meta-model Example of UML meta-model with two elements : X and Y Example of meta-model with two elements : X, Y with 1:2 ratio between them Example of partial UML meta-model in Alloy A Generated model where each class has exactly one sub class Example of meta-model with three elements : X, Y and Z Example of meta-model with three elements : X, Y and Z A Generated instance of the meta-model VII

8 List of Figures 6.16 A Generated instance of the meta-model A.1 Reasoning Tool Internal Structure A.2 The structural architecture of reasoning tool A.3 Class diagram of our tool's static structure VIII

9 List of Tables 2.1 UML Diagrams The Scope of The FiniteSat Algorithm Existence Experiment Results Scalability Experiment Results IX

10 Chapter 1 Introduction The Unied Modeling Language (UML) is nowadays the industry standard modeling framework, including multiple visual modeling languages, referred to as UML models. Traditionally, UML models are used for analysis and design of complex systems and now are starting being interleaved with most serious Integrated Development Environments (IDEs) such as open source Eclipse [34] IDE and most common IDEs that come from industry proprietary vendors. Their relevance has increased with the advent of the Model-Driven Development (MDD) approach, in which analysis and design models play an essential role in the process of software development. Recently, with the emergence of web-enabled agent technology, UML models are used also for ontology representation, and construction and extraction of ontologies [40, 8, 28]. In view of their wide popularity, it is highly important that UML models provide reliable support for the designed systems, and be subject to stringent quality assurance and quality control criteria [88]. Indeed, an extensive amount of research eorts is devoted to formalization of UML models, specication of their semantics, and development of reasoning and correctness checking methods [15, 74]. Moreover, with the prevalence of the Model Driven Engineering approach, it is expected that all information in a design model will be eective in its successive models. 1

11 Chapter 1: Introduction Modeling problems usually arise when models are scaled to model large, distributed applications. A model may originate from dierent sources and a large number of designers can be involved in the modeling process. Designers are highly prone to making mistakes, and combining information from dierent sources gives rise to potential conicts [17, 44, 23]. [53] shows that defects often remain undetected, even if the model is read attentively by practitioners. It is highly important that models are tested for correctness, and that problems are detected as early as possible in the software design process. Nevertheless, current case tools do not support reasoning about UML models, and enable the construction of erroneous ones. Furthermore, implementation languages still do not enforce design level constraints. Hence, there is an urgent need for reasoning methods for detecting analysis and design problems. The number of algorithms dealing with models is constantly growing: General errors recognition. Detection of the reason for an error. Transformation and improvements of models. Classifying models into patterns, see [12]. Developing special data structures for models and support of model querying. Due to increasing use and importance ascribed to models, it is crucial to develop means for examining models complexity from dierent points of view, and comparing models. Similarly to customary software evaluating and comparing methods, where complexity, implementation size and probability for bugs are measured - metrics are being developed along to techniques for evaluating metrics suitability. Unfortunately, there is no mechanism for evaluating a metrics suite for models. This thesis contributes to the development of methods for evaluating metrics for models, similarly to what Weyuker [89] proposed for sequential programs. 2

12 Chapter 1: Introduction In order to examine and compare algorithms, implementations, scalability and problems that algorithms come to solve - benchmarks needed. For databases and software there are real benchmarks: large software systems or synthetic creation that serve as agreed benchmarking problems. However, for large models in general, and especially those which are using extensively in modeling constraints there are no large applications. The main question addressed in this work is how to create benchmarks properly. The approach presented uses metrics for examining models complexity and algorithms for benchmark creation. For example: Creating class diagrams with complex class hierarchy structures, such as deep inheritance trees. Creating class diagrams with dierent ratio value between dierent constraints number imposed on class diagram. Using metrics for creating benchmarks is not straightforward. The key for independent and fast creation, not using any special application, is an abstraction. There is a need to use meta-model for dening the metrics. Then a model checker is used for creating instances of a meta-model: Input: specication of correctness conditions and the model. If the conditions do not hold - a counter example is generated. This method is used in this thesis for benchmarks generation, specically with Alloy model checker. The problem of nite satisability has been addressed and studied in the context of various kinds of conceptual schemata [24, 42, 45, 56, 86] and in the context of description logics [22]. 3

13 Chapter 1: Introduction The problem was studied in context of class diagrams recently [61, 9, 64] showing an application of nite satisability recognition and detection when UML class diagram involves the combination of cardinality constraints, class hierarchy constraints, and generalization sets constraints. Class diagram is nitely satisable 1 if it has a nite and non-empty instance. The example below shows a nite satisability problem. Figure 1.1: A Class Diagram with a Finite Satisability Problem Figure 1.1, presents a multiplicity constraint cycle that involves a compound class, Graduate, whose instances must be related to Academic instances. Therefore, the number of student-advisor links in every diagram instance must be both, G 1 and A 2, assuming that G and A are the number of graduates and academics, respectively. Therefore, the extensions of Graduate and Academic must satisfy G = A 2, while the Graduate extension is a subset of the Academic extension, and therefore G A. This constraint can be satised only by empty or innite extensions. Such problems are termed nite satisability problems. In order to check the relevance of the problem, the need for problem benchmarks had arisen. To check this method, we start with application of metric driven benchmark creation for the nite satisability of class diagram problem: A metric suite is proposed, and its correspondence to problems evaluation is examined: relevance of nite satisability problem and the scalability of the FiniteSat algorithm. This thesis started with an attempt to extend the scope of FiniteSat algorithm introduced by Balaban and Maraee [11] to handle Qualier constraint [65], and to implement and examine the existing algorithm for scalability, and the problem for relevance [59]. During 1 and thus correct 4

14 Chapter 1: Introduction the implementation it appeared that the direct approach to the problem, without any abstraction, forces a change of the implementation after any change in the metrics suite. After looking for a way for a useful abstraction for a metric suite, in order to express metrics of a model, this process led to using a model checker. The major contribution of this thesis is to the following topics: Analytical evaluation of model metrics. Classication of metrics into patterns. Development of language patterns for description of every metric pattern. Showing an algorithm for translation of each metric pattern into Alloy. The thesis is organized as follows: In the rst part, chapter 2 presents the nite satisability notion, summarizes relevant methods for detection and identication of nite satisability problems in class diagrams. Finally, the chapter presents the FiniteSat algorithm, introduced by Balaban and Maraee [11] which plays central role in this work. In Chapter 3 we present polynomial time algorithm for extending the FiniteSat algorithm to handle qualier constraint. Chapter 4 presents the initial exploration of nite satisability practical occurrence within class diagrams. In the second part: chapter 5 presents the relevant background to model metrics, while Chapter 6 deals with a denition of a metric language that can be also used for automatic metric driven model specication and generation of a benchmark problem sets. We demonstrate this automatic generation with Alloy [46] implementation. Finally Chapter 7 concludes this work, and draws the line for future research. See Appendix A for details of implementation of algorithms described in this work. In particular the implementation used for experiments reported in Chapter 4. 5

15 Part I Finite Satisability of Class Diagrams 6

16 Chapter 2 Background The Unied Modeling Language (UML) is now the standard graphical modeling language developed and adopted by the Object Management Group for specifying, visualizing, constructing, and documenting the artifacts of software systems, as well as for business modeling and other non-software systems [72]. UML simplies the complex process of software design by raising the level of abstraction throughout the analysis and design process. Their relevance has increased with the advent of the Model Driven Development (MDD) approach, in which analysis and model design play an essential role in the process of software development. Recently, with the emergence of web-enabled agent technology, UML models are used also for ontology representation, construction and extraction. A central assumption that underlined the development of UML was the idea that it is not possible to describe a complex system with a single model only. A "rich" description of a system must include a number of highly detailed models. UML consists of twelve diagrams referred to as UML models. Table 2.1 summarizes the UML diagrams and the modeling view of software solutions represented by them (extracted from [88]). 7

17 Chapter 2: Background UML Diagrams Use case Activity Class Interaction overview Interaction overview Communication Object State machine Composite structure Component Deployment Package Timing Represent functionality from the user's viewpoint the ow within a Use case or the system classes, entities, business domain, database interactions at a general high level interactions between objects objects and their links the run-time life cycle of an object component or object behavior at run-time executables, linkable libraries, etc. hardware nodes and processors subsystems, organizational units time concept during object interactions Table 2.1: UML Diagrams 2.1 Class Diagrams: Syntax and Semantics Among the twelve visual UML models, class diagrams are probably the most important and best understood among all UML models. UML class diagrams are used to specify, visualize, and document the system static view. They also serve as a basis for generating implementation artifacts such as code skeleton and database schemata, as a means for knowledge representation such as specifying ontologies, and for dening meta-models of other programming, modeling, and specication languages. The origin of the class diagram model is the conceptual models of the 80's, like Entity-relationship (ER) diagrams [25], their Enhanced versions (EER), Object-Role Modeling (ORM) diagrams [41], and Frames structured modeling in articial intelligence [70]. The UML class diagram model includes elements from all these models. A class diagram is a structural abstraction of a real world phenomenon. The model consists of basic elements, descriptors and constraints. The basic elements are classes and associations, the descriptors are class and association attributes, and the constraints are restrictions imposed on these elements. The constraints are (1) multiplicity constraints on associations (also termed cardinality constraint), with or without qualiers; (2) association 8

18 Chapter 2: Background class constraint; (3) class and associations hierarchy constraints; (4) generalization set constraints; (5) association constraints; (6) aggregation constraints; (7) multiplicity constraints on attributes. The syntax and informal semantics are described in Rumbaugh et al [77] and in OMG-UML [72]. As opposed to computer programs, the class diagrams of a system are partial. That is, if an attribute is absent on a certain class within class diagram, does not mean that it does not exist in this class. Figure 2.1 is an example of a class diagram, which partially species a university system. It captures the people hierarchy within the university and their relationship to the university courses. Classes are represented by rectangles; associations are represented by lines between the rectangles; the qualiers are presented by small rectangles attached to the end of an association ends; n-ary association is an association among three or more classes, it is shown as a large diamond, with a line from the diamond to each participant class; multiplicity constraints are marked on the association's line ends; association classes are marked by a dashed line connecting a class rectangle with an association line; class hierarchy constraints are marked by empty arrow heads; association hierarchies are marked by a dashed arrow labeled "subset" between association lines; aggregations is a special form of binary association. It is presented by a hollow diamond adornment on the end of an association line at which it connects to the aggregate class. If the aggregation is a composition, then the diamond is lled. The standard set theoretic semantics of class diagrams associates a class diagram with class diagram instances in which classes have extensions that are sets of objects that share structure and operations, and associations have extensions that are relationships among their end class extensions. We denote class diagrams as CD, class symbols as C, association symbols as R, role symbols as rn and instance symbols as I. The extension I of symbol T of CD is denoted T I. Henceforth, we shorten expressions like "instance of an extension of C" by "instance of C" and "instance of an extension of R" by "instance of R". For example, in Figure 2.1, the Academic class represents a set of academic people in a university, the binary 9

19 Chapter 2: Background Figure 2.1: UML Class Diagram association between FacultyMember and Course denotes a set of pairs of FacultyMember s and Courses in which the FacultyMember plays the role of a teacher. The ternary association between FacultyMember, Course and Student denotes a 3-tuple of values, one from each of the respective classes. Constraints are used to restrict the otherwise unrestricted extensions of the class diagram elements. Constraints provide an essential means of knowledge engineering, since they extend the expressiveness of diagrams. That is: Class and association constraints: restrict the set and relationship extensions of classes and associations, respectively. Attribute constraints restrict attribute values in terms of types and multiplicity. A legal instance of a class diagram is an instance that satises all constraints. 10

20 Chapter 2: Background The semantics of class diagram constraints: 1. Binary cardinality constraints on binary associations: A binary cardinality constraint (also termed "multiplicity constraint") is symbolically denoted: R(rn 1 : C 1 [ min 1, max 1 ], rn 2 : C 2 [ min 2, max 2 ]) (2.1) The multiplicity constraint [min 1, max 1 ] that is visually written on the rn 1 end of the association line is actually a participation constraint on instances of C 2. It states that an instance of C 2 can be related via R to n instances of C 1, where n lies in the interval [min 1, max 1 ]. For example, according to Figure 2.1, an Academic must advise at least two Graduates (as indicated by the 2..* multiplicity constraint). Formally: In every instance I : for every e 1 C I 1, min 2 {e 2 (e 1, e 2 ) R I } max 2. Figure 2.2: Binary multiplicity constraint 2. N-ary association multiplicity constraints: Multiplicity constraints are set in n- ary association R between the classes C 1,..., C n and the roles rn 1,...rn n respectively is symbolically denoted by the following relationship construct: R(rn 1 : C 1 [m, n 1 ],..., rn n : C n [m n, n n ]) (2.2) Multiplicity constraint on n-ary association end (role) [min i, max i ] represents the possible number of values (objects) of C i, when the values at the other n-1 ends are xed. Consider the ternary association S.F.C in Figure 2.1, A Student will not take the same Course from more than one FacultyMember, but a Student may take more than one Course from a single FacultyMember, and a FacultyMember may teach more than one 11

21 Chapter 2: Background Course. The cardinality constraint dened in the binary association is clear and is set on all class instances. 3. Qualier attribute constraints: are optional for a binary association ends roles. A qualier constraint distinguishes the set of objects at the far end of the association based on the qualier value, and symbolically denoted: R(rn 1 : SourceClass{(q 1, T 1 )...(q n, T n )}[min 1, max 1 ], rn 2 : T argetclass[min 2, max 2 ]) A qualier is used within a qualied association to relate a qualied object to a target object using a qualier value that is taken from qualier domain. The multiplicity on the target side restricts the number of target objects that can be related to a qualied object. In Figure 2.1, the binary association between FacultyMember and Course is qualied by the qualier Semester, whose domain is Semesters enumeration. This says, that qualied FacultyMember (a FacultyMemebr-Semester pair) can teach at most one Course (target class) in the specied Semester (qualier). In chapter 3 we discuss the two possible formal interpretations of the qualier constraint, and the one that seems to be the preferred UML interpretation (UML semantics is only verbally specied). 4. Association classes restrict their objects to be uniquely identied by pairs of the connected association. In Figure 1, every Enrollment object is identied by a unique course-student pair in 1:1 correspondence (no two enrollments are identied by the same pair). 5. Aggregation constraint: reects whole-part relationships between a class- the assembly to its parts - the components classes. For example, the class University in Figure 2 is the assembly where the classes MathFaculty and CompFaculty are the component classes. The aggregation relationship is transitive and asymmetric across all aggregation links. The asymmetric property of aggregation requires that a part of an assembly cannot aggregate one of its aggregators (the aggregation relation is 12

22 Chapter 2: Background acyclic). Composition is a restricted form of aggregation that describes physical containment and various notions of ownership. 6. Association constraints: It is also possible to dene explicit constraints between associations: a {xor} constraint: is imposed on two or more associations that have a common end class (base class). An instance of the base class may participate in at most one association in the constraint. A multiplicity constraint on a xored association a applies only if the base class participates in a [10]. Association hierarchy constraints: means inclusion of the association classes. Whether it means also inclusion of the associations of the association classes is not claried in the UML2 specication [77]. But, in any case the multiplicity constraints on the associations are inherited. 7. Class hierarchy constraints: specify subset relations between classes. In Figure 2.1, in every instance the extensions of the FacultyMember and Graduate classes are subsets of the Academic extension. 8. Generalization set constraints: Class hierarchy constraints can be grouped into a Generalization Set (shortly GS), as shown in Figure 2.1. For example, Graduate- Course, UnderGraduateCourse and Course form a Generalization Set. In that case, more constraints can be dened on the group. There are two orthogonal planes for dening such constraints: (1) disjointness and (2) completeness. Below are the four constraints that can be labeled the generalization set: (a) complete - An instance of the superclass is an instance at least one subclass. (b) incomplete- There might be instances of the superclass that are not instances of any subclass. (c) disjoint- Subclasses extensions are mutually exclusive. 13

23 Chapter 2: Background (d) overlapping - Subclasses extensions may overlap. The GS constraints can be combined to form one of the following valid combinations: {complete, disjoint}, {incomplete, disjoint}, {complete, overlapping}, {incomplete, overlapping}. For example, The constraint { overlapping, complete } on the generalization set Course indicates that a course may be both a graduate and an undergraduate course (citation overlapping), and every course is either graduate or undergraduate (complete). Def 2.1. An instance I of a class diagram CD, consists of a domain D and an extension function I that assigns extensions to symbols. For a class symbol C, C I (a shorthand for I(C)) is a subset of D, and for an association symbol a, a I is a subset of D D. Def 2.2. A legal instance of a class diagram is a nite instance where the class and association extensions satisfy all constraints in the diagram. Correctness of a class diagram involves consistency and satisability notions, that are discussed in [15, 24, 56, 86]. We further elaborate this terminology, and suggest additional notions, in order to facilitate a more accurate denition of correctness. 2.2 Correctness of Class Diagrams Class diagrams are models written by people, and therefore, usually suer from modeling problems like inconsistency, redundancy and abstraction errors. Inexperienced designers tend to create erroneous models, but even experienced ones cannot anticipate the implication of a change on an overall model. Indeed, Lange et al showed in [53] that model defects often remain undetected, even if experienced practitioners check the model attentively. These problems are aggravated when a model originates from dierent resources, as frequently happens when web services are integrated. Combined sources might overlap, and the integration might yield redundant inconsistent models [17, 43, 23]. It is a clear that such problems can best be solved at the level of models rather than during the implementation. 14

24 Chapter 2: Background Thus, the need to provide coherent models is appealing. In particular,it is essential to have tools that can validate quality and correctness of models. Furthermore, models can be improved, based on given design criteria. The same holds for meta models as they underlay the modeling of concrete systems. In order to achieve the goal of improving a model quality, a diversity of reasoning capabilities is required Inconsistency and lack of nite satisability Design quality refers to (1) erroneous models that impose cannot be populated in an acceptable way, (2) low quality that can be improved according to some design criteria. Reasoning helps in detecting erroneous models, nding the source of errors and possibly suggesting repairs. It is used for revealing redundant situations, and for testing whether design criteria are met. Correctness of class diagrams involves two problems: inconsistency and nite satisability. Quality involves : redundancy, design improvement and possibly other problems. Inconsistency arises when the constraints imposed on a class diagram are contradictory, meaning there is no legal instance which is not empty. Finite satisability is caused by multiplicity constraints that can be satised by either empty or innite class extensions (i.e., instantiations). Redundancy appears when constraints seem to allow values or links that cannot be realized (are inconsistent). Quality improvement deals with changing the models following various criteria such as design patterns or reuse enhancements Detection and Identication of Finite Satisability Class diagram reasoning methods can be classied into concrete reasoning methods that directly solve specic problems [56, 42] and translation based methods that provide reasoning by mapping UML models into a formal reasoning framework [15, 6, 54] 1. Concrete methods 1 A UML class diagram is translated into a formula or expression in some other language, and the translation is proved to be correct. The notion of correctness varies between studies. The formal notion requires a proof of equivalence, i.e., a proof that the translation preserves all and only the implications of the original 15

25 Chapter 2: Background tend to apply to error detection and revealing redundancy, while translation based methods deal with general query answering a variety of modeling needs. Concrete Methods for Reasoning about Emptiness of Class Diagrams Kaneiwa and Satoh [49] study the problem of full consistency in a subset of UML class diagrams that include classes with typed attributes and multiplicity constraints on the attributes, unconstrained associations and constrained generalization sets. They identify three factors for inconsistency in such diagrams: (1) combination of generalization with disjointness; (2) attribute overwriting in multiple hierarchies; and (3) combination of completeness and disjointness constraints in generalization sets. Based on these factors, they provide tractable algorithms for deciding full consistency in the restricted class diagram model. Concrete Methods for Reasoning about Finiteness of Class Diagrams Reasoning on niteness of entity relationship and class diagrams has attracted much attention. The problem was independently identied in (Lenzerini and Nobili: [56]) and in (Thalheim: [86]), and referred to entity relationship diagrams. Later the methods were extended to various fragments of UML class diagrams. The problem is to detect, identify cause and suggest repair, to diagrams that are not strongly satisable. There are two main approaches: (1) The linear programming approach, and the (2) graph based approach. The rst approach reduces the all class niteness problem to the problem of nding a solution to a system of linear inequalities. The second approach detects innity causing cycles in the diagram, and possibly suggests repair transformations. All methods apply only to fragments of UML class diagrams. Detection of innity in unrestricted UML class diagrams is still an open issue. class diagram. 16

26 Chapter 2: Background The Linear Programming Approach The fundamental method of Lenzerini and Nobily [56] is dened for an entity relationship diagram that includes Entity types (Classes), n-ary Relationship types (Associations), and Cardinality Constraints 2. The method consists of a transformation of the cardinality constraints into a into a set of linear inequalities whose variables stand for the sizes (cardinalities) of the entity and relationship types in a possible instance. A relationship R(rn 1 : C 1 [min 2, max 2 ], rn 2 : C 2 [min 1, max 1 ] (Figure 2.3) yields the following inequalities: For min 2 0: r min 2 c 1. For max 2 : r max 2 c 1. For min 1 0: r min 2 c 1. For max 1 : r max 2 c 1. where r, c 1, c 2, are variables that stand for the sizes of the respective entity or relationship types. In addition, For every entity or association symbol T, insert the inequality: T > 0. Figure 2.3: Binary Association The size of the inequality system is polynomial in the size of the diagram. The main result is that the entity relationship diagram is fully nitely satisable if and only if the inequalities system has a solution. Since linear programming is solvable in polynomial time in the size of the problem encoding, full nite satisability for this fragment of class diagrams can be decided in polynomial time. 2 Lenzerini and Nobili (1990) use the membership semantics for cardinality constraints (consult Balaban and Shoval in[?? ] for semantics of cardinality constraints) for semantics of cardinality constraints). For non-binary relationships, this is not the standard semantics of cardinality constraints, neither in the entity relation model nor in the class diagram model. 17

27 Chapter 2: Background Example 2.1. Consider Figure 2.4, each course should have a single successor and at least two predecessors. The applying of Lenzerini and Nobily method in this example yields the insolvable inequalities system below: 1. The Variables: c for Course and d for Dependency 2. The System Inequalities: (a) The Dependency Association Inequalities: 1. d c 2. 2,3. d = c, (d c and d c). (b) 4,5. d, c > 0 Figure 2.4: Unsatisability due to multiplicity constraint conict Calvanese and Lenzerini, in [24], extend the inequalities based method of [56] to apply to schemata with class hierarchy constraints. The expansion is based on the assumption that class extensions may overlap. They provide a two stage algorithm in which the nite satisability problem of a class diagram with ISA constraints is reduced into the nite satisability problem of a class diagram without ISA constraints. Then, similarly to [56], they check satisability of the new class diagram by deriving a special system of linear inequalities (dierent from that of [56]). The class diagram transformation process of [24] is fairly complex, and might introduce, in the worst case, an exponential number, in terms of the input diagram size, of new classes and associations. The method was further simplied in [20], were class overlapping is re- 18

28 Chapter 2: Background stricted to class hierarchy alone. The simplication of [20] reduces the overall number of new classes and associations, but the worst case is still exponential. Lenzerini and Nobili [56] were the rst to suggest a method for cause identication of strong satisability in restricted entity relationship diagrams. Their solution is not constructive, as they do not provide a method for computing critical cycles. A rst step towards nding critical cycles appears in [84]. Dullea and Song [31] and Dullea et al.[32] characterize innity causing structures (termed structural invalidity) of recursive binary and ternary relationship types in entity relationship diagrams. The analysis suggests a set of structure based decision rules for identifying structural invalidity in entity relationship diagrams. 2.3 FiniteSat Algorithm Correctness of a class diagram involves consistency and nite-satisability [24, 56, 86, 15, 62]. A class is consistent if it has a non-empty extension in some legal instance; a class diagram is consistent if all of its classes are consistent; a class is nitely satisable if it has a nonempty extension in some legal nite instance; a class diagram is nitely satisable if all of its classes are nitely satisable 3. It can be shown that a consistent class diagram has a legal instance in which all class extensions are non-empty, and a nitely satisable diagram has a legal instance in which all class extensions are non-empty and nite [61]. Class diagrams CD, CD are equivalent, denoted CD CD, if they have the same legal instances. Complexity: Berardi et al., in [15], showed that deciding consistency of UML class diagrams is EXPTIME-complete. Artale et al. [7] rene these results, by considering fragments of class diagrams. They show that for ER diagrams that include, besides cardinality, class hierarchy and disjoint constraints, deciding consistency is in NLogSpace. Addition of complete constraints raises the complexity to NP, and addition of association hierarchy has already the EXPTIME-complete complexity. 3 Lenzerini and Nobili [56] used the notion of strong satisability for this term. 19

29 Chapter 2: Background Recently, it was shown [58, 78] that nite satisability of the description logic ALCQI is EXPTIME-complete, which implies that nite satisability of class diagrams (under some minor restrictions) is also EXPTIME-complete. There are two main approaches for reasoning about nite satisability of class diagrams: The linear inequalities approach and the graph based approach. The rst approach reduces the nite satisability problem to the problem of nding a solution to a system of linear inequalities. The second approach detects innity causing cycles in the diagram, and possibly suggests repair transformations. All methods apply only to fragments of UML class diagrams. Deciding nite satisability in unrestricted class diagrams is still an open issue. Below, we shortly summarize results in both approaches, on which our research is based. The fundamental work in the linear inequalities approach is that of [56, 85]. It applies to Entity-Relationship (ER) diagrams with Entity Types (Classes), Binary Relationships 4 (Associations), and multiplicity Constraints. Calvanese and Lenzerini, in [24], extend the inequalities based method of [56] to apply to diagrams with class hierarchy constraints, but size of the resulting system of inequalities is exponential in the size of the class diagram. The simplication of [20] reduces the overall number of new class and association variables, but the worst case is still exponential. A method for identication of the cause for non nite satisability was rst suggested in [56]. The method is based on construction of a directed graph (digraph) whose nodes stand for classes and associations, and its edges connect association nodes with their end class nodes. The edges are weighted by the multiplicity constraints, as shown in Figure 2.5. The weight of a path is the product of the weights of its edges. The directed graph is the means for detecting the causes for non nite satisability of a class diagram. Cycles whose weight is less than 1 are termed critical cycles. They point on non nite satisability. Moreover, a critical cycle singles out a non-nitely satisable set of multiplicity constraints. Similar approaches are introduced in [86, 42, 44]. 4 They allow also n-ary relationships, but with non-standard (membership) semantics for cardinality constraints. 20

30 Chapter 2: Background Figure 2.5: The digraph representation of a binary association The FiniteSat algorithm presented by Maraee and Balaban [11]: Algorithm 2.1. The FiniteSat Algorithm Input: A class diagram CD with binary multiplicity constraints, class hierarchy constraints, GS constraints. Output: A linear inequality system Ψ CD Method: Insert a variable for every class and association in CD. 1. For every multiplicity constraint, insert inequalities according to the Lenzerini and Nobili method (see chapter 2). 2. For every class hierarchy B A constraint, B being the sub-class with variable b, and A being the super-class with variable a, add the inequality a b. 3. For every GS constraint GS(C, C 1,...C n ; Const), C being the super-class with variable c, C i s being the subclasses with variables c i, and Const being the GS constraint, add n class hierarchy inequalities c c i, i = 1, n, and the following inequalities: c j j=1 Const = disjoint: c n Const = complete: c n c j j=1 Const = incomplete: j [1, n].c > c j Const = overlapping: Without inequality Const = disjoint, incomplete: c > n c j. j=1 Const = disjoint, complete: c = n c j. j=1 21

31 Chapter 2: Background Const = overlapping, complete: c < n c j. Const = overlapping, incomplete: j [1, n].c > c j. Proving the correctness of the FiniteSat algorithm requires analysis of the structure of class hierarchies. For that purpose, we consider the graph of class hierarchy constraints alone, in which nodes represent classes and directed edges represent ISA constraints, directed from super-classes to their subclasses (association lines are removed). We consider two versions of such graphs: Directed and undirected. Three class hierarchy structures are analyzed: j=1 1. Tree class hierarchy: The directed graph of the class hierarchy forms a tree, as in Figure Acyclic class hierarchies: The undirected graph of the class hierarchy is acyclic. In Figure 2.6-a, the directed class hierarchy is not a tree, as F is a sub class of both C and D, but the undirected class hierarchy graph is acyclic (a tree). 3. Cyclic class hierarchies: The undirected graph of the class hierarchy is cyclic. Multiple inheritance is unrestricted, as the undirected induced graph can be cyclic. In Figure 2.6-b, class F has two ISA paths to its super-class A. The ISA path A, B, F, C, A forms an undirected ISA cycle. Figure 2.6: Unconstrained Hierarchy Structures The correctness of Algorithm FiniteSat is proved via a reduction of the nite satisability of a class diagram CD to the nite satisability of a class diagram CD, that does not 22

32 Chapter 2: Background include class hierarchies, and therefore, the [56] method applies to it. CD is created as follows: Initialize CD by CD. Replace all class hierarchy constraints with new regular binary associations (termed henceforth ISA associations) between the super-class to the subclasses. The multiplicity constraints on these associations are 1..1 participation constraint for the subclass (written on the super class end in the diagram) and 0..1 participation constraint for the super class. Figure 1.1-b shows the reduced class diagram of Figure 1.1-a. Lemma 2.1. Finite satisability of CD is reducible into the nite satisability of CD. Proof. (Sketched) The reduction is dened by bi-directional translations between non-empty nite legal instances I and I of CD and CD, respectively. The translations rely on a mapping T (and its inverse T 1 ) from I to I, which collapses a structure of ISA-linked objects in I into a single object in I. The intuition is that CD splits a single instance object of CD into its components in its ancestor classes. A crucial property of the T translation is that ISA-linked objects in I should not include two objects from the same class. This property, termed the Single Class property, ensures that the T mapping maps an instance I of CD to an instance I of CD. The main problem is showing that the mapping preserves multiplicity constraints (otherwise, while collapsed into a single object in I, the links of two objects are combined into links of a single one). Full proof in [61]. The reduction is proved by considering the three forms of class hierarchy graphs. For trees and for acyclic hierarchies, the single class property holds for every instance. For cyclic class hierarchies, it is shown that if a diagram is nitely satisable, then it has an instance that satises the single class property. Claim 2.2 (FiniteSat correctness without GS constraints). A class diagram with binary multiplicity constraints and class hierarchy constraints is nitely satisable if and only if the inequality system constructed by Algorithm FiniteSat is solvable. 23

33 Chapter 2: Background Proof. (Sketched) Given a class diagram CD, construct a class diagram CD as above, to which the inequalities method of [56] is applied. Based on Lemma 2.3, CD is nitely satisable if and only if the inequality system of [56] for CD is solvable. It is not hard to show that this inequality system is equivalent to the inequality system constructed by FiniteSat. The results of this claim can be extended for class diagrams with GS constraints and acyclic class hierarchy structure, or cyclic structure in which class hierarchy cycles do not include disjoint or complete constraints. The scope of the FiniteSat algorithm is dened in the following claim: Claim 2.3 (Partial correctness GS constraints, cyclic hierarchy). A class diagram with binary multiplicity constraints, class hierarchy constraints, and GS constraints, in which class hierarchy cycles include disjoint or complete constraints, is not- nitely satisable if the inequality system constructed by Algorithm FiniteSat is not solvable. Proof. In cyclic class hierarchies, the disjoint or complete GS-constraints might have an implicit global eect on other generalization sets in a cycle. Therefore, if the inequality system does not have a solution, the corresponding diagram does not have a legal nite nonempty instance, but a solution for the inequalities might miss the implicit constraints. Claim 2.4 (Complexity of the FiniteSat algorithm). The construction of the inequalities by FiniteSat, and their number is O(n), where n is the number of constraints in the class diagram. Proof. Every constraint contributes a constant number of inequalities. Table 2.2 summarizes the results of the above claims. 24

34 Chapter 2: Background Graph Structure With/ Without GS constraints FiniteSat correctness Acyclic Without correct with correct Cyclic Without correct No disjoint or complete in cycles correct disjoint or complete in cycles sound for unsatisability Table 2.2: The Scope of The FiniteSat Algorithm 25

35 Chapter 3 FiniteSat Algorithm : Extension to Qualier Qualier constraint is not a syntactic sugar. It rather signicantly enrich the modeling capabilities by strengthening multiplicity constraints, providing elegant renements on associations and designing lookup structures in software. Qualier constraint stands with multiplicity constraint, thus, it can play central role in causing nite satisability problems in class diagrams. This chapter deals with nite satisability problem when qualier constraint is imposed on an association. Qualier constraint on binary association is, generally speaking, a slot for an attribute or list of attributes, in which the values of the attributes select a unique related object or a set of related objects from the entire set of objects related to an object by the association [77, 72, 55]. The qualier rectangle is part of the association line, not part of the class. The qualier Figure 3.1: Visual Qualier Notation attached to the class that it qualies - that is, an object of the qualied class, together with 26

36 Chapter 3: FiniteSat Algorithm : Extension to Qualifier a value of the qualier, select a set of target class objects on the other end of the association. As said before, there may be one or more such attributes. It provides essential detail, the omission of which would modify the inherent character of the relationship. It is possible for both ends of a binary association to have qualiers, but it is rare and as far as is seen in literature, not practically used. 3.1 Qualier Explained A binary association maps objects between classes. Sometimes it is desirable to partition the objects of an end class, and rene the multiplicity constraints. We will now demonstrate this with a series of examples, to improve our understanding of qualier. Each example further details how an object can be selected out of the set, and why regular multiplicity constraints do not suce. 1. Example 1. Consider the binary association between the class Directory and class File on Figure 3.2. The intuitive meaning of this simple class diagram is that every directory is associated with many (may be zero) les and each le is associated with many (may be zero) directories. This simple modeling situation can be sharpened by adding a qualier, to reect additional constraints. Consider a Unix le Figure 3.2: A binary association example. system, in which each directory that consists of elements (les, directories or links) identied by their names. Cardinality constraints can not capture the key role of the name. But adding a qualier can, as shown in the next gure 3.3 : The example shows how the act of adding a qualier tightens the multiplicity in the forward di- 27

37 Chapter 3: FiniteSat Algorithm : Extension to Qualifier Figure 3.3: Modeling a Unix le system. rection over the association. Recall the symbolic notation from chapter 2 : R(rn 1 : SourceClass{(q 1, T 1 )...(q n, T n )}[min 1, max 1 ], rn 2 : T argetclass[min 2, max 2 ]) The source class in this example is Directory, the target class is F ile (q 1, T 1 ) is (filename, Name) and the multiplicity constraints 0, and 0, 1 respectively. 2. Example 2. Suppose we would like to model a some lookup data structure. In general case we can just create a simple binary association between two classes, say Array and Object, where there is many-to-many multiplicity constraint among them. However, we can be a bit more sophisticated by adding a qualier to specify non trivial constraints. The following model in gure 3.4 shows an array with exactly one object related to every index in the the array: Figure 3.4: Modeling an array with qualier. As we see from the above examples, when a natural index exists, it is benecial to use a qualier. 3.2 Discussion on Qualier Semantics Today, there is no one particular agreed by all set of formal semantics of UML, and many researches targeted this issue recently [47, 83]. The semantics specied by OMG [72] provide only verbal semantics, sometimes accompanied with OCL at the meta model level. In particular, qualier must have formal semantics for developing automated tools. 28

38 Chapter 3: FiniteSat Algorithm : Extension to Qualifier The general visual form of a qualier constraint is presented in gure 3.5, and symbolically denoted R(rn 1 : A{(q 1, T 1 )...(q n, T n )}[min 1, max 1 ], rn 2 : B[min 2, max 2 ]) such that A stands for SourceClass and B stands for T argetclass. Figure 3.5: A general qualier constraint In order to formulate its semantics consider the Bank example, gure 3.6. The account number serves as a qualier that uniquely identies an account. Without the qualier, the multiplicity constraint on the Account side is 0.. (actually, no Figure 3.6: A general qualier constraint multiplicity constraint). A qualier constraint might have several attributes, each associated with a value domain. In that case, the combined value domain of the qualier constraint is the cartesian product of the attributes' value domains. The semantics of a qualier constraint is combined with its associated multiplicity constraint. The general idea is that the combined values of the qualier attributes impose a partition on the set of target class instances that are linked to a source class instance. That is, for an instance s of the source class, a combined value of the attribute identies a set of target class instances that are linked to s. The multiplicity constraint at the target class end, imposes bounds on the size of the partition classes. There are two dierent interpretations for qualier semantics: 1. Universal semantics: The qualied multiplicity constraint applies to every qualied 29

39 Chapter 3: FiniteSat Algorithm : Extension to Qualifier object of the source class. 2. Existential semantics: The qualied multiplicity constraint applies only to source class objects that already participate in the association. In this work we adopt the universal semantics. Given a qualier constraint: R(rn 1 : A{(q 1, T 1 )...(q n, T n )}[min 1, max 1 ], rn 2 : B[min 2, max 2 ]), Q, its semantics is dened as follows: For a legal instance I, Q I is a function that maps every instance of A I and a combined value of T q1,... T qn to a set of B I instances: Q I : A I T q1... T qn 2 BI The mapping Q I satises the following constraints: 1. The set of B I instances to which an A I instance and a combined domain value are mapped, is restricted by the r multiplicity constraint on the B end: a A I, t 1 T q1,..., t n T qn : min 2 Q I (a, t 1,... t n ) max 2 2. Q I is a partition of r I : (a) The set of B I instances to which an A I instance a and a combined domain value are mapped, is included in the set r/a I of BI instances that are r I linked to a: a A I, t 1 T q1,... t n T qn : Q I (a, t 1,... t n ) r/a I (b) For a given A I instance, dierent combined domain values are mapped to disjoint sets of B I instances: a A I, t 1 T q1,..., t n T qn, t 1 T q 1,..., t n T qn : Q I (a, t 1,..., t n ) Q I (a, t 1,..., t n) = After we've seen this formal denition of the qualier constraint, there is one more sensitive matter (2.b in the above denition) that should be claried - the multiplicity constraints on the qualied side of the association. After the qualier is added to the 30

Chapter 3: FiniteSat Algorithm : Extension to Qualifier association, as mentioned earlier, combined domain value appears. However, the semantics of the multiplicity constraints does not change.

40 Chapter 3: FiniteSat Algorithm : Extension to Qualifier association, as mentioned earlier, combined domain value appears. However, the semantics of the multiplicity constraints does not change. That is, the min 1...max 1 constraints are imposed on the class alone, and not on the combined value. This observation appears since formally the following conditions should hold on legal instance: b B I a 1,..., a k such that Q I (a i, t 1,..., t n ) = b for dierent indices i j implies a i a j with min 1 k max 1. Although this detail may seem a bit redundant now, and partially an implication of the qualier semantics above, it will be very useful in upcoming sections of this work. In cases, where the number of possible values is innite, lower bound of the target class should not be strictly larger than zero. We will now elaborate this through a following example on Figure 3.7 where a part of a class diagram modeling health care system can be found. Figure 3.7 (a) shows a situation where the lower multiplicity bound of the number Figure 3.7: Example of multiplicities bounds under dierent semantic interpretations. (a) Shows unsuitable situation for universal interpretation. (b) Shows the unconstrained model with zero lower bound. of patients for a doctor in a day is 5. That means, under universal semantics, that for each and every day, there are at least ve and at most ten patients accepted. The issue with this situation is the question : what about dates in the past? Say 100 years ago? What about dates in the future? Since, the value qualier can get suits into type day, each such day must have at least 5 patients. The situation solved in gure 3.7 (b) where the lower bound is not such restrictive in association serves. Here the lower bound is zero. Actually, every Doctor instance does not have to have at least ve links with Patient instance. How- 31

41 Chapter 3: FiniteSat Algorithm : Extension to Qualifier ever, this case is less expressive, and some additional explanation either verbal or formal (like OCL) may be needed and used. The existential semantics interpretation advocates for saying that the multiplicity constraints hold only when the combined qualier value exists. Put otherwise, in the previous example on gure 3.7 (a) we could say: given a day where a doctor actually worked (and since such a date exists in our system) he accepted at least 5 and at most 10 patients. Such an interpretation makes it possible using qualiers in a way qualier's attribute takes values from dierent domains, without specifying where these values actually come from. 3.3 FiniteSat Extension for Qualier Constraint In previous chapters we discussed the importance of constraints in general and qualier constraints in particular. Qualier constraints signicantly enrich the modeling capabilities by strengthening multiplicity constraints. However, with more modeling and understanding capabilities more nite satisability problems arise. Finite satisability problems in the presence of qualier constraint arise due to cycles of conicting multiplicity constraints that include qualier constraints with nite attribute domains. Figure 3.8: A TV-Network with broadcast schedule example Figure 3.8 shows a situation where innity results from qualier constraint. The qualier constraint implies that in every legal instance, the number of broadcast schedules, b, is 7 t, where t is the number of TV networks. This is since every one of 7 possible values of Weekday 32

42 Chapter 3: FiniteSat Algorithm : Extension to Qualifier enumeration, together with an instance of TV-Network has a link to a BroadcastSchedule instance. But, at the same time, t = b by the multiplicity constraints on the networkschedule association. The only solution is that both classes are either empty or innite. In this chapter we present a method for detecting nite satisability problems in class diagrams that include binary multiplicity constraints, class hierarchy constraint, GS constraints and qualier constraint. The method builds on Maraee and Balaban algorithm FiniteSat [62, 61], which reduces the nite satisability of UML class diagram with above constraints into a solvability of linear inequalities system and test for the existence of solution. The algorithm is based on FiniteSat algorithm presented in chapter 2. Extending the algorithm FiniteSat to account to qualier constraints, involves the following addition of step (4): Algorithm 3.1. The FiniteSat Algorithm Input: A class diagram CD with binary multiplicity constraints, class hierarchy constraints, GS constraints and Qualier constraints. Output: A linear inequality system Ψ CD Method: Insert a variable for every class and association in CD. 1. For every multiplicity constraint, insert inequalities according to the Lenzerini and Nobili method (see chapter 2). 2. For every class hierarchy B A constraint, B being the sub-class with variable b, and A being the super-class with variable a, add the inequality a b. 3. For every GS constraint GS(C, C 1,...C n ; Const), C being the super-class with variable c, C i s being the subclasses with variables c i, and Const being the GS constraint, add n class hierarchy inequalities c c i, i = 1, n, and the following inequalities: 33

43 Chapter 3: FiniteSat Algorithm : Extension to Qualifier Const = disjoint: c n c j j=1 Const = complete: c n c j j=1 Const = incomplete: j [1, n].c > c j Const = overlapping: Without inequality Const = disjoint, incomplete: c > n c j. j=1 Const = disjoint, complete: c = n c j. j=1 Const = overlapping, complete: c < n c j. j=1 Const = overlapping, incomplete: j [1, n].c > c j. 4. For every qualier constraint Q, given as R(rn 1 : A{(q 1, T 1 )...(q n, T n )}[min 1, max 1 ], rn 2 : B[min 2, max 2 ]), (as described in Figure 3.5): (a) If the combined domain value of Q is non-nite, ignore the quali- er constraint, and handle the association according to the Nobili and Lenzerini method [56]. (b) Otherwise, extend the inequality system with the following inequalities: min 1 b r max 1 b min 2 a t q1... t qn r max 2 a t q1... t qn Correctness and Complexity of FiniteSat Next is given an extension of the original algorithm proof, which can be found in [11]. The correctness of Algorithm FiniteSat is proved in two steps: 1. First, nite satisability of a class diagram CD is reduced to the nite satisability of a class diagram CD, that does not include class hierarchies and qualiers. 2. Second, nite satisability of CD is reduced to solvability of the inequality system produced by FiniteSat. 34

44 Chapter 3: FiniteSat Algorithm : Extension to Qualifier The reductions depend on the structures of class hierarchies, and the presence of GS constraints. The rst step reduction does not hold for class hierarchy structures whose graphs include cycles with disjoint or complete constraints 1. Qualier constraints with a non nite combined value domain are removed (recall that their minimum multiplicity constraint is 0). A qualier constraint as in Figure 3.5 is replaced by two associations and a new class, as in Figure 3.9. The instances of the new class Q stand for all combinations of an A instance and a value in the combined domain value of Q. Figure 3.9: A reduced qualier constraint Reduction of Finite Satisability to a Class Diagram without Class Hierarchy Constraints and without Qualiers Translation of CD to CD : 1. Initialize CD by CD. 2. Replace every GS constraint GS(C, C 1,..., C n ; constraint) by n class hierarchy constraints C 1 C,..., C n C. 3. Replace all class hierarchy constraints with new regular binary associations (termed henceforth ISA associations) between the super-class to the subclasses. The multiplicity constraints on these associations are 1..1 participation constraint for the subclass (written on the super class end in the diagram) and 0..1 participation constraint for the super class. Figure 3.10-b shows the reduced class diagram of Figure 3.10-a. 1 For details of analysis of structure of class diagrams see [11] 35

45 Chapter 3: FiniteSat Algorithm : Extension to Qualifier 4. For a GS constraint GS(C, C 1,..., C n ; const) in CD, if ISA 1,..., ISA n are the associations in CD that replace its n class hierarchy constraints (entry (2) above), insert in CD a GS constraint const' on these ISA associations, as follows: (a) const = disjoint: const' = every object e of C may participate in exactly one link of the associations ISA 1,..., ISA n (a xor-constraint on the ISA associations). (b) const = complete: const' = every object of C participates in an ISA association link. (c) const = incomplete: const' = there exists an object of C that does not participate in any ISA association link. (d) const = overlapping: const' = there exists an object of C that participates in at least two ISA association links. If const is a pair constraint, insert in CD a constraint that combines the constraints of its components. 5. Replace every qualier constraint Q, given as R(rn 1 : A{(q 1, T 1 )...(q n, T n )}[min 1, max 1 ], rn 2 : B[min 2, max 2 ]), 2 with a new class Q and associations r q and r, with multiplicity constraints 1 to tq 1.. tq n and min 1..max 1 to min 2..max 2 respectively, as shown on gure 3.9. Note: The FiniteSat algorithm adopts the strict interpretation for the overlapping and incomplete constraints, that requires the existence of at least one instance with overlapping and incomplete covering, respectively. The constraint const' in CD reects the strict semantics. 2 as described in Figure

46 Chapter 3: FiniteSat Algorithm : Extension to Qualifier Figure 3.10: A class diagram reduction Mapping instances between CD and CD : 1. Γ Mapping an instance I of CD to an instance I of CD : (a) I has the semantic domain of I, same class extensions and association extensions, for all associations in CD and additional class and association extensions for associations added due to qualier constraint 3. (b) For every class hierarchy constraint D C in CD (including class hierarchies that are implied from GS constraints), and e D I : If the corresponding ISA association is ISA D, relate e in I by ISA D, i.e., (e, e) ISA I D. (c) For every qualier Q, every instance e of an associated object with qualier value is related by Γ(e) to two new objects a i and q i with link between them. The target object b i is linked to q i. Thar is, (a i, q i ) r q I, and (q i, b i ) r I We visualize this process in the gure Γ Mapping an instance I of CD to an instance I of CD: (a) Collapse ISA linked objects: For every structure of ISA linked objects o 1,..., o n in I, insert a single new object o to all classes of the objects in the structure: o = Γ (o i ), for i = 1,..., n. Figure 3.11 demonstrates the Γ mapping. The intuition is that CD splits a single instance object of CD into its components in its ancestor classes. 3 Like it was dened in the translation phase 37

47 Chapter 3: FiniteSat Algorithm : Extension to Qualifier (b) Populate CD I classes with the rest of I objects that are not ISA related: For every o C I, such that o is not ISA linked: Γ (o) = o C I. (c) Preserve association extensions: For every regular (not R q ) association a in CD, and link (o 1, o 2 ) a I, insert the link (Γ (o 1 ), Γ (o 2 )) into a I. Second, the R q links are not mapped (intuitively they shrink into combined qualier value). The r links are mapped into links between the target object and combined domain value with qualied object, between Γ (ê 1 ) and Γ (e 2 ) like in Figure Figure 3.11: The Γ mapping of a CD instance to a CD instance 4ê 1 is qualier combined value in I Figure 3.12: Γ Mapping of instances 38

48 Chapter 3: FiniteSat Algorithm : Extension to Qualifier The goal now is to show that the above mappings preserve legal instances. That is, if I is a legal instance of CD, then I is a legal instance of CD, and vice versa. While this is immediate for the Γ mapping, it is not always true for Γ. I. Preserving nite satisability from CD to CD The Γ translation: Claim 3.1. If CD is nitely satisable, then CD is also nitely satisable. Proof. Let I be a non-empty nite legal instance of CD, and denote I = Γ(I). All multiplicity constraints on regular associations and on ISA constraints are satised. It remains to show that I satises the corresponding GS constraints: For a GS constraint GS{C, C 1,...C n ; Const} in CD: 1. Const=disjoint: The extensions of C 1,...C n in I are pairwise disjoint. Therefore, for an object e C I i, (e, e) ISA I i and for each j i, (e, e) / ISA I j. Hence the xor-constraint is satised. 2. Const=complete: An object e C I is also an object of at least one subclass C I i. I Therefore, (e, e) ISA i. 3. Const=incomplete: There exists an object e C I which does not belong to any sub-class of C. Therefore, it does not participate in any ISA-link in I. 4. Const=overlapping: There are two classes from C 1,...C n that are overlapping in I. If e C i I C j I, then (e, e) ISA I i ISA I j. The proofs for pair constraints are obtained by combining the proofs of the single constraints. For every instance e of an associated object with qualier Γ(e) turns to be two new objects a 1 and q 1 with link between them. The target object b 1 is linked to q 1. We visualize this process in the gure Auxiliary claim. I is legal, fully nite instance of CD. That is: 1. I is nite and has non empty instances for all classes. 39

49 Chapter 3: FiniteSat Algorithm : Extension to Qualifier Figure 3.13: The Γ Mapping 2. I satises multiplicity constraints of CD. Proof. 1. I is instantiated using the mapping Γ applied to I that is legal fully nite instance of CD. In each step, constant amount of new objects are instantiated, since I is nite. All objects of I are mapped to their corresponding objects in I, thus I is also fully instantiated. 2. Note that the only constraints imposed on the class diagram that are newly added in CD are the multiplicity constraints added by the reduction on the newly created class that represents the qualier combined value. The constraints are not violated since it is set to be the size of the domain combined qualier value belongs to, and since it is nite - no more and no less new objects can be instantiated in I by Γ out of legal instance I of CD. II. Preserving nite satisability from CD to CD The Γ translation: The Γ translation might map a legal instance I of CD into an illegal instance I of CD. Therefore, it is necessary to characterize legal instances of CD whose Γ translation 40

50 Chapter 3: FiniteSat Algorithm : Extension to Qualifier yields a legal CD instance. The single class property dened below (for details see [61, 11]) guarantees that Γ (I ) is a legal CD instance, for a legal CD instance I. Denition:[Single class property] An instance I of CD has the single class property if every structure of ISA-linked objects does not include two objects from the same class. In [11] Balaban and Maraee characterize thew above property and explore when its existence in class hierarchy structures which satisfy it, providing the following result: Claim 3.2. If a non-empty, nite legal instance I of CD satises the single class property, then Γ (I ) is a non-empty, nite legal instance of CD. Proof. See [11]. Claim 3.3. Single class property existence 1. If CD has a tree or acyclic class hierarchy structure, then every legal instance of CD satises the single class property. 2. If the class hierarchy structure of CD does not include cycles with a disjoint or a complete constraint, then a nitely satisable CD has a non empty nite legal instance I that satises the single class property. Moreover, I can be, eciently constructed from any non empty nite legal instance. Proof. See [11]. Next is given the claim justifying qualier-mapped objects. Claim 3.4. I is legal nite instance of CD. That is: 1. I is nite and has non-empty instances for all classes. 2. I satises qualier constraints of CD (all the rest of constraints remain unchanged). Proof. 1. I is nite since it is constructed from I that is nite. All classes in I are not empty since all classes in I are not empty, and the only objects that do appear in I and do not appear in I are those of qualier domain values. 41

51 Chapter 3: FiniteSat Algorithm : Extension to Qualifier 2. We now show that I satises all qualier constraint specications. (a) Multiplicity constraints on the target end. Multiplicity constraints, namely min 2..max 2 bounds are satised in I. Since the link of association α simply changes the left end from Q object to combined qualier end, the associated multiplicity constraints in I are not violated. (b) Partition constraints. i. Correct mapping. It is obvious from T that the set of B I instances to which and A I instance a and combined domain value is mapped is included in the set of instances where a is linked to. ii. Disjointness of target sets. Consider two dierent objects q 1 and q 2 in I that are linked to the same object a of the A class (qualied class) in I. In case the objects q 1 and q 2 are linked to dierent instances of B the disjointness is not violated in I, by T. The confusing situation arises when they are linked to the same object of B, say b 1. In this case we show there still exists another legal instance of CD, without this situation. The construction of such is being made by splitting the object b 1 into two distinct objects b 11 and b 12 that are linked to q 1 and q 2 respectively. In order to satisfy all constraints imposed on class B in CD we copy all the the rest of I linked to b 1 to both b 11 and b 12 objects. If such situation still exists with another instance of Q we repeat this process again. We further give a visual elaboration of the above proof. In a case where two dierent objects q 1 and q 2 in I are linked to dierent objects of A each, the disjointness is not violated in I by T. For demonstration consider gure In a case where more then one Q object is linked to an A object we get a situation shown on gure Now recall the most complicated and confusing case, where two objects q 1 and q 2 are linked to the same 42

52 Chapter 3: FiniteSat Algorithm : Extension to Qualifier Figure 3.14: Γ Mapping on dierent A objects. Figure 3.15: Γ Mapping on the same A object. object of A and the same object of B. The demonstration of splitting the B object into two new objects can be seen on gure Based on Claims 3.1, 3.2, 3.3 and 3.4 we get the main result for reducing nite satisability between class diagrams: Theorem 3.5 (Reduction of nite satisability between class diagrams). Let CD M,CH,GS,Q denote class diagrams with multiplicity constraints, class hierarchy, GS and qualier constraints, and CD M,GS denote class diagrams as in the CD construction, with multiplicity constraints and GS constraints on ISA associations. 43

53 Chapter 3: FiniteSat Algorithm : Extension to Qualifier Figure 3.16: Γ Mapping on the same A object and same B object. 1. Finite satisability in CD M,CH,GS,Q is reducible to nite satisability in CD M,GS, for class diagrams in CD M,CH,GS,Q without class hierarchy cycles that include a disjoint or a complete constraint. 2. A nitely satisable class diagram in CD M,CH,GS,Q can be eectively translated into a nitely satisable class diagram in CD M,GS. Corollary 1. If a class diagram CD in CD M,CH,GS,Q is translated into a non-nitely satisable class diagram in CD M,GS, then CD is also non-nitely satisable Reduction of Finite Satisability of CD M,GS to solvability of the inequality system produced by FiniteSat Claim 3.6. If CD is nitely satisable, then Ψ CD is solvable. Proof. There are four kinds of inequalities introduced for multiplicity, class hierarchy, GS constraints and qualiers. 1. Multiplicity constraint inequalities: Satised, as shown in [56]. 2. Class hierarchy inequalities: Satised, as shown in [11]. 44

54 Chapter 3: FiniteSat Algorithm : Extension to Qualifier 3. GS constraint inequalities: Satised, as shown in [11]. 4. Qualier constraint inequalities: As shown earlier translation description of CD into CD, the qualier disappears, and newly added class Q is added with multiplicity constraints. The solution for these new multiplicity constraints inequalities is the same as for qualier inequalities added by the FiniteSat algorithm. Therefore, CD has only multiplicity constraints which represent qualier constraint from CD. Multiplicity constraints inequalities are satised as shown in [56]. Claim 3.7. If Ψ CD is solvable, then CD is nitely satisable. Proof. Ψ CD contains only inequalities described in claim 3.6, and the proof for these inequalities kinds has been shown in [56, 11]. These claims prove the second step of FiniteSat correctness: Theorem 3.8 (Reduction of Finite Satisability of CD M,GS to inequality solvability). Finite satisability in CD M,GS is reducible to solvability of the inequality system produced by FiniteSat. Putting together the results of the two step proof (theorems 3.5 and 3.8), we obtain the correctness theorem for FiniteSat: Theorem 3.9. FiniteSat correctness Reduction of Finite Satisability of CD M,CH,GS,Q to inequality solvability 1. Finite satisability in CD M,CH,GS,Q is reducible to solvability of linear inequalities, for class diagrams in CD M,CH,GS,Q without class hierarchy cycles that include a disjoint or a complete constraint. The reduction is given by the FiniteSat algorithm. 2. A nitely satisable class diagram in CD M,CH,GS,Q can be eectively translated by FiniteSat into a solvable linear inequality system. 45

55 Chapter 3: FiniteSat Algorithm : Extension to Qualifier Corollary 2. If the application of FiniteSat to a class diagram CD in CD M,CH,GS,Q returns an unsolvable inequality system, then CD is non-nitely satisable. Claim 3.10 (FiniteSat Complexity). The construction of the inequalities by FiniteSat, and their number is O(n) where n is the number of constraints in the class diagram. Proof. Every constraint contributes a constant number of inequalities. 46

56 Chapter 4 Practical Occurrence of Finite Satisability in Class Diagrams In this chapter we introduce a series of experiments based on the implementation described in appendix A. We also propose a series of class diagram's metrics that are relevant to nite satisability property of a class diagram. While considering these metrics, we programmatically generate large class diagrams and run controlled experiments to demonstrate the relevance of this research and set up a basis for benchmarking. 4.1 Class Diagram's Metrics Many dierent kinds of software metrics have been developed during last years. Among these metrics, not only there is the primitive LOC (lines of code) metrics that was very descriptive several decades ago but is not sucient today, but also object oriented programming metrics studied by Chidamber and Kemerer in the beginning of 1990's [26] along with recent work describing metrics for UML models directly in [52, 60]. In this work we are interested in the metrics that both possibly cause nite satisability problem and describe the complexity of a class diagram. The following metrics are the most useful in describing class diagram size 47

57 Chapter 4: Practical Occurrence of Finite Satisfiability in Class Diagrams and structural complexity: 1. NCM - Number of the Classes in a Model 2. NASM - Number of the Associations in a Model 3. NSUBC - Number of Subclasses of a Class 4. NSUP C - Number of Superclasses of a Class 5. DIT - Depth of Inheritance Tree We also introduce three more metrics below. These metrics are of our interest due to their straightforward impact on the nite satisability of a model. 1. CY CM - Number of cyclic inheritance structures in a model 2. NGS - Number of generalization set constraints in a model 3. NCM NASM - Classes to associations ratio in a model The last three metrics above, are less conventional properties of class diagrams, but seem to be very relevant in light of testing their eect on nite satisability problem as shown by Maraee and Balaban in [61, 62] - we contribute to this knowledge by showing the existence of nite satisability through synthetic experiments. There are also other metrics dening class diagram complexity. Some of them are derived from the UML meta model [72]. Others derive from experiments about cognitive complexity of class diagrams in [60] and applying mathematical techniques, like principal component analysis. These metrics also relevant to structural complexity of class diagram, but are less relevant to nite satisability problem 1, and thus are omitted in our experiments. In the following sections we describe a series of experiments on large class diagrams. By large, we mean class diagrams with hundreds of classes and multiple constraints imposed of them 1 These metrics deal with properties and constraints in dierent details resolution, not aecting nite satisability directly 48

58 Chapter 4: Practical Occurrence of Finite Satisfiability in Class Diagrams out of the metrics above. The class diagrams used for the experiments are automatically generated in a symbolic representation to be tested on our implementation described in previous chapter. Each experiment is used to nd out some property, and the results are presented over class diagrams with dierent sizes and multiple repetitions. 4.2 Experiment 1 The problem of model's nite satisability gained more attention in the past years within the research MDA community. However, for the best of our knowledge, it was not clearly shown that the problem does exist among models. In the following series of controlled experiments we advocate for the existence of such a problem. The experiment presented in the following section does not provide a formal proof for existence of a nite satisability problem, it rather deals with the lack of real-world benchmarks or problem sets as best eort approximation. The steps taken in the experiment are: 1. Generate a class diagram with NCM classes and NASM associations 2. Check the class diagram for nite satisability 3. Make k repetitions to achieve more precise results Let's elaborate the three steps of the experiment above. 1. Class diagram generation. The best way to describe this step, is to compare a class diagram to a random graph model, rst studied by Erdos and Renyi [33]. In this model, undirected edges are placed at random between a xed number n of vertexes to create a graph in which each of the 1 2n(n 1) possible edges is independently present with some probability p. The only exception is that we allow duplicate edges. In the view of this, we take classes for vertexes and associations for edges. The probability p varies together with dierent values given to NCM and NASM in every generation 49

59 Chapter 4: Practical Occurrence of Finite Satisfiability in Class Diagrams of a class diagram. The output of this step is a le with a symbolic representation of a model in USE [37] format. The process of class diagram generation in detail: (a) Create NCM classes. (b) Do NASM times i. Choose two random classes. (That is why duplicate edges appear. We allow several dierent associations between two classes) ii. Create association between the chosen classes iii. Impose random multiplicity constraints on both sides of the association created. 2. Checking for nite satisability. After the class diagram was generated, it is tested for nite satisability. This is done via our implementation of FiniteSat [65] algorithm described in appendix A. The input is the symbolic representation of generated class diagram, and the output is boolean answer considering existence of nite satisability problem. 3. Repeating over and over. The idea of the experiment is to show the existence of nite satisability problem in large class diagrams. In order to show this phenomena in a convincing way, we repeat the generation and the test multiple times. Collecting the results of this experiments enables us to show quite a precise results, upon a representative sample. We now present the results of the described above experiment. Table 4.1 demonstrates the results of the experiment for existence of nite satisability problems in huge class diagrams. By huge we mean hundreds of classes and associations between them. For every such experiment 100 and 1000 repetitions have been made in order to demonstrate high probability results. The rst column of the table is how much classes we generated in the diagram. The second column is the number of associations. The last two columns state what is the fraction 50

60 Chapter 4: Practical Occurrence of Finite Satisfiability in Class Diagrams Table 4.1: Existence Experiment Results NCM NASM FS in 100 repetitions FS in 1000 repetitions % 12.1% % 35.1% % 87.1% % 25.9% % 89.2% % 27% % 96.4% % 6.8% % 27.4% % 99.5% of the diagrams generated, in which the problem of FiniteSat was present. It is worth to note, that the class diagrams generated where very constrained. Multiplicity constraints were randomly selected in the range [1..10]. There is another interesting issue to consider in the presented results on table 4.1. Specically the NCM NASM ratio. Is there any point to consider class diagrams where the number of associations is less then number of classes in order of magnitude? Well, there is! It seems that in many real world class diagrams a major part of associations between classes are unconstrained. Put otherwise, they have many-tomany [0..*] multiplicity constraint, that is actually unconstrained. Since very studied reason for nite satisability problems is a cycle of conicting multiplicity constraints[61, 65, 11], such associations are not of our interest. Notice, that the associations generated in our class diagrams are all constrained. Therefore, class diagrams where NCM NASM greater than one, can easily describe a real world model. is any xed number The next experiment we attend to perform over the existence problem is adding more constraints and parameters into the class diagram being generated. The parameters of Number Of Subclasses (NSUBC), Number of Superclasses (NSUPC), Depth of Inheritance Tree (DIT), Cyclic Inheritance Structures (CYCM) and Number of Generalization Sets (NGS) will added to class diagrams. NSUBC and NSUPC increase the complexity of the diagram in a straightforward way, while DIT, CYCM and NGS clearly cause more nite satisability 51

61 Chapter 4: Practical Occurrence of Finite Satisfiability in Class Diagrams Table 4.2: Scalability Experiment Results NCM NASM Running Time in mili-seconds problems that need special algorithmic treatment, as have been shown in [24, 62, 65, 61, 64]. It might be interesting to check nite satisability problem occurrence through coverage of problems patterns, which were shown in [12]. 4.3 Experiment 2 A very important question we deal with in this section is : Is our algorithm scalable and appropriate for large models? Using linear programming methods and Java implementation make this question non trivial. In the following results we show the running time of our algorithm on large generated models. Every row in table 4.2 shows the average running time over 100 repetitions. The experiment performed on a machine with dual core CPU 1.3 GHz, 2 GB RAM memory on Windows Vista operating system with algorithm implementation 2 described in appendix A. Note, that these results demonstrate very high performance. Consider the last row saying that a class diagram with 1000 classes and 500 constrained associations can be tested for nite satisability in 1.5 minutes! Other results demonstrate all, by far less than one minute running time. This raises an interesting question on the need of incremental techniques for nite satisability testing. It seems the only reason of 2 available at modeling/ 52

62 Chapter 4: Practical Occurrence of Finite Satisfiability in Class Diagrams doing is keep the modeler of being annoyed by waiting a minute for the CASE tool answer after clicking on the FiniteSat test button. 4.4 Experiment 3 In this section we describe an experiment which aim is to determine when the problem of nite satisability arises. In order to do so, we performed a set of consecutive experiments of 100 repetitions, by raising the number of constrained associations. The number of classes in the generated class diagram was constant The number of associations varied from one to 50. The following gure 4.1 plots the results we obtained. We see that when we have 50 classes and 50 associations there is 0.83 probability (normal distribution considered, which might be dierent in real world applications) to have nite satisability problem on randomly generated class diagram. On gure 4.2 results of the analogous experiment are Figure 4.1: Appearing FiniteSat on 50 Classes Diagram presented. This time the class diagram size was enlarged. We started with 100 classes, and added up to 100 associations. Like in the previous case, we run the experiment 100 times for each case. While in each case we generated random class diagram from scratch. We see that the probability to encounter nite satisability problem rises together with the number of associations. This is intuitively clear - the more constrained the model is, the more contradictions appear. 53

63 Chapter 4: Practical Occurrence of Finite Satisfiability in Class Diagrams Figure 4.2: Appearing FiniteSat on 100 Classes Diagram 54

64 Part II Metrics 55

65 During the experiments that explored nite satisability occurrence and relevance, that were described in chapter 4 several problems arose. The main problem was the lack of exibility in metrics changes. Consequently, a method independent of metrics choice was developed. The method automatizes benchmarks creation given a metric suite. The method is based on: Abstraction of metrics specication - general patter for metrics language. Using a model checker for benchmarks creation: Using a meta model - that is, abstract syntax of the model. Metric suite for a meta model. Creating s benchmark which is a meta model instance, tting the metric suite. In this part the following topics are discussed: 1. What is a metric? 2. What are the rules for determining the credibility of metric suite. 3. How do model benchmarks look like? 4. How metrics can be used to generate benchmarks for modeling problems. 56

66 Chapter 5 Metrics Metrics needed for controlling processes, since only measurable processes can be possibly controlled. Decision making becomes better when grounding the decision making process on given numbers, describing what is being controlled, rather than intuitive feelings and observations. Genero et,.al. [36] notice that in a marketplace of highly competitive products, the importance of delivering quality software is not only an advantage, but a necessary factor for software companies to be successful. It is widely accepted in software engineering that the quality of a software system should be assured from the initial phases of its life cycle. Class diagrams in particular is an artifact often available in early stages of software development, thus serves as natural subject for metric characterization of external attributes, such as coupling, complexity, etc., separately from its behavior. Denition: Metric. We dene a metric µ to be a non-negative function on a specic domain of models to the set of real numbers R. It is due to specic need more properties can be asked from metrics to satisfy. 57

67 Chapter 5: Metrics 5.1 Weyuker's Characterization of Metrics Properties Weyuker [89] proposed a set of properties for software metrics evaluation. The properties were successfully used by Chidamber and Kemerer at very cited work [27] on object oriented design metrics. Today, Weyuker's properties are still relevant, and dierent interpretations are examined in [71]. Original Weyuker's properties, examined metrics of classical sequential programs consisting of program statements. Examples of such programs, are programs written in Pascal, C or Fortran languages. According to Weyuker [89] one way to think of a program is as an object made up of smaller programs. Using this point of view, the basic operation in constructing programs is composition, which is concatenating two program bodies. P ; Q is the program body formed by appending the program body Q immediately following the last statement of P. Weyuker uses P to denote the complexity of P, with respect to some hypothetical measure, and that P is a non-negative number. Below is given a list of properties to evaluate metrics for such programs: 1. Property 1: ( P )( Q)( P = Q ). This is a property of any general metric. Surely, a metric which rates all programs equally is not really a metric. 2. Property 2: ( P )( Q)(P Q, P = Q ). 1 This property considers syntactic complexity metrics. That means,complexity of the program is being measured, not the function being computed by the program. 3. Property 3: Let c be a non negative number. Then there are only nitely-many programs of complexity c. This property needed to strengthen the rst property. 4. Property 4: ( P )( Q)( P P ; Q, P P ; Q ). This property states, that the components of a program are no more complex than the program itself Property 5: To answer the question whether or not the concatenation of a given 1 P Q means P and Q compute the same function 2 Weyuker terms this property monotonicity. 58

68 Chapter 5: Metrics program body with other program bodies should always aect the complexity of the resultant program body in a uniform way. (a) ( P )( Q)( R)( P Q, P ; R = Q; R ) (b) ( P )( Q)( R)( P Q, R; P = R; Q ) 6. Property 6: There are program bodies P and Q such that Q is formed by permuting the order of statements of P, and P = Q. This property asserts that program complexity should be responsive to the order of the statements, and hence the potential interaction among statements. 7. Property 7: If P and Q are almost identical 3 then P Q. This property examines the question: what kind of syntactic modications should leave the complexity of a program unchanged. 8. Property 8: ( P )( Q)( P + Q P ; Q ). This property examines the question: should the complexity of a program body be no less than the sum of the complexities of its components. The properties above, are interesting not only because they can help choosing metrics suite, but also because they are useful for examining metric strengths and weaknesses thus comparing existing metrics. Chidamber and Kemerer used [27] Weyuker's properties to evaluate metrics for object oriented design, and later in this work the metrics proposed for models are also evaluated with Weyuker's properties. 5.2 Object Oriented Metrics Bunge's denition of object complexity The classes properties as a complexity measures were inspired by Bunge's denition of ontologies [18, 19]. Like any substantial individual, a class possess a nite number of prop- 3 Q is a syntactical transformation of P 59

69 Chapter 5: Metrics erties. The properties do not exist on their own, but are attached to individuals. On the other hand substantial individuals are not simply bundles of properties. A substantial individual and its properties collectively constitute and object. An object can be represented as X =< x, p(x) > where x is the substantial individual and p(x) is the nite collection of its properties. x can be considered to be the token or the name by which the object is represented in a system. Basing on this representation, Bunge denes a similarity of two objects X and Y to be σ(x, Y ) = p(x) p(y), following general principle of dene similarity in terms of sets. The complexity of an individual dened as a numerosity of its composition, implying that a complex individual has a large number of properties. Complexity of < x, p(x) >= p(x), where p(x) is the cardinality of p(x) Chidamber and Kemerer Metrics and Evaluation In 1994 Chidamber and Kemerer (C&K) introduced a metrics suite [27] for object oriented design. The C&K suite consists of six metrics that measure complexity of a single class. For a class C, C&K dene p(c) = {M C } {I C } where {M C } is the set of methods and {I C } is the set of instance variables (a.k.a data members) of a class C. Following the Bunge's denition above, a binary operation + on two classes is dened. For two classes X =< x, p(x) > and Y =< y, p(y) >, X + Y is dened as < z, p(z) > where z is the token with which X + Y is represented and p(z) is given by p(z) = p(x) p(y). The metrics evaluate a single class in a way that measures dierent aspects of object oriented design. The metrics are theoretically grounded on next terms 4 : 1. Complexity. Numerosity of composition. The properties' cardinality - cardinality of dierent sets of methods and instance variables. 4 When closely examined they remind Fowler's bad smells[35], especially the Divergent Change bad smell. 60

Chapter 5: Metrics 2. Scope of properties. Reects design decisions (DIT,NOC), how classes are arranged in hierarchy, and how their methods and instance variables aect the system.

70 Chapter 5: Metrics 2. Scope of properties. Reects design decisions (DIT,NOC), how classes are arranged in hierarchy, and how their methods and instance variables aect the system. How far does the inuence of a property extend? 3. Coupling and Cohesion. Two terms that are used to characterize OO design. (a) Coupling. Two objects are coupled if and only if at least one of them acts upon the other. (b) Cohesion. Following the set theoretic denition of similarity, cohesion can be de- ned as similarity between two methods. Where the similarity set is the common instance variables for two methods. The metrics are evaluated with six of the Weyuker's evaluation properties. The metrics C&K propose are: 1. Weighted Methods Per Class (WMC). Consider a class C 1 with methods M 1,..., M n that are dened in the class. Let c 1,..., c n be the complexity of the methods 5. Then W MC = n i=1 c i. If all method complexities are considered to be unity, the W MC = n, the number of the methods, for example W MC(A) = 3 on gure 5.1. Figure 5.1: A class diagram 5 Complexity is deliberately not dened more specically here in order to allow the most general application of this metric 61

71 Chapter 5: Metrics 2. Depth of Inheritance Tree (DIT). Depth of inheritance of the class is the DIT metric for the class. In cases involving multiple inheritance, the DIT will be the maximum length from the node to the root of the tree. For example, DIT (B) = 1, DIT (A) = DIT (C) = 0 on gure Number Of Children (NOC). The number of immediate subclasses subordinated to a class in a class hierarchy. For example, NOC(A) = 1 on gure Coupling Between Object Classes (CBO). CBO for a class is a count of the number of other classes to which it is coupled. Two classes are coupled when methods declared in one class use methods or instance variables dened by other class. For example, CBO(C) = 1 on gure 5.1, since class C is coupled only to class A. 5. Response For a Class (RFC). RF C = RS where RS is the response set for the class. The response set of a class is a set of methods that can potentially be executed in response to a message received by an object of that class. It should be noted the membership to response set is dened only up to the rst level of nesting of method calls. The set also specically includes methods called outside of the class. For example, on gure 5.1 if the methods Foo, Goo and Boo are recursive methods that call only themselves, we get RF C(A) = 3. To obtain an accurate RF C value a code or sequence diagrams must be analyzed. 6. Lack of Cohesion in Methods (LCOM). The LCOM is a count of the number of method pairs whose similarity is 0 minus the count of method pairs whose similarity is not zero. Like with RF C metric, in order to compute LCOM the code of the class must be analyzed. Assume that Foo, Goo and Boo methods of class A on gure 5.1 do not use common data members, than we get LCOM(A) = 3. C&K chose six properties out of Weyuker's properties list to evaluate the metrics they propose. The properties chosen were properties number 1,3,4,5,6,9 6. The properties were 6 They numbered original Weyuker's properties 8.a and 8.b as dierent properties 62

72 Chapter 5: Metrics changed 7 to be appropriate to classes as the objects being measured. The rest three metrics that C&K did not use, don't suite according to classes, but only to sequential programs. In order to evaluate their metrics, CK dened what a class is and what a binary operation + on two classes mean. Thus, they dened a combination of two classes, based in the denition of classes properties. The classes properties as a complexity measures were inspired by Bunge's denition of ontologies. 5.3 Metrics for Models Background For many specic goals, many dierent metrics for models have been proposed over the last decade. A clear goal is specied when a metric is proposed. Many authors proposed metrics for class diagrams in particular - the survey of such metrics was reported by Marcela et.al., in [36]. In general metrics can be characterized into several categories. For example, there can be size or structure metrics [59] which can be used to measure size and internal constraints in a class diagram. Metrics dened by dierent authors [36] are used for the following goals: 1. Measure design complexity in relation to their impact on external quality attributes such as maintainability, reusability, etc. For example 3 of 6 metrics Chidamber and Kemerer propose [27] : W MC NOC DIT 2. Measure dierent internal properties such as coupling. For example Li and Henry [57] propose the DAC metric, which is the number of attributes in a class that have another class as their type. 7 transformed to classes, rather than sequential program bodies 63

73 Chapter 5: Metrics 3. Measure object oriented mechanisms such as inheritance or information hiding. Abreu and Melo in [1] propose the method hiding factor and attribute inheritance factor as a part of MOOD metric suite, as such measures. 4. Class diagram complexity. See work by Manso et,al.,[60] for description of cognitive experiments describing such which metrics contribute to class diagram complexity. The metrics above, are used to measure existing class diagrams, at various stages of their development. Many research results demonstrate automatic metric-extraction tools Weyuker's Properties Adaptation for Models Metrics Class Diagram is a visual language that consists of classes, associations and features. These elements are further restricted by constraints. Following Bunge's denition of object's complexity, and inspired by example of Chidamber and Kemerer [27] we dene a class diagram CD = {C CD } {P CD } {CON CD } where {C CD } is the set of classes, {P CD } is the set of properties (this set includes associations in a class diagram) and {CON CD } is the set of constraints imposed on the class diagram elements. Consistent Renaming: We dene consistent renaming of class diagram, to be such a renaming of elements that take eect on every place where the renamed element of a class diagram mentioned. For example, in gure 5.1 a consistent renaming could be renaming of class A to X making the attribute att of class C change its type form A to X accordingly. Ordering: Ordering of a class diagram means replacing the elements of class diagram. That is - imposing constraints on dierent classes or properties or imposing an association on other classes than it was originally imposed on. No elements are added or removed in the ordering. Intuitively, the elements only change their place in a diagram. Binary operation + : Combination of class diagrams: Bunge provides an ontology as a basis for dening combination of class diagrams. Combination of two (or more) class diagrams result in another class diagram whose elements are the union of the elements of 64

74 Chapter 5: Metrics the component class diagrams. Let CD 1 = {C CD1 } {P CD1 } {CON CD1 } and CD 2 = {C CD2 } {P CD2 } {CON CD2 } be two class diagrams. Then CD 1 + CD 2 is dened as: CD 3 = {C CD1 } {C CD2 } {P CD1 } {P CD2 } {CON CD1 } {CON CD2 } The combination is more than just union of the properties sets. The combination is recursive. That is, for example, once two classes having the same name 8 they are merged too 9. Semantic equivalence: Two class diagram are semantically equivalent if they have exactly the same instances. Intuitively, two distinct class diagrams can be semantically equivalent, but dierent 10 due to transitive class hierarchy constraints, which are specied visually. Although many researchers propose dierent ways of combination for software models in general, and UML class diagrams in particular [30, 68, 76], the above denition for combination is chosen for metrics evaluation, since it does neither rely on user participation nor on versioning systems. Merging Invariants: In case the two class diagrams which are being merged are disjoint, we envision no problems on merging operation. This is not the case where the merging is applied to two overlapping class diagrams. During combination of overlapping class diagrams dierent conicts may arise (dierent multiplicity constraints on the same association, etc.), depending on the overlapping part. General syntactic conicts. If such a conicts arise, we assume that they are solved by a pre-dened strategy, in a way that does not removes the elements of the resulting class diagram. Rather it reorganizes the elements of the diagram, without eect on class diagram's properties set sizes. We introduce several invariants that must hold during class diagrams combination process. 1. Example 1: Multiplicity Constraint Conict. In case there is a non trivial multiplicity 8 Two classes having the same name in a merging operation represent a conict that must be solved with well dened strategy. 9 Like Chidamber and Kemerer dene in [27] 10 syntactically dierent 65

75 Chapter 5: Metrics constraint on the same association end in both class diagrams, that rise a conict situation, the non trivial multiplicity constraint remains in the resulting diagram, regardless of the way it was solved. Consider gure 5.2, gure 5.3 and gure 5.4 where CD 3 = CD 1 + CD 2 as an example. Figure 5.2: Class Diagram: CD 1 Figure 5.3: Class Diagram: CD 2 The multiplicity constraint on the association end stays non trivial, that is, two con- icting constraints on CD 1 and CD 2 are solved in way of creating a new non trivial constraint, rather than setting it to any and thus unconstraining the association end. Figure 5.4: Class Diagram: CD 3 2. Example 2: Class Hierarchy Constraint Conict. In case there are two classes A and B, such that A is a sub class of B in CD 1 and B is a sub class of A in CD 2 like shown in gure 5.5 and gure 5.6, without loss of generality, if CD 3 = CD 1 + CD 2, A will remain a subclass of B in CD 3. Class B, however, will be subclassing some other class, say C, in CD 3 like it is shown on gure 5.7. Which class exactly is chosen to be the C class is decided by a predened strategy. To advocate that this C class can always be found, consider the Object class in modern object oriented languages such as Java or C, where each and every class is a subclass of Object. 66

76 Chapter 5: Metrics Figure 5.5: CD 1 Figure 5.6: CD 2 Figure 5.7: Class Diagram: CD 3 Following are Weyuker's properties and their intuitive meaning in terms of models, rather than sequential programs: 1. Property 1 - Non-Coarseness. A metric µ satises non-coarseness if there exist two distinct models M 1 M 2 such that µ(m 1 ) µ(m 2 ). This property characterizes syntactic sensitivity. This implies that not every model can have the same value for a metric, otherwise it has no value as a measurement. For example, NCM NCM where NCM is number of classes for a model, is not a useful metrics since it is always equal to 1 for every class diagram model. 2. Property 2 - Finiteness. A metric µ satises niteness if there is a nite number of models M with the same value for a metric µ. 3. Property 3 - Non-Uniqueness. A metric µ satises non-uniqueness if there exist 67

77 Chapter 5: Metrics two distinct models M 1 M 2 such that µ(m 1 ) = µ(m 2 ). That means that a metric measures semantic properties: Two dierent models can have the same metric value. For example, it is obvious that two dierent class diagrams can have dierent number of associations or constraints, thus being syntactically dierent and still having the same number of classes, that is equal NCM value. 4. Property 4 - Syntactic Sensitivity. A metric µ satises syntactic sensitivity if there exist two dierent M 1 M 2 but semantically equal models M 1 M 2 which describe equal systems such that µ(m 1 ) µ(m 2 ). Such a metric distinguishes syntactic dierences that have no semantic eect. 5. Property 5 - Monotonicity. A metric µ satises monotonicity if for every two models M 1 and M 2, µ(m 1 ) µ(m 1 + M 2 ) and µ(m 2 ) µ(m 1 + M 2 ). where M 1 + M 2 implies combination of M 1 and M 2. Combination of models can not decrease metric value. 6. Property 6 - Combination Sensitivity. A metric µ satises combination sensitivity if there exist M 1, M 2 and M 3 such that µ(m 1 ) = µ(m 2 ) but µ(m 1 +M 3 ) µ(m 2 +M 3 ). This property measures sensitivity to model combination: The interaction between M 1 and M 3 can be dierent than the interaction between M 2 and M 3 resulting in dierent metric values for M 1 + M 3 and M 2 + M 3. For example, see the analytical evaluation of the NCM metric later in this chapter. 7. Property 7 -Ordering Sensitivity. A metric µ satises ordering sensitivity if there exist two models M 1, M 2 such that M 2 is obtained by ordering 11 of M 1 's elements, and µ(m 1 ) µ(m 2 ). The metric should be sensitive to permutation of inner elements inside a model. 8. Property 8 - Consistent Renaming. A metric µ satises consistent renaming 11 The notion of model ordering must be dened in order to evaluate metrics with respect to this property 68

78 Chapter 5: Metrics if for every two models M 1, M 2 such that M 2 is obtained by consistent renaming of M 1 's elements µ(m 1 ) = µ(m 2 ).The metric should possess indierence to consistent renaming of elements inside model. That is, the metric's value should not change with a consistent renaming of an element inside a model. 9. Property 9 - Interaction Increases Complexity. A metric µ satises interaction increases complexity property if there exist two models M 1 and M 2 such that µ(m 1 )+ µ(m 2 ) < µ(m 1 + M 2 ). Intuitive meaning is that the metric should reect combination of models. The principle behind this property is that when two models are combined, the interaction between models can increase the complexity metric value. What metrics properties are for? According to Weyuker [89], the properties of syntactic complexity serve a basis for metrics evaluation, and should help to clarify the strengths and weaknesses of metrics. The properties should allow us to formally compare complexity models Metrics Evaluation In this section we evaluate some of the metrics proposed for UML class diagrams [52, 60, 36] and those that were used for synthetic generation in [59]. Metric 1 : Number of Classes in a Model (NCM). NCM is dened to be the number of classes in a class diagram [52], that is, given CD as dened earlier, NCM = {C CD }. Theoretical basis: NCM relates directly to Bunge's denition of complexity of a thing, since classes are properties of class diagram and complexity is determined by the cardinality of its set of properties. Analytical evaluation of N CM 1. Obviously there exist two distinct class diagrams CD 1 and CD 2 such that µ(cd 1 ) µ(cd 2 ), therefore property 1 is satised. 69

79 Chapter 5: Metrics 2. Property 2 is not satised - another association can be always added to any given class diagram. This means there is innite number of class diagrams having the same NCM value. 3. There exist two distinct class diagrams CD 1 and CD 2 such that µ(cd 1 ) = µ(cd 2 ), therefore property 3 is satised. 4. The same application domain can be modeled by two designers in two dierent ways, using dierent considerations about generalization for instance, creating dierent classes in a class diagram. Therefore, property 4 is satised. 5. µ(cd 1 +CD 2 ) = µ(cd 1 )+µ(cd 2 ) σ, where σ is the number of common classes between CD 1 and CD 2. Clearly, the maximum value of σ is min(µ(cd 1 ), µ(cd 2 )). It follows that µ(cd 1 ) µ(cd 1 + CD 2 ) and µ(cd 2 ) µ(cd 1 + CD 2 ), thereby satisfying property Let CD 1 and CD 2 be two class diagrams, such that {C CD1 } {C CD2 } =. Let CD 3 be class diagram such that {C CD1 } {C CD3 } =. From the denition of class diagram merging above, it follows that µ(cd 1 + CD 3 ) µ(cd 2 + CD 3 ). Therefore property 6 is satised. 7. Property 7 is not satised. Given any class diagram CD, no matter how the class elements will be ordered, their number will still be xed. Put otherwise, removing an association feature or constraint from one class to another, does not change the number of classes in a diagram. 8. If we rename 12 the classes inside a diagram, it is trivially does not change the number of the classes. Therefore property 8 is satised. 9. From the analysis of property 5 above, we get that property 9 does not hold. Roughly speaking, the number of classes can not grow due to merging of two class diagrams. 12 We assume that every class appears exactly once in a given class diagram 70

80 Chapter 5: Metrics Metric 2 : Number of non-trivial Multiplicity constraints (N ot M). NoT M is dened to be the number of non trivial (dierent from 0 and *) multiplicity constraints at the ends of binary associations [59]. Theoretical basis: N ot M relates directly to Bunge's denition of complexity of a thing, since constraints in general, and multiplicity constraints in particular are properties of class diagram, while complexity is determined by the cardinality of its set of properties. It was shown [59] that NoT M value has a major impact on correctness of class diagrams. Analytical evaluation of N ot M 1. Obviously there exist two distinct class diagrams CD 1 and CD 2 such that µ(cd 1 ) µ(cd 2 ), therefore property 1 is satised. 2. Property 2 is not satised - another unconstrained association can be always added to any given class diagram. This means there is innite number of class diagrams having the same NoT M value. 3. There exist two distinct class diagrams CD 1 and CD 2 such that µ(cd 1 ) = µ(cd 2 ), therefore property 3 is satised. 4. Two equal systems can be modeled in two dierent ways by dierent designers. Using dierent considerations about generalization for instance, creating dierent classes in a class diagram and imposing dierent amount of constraints on associations among these classes. Therefore, property 4 is satised. 5. µ(cd 1 + CD 2 ) = µ(cd 1 ) + µ(cd 2 ) σ, where σ is the number of common multiplicity constraints between CD 1 and CD 2. Clearly, the maximum value of σ is min(µ(cd 1 ), µ(cd 2 )). It follows that µ(cd 1 ) µ(cd 1 + CD 2 ) and 71

81 Chapter 5: Metrics µ(cd 2 ) µ(cd 1 + CD 2 ), thereby satisfying property 5. We assume that in case of conict between a multiplicity constraints on the same end of binary association occurs it is solved, but a constraint stays. See [30] for example of solving such a conict. 6. Let CD 1 and CD 2 be two class diagrams, such that {CON CD1 } {CON CD2 } =. Let CD 3 be class diagram such that {CON CD1 } {CON CD3 } =. From the denition of class diagram merging above, it follows that µ(cd 1 + CD 3 ) µ(cd 2 + CD 3 ). Therefore property 6 is satised. 7. Property 7 is not satised. Given any class diagram CD, no matter how the its elements will be ordered,n ot M value will remain the same. Put otherwise, removing an association feature or constraint from one class to another, does not change the number of multiplicity constraints in a diagram. 8. Property 8 is obviously satised. 9. From the analysis of property 5 above, we get that property 9 does not hold. Roughly speaking, the number of multiplicity constraints can not grow due to merging of two class diagrams. Metric 3 : Number of cycles formed by constrained associations and classes (NCY C). Theoretical Basis: Cycles formed by constrained associations and classes cause nite satisability problem in class diagram when a conict between multiplicity constraints present [61, 9]. Analytical evaluation of NCY C 1. Obviously there exist two distinct class diagrams CD 1 and CD 2 such that µ(cd 1 ) µ(cd 2 ), therefore property 1 is satised. 2. Property 2 is not satised. Given a class diagram CD, another class diagram CD can always be found by adding a single dummy class to CD, such that 72

Using dierent considerations about generalization for instance, creating dierent classes in a class diagram and imposing dierent amount of constraints on associations among these classes.

82 Chapter 5: Metrics µ(cd) = µ( CD). 3. There exist two distinct class diagrams CD 1 and CD 2 such that µ(cd 1 ) = µ(cd 2 ), therefore property 3 is satised. 4. Two equal systems can be modeled in two dierent ways by dierent designers. Using dierent considerations about generalization for instance, creating dierent classes in a class diagram and imposing dierent amount of constraints on associations among these classes. Therefore, property 4 is satised. 5. µ(cd 1 +CD 2 ) = µ(cd 1 )+µ(cd 2 ) σ, where σ is the number of common cycles between CD 1 and CD 2. Clearly, the maximum value of σ is min(µ(cd 1 ), µ(cd 2 )). It follows that µ(cd 1 ) µ(cd 1 + CD 2 ) and µ(cd 2 ) µ(cd 1 + CD 2 ), thereby satisfying property Property 6 is satised. Proof by example. Consider the class diagrams on gures 5.8,5.9 and Figure 5.8: Class Diagram: CD 1 Figure 5.9: Class Diagram: CD 2 Analyzing the NCY C metric, we get that µ(cd 1 ) = µ(cd 2 ) = 1. However, µ(cd 1 + CD 3 ) = 4 and µ(cd 2 + CD 3 ) = 2, see gure

83 Chapter 5: Metrics Figure 5.10: Class Diagram: CD 3 Figure 5.11: Merging of Class Diagram CD 1 and CD 3 7. Property 7 is satised. Ordering of classes and association among them eects the number of cycles. 8. Property 8 is obviously satised. Consistent renaming of classes does not change the number of cycles inside a class diagram. However, changing the names inside one diagram can change the metric value. 9. Property 9 is satised. Proof by example. Consider the example from analysis of property 6. It seems that in order for a metric suite to satisfy all Weyuker's properties, the metric should reect size and structure of the class diagram. Consider test coverage of software systems. Many characteristics, such as intensive unit testing are appropriate only to big and relatively complex systems, while not useful for simple sequential routine with any if statement in it Related Work on Model Metrics Kim and Boldyre [52] proposed a list of software metrics that can be applied to UML models. The metrics proposed in their work based on metamodel scheme snapshot given at [52]. Among the metrics proposed for model: 74

84 Chapter 5: Metrics NCM - Number of classes in a model NIM - Number of inheritance relations in the model NPM - Number of packages in a model NASM - Number of associations in a model Metrics proposed for class include: NASC - Number of associations linked to a class NSUPC - Number of superclasses for a class They also introduce several metrics for a use case level, but they are omitted in this work for simplicity and clearance. Mens and Lanza [69] suggest to express and dene metrics using a language independent metamodel based on graphs. In their work, a type graph is used to specify the object oriented meta-model. A small core of three generic metrics proposed: 1. NodeCount (NC) 2. EdgeCount (EC) 3. PathLength (PL) Mens and Lanza combine these generic metrics with object oriented metamodel to express typical object oriented metrics in terms of generic ones. For example, considering the metamodel on gure 5.12 we get: 1. NC(s, system, class) = number of classes in the system c 2. EC(c, class, inheritance, single) = number of children for class c 3. P L(c, class, inheritance, maximal) = depth of class c in the inheritance tree 75

85 Chapter 5: Metrics Figure 5.12: An object oriented metamodel The main advantages of their approach are that it can be very easily automated, as demonstrated in [69] and combined with each other to obtain higher order metrics including ratio and summation. Baroni et,al [14, 13] proposed a formal denition of object oriented metrics, using OCL and the UML metamodel. Their approach involves modifying the metamodel by creating the metrics as additional operations in the metamodel and expressing them as OCL conditions. A library called FLAME [2] which is a library of metric denitions formulated as OCL expressions over the UML 1.3 metamodel. McQuilan extends the work of Baroni, by decoupling the metrics denition from UML metamodel, and generalize the approach to any metamodel and any set of metrics [66]. As an example a formal denition of CK metrics using OCL over UML 2.0 metamodel is developed (see gure 5.13 ) and demonstrated in a prototype tool called DMML (Dening Metrics at the Meta Level). For example consider the NOC metric, as dened by McQuilan et,al. 76

86 Chapter 5: Metrics Figure 5.13: Extension to the UML 2.0 metamodel. This UML package diagram shows the denition of the CK metrics as a separate package, with a dependency on classes from the UML metamodel. Full list of metrics of the McQuilan's approach can be found at [66]. While the last approach is similar to Baroni's et al., it diers in a number of key areas. McQuilan et al., approach can be generalized at the metamodel level, for example, to apply to other UML diagrams. Their metric calculation procedure is highly extensible, allowing for dierent versions to be implemented and compared. 77

87 Chapter 5: Metrics Figure 5.14: NOC Metric Denition. This OCL code denes the NOC metrics from the CK metrics suite, and is part of a larger denition of the whole CK metric suite which we have implemented using dmml. 78

88 Chapter 6 Benchmarking 6.1 Metric-Driven Benchmark Creation Metrics as means for algorithm evaluation A benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it. Although, benchmarking used to be associated with assessing performance characteristics of computer hardware, for example, the oating point operation performance of a CPU, but there are circumstances when the technique is also applicable to software. Software benchmarks, for example, run against compilers or mostly known against database management systems. Another type of test program, namely test suites or validation suites, are intended to assess the correctness of software. Benchmarks provide a method of comparing the performance of various subsystems across dierent chip/system architectures. Metrics have properties which can characterize the algorithm's quantitative or qualitative performance such as run-time or problem scope. There is a constant call from the research community to more rigor in experimentation and empirical validation of research results [87, 80]. In this work we advocate for using metrics as 79

89 Chapter 6: Benchmarking a primary means for benchmark creation. But what metrics selection depends on? Two obvious candidates for aecting metrics choice are the problem, and the algorithm solving the problem. We do not see either of these two candidates as the exclusive reason for selecting the right metric for benchmarking. Consider the following examples : 1. Example: Graph traversal problem. In this example we consider a problem of traversing a connected graph G = (V, E). If we want to compare dierent techniques like BFS or DFS (see [38] for details) the metrics that would be useful to compare running time would probably be: Number of verticals in a graph. Number of edges in a graph. Average degree for a vertex in a graph. These three metrics characterize a graph by size in rst two metrics, and by structure in the last metric. Running the traversal algorithms like DFS and BFS on graphs with dierent metric values will give us a clue to compare their performance. However, if we consider the average weight for an edge metric, it has probably nothing to do with comparing graph traversal algorithms, since they are not inuenced by edge weight when traversal is run. Algorithms solving problems of shortest path and maximal ow rather, are dependent on this metrics as a weight and capacity respectively. 2. FiniteSat Algorithm. Creating a benchmark for FiniteSat would probably use the following metrics as a main course of interest: Comparing scalability of algorithms: Number of classes in a class diagram. Number of non trivial multiplicity constraints in a class diagram. Comparing complex problem instances of structure: 80

90 Chapter 6: Benchmarking Number of cycles in a class diagram. Ratio between number of classes to number of associations Dierent ratio metrics describing the structure of class diagram. Relevance of nite satisability in class diagram clearly depends on these metrics. However, the metric of Number of trivial multiplicity constraints has no impact on nite satisability of class diagram Brute-Force Benchmark Creation Without Abstraction Relevance of nite satisability was studied earlier in this work, in chapter 4. The problems created for the experiments were class diagrams. The class diagram were created by consequent running of a sequential program mainly consisting of for statements which generated text le, conforming the format of USE [37] and checked for presence of nite satisability with software presented in appendix A. The generation code was very inexible, since it depended on very concrete metrics used, and adding each and every metric to the generation process required a lot of work. The work needed to be done for an additional metric required not only the code for generating elements measured by metric itself, but also hard-coding of adaptation in code of all other metrics, ensuring there is no any collision both in metrics and syntax. The proses can be easily compared to parsing of a text without a grammar tools, but rather with direct developing and coding of the parser program. 6.2 Benchmark Creation via Model Checking Introduction to Alloy Jackson introduced Alloy [46] as a little language for describing structural properties. It can be used today as model nder, based on SAT solver. The Alloy Analyzer works by translating the model specied in Alloy language into a boolean expression, which is analyzed by SAT solvers embedded within the Alloy Analyzer. A user-specied scope on the model elements 81

91 Chapter 6: Benchmarking bounds the domain, making it possible to create nite boolean formulas for the evaluation by SAT solvers. The Alloy Analyzer oers two analysis methods. The rst is simulation and the second is assertion checking [4]. In this section we will see how to adopt Alloy for generating models along specied metric values. In short, using Alloy, a model can be built by using: Signatures. Signatures are used to model classes of objects, that is sets. Predicates. Predicates give us a way to nd model instances - to write a predicate and then make Alloy produce instances that satisfy this predicate. Asking Alloy to nd instances is similar to nding a model of a given schema. Facts. Facts are used to impose constraints on model. Facts are global, and apply always. Put otherwise, every instance of a model must satisfy the facts. Functions.Function is an expression that returns a result. Assertions. These are assumptions about the model that you can ask the analyzer to nd counter-examples of. After the model is built, its assertions can be veried, with an attempt to nd a counterexample. Alloy performs an exhaustive search in a limited space eliminating the possibility of missing an instance. The scope of instance search should be specied. Recently, Shah et,al. demonstrated a way to analyze UML class models [79] by transforming them into Alloy models with consequent transformation of Alloy produced instances back to UML object diagrams. Zito and Dingel [90] used Alloy to model package merging in UML. They used Alloy for formalizing and analyzing dierent versions of package merge. Bordbar and Anastasakis [16] used Alloy Analyzer as a tool for verication of newly introduced model called Abstract Description of Interaction for modeling Web Applications. 82

92 Chapter 6: Benchmarking This report transforms an abstract model of a web application into Alloy language, and analyzes it with Alloy Analyzer [3] to demonstrate unwanted behavior of the application. Simons and Fernandez [82] used Alloy to model-check visual design notations. In their work, they encoded the abstract syntax of Discovery method [81] into Alloy model, consequently checking it by running trivial predicate for exactly one model instance, thereby validating the consistency of the model Generating Models from Meta-Model Metrics with Alloy In this section the idea of using a model-checker for instantiating a meta-model is implemented. That is, benchmarking according to given metric values, where metrics choice is inuenced by algorithmic considerations, and examined with Weyuker's properties [89] relatively to algorithmic equivalence. As noted by McQuilan and Power [66], defying metrics is a meta-modeling activity. Hence, the rst step in generating (other words for nding) models is to dene a meta model in Alloy. Second, Alloy Analyzer should be asked to nd a model with a given values to specied metrics (thus specifying the scope of search). Actually, it is impossible to nd an instance for some model without specifying scope and thereby giving values to metrics. It is also possible to specify exactly how many dierent instances of a meta-model should be found, generating appropriate task sample for benchmarking. Consider the sub-set of class diagram meta-model on gure 6.1. The metamodel presents a class class, association, and constraint. Suppose now, we want to generate instances of this metamodel. Figure 6.2 shows the metamodel from gure 6.1 encoded in Alloy Analyzer, with additional constraints to enforce UML syntax 1 : The meta-model can be produced in two ways: The rst, straightforward way, like the one in gure 6.2 is hand-written from scratch in Alloy Analyzer tool. In this case we must write all parts of our meta-model with 1 For example: A class can't be a superclass of itself 83

Chapter 6: Benchmarking Figure 6.1: Class Diagram Meta-Model needed constraints as alloy facts, and then nd instances. The second, optimized way can be done with UML2Alloy [5] tool.

93 Chapter 6: Benchmarking Figure 6.1: Class Diagram Meta-Model needed constraints as alloy facts, and then nd instances. The second, optimized way can be done with UML2Alloy [5] tool. UML2Alloy supports transformation of UML class diagrams, accompanied with OCL constraints. On the one hand, only part of Class diagram and OCL languages supported by the automatic transformation with the tool. Therefore, for complicated model nding the rst way of writing meta-model may be adopted. On the other hand, it may be very eective to use class diagrams with OCL for known and checked models of interest due to OCL widespread. Figure 6.3 denes an empty predicate named NCM which satisfaction is trivial by itself. When applying run command on the NCM predicate we get the following result on gure 6.4: It should be noted here, that Alloy Analyzer provides several useful formats as output for the found instances. This is extremely useful in the automation context. The output can be in the following formats: 1. Visual (like on gure 6.4) 84

94 Chapter 6: Benchmarking Figure 6.2: A partial meta-model of UML in Alloy Analyzer. Instance nd- Figure 6.3: ing. 2. Tree view browsable (like on gure 6.5) 3. Textual: (a) XML (b) DOT format With the model dened above and the predicate NCM dened, we can generate a model along specied metric values. The following metrics supported: 1. Number of classes 2. Number of associations 3. Number of constraints. A constraint can be dened in a generic way, depending on what elements it is imposed on. For example multiplicity constraint on association. In the next section we classify the metrics into groups, and show how the metrics from the example above, as well as more complex metric types can be supported for model generation. 85

Chapter 6: Benchmarking Figure 6.4: Instance of the specied metamodel 6.2.2.1 A General Process for Metric Specication for Model Generation 1.

95 Chapter 6: Benchmarking Figure 6.4: Instance of the specied metamodel A General Process for Metric Specication for Model Generation 1. Specify the meta-model in Alloy Analyzer, according to the guidance presented earlier. 2. Select a metric suite. 3. Specify an empty Generate predicate. 4. Use run command, specifying number of instances to generate and exact or upper bounds for every element chosen earlier. 6.3 Automation: A Language For Metrics Values Denition In the rest of this chapter we set up a basis for a functional language for metrics values denition. The metrics are classied into patterns and a general form for each pattern is developed. Consider the following example for using a language for creating a benchmark for FiniteSat: The language consists of: Number-Of-Classes(number) command. Where number is the metric value 86

96 Chapter 6: Benchmarking Figure 6.5: Tree view of Instance of the specied metamodel Number-Of-Associations(number) command. Number-Of-Non-Trivial-Multiplicity-Constraints(number) command. This way, to perform again the study for nite satisability occurrence and relevance we could use the above commands which reect metrics values for creating the problem sample. It is important to note why the Alloy language is needed at rst place. Why just not to use OCL instead? There is another reason more than just the fact that Alloy Analyzer is a model checker, and OCL is just a specication language with several supporting applications. As shown in this chapter, there is a need to have variables on models elements which is impossible in OCL, that does not have any metadata. The only variable in OCL is available for instances, which is not sucient Metrics classication Since we believe that Alloy is needed, and we know that it is possible to generate models along provided metrics values it would be useful to have some framework for model generation using metrics with Alloy. In order to do so, we attempt to classify the metrics into exclusive groups, with a specic guidance of generating models through the metrics in a group. 87

97 Chapter 6: Benchmarking Pattern 1. Regular Size Metrics. Regular size metrics, are metrics where number of instances for a specic class is explicitly specied. Examples for such metrics are well known Number of Classes in a Model when measuring a class diagram introduced by Kim and Boldyre [52], as well as introduced earlier by Chidamber and Kemerer [27] Weighted Methods per Class when measuring a single class metric. These kind of metrics have a common pattern that looks like: NUMBER OF? OBJECTS = X Where the symbol? stands for the name of the class. The X symbol stands for the numeric value of the metric. For regular size metrics, we must encode the needed meta-model into Alloy Analyzer and specify the exact 2 number of meta-model elements to generate as well as the number of instances to generate in the overall process. Figure 6.6: Example of UML meta-model with two elements : X and Y Figure 6.7: Example of Alloy-written meta-model with two elements : X and Y Consider the meta-model in gure 6.6. The same metamodel written with Alloy is 2 or upper bound 88

Chapter 6: Benchmarking specied on gure 6.7. The meta-model contains two class elements X and Y, such that each instance of Y associated with exactly one instance of X via association f ield.

98 Chapter 6: Benchmarking specied on gure 6.7. The meta-model contains two class elements X and Y, such that each instance of Y associated with exactly one instance of X via association f ield. By specifying the Generate predicate and applying the run command, we ask Alloy Analyzer to generate one instance of the metal model, such that there is one instance of X and two instances of Y. Figure 6.8: A Generated instance of the meta-model The result of such generation is shown in gure 6.8. Following the general process of generating models by metrics, we summarize the process of generation instances along regular size metrics with the following steps: 1. Specify the meta-model in Alloy Analyzer. 2. Select the Metric suite. That is, select the classes which number of instances you want to specify in the generated model. 3. Specify an empty Generate predicate. 89

99 Chapter 6: Benchmarking 4. Use run command, specifying number of instances to generate and exact or upper bounds for every element chosen earlier Pattern 2. Ratio Metrics. Ratio metrics specify some fraction or factor, often in form of percentage for a sub-group inside a whole, usually bigger group. For example, consider the ratio metric NASM NCM inroduced in [59], and method hiding factor proposed by Abreu and Melo in [1]. These metrics sets the ratio between the number of classes to number of associations in a model, and between the number of private access methods to number of public methods within a class, accordingly. These kind of metrics have a common pattern that looks like: RATIO OF A B = X Where A and B symbols are regular size metrics, and the symbol X stands for the ratio metric value. Consider the meta-model in gure 6.9. The same metamodel written with Alloy is specied on gure Suppose we want to set value for a metric X Y = 1 2, that is, the number of X instances is double then number of Y instances. In order to set value to this metric, we use the fact keyword of Alloy. Figure 6.9: Example of UML meta-model with two elements : X and Y We now summarize the framework of generating models along these kind of metrics in the following steps: 1. Specify the meta-model in Alloy Analyzer. 2. Create appropriate fact statement, as shown on Figure 6.10 to specify the ratio value. 90

100 Chapter 6: Benchmarking Figure 6.10: Example of meta-model with two elements : X, Y with 1:2 ratio between them 3. Specify an empty Generate predicate. 4. Use run command, specifying number of instances to generate and exact or upper bounds for every element chosen earlier. The concept of ratio metrics is orthogonal to the size metrics, and can be applied together generating more accurate models. Note, that when using Alloy, the values for regular size metrics must be also provided for generating the model. The generation of models along ratio metrics is a little complicated. You can't just put one metric value here and another metric value there, assuming everything will be generated properly. A conict between size metrics to ratio metric can occur, if the size metric values are explcit and no model with ratio specied can not be found. For example, asking to generate a model with exactly 10 classes, exactly 3 associations and NASM NCM = 1 2 is impossible Pattern 3. Quantied Metamodel Association Restriction. A good way to understand this type of metrics is by example. Consider the NOC metric proposed by Chidamber and Kemerer [27]. The N OC metric species number of children for a class. That is, we want to quantify the size of the subclass relationship for a class, like on gure 6.1. These kind of metrics have a common pattern that looks like: 91

101 Chapter 6: Benchmarking NUMBER OF A FOR A B = X Where A is the type of the association we restrict, B is the context class, which association is being restricted, and X is the metric value. Taking the example above we get NUMBER OF CHILDREN FOR A CLASS = X. We proceed with the example from previous section - writing the metamodel in gure 6.1 with Alloy is given in gure Figure 6.11: Example of partial UML meta-model in Alloy Applying run command as shown, we get the generated model as shown on gure The moel generated has exactly 10 classes, where every class has exactly one sub-class. That is, we explicitly specied the NOC value for all classes in a model. We summarize rpoducing model along metrics of Pattern 3 in the following steps: 1. Specify the meta-model in Alloy Analyzer. 2. Specify a fact statement, limiting the size of the elements inside an association. 92

102 Chapter 6: Benchmarking Figure 6.12: A Generated model where each class has exactly one sub class. 3. Specify an empty Generate predicate. 4. Use run command, specifying number of instances to generate and exact or upper bounds for every element chosen earlier. The metrics which fall into the group of Pattern-3 metrics, are obviously very useful, they are well known and very cited by researchers recently. However, as it is presented above, it may not be useful since we probably do not want always to specify the metric for all the elements marked A above. This observation leads to a renment to this group of metrics. A Renment for Pattern 3. Figure 6.13 UML and gure 6.14 in Alloy show a metamodel with Z as a subset of X. Using this technique we can impose size constraints on a set inside an association. For example, if we want to generate a model with known number of exactly X instances, each of which associated with exactly three elements of Y, we dene a subset of X (Z on gure 6.14) and generate explicitly the wanted number of Z elements. To be more precise, and obtain exact number of Z-elements (that is, elements with three associated Y elements), we can state by fact that each of the X \ Z-elements has a number of associated Y elements, that is dierent from three. Generating instances along the above meta-model we get the following results. Figure 6.15 shows the rst instance with two Z-elements. 93

103 Chapter 6: Benchmarking Figure 6.13: Example of meta-model with three elements : X, Y and Z Figure 6.14: Example of meta-model with three elements : X, Y and Z However, as noted above, it is not only one instance of the meta-model that can be generated. Alloy-Analyzer searches the entire specied search space, and if needed, another instance can be obtained. For example, gure 6.16 demonstrates another instances of the same meta-model as gure 6.15 and same metric values. We now summarize the framework of generating models along these kind of metrics in the following steps: 1. Specify the meta-model in Alloy Analyzer. 2. Specify a subset element for the desired element to impose a limit on, as shown at gure Specify a fact statement, limiting the size of the elements inside an association. 4. Optionally, for even greater renment, specify explicitly fact{all element : X-Z # 94

Chapter 6: Benchmarking Figure 6.15: A Generated instance of the meta-model element.eld!= 3}, the excluding group, obtaining exact number of Z-elements. 5. Specify an empty Generate predicate. 6. Use run command, specifying number of instances to generate and exact or upper bounds for every element chosen earlier.

104 Chapter 6: Benchmarking Figure 6.15: A Generated instance of the meta-model element.eld!= 3}, the excluding group, obtaining exact number of Z-elements. 5. Specify an empty Generate predicate. 6. Use run command, specifying number of instances to generate and exact or upper bounds for every element chosen earlier. Pattern 3.Renment 2. Observe, that the rst renment only promises that the wanted number of object will be generated. However, this is only an inmum for that number. It is possible to impose a strict limit on the number of earlier referred Z-objects by constraining the excluding group: fact{all element : X-Z # element.eld!= 3} Pattern 4. General Structure Metrics. General structure metrics are probably very complicated type of metrics. Such metrics are not often easily extracted, and may need some non-trivial algorithm in order to extract them 95

105 Chapter 6: Benchmarking Figure 6.16: A Generated instance of the meta-model from a given model. Generating models that have some specic structure which is dened along structure metrics is not a trivial task. Examples for general structure metrics are: Number of Cycles in a Model (NCYC) [59]. Path Length between Classes. Average,Longest,Lowest Depth of Inheritance Tree (DIT) [27] Note, that in light of the method presented in this report, general structure metrics alone are not enough to generate a model. Structure metrics should be interleaved with other metric types in general and with size metrics in particular. We now summarize the framework of generating models along these kind of metrics in the following steps: 1. Specify the meta-model in Alloy Analyzer. 2. Create appropriate fact statements, which characterize the metric values. 96

106 Chapter 6: Benchmarking 3. Specify an empty Generate predicate. 4. Use run command, specifying number of instances to generate and exact or upper bounds for every element chosen earlier in the meta-model specication step. 97

107 Chapter 7 Conclusions and Future Work In this thesis, we developed a method for automatic metric driven benchmark creation for model correctness algorithms. We also extended methods for detecting and and addressing problems of nite satisability in UML class diagrams in a way that is simple and ecient and that provides the foundations for expanding UML CASE tools to address these nite satisability problems. Furthermore, this thesis made a big contribution on showing the developed methods' applicability and scalability, by developing a platform that supports our methods. A basis for further benchmarking is set with a series of experiments. Finally, a basis for function metric-driven language, for benchmarking specication and creation is set up. This research can be expanded in a straightforward way to the following directions: 1. Developing a fully functional language for metrics description, based on metrics patterns. As a result a proprietary language for model level metrics might be developed. 2. Developing an engine, for automatically translating (compiling) the language into a model checker, such as Alloy. 3. Developing metrics which describe hierarchical data, with possibly unbounded size and recursive denition. For example: graphs, cycles, lists etc,. 4. During this thesis, we noticed that it is very dicult to describe complex hierarchical 98

108 Chapter 7: Conclusions and Future Work data structures with Alloy. In future it is important to examine other model checkers, which can have more expressive language or build specic target language based on Alloy. Further research directions include: 1. Developing metrics for other types of models. For example, metrics for sequence diagrams. 2. Developing metrics for interaction and composition of dierent model types. 3. Explore the connection between metrics and complexity. Study models complexity, motivated by metrics. 99

109 Appendix A Reasoning Infrastructure Implementation A.1 Implementation of FiniteSat Algorithm Today, there is a quiet solid base of theoretical knowledge about reasoning tasks on UML class diagrams studied in [63, 61, 62, 56, 50, 9, 42, 22, 23, 24] any many others. However, only a very small part of this research was tested on practice and implemented. Most of the implemented methods were partial on prototype level or used dierent and tricky programming techniques then those studied in theory [21, 61, 48]. In this work we present an implementation of the methods described in previous chapters. The methods implemented rely heavily on USE [37] software and make use of linear programming methods, particularly on the well known Simplex algorithm (for further reading see [67, 29]). The main goal of this implementation was to create an infrastructure for reasoning algorithms. Put otherwise, a piece of software that is carefully designed and implemented for its current needs and future extensions. To achieve this goal several steps were taken. Above steps include : using the Java language and the Eclipse platform [34], using ANTLR [75] compiler-compiler to allow future features and exible OO design. The choices above were 100

110 Appendix A: Reasoning Infrastructure Implementation made because their popularity and eective implementation experience. The secondary goal was to establish a basis for future benchmarking of such reasoning algorithms and check the scalability of our methods. A.2 Implementation in Detail In this section we present our implementation in detail. The tool we implemented supports the following constraints imposed on class diagrams that cause nite satisability problem: 1. Multiplicity constraints 2. Class hierarchy constraints interaction with associations 3. Constrained generalization sets 4. Qualied associations Following the ideas of [61], our tool makes 3 distinct steps: 1. Input Processing (Complete Model Creation) 2. Inequalities System Creation 3. Solving the Inequalities System We will now discuss in detail each of the 3 steps mentioned above. 1. Input Processing (Complete Model Creation). Most (and maybe all) of the CASE tools available today to the software analyst or designer, save UML models in a symbolic representation via XMI format. XMI is a special form of XML, where all data about the UML model, as well as its visual representation on the screen is saved. Unfortunately, there is a main problem with this matter. The problem is that many dierent CASE tools (take for example: Poseidon, ArgoUML, Rational Rose or Visual Paradigm) use dierent proprietary formats and conventions of XMI. So, a decision 101

111 Appendix A: Reasoning Infrastructure Implementation Figure A.1: Reasoning Tool Internal Structure has to be made. To overcome all this variety of formats, we've chosen to use the USE [37] format to store the symbolic representation of UML models. We advocate for this choice with a partially ready preprocessing implemented by [37], very comfortable and readable textual notation in addition to visualization options. We summarize the extended grammar of USE for our tool by the following grammar in EBNF [39] style: UMLModel Name ModelBody Name IDENT ModelBody (Enumeration)* (Class)* (Association)* (GS)* Enumeration "enum" Name "{" Name+ "}" Class ("abstract")? "class" Name "attributes" (Attribute)+ Association ("association" "composition" "aggregation") Name "between" AssociationEnd AssociationEnd AssociatioEnd Name Multiplicity "role" Name Qualier Qualier "qualier" "attributes" (Attribute)+ Attribute IDENT ":" Type GS "gs" GsName GsType Super Name+ GsName ("name" IDENT)? 102

112 Appendix A: Reasoning Infrastructure Implementation GsType ("type" ("overlapping" "complete" "disjoint" "incomplete" "overlappingcomplete" "overlappingincomplete" "disjointcomplete" "disjointincomplete"))? Super ("super" IDENT) SubList ("subclasses" Name+) IDENT (a..z A..Z) (a..z A..Z 0..9)* Multiplicity "[" DIGIT * (.. DIGIT *)? "]" In this way a UML model can be specied manually or generated automatically using these simple, intuitive and well dened grammar. Appropriate changes are made to org.tzi.use.uml.mm package of [37]. Once the input is read, a model representation in form of Java objects, is created. The model forms naturally into objectal structure following the UML meta model rules [72]. The main addition to the initial grammar is the constraints mentioned above, that can cause nite satisability problem if interact or conict. 2. Inequalities System Creation The next step in our tool is the creation of linear inequalities. The inequalities are created to reect the structure of the class diagram and all the constraints imposed on it. The inequalities are not created all together, but separately by their type: (a) Inequalities to reect classes in the model (b) Inequalities to reect class hierarchy (c) Inequalities to reect associations with multiplicity constraints (d) Inequalities to reect qualied associations (e) Inequalities to reect generalization set's constraints The nal output is an instance for optimization problem that can be solved by a linear programming tool via simplex method. 103

113 Appendix A: Reasoning Infrastructure Implementation 3. Solving the Inequalities System To solve the linear programming problem we use an open source Java tool for operations research [73]. The inequalities system constructed above in the form of linear programming problem instance is checked for the presence of solution subject to constraints. The later constraints are exactly the linear inequalities system created. As mentioned above, we use the Simplex algorithm introduced by Danzig [29] in order to solve linear programming problem. Although the running time of this algorithm is known to be exponential in contrast to Karmarkar's algorithm [51] with linear running time from theoretical point of view, it was shown to have remarkable performance and ease of implementation on practice. A.3 Structural Architecture and Conclusions The implementation we presented in this chapter can serve as an infrastructure for reasoning software due to its exibility. For the best of our knowledge, it is the rst scalable implementation handling the necessary amount of constraint and properties of class diagrams. We summarize the structure of the implemented tool below. On gure A.2 the structural architecture of our tool is presented. The ow shown on gure A.2 demonstrates in zoom Figure A.2: The structural architecture of reasoning tool the steps withing our tool. The symbolic representation is scanned and AST model created. 104

Appendix A: Reasoning Infrastructure Implementation Later, it is transformed into a java objectal form to represent UML meta-model structure via Model-to-Model transformation part.

114 Appendix A: Reasoning Infrastructure Implementation Later, it is transformed into a java objectal form to represent UML meta-model structure via Model-to-Model transformation part. The inequalities system is then constructed and a search for a solution started. If the solution exists we report nite satisability, otherwise we report the model is not nitely satisable. On gure A.3 a class diagram representing Figure A.3: Class diagram of our tool's static structure the static structure of the implemented tool is presented. We present it in this work in order to stimulate further extension of the tool, and demonstrate the exibility of its design. For example, in order to add another constraint of class diagram to extend the FiniteSat algorithm, one needs only to implement another specialization of InequalitiesCreator class and invoke it. There is no any other change needed. Another point is that several tests and experiments can be easily made due to the current structure. For example, it is possible to check whether the tool's performance considering dierent sets of constraints, by simply not invoking appropriate inequalities creator. 105

Efficient Recognition of Finite Satisfiability in UML Class Diagrams: Strengthening by Propagation of Disjoint Constraints

Efficient Recognition of Finite Satisfiability in UML Class Diagrams: Strengthening by Propagation of Disjoint Constraints Azzam Maraee Ben-Gurion University of the Negev Beer-Sheva 84105, ISRAEL mari@cs.bgu.ac.il