INFORMS 4th Conference on Information Systems and Technology. Generalizations as Data and Behavior Abstractions

INFORMS 4th Conference on Information Systems and Technology Generalizations as Data and Behavior Abstractions,..- Dale L. Lunsford The University of Southern Mississippi, College of Business Administration, Department ofmanagement and MIS, Box 5077, Hattiesburg, MS 39406 (601) 266-4920, lunsford@cba.usm.edu Waleed A. Muhanna The Ohio State University, College of Business, Department of Accounting and MIS, 420 Fisher Hall, 2100 Neil Avenue, Columbus, OH 43210 (614) 292-3808, wmuhanna@cob.ohio-state.edu April 1, 1999 194

INFORMS 4th Conference on Information Systems and Technology GENERALIZATIONS AS DATA AND BEHAVIOR ABSTRACTIONS Abstract Generalization, a key infonnation system modeling abstraction, defines a subset relationship between elements of two or more classes. The relationship is nonnally implemented using a mechanism called inheritance whereby a sub-class acquires (shares) properties from its immediate super-class, and, by induction, from all of its antecedent super-classes. This paper discusses two dimensions of the generalization abstraction and identifies three specific subtypes of generalizations. We examine the different treatment of generalizations in object-oriented analysis methods and structured analysis methods and hypothesize that those differences impact the ability of the systems analyst to fully model an application domain. The results of one study in which we employed generalizations as a measure of the semantic richness of an application domain, are summarized. Introduction During infonnation systems development, systems analysts produce models of an organization's business processes (behavior) and the data captured, stored, and processed by the organization (structure). Several conceptual modeling approaches have been proposed to manage the complexity of this process. Conceptual modeling generally involves the use of three distinct types of abstractions to organize knowledge in the application domain: classification, aggregation, and generalization [1,3,7,14,15]. A classification abstraction is used for defining one entity instance or object as a member ofa class ofreal-world objects characterized by common properties. An aggregation abstraction defmes a new class from a set of (other) classes that represent its component parts. A generalization abstraction defmes a subset relationship between elements of two or more classes. A class of objects is partitioned into two or more subsets where objects in each subset possess some characteristics or behaviors that differ from objects in the other subsets. The super-class depicts the characteristics and behaviors shared by allobjects. Each sub-class represents the characteristics and behaviors unique to a particular subset of the objects. Generalizations provide a mechanism called inheritance whereby a subclass acquires (shares) properties from its immediate super-class, and, by induction, from all of its antecedent super-classes. Both object-oriented and structured analysis methods provide constructs in support of these three abstraction types [1,3,5,14-16]. As described later, however, while structured analysis methods and object-oriented analysis methods treat classification and aggregations in similar ways, they differ in their treatment of generalizations. Because of differences in how the structured analysis methods and the object-oriented methods provide for modeling generalizations, there can be significant differences in how or whether the systems analyst represents generalizations. Despite the importance of the generalization abstraction, there has been surprisingly little empirical research examining the implication of the use of constructs supporting it in system analysis models. One study that did examine the impact of generalizations was Hardgrave and 195

INFORMS 41h Conference on Information Systems and Techn%gy 3. Generalizations in Structured and Object-oriented Analysis Structured Analysis Methods Structured analysis methods use the Entity Relationship Diagram to depic~ tqe structure features of an application domain. Chen's original specification of the Entity Relationship Diagram pennitted the specification ofrelationships; however, Chen did not specifically address the types of relationships that might arise [2]. Smith and Smith [16] introduced the.notion of aggregations and generalizations as concepts that may be depicted on the Entity Relationship Diagram. Generalizations have often been depicted using the relationship diamond and a special label like "is a" to signify the generalization [8]. Some Entity Relationship Diagram notations (e.g., [5], [13], and [17]) use a special symbol to represent generalizations. If a sub-class has attributes unique to that sub-class, the analyst depicts the sub-class on the Entity Relationship Diagram; as a result, there is a mechanism to represent structure-based generalizations. Structured analysis methods generally employ a structure modeling process that eliminates classes that do not have any attributes [17]. The Entity Relationship Diagram also does not contain any construct for depicting behaviors associated with classes; as a result, if a sub-class exists as a result of a behavior-based generalization, the sub-class does not appear on the Entity Relationship Diagram Structured analysis employs several different models to depict behaviors. The basic set of models used includes Decomposition Diagrams, Data Flow Diagrams, and logic models including Decision Tables, Decision Trees, and Structured English (e.g., [5], [9], and [17]). The Decomposition Diagram depicts the decomposition of an application domain into its major business functions. The Data Flow Diagram depicts the processes necessary to implement the business functions, the data flows between agents outside the system and the system, data flows among the processes that comprise the system, and the data stores that retain data of importance. The data stores map to classes on the Entity Relationship Diagram. A process on a high-level Data Flow Diagram may be decomposed into a more detailed Data Flow Diagram that depicts the internal workings of the process. Any process that is not decomposed into a more detailed Data Flow Diagram is described using a logic model. Structured analysis methods do not provide any formal mechanism to detect and represent a behavior-based generalization. As a result, the burden of detecting the behavioral component of a generalization and structuring the Data Flow Diagrams and logic models to take advantage of this behavior-based generalization falls on the systems analyst. Unfortunately it is easy for a systems analyst to create Data Flow Diagrams that do not concisely depict differences in the treatment of the various sub-classes. As a simple example, say a business processes orders from two types of customers, business and individuals. The systems analyst may create a process on a high-ievel Data Flow Diagram called "Process Business Customer Order" and a second process called "Process Individual Customer Order". In this case, the systems analyst is likely to duplicate a number of processing actions on lower-ievel Data Flow Diagrams. The systems analyst may even end up producing a number of logic models that are very similar, with one version for business customers and another version for individual customers. If the systems analyst realizes early on that there are only minor differences between the processing of orders 198

INFORMS 4th Conference on Information Systems and Technology from businesses and individuals, the systems analyst may label the high-level process, "Process Order," and then depict the behavioral differences between business and individual customers on lower-level Data Flow Diagrams. The models produced in this case provide a more concise representation of the application domain since behavioral elements are not unnecessarily duplicated. Unfortunately, the likelihood of the systems analyst recognizing the presence ofa behavior-based generalization is dependent on a number of factors including the experience of the systems analyst with behavior modeling and the application domain. Also, when and how the systems analyst obtains information about behaviors during discovery may influence the likelihood that the systems analyst will recognize a behavior-based generalization. Object-oriented Analysis Methods Object-oriented analysis methods use the Object Diagram (also called an Object Model, Class Diagram, or Class Model) to depict both structure and behavior aspects of an application domain Common Object Diagram notations include modeling constructs to represent classes of objects, attributes associated with classes, relationships among classes, and the cardinality of each relationship. In addition, the systems analyst assigns behavioral elements of an application domain to the classes on the Object Diagram [1,3,8,15]. As a result, the Object Diagram provides a complete model of the structure elements and identifies the behavior elements that comprise the behavior of an application domain. The systems analyst then prepares more detailed models to depict the behavior of an application domain. Depending on the objectoriented method employed, for each service the analyst may prepare a service chart [3], an event trace diagram [15], a scenario [14], an interaction diagram [1,7], or some other model. With the Object Diagram, object-oriented analysis methods provide a powerful mechanism for representing the three types of relationships that may arise in an application domain. When constructing the Object Diagram using an object-oriented analysis method, the analyst associates the structure elements of a generalization with the appropriate sub-classes. The Object Diagram also provides a record of the behavior of an application domain. The analyst associates services performed by all sub-classes with the super-class on the Object Diagram. The analyst associates services related to an individual sub-class with that sub-class only on the Object Diagram. As a result, the Object Diagram clearly distinguishes between services common to all sub-classes and services specific to a sub-class. Also, because the Object Diagram depicts services associated with classes, a sub-class that exists as a result of a behavioral generalization remains on the Object Diagram even though there are no attributes specifically associated with the sub-class. Comparing Structured Analysis and Object-Oriented Analysis Methods Both structured analysis and object-oriented analysis methods depict the structure-based elements of a generalization on their respective structure models. In both cases, relationships and attributes associated with all sub-classes appear as part of the super-class' specification. Relationships and attributes associated with a sub-class appear as part of the sub-class, specification. As a result, structure models associated with structured analysis and objectoriented analysis methods are informationally equivalent [10,12] in terms of their representation of structure-based generalizations. Both structured analysis and object-oriented analysis methods depict behavior-based generalizations on the appropriate behavior models. In addition, 199

INFORMS 41h Conference on Information Systems and Technology Table I summarizes our findings. In tenns of general recall, semantic richness emerged as a significant factor. The coefficient on the semantic richness tenn is significantly different from zero (p-value = 0.034) with a negative sign. This indicates that subjects given a semantically simple case had lower overall recall of application domain elements than did subjects given a semantically rich case. The semantic richness of the application domain also influenced recall of structure elements. As with overall recall, the coefficient on the semantic richness tenn is significantly different from zero (p-value = 0.040) with a negative sigri. Jh~~ indicates that subjects given a semantically rich case had better recall of structure elements. Finally, both the method term and the semantic richness term are weakly significant for the recall of behavior elements. The coefficient on the semantic richness tenn is weakly significant (p-value = 0.058) with a negative sign indicating that subjects given a semantically rich case had better recall. The coefficient on the method term is weakly significant (p-value = 0.068) with a positive sign indicating that subjects given a set of object-oriented analysis method models had better recall of behavioral elements than did subjects given the structured analysis models. 202

INFORMS 4th Conference on Information Systems and Technology Recall ~ Structure verall Behavior ~ 0.054Oef 0.032 0.070 *a < 0.05, **a < 0.10. Summary TABLEI ~ 0.154 0.470 0.068** of Results Coef i, -0.077-0.089-0.069 I0.035 0.042 tdev Semantic Richness 0.036 ~ -2.17-2.11-1.94 ~ 0.034* 0.040* 0.058** The results of this study indicate two things. First, the use of generalizations to model an application domain positively influences the recall of elements in that domain. Drawing on Houston's discussion of concept formation [6], we believe that generalizations provide a pattern that the subjects use to organize information in memory by forming concepts that define what characteristics a sub-class possesses in this application domain. In this case, the concept enables subjects to correctly store and retrieve knowledge about similar structure elements in the application domain. For the recall of behavioral elements, generalizations provide patterns of behaviors that the subjects may use to form concepts regarding the types of processing activities that apply to the application domain. Once again, this enables the subjects to store and retrieve knowledge about the behaviors that apply in the application domain. Subjects given models for the application domain that did not incorporate generalizations had to capture, organize, store, and retrieve knowledge about each element without being able to exploit patterns. A second finding from this study is that the subjects given models based on an object-oriented analysis method performed better on the recall of behavioral elements. There are two possible, related effects that may account for this. First, the behavioral elements are repeated on the Object Diagram and in the behavioral models used with object-oriented analysis methods and the repetition may have aided recall. Second, the Object Diagram depicts the specific behaviors associated with each sub-class as part of the specification for the sub-class and the shared behaviors as part of the specification for the super-class. This may help to solidify the behavioral differences between the sub-classes. As a result, the subjects given object-oriented analysis models may form a clearer concept of the behavior of each sub-class resulting in better recall. Conclusion Previous research related to generalizations in the infonnation system literature has focused almost exclusively on structure-based generalizations. Behavior-based generalizations are also important since they provide both patterns of behavior and clearly distinguish different kinds of behaviors that an infonnation system must ultimately support. In this paper, we have examined the treatment of generalizations by structured analysis and object-oriented analysis methods. Both structured analysis and object-oriented analysis methods support structure-based generalizations; however, object-oriented analysis methods provide more tangible representations ofbehavior-based generalizations. This is important since both types of generalizations may arise in application domains. We conducted an experiment in which we 203 ~

INFORMS 4'11 Conference on Information Systems and Technology manipulated the semantic richness of an application domain by including generalizations in the semantically rich application domain. This enabled us to assess the impact of generalizations on the recall of application domain elements. From this, we concluded that the presence of generalizations positively influence the recall ofboth structure elements and behavior elements. In addition, we found that recall of behavioral elements was greater with object-oriented models. This indicates that the object-oriented models provide a more powerful depiction of the behavior of an application domain. This effect may be a result of the repetition ofseryices on the Object Diagram and the behavior models, or because the Object Diagram provides a more salient representation of behavioral differences between sub-classes. Since the generalization is an important relationship type and there are few studies that have explored generalizations in information systems, there is a need for additional research in this area. We used a recall test to assess the ability of subjects to acquire and retain knowledge about an application domain by studying information system models. Although the recall test provides a good indication of how much the subject has acquired and can report after one session studying models, the recall of application domain elements was low in our experiment. Other testing methods, including recognition tests, may be more effective in raising the performance of subjects while still providing a good measure of the communication effectiveness of models. Also, we used an application domain that incorporated both structure-based and behavior-based generalizations. It would be informative to examine the role of structure-based generalizations and behavior-based generalizations in isolation to see if they have the same impact on memory for application domain elements. This would also increase our ability to make broader statements about the role of generalizations in systems analysis. BIBLIOGRAPHY [1] Booch, G., Object-oriented Analysis and Design with Applications, 2nd edition, Benjamin/Cummings: Redwood City, CA, 1994. [2] Chen, p., The Entity Relationship Model: Toward a Unified View of Data, ACM Transactions on Database Systems, 1(1),9-36,1976. [3] Coad, P. and Yourdon, E., Object-Oriented Analysis, 2nd edition, Prentice Hall: Englewood Cliffs, NJ, 1991. [4] Hardgrave, B. C., and Dalai, N. P., Comparing Object-Oriented and Extended Entity Relationship Data Models, Journal of Database Management, 6(3), 15-21, 1995. [5] Hoffer, I. A., George, I. F., and Va1acich, I. S., Modern Systems Analysis and Design, Benjamin/Cummings: Reading, MA, 1996. [6] Houston, J. p., Fundamentals of Learning and Memory, Harcourt Brace Jovanovich, Inc Orlando, FL, 1991. 204

fnforms 4th Conference on Information Systems and Technology [7] Jacobson, I., Christerson, M., Jonsson, P., and Overgaard, G., Object-oriented Software Engineering: A Use Case Driven Approach, revised, Addison- Wesley: Harlow, England, 1992 [8] Kemper, A. and Moerkotte, G., Object-OrientedDatabase Management: Applications in Engineering and Computer Science, Prentice Hall: Englewood Cliffs, NJ, 1994. [9] Kendall, K. E. and Kendall, J. E., Systems Analysis and Design, 3rd edition, Prentice-Ha11 Publishing Company: Englewood Cliffs, NJ, 1995. [10] Larkin, J. H. and Simon, H. A., Why a Diagram is (Sometimes) Worth Ten Thousand Words, Cognitive Science, 11, 65-99, 1987. [11] Lunsford, D. L., An Experimental Comparison of Structured Analysis and Object-oriented ystesms Analysis Methodologies, Doctoral Dissertation, UMI Company: Ann Arbor, MI, 1996 [12] Lunsford, D. L. and Muhanna, W. A., A Theory-based Research Framework for Studying Systems Analysis Methods, Proceedings of the Third INFORMS Conference on Information Systems and Technology, Montreal, 131-145,1998. [13] McFadden, F. R. and Hoffer, J. A., Database Management, 3rd edition, The Benjamin/Cumrnings Publishing Company: Redwood City, CA, 1991. [14] Nonnan, R. J., Object-Oriented Systems Analysis, Prentice-Hall, Inc.: Upper Saddle River, NJ, 1996. [15] Rumbaugh, J., Blaha, M., Premerlani, W., Eddy, F., and Lorensen, W., Object-Oriented Modeling and Design, Prentice-Hall Publishing Company: Englewood Cliffs, NJ, 1991. [16] Smith, I. M. and Smith, D. C. P., Database Abstractions: Aggregation and Generalization, ACM Transactions on Database Systems, 2(2), 105-133, 1977. [17] Whitten, J. L. and Bentley, L. D., Systems Analysis & Design Methods, 4th edition, Irwin: Homewood, II., 1998. [18] Zachman, J. A., A Framework for Information System Architecture, IBM Systems Journal, 26(3),276-292, 1987. J 205