On the efficiency of domain-based COTS product selection method

Information and Software Technology 44 (2002) 703 715 www.elsevier.com/locate/infsof On the efficiency of domain-based COTS product selection method Karl R.P.H. Leung a, Hareton K.N. Leung b, * a Compuware Software Testing Laboratory, Department of Computing and Mathematics, Hong Kong Institute of Vocational Education, Hong Kong, People s Republic of China b Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, People s Republic of China Received 8 January 2001; revised 28 June 2002; accepted 1 July 2002 Abstract Use of commercial-off-the-shelf (COTS) products is becoming a popular software development method. Current methods of selecting COTS products involve using the intuition of software developers or use a direct assessment (DA) of the products. The former approach is subjective, whereas the latter approach is expensive. This high cost is because the efficiency of the DA approach is inversely proportional to the product of the number of modules in the system to be developed and the total number of modules in the candidate COTS products. With the increase in the number of available COTS components, the time spent on choosing the appropriate COTS products could easily offset the advantages of using them. Neither of the selection approaches mentioned leads to quality results. Furthermore, inappropriately chosen COTS components may cause much greater damage to a development project than faults in software units that are developed in-house. A domain model is a generic model of the domain of an application system. It captures all of the features and characteristics of the domain. We have developed a new indirect selection approach, called the domain-based COTS product selection method, which makes use of domain models. We have successfully applied our selection method to the development of an on-line margin trading application. In this paper, we first analyze the efficiency of the domain-based COTS product selection method qualitatively. Then, we study the efficiency of the method by means of a formal approach and also through the case study of the on-line margin trading application. All of these results show that the domain-based COTS product selection method is more efficient than the DA methods. q 2002 Published by Elsevier Science B.V. Keywords: Commercial-off-the-shelf; Domain-based method; Selection of COTS; Selection efficiency 1. Introduction The use of commercial-off-the-shelf (COTS) products as units of large systems is becoming popular. Shrinking budgets, rapid advancement of COTS development and the increasing demands of large systems are all driving the adoption of the COTS development approach. A COTS component is defined as an independent unit that provides a set of related functions and is suitable for reuse [5]. COTS components are different from software components in terms of their completeness [2]. We call those systems that adopt COTS development as much as possible COTS-based systems (CBS). Compared with traditional software development, CBS development promises faster delivery with lower resource costs. The shift from custom development to CBS is not limited to new application development projects. Many maintenance projects also involve COTS products. * Corresponding author. Tel.: þ852-2766-7252; fax: þ852-2339-7082. E-mail addresses: cshleung@comp.polyu.edu.hk (H.K.N. Leung), kleung@computer.org (K.R.P.H. Leung). Rather than building the whole system from scratch, a new system can be assembled and constructed by using existing, market proven and vendor supported COTS products. For example, accounts receivables and inventory control COTS components can be purchased and integrated into an accounting system. The use of COTS products has been recognized as a trend and it has become an economic necessity [1,17]. Developing a CBS involves selecting the appropriate COTS products, building extensions to satisfy specific requirements, and then gluing the COTS products and the in-house developed units together. The success of CBS development depends heavily on the ability to select the appropriate COTS components. An inappropriate COTS product selection strategy can lead to adverse effects. It could result in a shortlist of COTS products that can hardly fulfill the required functionality, and it might also introduce overheads in system integration and maintenance phases. An effective and efficient COTS product selection process is essential to the delivery of the full potential of CBS 0950-5849/02/$ - see front matter q 2002 Published by Elsevier Science B.V. PII: S0950-5849(02)00118-0

704 K.R.P.H. Leung, H.K.N. Leung / Information and Software Technology 44 (2002) 703 715 development. If the effort required to select the appropriate COTS product is too high, then it may offset the time saving in using the COTS development approach. We classify the current COTS product selection methods into two categories, namely, the intuition approach and the direct assessment (DA) approach. In the intuition approach, software developers select COTS products according to their experience and intuition. This approach is obviously subjective. Furthermore, some COTS products that are qualified candidates for an application may be omitted inadvertently. Most of the recently proposed COTS component selection methods belong to the DA approach, which selects COTS components directly from their source. These methods consider ALL of the descriptions of the COTS products and then try to make decisions on their suitability. For example, a recent study proposes the use of a matching process between the system requirements on one hand and the constraints imposed by the COTS products on the other [13]. These methods all require a searching phase followed by an evaluation phase that examines both the functional and non-functional aspects of the COTS products. These approaches are more objective than the intuition-based approaches. However, we have noticed that the vendor s information has not been well utilized. By making better use of the vendor s information, we can find room for reducing the time required for selecting COTS products. The domain model has been recognized as an effective tool in developing quality software. A domain model is a generic model of the domain in question, and it consists of entities that represent the intrinsic properties of the domain and it captures the relations between these intrinsic properties. Hence, a domain model captures all of the functionality and features of the domain. Furthermore, since the scope of the study is the domain in general, a domain model possesses all of the intrinsic properties, features, characteristics and functions of the domain. A method of developing correct software with reference to a domain model is reported in Ref. [8]. A domain can arise naturally, or it can be created artificially or invented abstractly [15]. Each domain consists of a collection of entities, together with the specified relationships among them. A domain model is developed using the process of domain analysis. Although there are different views on domain modeling [3,5,11,12], they all share the same goal: to facilitate quality software development by reusing the knowledge of the domain in question. Domain models can be classified into two major types, namely, application-specific domains and technical classifications. Application-specific domains are based on a type of real life application such as banking, air traffic control, etc. Technical classifications are based on those classifications that are commonly used by technical practitioners such as in real-time systems, reactive systems, etc. In the development of real-life systems, these two types of domain model are interwoven. When a software module is said to be able to serve some specific purposes in an application domain, this means that it is satisfying both the application s domain-specific requirements and it is using some techniques that belong to some technical classifications. To develop a successful application system, the principal consideration must be business first. Thus, by domain modeling, we mean the application-specific domains. We have developed an indirect approach to select COTS products. Our approach, instead of assessing all of the available COTS products, makes use of the specific domain model of the intended system. We call this method the domain-based COTS selection (DBCS) method [9] and its overview is described in Section 3. In this paper, we report the analysis of the efficiency of the DBCS method. We first analyze the efficiency of the DBCS method qualitatively, and then we study its efficiency using a formal approach. We also compare the efficiency of the DBCS method and DA methods using best-fit and first-fit searching strategies. These analyses show that the DBCS method is more efficient than DA methods for both search strategies. In Section 2, we provide a review of three DA methods. Section 3 gives an overview of the domain-based COTS product selection method. This overview is followed by a qualitative analysis of the efficiency of the DBCS method. We have successfully applied the DBCS method to the development of a dealing system for a local bank. A brief description of this case study is presented. In Section 4, we formally analyze and compare the efficiency of both the DBCS method and DA approaches using both first-fit and best-fit strategies. In Section 5, we present our conclusions on the results reported in this paper. 2. COTS selection methods We first give an overview of some DA methods that have been proposed for COTS product selection. They are called the off-the-shelf-option (OTSO) [7], COTS-based integrated system development (CISD) [16], and the infrastructure incremental development approach (IIDA) [4]. We then analyze these methods and point out the drawbacks in their efficiency. Kontio has proposed the OTSO selection approach [7]. The OTSO approach comprises three phases: searching, screening and evaluation. Based on the knowledge of requirements specification, design specification, project plan and organizational characteristics, a set of selection criteria is set up to select the COTS products. The searching phase attempts to identify all potential COTS candidates. It aims to search for all COTS candidates that cover most of the required functionality. This phase emphasizes breadth rather than depth. The search criteria are based on the functionality and the system constraints. The objective of the screening phase is to decide which

K.R.P.H. Leung, H.K.N. Leung / Information and Software Technology 44 (2002) 703 715 705 COTS candidates should be selected for detailed evaluation. The screening criteria are similar to those of the searching phase. The less-qualified COTS candidates are eliminated in this stage. In the evaluation phase, COTS candidates undergo a detailed evaluation. Evaluation criteria include: functional requirements, correctness, business concerns, and software architecture. Apart from the evaluation criteria, the analytic hierarchy process is employed in the OTSO selection process to consolidate all of the evaluation information [14]. Tran and Liu have proposed the CISD model for COTS development [16]. The CISD model is not solely for COTS product selection. It is a model used to generalize the key engineering phases of selecting, evaluating and integrating COTS products for system development. The CISD model consists of three distinct phases: identification, evaluation and integration. The COTS product selection process lies within the identification and evaluation phases. The identification phase includes two sub-phases: product classification and product prioritization. In the product classification sub-phase, information on potential COTS products is collected based on the requirements of each (application) service domain. This phase is similar to the searching phase of the OTSO selection process. In the prioritization sub-phase, candidate COTS products are screened and prioritized. Two important criteria for prioritization are interoperability and the ability to fulfill multiple requirements of the service domains. The first criterion ensures that the selected COTS candidates can be integrated readily, leading to a reduction in the overall time and the effort required during system integration. The second criterion gives a higher rating to products that support multiple domains because these products can reduce the number of interconnected interfaces and architectural mismatches. In the evaluation phase, a detailed evaluation of the COTS products is performed. The CISD model examines three important areas of the COTS products: functionality, interoperability and architecture, and performance. Fox has proposed the IIDA for COTS-based technical infrastructure development [4]. This approach is a combination of the classical waterfall and spiral development models in order to accommodate the needs of CBS development. The COTS product selection process of the IIDA relies on two phases: analysis prototype and design prototype. In the analysis prototype phase, COTS candidates are selected from each COTS product family. A COTS product family is defined as a group of COTS products that perform similar functions and/or provide related services. Based on the general capabilities and basic functionality of the COTS candidates, qualified COTS products that fulfill the infrastructure requirement are identified from the COTS product family. However, the IIDA has not specified how the COTS product family is constructed or possible sources for that information. The purpose of the design prototype phase is to select and evaluate the best COTS products from the earlier phase. Basic evaluation criteria include functionality and performance. Although the three DA methods have some differences in their finer details, the typical steps of these methods that affect their efficiency are: 1. Inspect all of the modules of each available COTS product to check whether it has modules that satisfy some or all of the functional requirements of the CBS in question. 2. Check whether the COTS product also satisfies the nonfunctional requirements of the CBS. Non-functional requirements may include properties such as the interoperability of the modules of the COTS product with other systems. 3. Select the most appropriate COTS product that satisfies both the functional and non-functional requirements of the CBS. It is obvious that the efficiency of these exhaustive DA methods is inversely proportional to the product of the number of modules to be developed using COTS products, and the total number of modules in all of the available COTS products. Once CBS development becomes more popular, there will be more and more COTS products available. As a typical COTS product consists of many modules, the total number of modules in all of the available COTS products will become large. Consequently, the efficiency of the DA methods will decrease sharply with an increase in the number of COTS products. It is interesting that the proposed methods seldom mention the strategy for selecting a COTS product, even though the selection strategy affects the quality of the product selected. Depending on whether an application development needs the best available COTS product, COTS product selection can be performed using one of two strategies: Best-fit strategy: the selection process aims at identifying the best COTS product among all of the candidates. First-fit strategy: the selection process aims at identifying the first COTS product that satisfies all of the requirements. If no COTS product satisfies all of the requirements, then the best product from the available COTS products is selected. In general, the best-fit strategy will require analyzing all of the COTS candidates, whereas the first-fit strategy may require less effort as it will stop once the first COTS candidate that meets the requirements has been identified.

706 K.R.P.H. Leung, H.K.N. Leung / Information and Software Technology 44 (2002) 703 715 Fig. 1. Many-to-many relation between COTS and CBS. However, the latter strategy may not identify the best COTS product. 3. Domain-based COTS product selection method In this section, we present an overview of the DBCS method. We assume that there are many COTS products available in the market and that the CBS is complex. We first qualitatively analyze the problem of COTS product selection. Then, we develop the DBCS method with reference to the analysis. Afterwards, we analyze the DBCS method qualitatively, and then we present a case study. 3.1. Relationships among COTS products, a CBS and a domain model We first review the different relations between COTS products, a CBS, and a domain model. Relations between COTS products and a CBS. Many of the COTS products in the market are generic. These products claim that they can be applied in application systems from different domains. On the other hand, a CBS development requires that we examine COTS products from multiple vendors. The selection of COTS components for a CBS is then a many-to-many matching process (Fig. 1). The efficiency is inversely proportional to the product of the number of modules to be developed using the COTS products and the total number of modules being considered in the COTS products. If the CBS is a complex system and there are many COTS products available on the market, the system developers will have to examine many COTS products, and hence they will exert a great effort in making the selection. Consequently, this many-to-many relation may offset the advantages of using COTS development. With the increase in the number of available COTS products and the many-to-many relation between COTS and CBS, it is difficult, without a good selection method, for system developers to examine all of the COTS products. Also, searching through a large number of available COTS products is error-prone. Ideally, the selection method should be automated to increase its efficiency and reduce errors. Fig. 2. Relation between a domain model and a CBS. Fig. 3. Relation between COTS and a domain model. Relation between a domain model and a CBS. A domain model is a generic model of a domain. There may be an infinite number of systems for a domain. Hence, the relation between a domain model and a CBS is one-to-many. However, in our study, we are interested in a method of developing a CBS for a domain. Hence, the relation between a domain model and a particular CBS is a simple one-to-one relation (Fig. 2). 1 This is the case because each module of the CBS should have corresponding modules in the domain model. Otherwise, the domain model is not an appropriate generic model of the domain. It is a simple relation because it is easy to identify the modules in the domain model that correspond to the CBS. Relation between COTS products and a domain model. In general, a domain model can be supported by more than one COTS product, or the modules of a domain model can be fitted by a number of modules, each from different COTS products. Consequently, the relation between COTS products and a domain model is many-to-one (Fig. 3). It should be noted that the COTS product vendors have all the information about the COTS products, including the functional and non-functional applicability of the product. Hence, it would be efficient and appropriate for the vendors to map their COTS products to the domains in which their COTS products apply. They could specify how the COTS products can be used in the applicable parts of a domain. This would not only help the software developers to select the right COTS products, but also help the vendors to market their products. 3.2. Selection based on the domain model From the above analysis of the relations between the class of COTS products, a domain model and a CBS, we notice that the relation between the class of COTS products and a domain model is a many-to-one relation and the relation between a domain model and a CBS is a one-to-one relation (Fig. 4). We can take advantages of the properties of these relations and design a new approach to COTS product selection. We use the domain model as an agent between COTS products and a CBS. The modules from the COTS products that are claimed by the vendor to be appropriate for specific modules of a domain model are first mapped to these modules by the vendors. Then, when selecting the COTS modules for a CBS, instead of selecting the COTS components directly, the corresponding modules in the domain model of a CBS are consulted first. Through the 1 The parallelogram represents a specific entity, which denotes a domain model and a CBS in this example. Rectangles represent a class of entities, such as multiple CBS and COTS products in Fig. 1.

K.R.P.H. Leung, H.K.N. Leung / Information and Software Technology 44 (2002) 703 715 707 Fig. 4. Relation between COTS, a domain model and a CBS. mappings between a domain model and the COTS products, the corresponding COTS modules that are (claimed to be) appropriate for the CBS module are identified. A vetting procedure is then used to select the best COTS products for the application. As the mappings between the COTS products and the domain models have been provided in terms of information on how the products can be applied to a domain, selecting an applicable COTS product can be automated based on the mappings. However, the vetting procedure cannot be fully automated because some of the functional and non-functional criteria will require human judgment. To summarize, this method consists of two phases: set-up and selection. 1. Set-up phase. When vendors rollout their COTS products, besides making the COTS products available on the market, they also need to map their COTS modules to those modules of the domains that they find are applicable. These mappings are available to the application system developers and can be accessed electronically. 2. Selection phase (a) The corresponding modules in a domain model are identified for each of the modules of the CBS in question. (b) The COTS modules that claim to be applicable are Fig. 5. An example of using the DBCS method. (c) (d) identified by the mappings from the domain model to the COTS modules. It should be noticed that these modules are functionally applicable to the domain models. The non-functional properties of the identified COTS modules are assessed. With reference to all of the assessment results, the most appropriate COTS modules are selected. We illustrate this method with the example shown in Fig. 5. In the set-up phase, the vendor of COTS1 has identified that C11 is applicable to n1 of the domain model; the vendor of COTS2 has identified that C21 and C22 are applicable to n1 and n2, respectively; the vendor of COTS3 has identified that C31 is applicable to n2; and the vendor of COTS4 has identified that C41 and C42 are applicable to n2 and n3, respectively. Then, when the specification of the CBS is developed, it is easy for the developers to identify that S1 corresponds to n2 and S2 corresponds to n3 of the domain model. With reference to n2 and n3, COTS modules C22, C31, C41 and C42 are identified. The next step is to validate the nonfunctional requirements of these four modules. One module has to be chosen from C22, C31 and C41. Since C42 is the only applicable module for S2, it will be selected if it satisfies the non-functional requirements. We next make several observations on the efficiency of the DBCS method. In the set-up phase, modules of COTS products are first mapped to domain models by the vendors. Since the vendors have full information about the functionalities and features of the COTS products, it should be easy for them to decide how the modules are to be used in different domain modules. The mappings from the COTS modules to a domain model are reused each time a CBS from that domain is to be developed. Moreover, the mappings between the COTS modules and the domain modules help the vendors to market their products because they can demonstrate easily how their COTS products can be applied in an application domain. Conceptually, the relation between a domain model and all of the COTS products is one-to-many. As the mappings are developed from the COTS-product side to the domain model side, in terms of the development of the mapping, it is still a one-to-one relation. Consequently, the complexity of developing the mapping is reduced. Furthermore, these mappings are reused in every CBS development. The mappings between the COTS products and domain models can be managed by means of computing systems, and retrieving information about the COTS modules from the domain models can be done automatically. Consequently, much effort can be saved by this approach. In the selection phase, the first step is identifying the corresponding modules in the domain model for a CBS. This step is simple because a domain module has captured

708 K.R.P.H. Leung, H.K.N. Leung / Information and Software Technology 44 (2002) 703 715 all of the features of the domain. The relation between a module of a CBS with its domain module is one-to-one. It should be easy for the developer of a CBS to identify the corresponding modules in the domain model. Although the process of selecting COTS products for these modules depends on the number of mappings, it is still a single simple step because information on the COTS products can be collected through the defined mappings. Furthermore, if the mappings are well managed, after the modules of the domain models are identified, the mappings between COTS products and domain modules can be retrieved by automatic systems. This would be an accurate and efficient method to identify those COTS products that satisfy the functional requirements of a CBS. A weighting method based on the degree of functional fitness can then be used to help select the most suitable COTS product. It is obvious that the DBCS method is an efficient method. This is because the one-to-many relation between a CBS and all of the COTS products is reduced to two relations, namely, a many-to-one relation between the COTS products and a domain model and a one-to-one relation between a domain model and a CBS. The former relation is a simple relation and is handled easily. Although the latter relation is complex to deal with, the vendors who possess all of the necessary information are able to easily solve this problem. 3.3. A case study We have applied the DBCS method to the development of an on-line margin trading system and obtained encouraging results [9]. We have successfully developed a prototype margin trading system for a large leading bank in Hong Kong with the use of COTS products. The margin trading system deals mainly with the margin trades between trading parties. It is one of the core activities in the banking and financial industry. Due to confidentiality requirements, we cannot release the functional details of this system. In applying the DBCS method, the first requirement is the availability of a domain model for the specific application. Since an industrial-scale margin trading domain model was not available for our research, our first step was to create such a domain model. This domain model was built by an experienced developer who has been working on a margin trading system in a leading bank for several years. He was familiar with the business, activities, operations and general system architecture of a margin trading system. The domain model was built using object-oriented technology and expressed in object modeling technique (OMT). We then mapped the modules of two COTS products to the modules of our bank margin trading domain model. Afterwards, we asked the software developers to identify the COTS modules by following the steps of the DBCS method using both best-fit and first-fit strategies on the two modules. 4. Efficiency of the COTS product selection methods In considering the efficiency of product selection, we are concerned with the processing performance of the method. The number of comparisons that are made and the effort required in using a method can be used for this measurement of efficiency. In this section, we analyze the efficiency of the domain-based COTS product selection method and the DA method. In these analyses, we first examine the best-fit strategy, then the first-fit strategy. The best-fit strategy selects the best COTS product among those available. The first-fit strategy selects the first product that fits all of the requirements. If there are no choices that meet all requirements, then the best COTS product is selected from the available choices. Let there be n modules in the CBS ðn $ 0Þ and x COTS products available ðx $ 0Þ: Let c i denote an individual COTS product. Let cm i be the set of modules of c i. Then, ðc i ; cm i Þ form a tuple that captures the COTS product identifier and its set of modules. Let L c ¼ ½ðc 1 ; cm 1 Þ; ðc 2 ; cm 2 Þ; ; ðc x ; cm x ÞŠ be the list which represents a set of x COTS products. Here we define some operators for our analysis. Let elemðlþ be a function that returns the set of unique elements from a list l. For example, elemð½1; 1; 2ŠÞ ¼ ½1; 2Š: Let cardðsþ be a function that returns the cardinality of the set S. For example, cardð{1; 2; 3}Þ ¼3: Let xtpði; tþ be a function that extracts the ith elements from a set of tuples, denoted by t, and let it return them in a set. If the ith element in a tuple does not exist, xtp returns an empty set. For example, xtpð2; {ða; bþ; ðc; d; eþ}þ ¼{b; d} and xtpð3; {ða; bþ; ðc; dþ}þ ¼{}: Let there be y COTS products among the x available COTS products that have some modules satisfying some of the functional requirements of the n CBS modules. Let each of these y COTS products be denoted by d i.letdc i be a module of d i,andletn i be a module of the CBS that is functionally satisfied by dc i.letdm i be a set of tuples ðdc i ; n i Þ: Then, a list L d ¼½ðd 1 ; dm 1 Þ; ðd 2 ; dm 2 Þ; ; ðd y ; dm y ÞŠ can be constructed to keep the COTS candidates together with their set of module pairs that satisfy the functional requirements of the CBS. Since L d is selected from L c and cardðelemðl c ÞÞ ¼ x; cardðelemðl d ÞÞ ¼ y; 1. x $ y, 2. xtpð1; elemðl c ÞÞ $ xtpð1; elemðl d ÞÞ; 3. ;c i, cm i, dm i : ðc i ; cm i Þ [ elemðl c Þ ^ ðc i ; dm i Þ [ elemðl d Þ)cm i $ xtpð1; dm i Þ: Let the CBS have v non-functional criteria. Among the y COTS products, let there be z COTS products that satisfy the non-functional requirements of the CBS. Let each of these z COTS products be denoted by e i. Let ec i be a module of e i, which satisfies the non-functional requirements, and let n i be a module of the CBS, which is

K.R.P.H. Leung, H.K.N. Leung / Information and Software Technology 44 (2002) 703 715 709 Fig. 6. CBS and related COTS. functionally satisfied by ec i.letem i be the set of tuples ðec i ; n i Þ: Unlike functional requirements, non-functional requirements can have various degrees of satisfaction. For example, the performance of COTS A may just meet the requirements and cost $A, while COTS B gives a better performance than A and costs less than $A. Then, we can say that the overall rating on the non-functional requirements for B is greater than A. This assignment of the overall rating depends on a wide variety of factors such as company policy, experience, costing and the specific CBS. Therefore, its evaluation requires human judgment and it cannot be automated. Let p i be the overall rating of the non-functional requirements assessment of the COTS product e i. Then, a list L e ¼½ðe 1 ; em 1 ; p 1 Þ; ðe 2 ; em 2 ; p 2 Þ; ; ðe z ; em z ; p z ÞŠ can be constructed to keep the COTS products together with their set of modules that satisfy both the functional and nonfunctional requirements of the CBS and their overall rating on the non-functional requirements. Since L e is constructed from L d and cardðelemðl e ÞÞ ¼ z; 1. y $ z, 2. xtpð1; elemðl d ÞÞ $ xtpð1; elemðl e ÞÞ; 3. ;e i, em i, p i : ðe i ; em i ; p i Þ [ elemðl e Þ)ðe i ; em i Þ [ elemðl d Þ: Fig. 6 shows the relationship between the three types of COTS products (x COTS, y COTS and z COTS) and the CBS. 4.1. Efficiency comparison for the best-fit strategy We first consider the DA method. When selecting COTS products using the DA method with the best-fit strategy, the following four steps are involved: 1. Collect all information on the COTS products from the sources. 2. Select those COTS products that have modules satisfying some or all of the functional requirements of the CBS in question. 3. Among the COTS products selected in step 2, select those COTS products that satisfy the non-functional requirements of the CBS. 4. Prioritize the COTS products selected in step 3. The first in the priority list is then the best COTS product for CBS development. The effort involved in selecting the best COTS product consists of four parts: searching, functional vetting, nonfunctional vetting, and sorting. Searching is the effort expended to find the information on the COTS products available from the sources. Searching can be done using different methods, such as searching the Internet and consulting catalogues of COTS products. Functional vetting and non-functional vetting involves selecting those COTS products that satisfy the functional and non-functional requirements, respectively. Sorting prioritizes the COTS products that satisfy both functional and non-functional

710 K.R.P.H. Leung, H.K.N. Leung / Information and Software Technology 44 (2002) 703 715 requirements. There are different ways to do this sorting. We use the following prioritization strategy: 1. The COTS product with the most modules satisfying the functional requirement of the CBS has the highest priority. This is reasonable because the more modules that fit the requirement, the less development work is required. 2. If several COTS products qualify with the same number of modules, then their priority is based on the rating ( p i ) for the non-functional requirements. We assume that sufficient information on all x COTS products can be collected. Let t i be the average time required for searching for a COTS product from the available sources. The scale of t i can be of hours or days because searching is a manual step, which collects information on COTS products from records, catalogues or the Internet. The total time required for searching for information on the x COTS products and forming the sequence of COTS L c ¼ ½ðc 1 ; cm 1 Þ; ðc 2 ; cm 2 Þ; ; ðc x ; cm x ÞŠ is xt i. Let t x be the average time required for vetting the functional appropriateness of a module of a COTS product. This step is similar to carrying out a black box test, which involves setting the test plan and conducting the test. Hence, the scale of t x can be hours to days. The time required to check all of the functional appropriateness of the x COTS products and to form the list L d ¼½ðd 1 ; dm 1 Þ; ðd 2 ; dm 2 Þ; ; ðd y ; dm y ÞŠ is nt x S x 1cardðcm i Þ; where n is the number of modules of the CBS to be developed using COTS products. Let t v be the average time required for assessing each of the v non-functional criteria for a COTS product. This step is also similar to carrying out a black box test of the nonfunctional requirements. Like t x, the scale of t v can also be from hours to days. Note that the vetting of non-functional requirements involves individual modules. In the list L d ¼ ½ðd 1 ; dm 1 Þ; ðd 2 ; dm 2 Þ; ; ðd y ; dm y ÞŠ; the cardinality of the set dm i is the number of modules of the CBS in which the COTS product d i can provide modules that satisfy the CBS functional requirements. Hence, the total time required for vetting the non-functional criteria and forming the list L e ¼½ðe 1 ; em 1 ; p 1 Þ; ðe 2 ; em 2 ; p 2 Þ; ; ðe z ; em z ; p z ÞŠ is vt v S y 1 cardðdm iþ: The time required to sort L e, with reference to cardðem i Þ as the primary key and rating p i as the secondary key for selecting the most appropriate COTS product, is O(z log z ). Sorting is of order O(z log z ) because the lower bound for sorting q items is O(q log q ). The L e list is sorted with reference to the cardðem i Þ because it is the set of modules possessed by COTS product e i that satisfy both the functional and non-functional requirements. The product that fits most of the modules of the CBS is treated as the best one. If there is more than one COTS product with the same (highest) number of modules satisfying both the functional and non-functional requirements, then the COTS product with the highest rating ( p i ) is chosen. The total time required for selecting the best COTS product for a CBS using the DA method is t B d ¼ xt i þ nt x S x 1cardðcm i Þþvt v S y 1 cardðdm iþ þoðz log zþ: Next, we consider three boundary cases. Case (1). No COTS product fits the functional requirements of the CBS, i.e. y ¼ 0: In this case, steps 3 and 4 can be skipped. td B ¼ xt i þ nt x S x 1cardðcm i Þ: ð1aþ Case (2). None of the functionally appropriate COTS products satisfies the non-functional requirement, i.e. z ¼ 0: In this case, step 4 can be skipped. t B d ¼ xt i þ nt x S x 1cardðcm i Þþvt v S y 1 cardðdm iþ: ð1þ ð1bþ Case (3). All of the available COTS products satisfy both the functional and non-functional requirements, i.e. z ¼ y ¼ x; then cardðdm i Þ¼cardðcm i Þ In this case, t B d ¼ xt i þ nt x S x 1cardðcm i Þþvt v S x 1cardðcm i Þ þoðx log xþ ¼xt i þðnt x þ vt v ÞS x 1cardðcm i ÞþOðx log xþ: ð1cþ Next, we consider the DBCS method. When selecting a COTS product using DBCS with the best-fit strategy, the following steps are involved: 1. The modules that correspond to the CBS are identified from a domain model. 2. Identifiers of COTS modules that are mapped to the modules in the domain model from step 1 are found through the mappings between the domain model and the COTS modules. These COTS modules are claimed (by the vendor) to functionally satisfy the modules of the domain model. 3. Information is collected on those COTS products that have modules selected in step 2. 4. These COTS products are vetted against the nonfunctional requirements of the CBS. 5. The best COTS product is selected from the candidates that satisfy both functional and non-functional requirements. Let t n be the average time required for identifying a module from a domain model. As the system developers should be familiar with the system in question and its corresponding domain model, the scale of t n should be in minutes. Furthermore, as the relation between a CBS and its corresponding domain model is a one-to-one mapping, the searching time of n corresponding modules in the domain model for step 1 is nt n. The next step is to identify the COTS product that is claimed to satisfy the functional requirements. There is no need to retrieve information about the COTS products in

K.R.P.H. Leung, H.K.N. Leung / Information and Software Technology 44 (2002) 703 715 711 Table 1 Selection time under the best-fit strategy DA, t B d DBCS, t B DB Case (1): no COTS product fits the functional requirements of the CBS Case (2): none of the functionally appropriate COTS products satisfies the non-functional requirements Case (3): all the available COTS products satisfy both the functional and non-functional requirements xt i þ nt x S x 1 cardðcm i Þ xt i þ nt x S x 1 cardðcm i Þþvt v S y 1 cardðdm iþ xt i þðnt x þ vt v ÞS x 1 cardðcm i ÞþOðx log xþ nt n nt n þ t b S y 1 cardðdm iþþyt r þ vt v S y 1 cardðdm iþ nt n þðt b þ vt v ÞS x 1 cardðdm i Þþxt r þ Oðx log xþ this step. Only the identifier of the COTS product is required. This step is done by consulting the mappings between the domain model and the sources of COTS products. Let the n modules in the domain model be denoted by ðn 1 ; n 2 ; ; n n Þ: Let the set of identifiers of the COTS modules that satisfy the functionality required of module n i be denoted by l i. Then, the list L n ¼ ½ðn 1 ; l 1 Þ; ðn 2 ; l 2 Þ; ; ðn n ; l n ÞŠ denotes the list of modules in the domain model together with the set of identifiers of the modules of the COTS products. In the example shown earlier in Fig. 5, there are three modules in the domain model and four COTS products available, and a total of six matching pairs of modules, so that L n ¼ ½ðn1; {c11; c21}þ; ðn2; {c22; c31; c41}þ; ðn3; {c42}þš: Let t b be the average time required for identifying the modules, which are claimed to be functionally appropriate for a module of the CBS through the mappings between the COTS products and the domain model. These mappings are provided by the COTS product vendors in advance and they can be stored in computer systems or obtained through the Internet. Hence, this step is similar to retrieving information from databases. The scale of t b should be in seconds or minutes. The total time required to form L n is t b S n 1cardðl i Þ: The next step is to construct the list L 0 d ¼ ½ðd 1 ; dm 1 Þ; ðd 2 ; dm 2 Þ; ; ðd y 0; dm y 0ÞŠ from L n. This includes two sub-steps. The first sub-step is to construct L 00 d ¼ ½ðd 1 ; ldm 1 Þ; ðd 2 ; ldm 2 Þ; ; ðd y 0; ldm y 0ÞŠ from L n,where d i is the ith COTS product and ldm i is the set of identifiers of the modules of the ith COTS product that contains functionally appropriate modules for the CBS. The second sub-step is to retrieve the information content for the COTS modules to form the list Ld 0 in order to do the non-functional requirements check. In the example shown in Fig. 5, Ld 0 ¼ [(COTS1, {(C11, n1)}), (COTS2, {(C21, n1), (C22, n2)}), (COTS3, {(C31, n2)}), (COTS4, {(C41, n2), (C42, n3)})]. The time required to form Ld 0 from L n can be neglected because it is much smaller than t b : Thus, the total time for step 2 is approximately t b S n 1cardðl i Þ: Let t r be the average time required to retrieve information on a COTS product from the source and to form the tuple (d i, dm i ). As the information on COTS products is retrieved from known locations, they can be obtained using some assessment method. Hence, the scale of t r is in minutes. The time for retrieving information on y 0 COTS products for step 3isy 0 t r. Assuming that both the DBCS method and the DA method can obtain the same set of COTS products that satisfy the CBS functionally, then y 0 ¼ y: Steps 4 and 5 vet the non-functional requirements and choose the best COTS product. These steps are the same as those used in the DA method. Hence, the time required for these steps is the same as for the DA method. The total time required for selecting the best COTS product for a CBS using the domain-based method is t B DB ¼ nt n þ t b S n 1 cardðl i Þþyt r þ vt v S y 1 cardðdm iþ þoðz log zþ: It should be noticed that S n 1 cardðl i Þ¼S y 1 cardðdm iþ: Therefore, t B DB ¼ nt n þ t b S 3 1 cardðdm i Þþyt r þ vt v S y 1 cardðdm iþ þoðz log zþ: Next, we consider the three boundary cases presented earlier. For case (1), step 2 will fail and the remaining steps can be skipped. t B DB ¼ nt n : For case (2), step 5 can be skipped. t B DB ¼ nt n þ t b S y 1 cardðdm iþþyt r þ vt v S y 1 cardðdm iþ: For case (3), t B DB ¼ nt n þ t b S x 1 cardðdm i Þþxt r þ vt v S x 1 cardðdm i Þ þoðx log xþ ¼ nt n þðt b þ vt v ÞS x 1 cardðdm i Þ þxt r þ Oðx log xþ: ð2þ ð3þ ð3aþ ð3bþ ð3cþ Table 1 summarizes the three cases for both DA and DBCS.

712 K.R.P.H. Leung, H.K.N. Leung / Information and Software Technology 44 (2002) 703 715 Next, we compare the performance of the DBCS method and DA approach. We would like to show that DBCS is more efficient than DA. That is, t B d $ t B DB: There are two important cases to consider. Case (1): there exist y COTS products that have modules that fit all of the functional requirements of the CBS, i.e. y. 0: Case (2): no COTS product fits the functional requirements of the CBS, i.e. y ¼ 0: We have shown that for cases (1) and (2) that t B d $ t B DB holds [10]. Thus, we can conclude that with the best-fit strategy, the DBCS method is more efficient than the DA approach. 4.2. Efficiency comparison for the first-fit strategy We first consider the DA method. In applying the first-fit strategy, ideally, once we find a COTS product that fits all of the functional and non-functional requirements, we can stop the selection process. However, such a COTS product may not exist. If no COTS product satisfies all of the functional and nonfunctional requirements, then the one with the most modules that satisfy both the functional and the nonfunctional requirements is chosen. The algorithm of this first-fit strategy using the DA method is as follows: Algorithm DA-FF 1. While there are some remaining COTS products from the available sources OR no appropriate COTS product is found 2. get information on one COTS product from the source 3. identify the modules which satisfy the functional requirements of the CBS 4. if all of the modules of the CBS are satisfied by some modules of the COTS product then 5. check the non-functional requirements of the COTS product 6. if it satisfies all of the non-functional requirements, then an appropriate COTS product is found 7. else 8. insert the COTS product into the working list L w ¼ ½ðd 1 ; dm 1 Þ; ðd 2 ; dm 2 Þ; ; ðd u ; dm u ÞŠ 9. end While 10. if no appropriate COTS product is found and L w is not empty then 11. sort L w in descending order of card(dm i ) 12. While there are COTS products in the sorted list and no suitable COTS product is found 13. get the head of the sorted list 14. check the non-functional requirements of the COTS product 15. if it satisfies all of the non-functional requirements, then a suitable COTS product is found 16. end While. The time required by step 2 is t i. The time required by step 3 is nt x cardðcm i Þ; as there are n modules in the CBS. The time required by step 5 is nvt v, as we only need to check the n modules of the COTS products that satisfy the functional requirements of the CBS. The time required by step 11 is O(u log u ), where u is cardðl w Þ: The time required by step 14 is vt v cardðdm i Þ: The other steps require minimal time and can be ignored in the computation of the total effort. The time required to check the appropriateness of a COTS product denoted by c i is t F d ðc i Þ¼t i þ nt x cardðcm i Þþnvt v : There are two important cases to be considered as follows. Case (1). Multiple COTS products are found that satisfy both the functional and the non-functional requirements. Let there be w COTS products ðw. 0Þ; among the available sources, with modules that satisfy both the functional and the non-functional requirements of the CBS. The probability of selecting the first COTS product that is appropriate for the CBS is w=x; and the probability of selecting the second COTS product that is appropriate is w=ðx 2 1Þ; and so on. Let ami denote the average number of modules contained in a COTS product. The expected time t F d (i.e. the average time) required using the first-fit strategy is then t F d ðaverageþ ¼ðt i þ nt x ami þ nvt v Þðw=x þ w=ðx 2 1Þ þ w=ðx 2 2Þþ þ 1Þ: ð4þ ð4aþ Case (2). All COTS products satisfy the functional requirements but none satisfies the non-functional requirements. The worst case occurs when all x of the COTS products satisfy only some of the functional requirements, and none of them satisfy the non-functional requirements of the CBS. Then, L w ¼½ðd 1 ; dm 1 Þ; ðd 2 ; dm 2 Þ; ; ðd x ; dm x ÞŠ: The time required to form the working list, L w,isxt i þ nt x S x 1 cardðcm i Þ: The time required to sort the working list is O(x log x ). The time required to check all of the non-functional requirements of the COTS products in L w is vt v S x 1 cardðdm i Þ: Therefore, in the worst case, t F d ðworstþ ¼xt i þ nt x S x 1 cardðcm i ÞþOðx log xþ þ vt v S x 1 cardðdm i Þ: ð4bþ Next, we consider the domain-based COTS product

K.R.P.H. Leung, H.K.N. Leung / Information and Software Technology 44 (2002) 703 715 713 Table 2 Selection time under the first-fit strategy DA, t F d DBCS, t F DB Case (1): multiple COTS products found satisfying both functional and non-functional requirements Case (2): all COTS products satisfy the functional requirements, but none satisfy the non-functional requirements t F d ðaverageþ ¼ðt i þ nt x ami þ nvt v Þ ðw=x þ w=ðx 2 1Þþw=ðx2Þþ þ 1Þ td F ðworstþ ¼xt i þ nt x S x 1 cardðcm i Þ þ Oðx log xþþvt v S x 1 cardðdm i Þ t F DBðaverageÞ ¼nt n þðnt c þ t r þ nvt v Þ ðw=x þ w=ðx 2 1Þþ þ 1Þ tdbðworstþ F ¼nt n þ nt c x þ Oðx log xþ þ xt r þ vt v S x 1 cardðdm i Þ selection method. The algorithm of the first-fit strategy of the DBCS method is as follows. Algorithm DBCS-FF 0. identify the corresponding modules of the CBS in the domain model 1. While there are un-selected COTS products OR no appropriate COTS product is found 2. choose a COTS product 3. check if all n modules in the domain model have mappings from this COTS product 4. if step 3 is false, then insert the COTS product into the working list L w ¼½ðd 1 ; dm 1 Þ; ; ðd u ; dm u ÞŠ 5. else get information on the COTS products from the sources 6. if it satisfies the non-functional requirements, then an appropriate COTS product is found 7. end While 8. if no appropriate COTS product is found and L w is not empty, then 9. sort the list L w in descending order according to cardðdm i Þ 10. While there are COTS products in the sorted list and no suitable COTS product is found 11. get the head of the sorted list 12. retrieve information on the COTS product from the source 13. check the non-functional requirements of the COTS product 14. if it satisfies the non-functional requirements then a suitable COTS product is found 15. end While In the DBCS method, since there are mappings between the COTS products and the domain model, all of the COTS modules that are functionally appropriate for the CBS can be obtained directly through these mappings. The first-fit strategy of the DBCS method tries to find the first COTS product that functionally satisfies most of the modules of the CBS and also the non-functional requirements. The second half of the algorithm is similar to that of Algorithm DA-FF. The only difference is the time required for retrieving the information on the COTS products from the sources, which can affect the performance of the algorithm. The time required by step 0 is nt n. Let t c be the average time required for checking whether there is a mapping between a module of the COTS product and the domain model. The scale of t c is seconds to minutes. The time required by step 3 is nt c because it is only required to check whether there are mappings from each of the n modules in the domain model to the COTS products. The time required by step 5 is t r. The time required by step 6 is nvt v because only the n modules of the COTS products which satisfy the functional requirements of the CBS are required to be checked. The time required by step 9 is O(u log u ), where u is cardðl w Þ: The time required by step 12 is t r. The time required by step 13 is vt v cardðdm i Þ: The other steps require minimal time and can be ignored in the computation of the total effort. The total processing time of an appropriate COTS product denoted by c i is t F DBðc i Þ¼nt c þ t r þ nvt v : Next, we consider the two important cases presented earlier as follows. For case (1), let there be w COTS products (w. 0) that fit all n modules. The expected time for identifying an appropriate COTS product is then t F DBðaverageÞ ¼nt n þðnt c þ t r þ nvt v Þ ðw=x þ w=ðx 2 1Þþ þ 1Þ: ð5þ ð5aþ For case (2), the worst case occurs when all x of the COTS products have some mappings to some of the n modules in the domain model, but none of them satisfy the nonfunctional requirements of the CBS. Then, L w ¼ ½ðd 1 ; dm 1 Þ; ðd 2 ; dm 2 Þ; ; ðd x ; dm x ÞŠ: The total time required for step 3 is nt c x. The time required by sorting the list in step 9 is O(x log x ). The total time required for retrieving the information on the x COTS products from the sources is xt r. The total time required for checking the non-functional requirements is vt v S x 1 cardðdm i Þ: