Hybrid Clustering Approach for Software Module Clustering

Similar documents
SOFTWARE MODULE CLUSTERING USING SINGLE AND MULTI-OBJECTIVE APPROACHES

Comparative Study of Software Module Clustering Algorithms: Hill-Climbing, MCA and ECA

Software Module Clustering using a Fast Multi-objective Hyper-heuristic Evolutionary Algorithm

Software Module Clustering as a Multi Objective Search Problem

Research Note. Improving the Module Clustering of a C/C++ Editor using a Multi-objective Genetic Algorithm

Multi-objective Ranking based Non-Dominant Module Clustering

Using Heuristic Search Techniques to Extract Design Abstractions from Source Code

Application Testability for Fault Detection Using Dependency Structure Algorithm

A Design Recovery View - JFace vs. SWT. Abstract

Putting the Developer in-the-loop: an Interactive GA for Software Re-Modularization

Overview of SBSE. CS454, Autumn 2017 Shin Yoo

SArEM: A SPEM extension for software architecture extraction process

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems

A Novel Approach for Minimum Spanning Tree Based Clustering Algorithm

An Architecture for Distributing the Computation of Software Clustering Algorithms

On the evaluation of the Bunch search-based software modularization algorithm

XLVI Pesquisa Operacional na Gestão da Segurança Pública

A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics

Improving the quality of software cohesion metrics through dynamic analysis

Assessing Package Reusability in Object-Oriented Design

Using Interconnection Style Rules to Infer Software Architecture Relations

ISSN: [Keswani* et al., 7(1): January, 2018] Impact Factor: 4.116

Optimization Technique for Maximization Problem in Evolutionary Programming of Genetic Algorithm in Data Mining

Multiple Layer Clustering of Large Software Systems

Parallel string matching for image matching with prime method

Research on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a

A Fitness Function to Find Feasible Sequences of Method Calls for Evolutionary Testing of Object-Oriented Programs

Genetic Algorithm based Fractal Image Compression

Automated Test Data Generation and Optimization Scheme Using Genetic Algorithm

Impact of Length of Test Sequence on Coverage in Software Testing

Meta- Heuristic based Optimization Algorithms: A Comparative Study of Genetic Algorithm and Particle Swarm Optimization

NOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

Optimization of Query Processing in XML Document Using Association and Path Based Indexing

Taxonomy Dimensions of Complexity Metrics

Joint Entity Resolution

Mining User - Aware Rare Sequential Topic Pattern in Document Streams

Fast Efficient Clustering Algorithm for Balanced Data

A Genetic Algorithm Applied to Graph Problems Involving Subsets of Vertices

Substitution Based Rules for Efficient Code Duplication on Object Oriented Systems

RECORD DEDUPLICATION USING GENETIC PROGRAMMING APPROACH

A Novelty Search Approach for Automatic Test Data Generation

Automatic Modularization of ANNs Using Adaptive Critic Method

Solving the Class Responsibility Assignment Case with UML-RSDS

Fuzzy Ant Clustering by Centroid Positioning

Java Source-code Clustering: Unifying Syntactic and Semantic Features

Optimization of Multi-server Configuration for Profit Maximization using M/M/m Queuing Model

A Retrieval Mechanism for Multi-versioned Digital Collection Using TAG

Best Keyword Cover Search

A Genetic Algorithm Approach for Clustering

A Transformation-Based Model of Evolutionary Architecting for Embedded System Product Lines

Using Text Learning to help Web browsing

Keywords: clustering algorithms, unsupervised learning, cluster validity

An Algorithm for Optimized Cost in a Distributed Computing System

Analyzing Outlier Detection Techniques with Hybrid Method

KEYWORDS: Clustering, RFPCM Algorithm, Ranking Method, Query Redirection Method.

Obstacle-Aware Longest-Path Routing with Parallel MILP Solvers

Theoretical Foundations of SBSE. Xin Yao CERCIA, School of Computer Science University of Birmingham

Conducting a Self-Assessment of a Long-Term Archive for Interdisciplinary Scientific Data as a Trustworthy Digital Repository

Evaluating the Effect of Inheritance on the Characteristics of Object Oriented Programs

Genetic Algorithms. Kang Zheng Karl Schober

SELF-ORGANIZING TRUST MODEL FOR PEER TO PEER SYSTEMS

Performance Evaluation of XHTML encoding and compression

Lamarckian Repair and Darwinian Repair in EMO Algorithms for Multiobjective 0/1 Knapsack Problems

CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM

Comparison of Online Record Linkage Techniques

AN EFFICIENT VIDEO WATERMARKING USING COLOR HISTOGRAM ANALYSIS AND BITPLANE IMAGE ARRAYS

Enhancing Internet Search Engines to Achieve Concept-based Retrieval

Semi-Supervised PCA-based Face Recognition Using Self-Training

TOPOLOGY CONTROL IN WIRELESS NETWORKS BASED ON CLUSTERING SCHEME

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS

Scalable Clustering of Signed Networks Using Balance Normalized Cut

Clustering large software systems at multiple layers

A new predictive image compression scheme using histogram analysis and pattern matching

In this project, I examined methods to classify a corpus of s by their content in order to suggest text blocks for semi-automatic replies.

Getting the Most from Search-Based Refactoring

Enhancing Clustering Results In Hierarchical Approach By Mvs Measures

Finding Effective Software Security Metrics Using A Genetic Algorithm

A Modified Hierarchical Clustering Algorithm for Document Clustering

A Study on Reverse Top-K Queries Using Monochromatic and Bichromatic Methods

A QUAD-TREE DECOMPOSITION APPROACH TO CARTOON IMAGE COMPRESSION. Yi-Chen Tsai, Ming-Sui Lee, Meiyin Shen and C.-C. Jay Kuo

Test Case Generation for Classes in Objects-Oriented Programming Using Grammatical Evolution

Using Database Storage to Improve Explorative Optimization of Form Critical Structures

COMPARISON OF SIMPLIFIED GRADIENT DESCENT ALGORITHMS FOR DECODING LDPC CODES

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM

Applying Phonetic Hash Functions to Improve Record Linking in Student Enrollment Data

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

MODULAR PARTITIONING FOR INCREMENTAL COMPILATION

Forecasting Technology Insertion Concurrent with Design Refresh Planning for COTS-Based Electronic Systems

WITHOUT insight into a system s design, a software

Finger Print Enhancement Using Minutiae Based Algorithm

Creating a Lattix Dependency Model The Process

The Relationship between Slices and Module Cohesion

Graph Matching: Fast Candidate Elimination Using Machine Learning Techniques

Redundancy Resolution by Minimization of Joint Disturbance Torque for Independent Joint Controlled Kinematically Redundant Manipulators

ISSN: (Online) Volume 2, Issue 7, July 2014 International Journal of Advance Research in Computer Science and Management Studies

Solving the Travelling Salesman Problem in Parallel by Genetic Algorithm on Multicomputer Cluster

A Performance Analysis of Compressed Compact Genetic Algorithm

Advanced Matching Technique for Trustrace To Improve The Accuracy Of Requirement

MySQL Data Mining: Extending MySQL to support data mining primitives (demo)

Transcription:

Hybrid Clustering Approach for Software Module Clustering 1 K Kishore C, 2 Dr. K. Ramani, 3 Anoosha G 1,3 Assistant Professor, 2 Professor 1,2 Dept. of IT, Sree Vidyanikethan Engineering College, Tirupati 3 Dept. of CSE, Annamacharya Institute of Technology and Sciences, Tirupati 1 mr.c.kishore@gmail.com, 2 ramanidileep@yahoo.com, 3 anua13@gmail.com Abstract---Maintenance is one of the main software creation activities in terms of allocated resources. As existing software ages, newly developed systems are built to improve upon existing systems. As software evolves, modularization structure of software degrades and at one point it becomes a challenging task to maintain the software future. Recently, clustering techniques have been used to help with the issues of software evolution and maintenance. It is well known fact that a good modularized software system is easy to understand and maintain. Software Module Clustering is an important task during the maintenance of the software whose main goal is to achieve good modularized software. In recent time, this problem has been converted into search based software engineering problem. All previous work on software module clustering used a single objective formulation of the problem. That is, twin objectives of high cohesion and low coupling have been combined into a single objective called modularization Quality. We introduced hybrid clustering approach which improves modular structure of the software system. We present the results and comparing the results obtained with this from existing single objective formulation on 7 real world model clustering problems. The results of this empirical study prove that hybrid clustering approach produces significantly better solutions than the existing single-objective approach. Index Terms--- Search Based Software Engineering, Software Module Clustering, Modularization Quality. I. INTRODUCTION Most of the interesting software systemss are large and complex which is difficult to understand their structure. One of the reasons for this complexity is that source code contains many entities like classes, modules etc., that depend on each other in intricate ways (e.g., procedure calls, variable references. Once a software engineer understands a system's structure, it is difficult to preserve this understanding, because the structure tends to change during maintenance. To solve the above problem, the reversee engineering research community has developed techniques to partition the software system s structure into meaningful subsystems called clusters. Generally a subsystem consists of a collection of collaborating source code resources that implement a feature or provide a service to the rest of the system. Source code resources that found in subsystems include classes, modules and possibly other subsystems. Software module clustering involves partitioning of connected software system modules into group of clusters according to predetermined criteria. The criterion followed in this paper is that clusters are to be strongly connected internally (high cohesion) and weakly connected externally (low coupling) as shown in figure 1. Creating multiple clusters of high cohesion and low coupling is better than creating a single cluster of relatively low cohesion but zero coupling. Fig. 1 Example of a possible clustering of a simple MDG Software module clustering [10] is an important and challenging task in software engineering. It is widely believed that a wellthe module clustering problem. modularized software system is easier to develop and maintain. There are so many ways to solve Mancoridis et al., [1] suggested the search-based approach to solve software module clustering, in this paper also we follows the search-based approach. In the search based approach, the attributes of a good modular decomposition are formulated as objectives, the evaluation of which as a fitness function [3] guides a search-based optimization algorithm. All previous works done on the software module clustering problem inspired by it have used a single-objective formulation of the problem. That is, the twin objectives of high cohesion and low coupling have been combined into a single objective called Modularization Quality (MQ). This paper introduces hybrid clustering approach for software module clustering and presenting results that show how this approach can yield superior results than the existing software clustering approaches [1], [9]. The primary contributions of this paper are as follows: (www.ijedr.org) 181

Hybrid Clustering Approach for software module clustering is introduced. An empirical study into effectiveness and performance of existing and proposed approaches is presented. The rest of this paper is organized as follows: Section 2 presents Automated Software Module Clustering process. Section 3 presents background and related work on software module clustering. Section 4 introduces the Hybrid Clustering Approach for modularization. Section 5 results. Section 6 concludes. II. AUTOMATED SOFTWARE MODULE CLUSTERING PROCESS Figure 2 shows the process of software module clustering [6],[7]. The first step in the clustering process is to extract the module-level dependencies from the source code and store the resultant information in a database by using source code analysis tool. After all of the module level dependencies have been stored in a database, a script is executed to query the database, filter the query results, and produce, as output, a textual representation of the Module Dependency Graph (MDG). We define MDGs formally, but for now, consider the MDG as a graph that represents the modules (Classes) in the system as nodes, and the relations between modules as weighted directed graph. Once the MDG is created, Bunch[2],[5] applies our clustering algorithms to the MDG and creates a partitioned MDG. The clusters in the partitioned MDG represent subsystems, where each subsystem contains one or more modules from the source code. The goal of Bunch s clustering algorithms is to determine a partition of the MDG that represents meaningful subsystems. After the portioned MDG is created, we use graph drawing tools such as Graphviz (dotty) to visualize the results. III. BACKGROUND AND RELATED WORK Fig. 2 Automatic Software Modularization Environment Software module clustering is an important and challenging task in software engineering. It is well known fact that a well- clustering, many approaches modularized structure of software system is easier to develop and maintain. To the software module have been applied successfully. Hill climbing [4] is one of the approaches and initiates the development of Bunch tool[2],[5] for automated software module clustering. Several other searching technologies [10] have been applied, including genetic algorithms. These experiments have all shown that other techniques are outperformed in both result quality and execution time by hill climbing. Fitness function The fitness function[3] need to be definedd for formulate software engineering problems.one possible way to measure the fitness is to determine cohesion and coupling values for each of the clusters. The cohesion could be measured as: Where Ni is the number of inner edges and Pi is the number of possible inner edges. The coupling in a similar way could be measure as: Where No is the number of edges connected to outside of the cluster and Po is the possible number of connections that can be made to outer edges. We may then measure the fitness of each cluster by: Our overall clustering fitness can be measured as the mean of all clusters: Where n is the number of clusters. (1) (2) (3) (4) (www.ijedr.org) 182

Modularization Quality In this paper, we used the Modularization Quality [8] measure, introduced by Mancoridis et al. The intra-edges are those for which the source and target of the edge lie inside the same cluster. The inter-edges are those for which the source and target lie in distinct clusters. MQ is the sum of the ratio of intra-edges and inter-edges in each cluster, called the Modularization Factor (MF ) for cluster k. MF can be defined as follows: 0, if i0, MF, if i0, (5) Where i is the weight of intra-edges and j is that of inter-edges, that is, j is the sum of edge weights for all edges that originate or terminate in cluster k. The reason for the occurrence of the term j in the above equation (rather than merely j) is to split the penalty of the inter-edge across the two clusters that connected by that edge. If the MDG is unweighted, then the weights are set to 1. The MQ can be calculated in terms of MF as MQ MF (6) Where n is the number of clusters. IV. HYBRID CLUSTERING APPROACH The Hybrid Clustering Approach considers following set of objectives: The number of clusters should be maximum, Modularization Quality should be maximum, The sum of intra-edges of all clusters should also maximum, The sum of inter-edges of all clusters should be minimum, The number of isolated clusters should also minimize. The intra-edges, inter-edges, and the MQ are used to measure the quality of the system partitioned. An isolated cluster is a cluster that contains only one module. With the experience we concluded that isolated clusters are uncommon on well-modularized decompositions and so they are deprecated in the Hybrid Clustering Approach by including the number of isolated clusters and an objective to be minimized. The aim of the Hybrid Clustering Approach measure is to capture the good clustering attributes. That is, it will have maximum amount of cohesion (intra-edges are maximizing) and minimal amount of coupling (inter-edges are minimizing). However, it should not put all modules into a single cluster (maximizing the number of clusters) and not produce a series of isolated clusters (so the number of isolated clusters is minimized). Since MQ is a well-studied objective function, this is also included as an objective for Hybrid Clustering Approach. This is one of the attractive aspects of a Hybrid Clustering approach; one can always include other candidate single objectives as one of the multiple objectives to be optimized. The MQ value will tend to increase if there are more clusterss in the system, so it also makes sense to include the number of clusters in the modularization as an objective. To illustrate the Hybrid Clustering Approach, consider the Module Dependency Graph in Figure 3. The set of objective values for Hybrid Clustering Approach are as follows: Sum of Intra-edges of all clusterss (cohesion): 9, Sum of Inter-edges of all clusterss (coupling): -6, The total number of clusters: 3, Modularization Quality: 2.207, The total number of isolated clusters: 0. Fig. 3 The Module Dependency Graph after clustering by Hybrid Clustering Approach. V. EXPERIMENTAL RESULTS Experiment was conducted on seven real-worltimes independently. There are two different ways to get the average of MQ value. One way by using the hill-climbing algorithm, clustering systems which are shown in Table 1. Each MDG can be executed 20 this gives only one solution in each run. The average of MQ can be calculated indirectly from the solutions. However, the twobe the best solution in each archive algorithm produces the set of solutions. The solution with the highest MQ is chosen to run. (www.ijedr.org) 183

Thus, the average of the MQ of obtained solutions from the Hybrid Clustering Approach is estimated using the representatives from 20 runs. This is the method to collect the MQ values from the experiment. TABLE 1: THE SYSTEMS STUDIED Name Nodes Mutunis 20 Ispell 24 Rcs 29 Bison 37 Grappa 86 Bunch 116 Incl 174 Edges Description 57 An operating system for educational purposes written in the turning language 103 Software for spelling and typographical error correction in files 163 Revision Control System used to manages multiple revisions of files 179 General-purpose parser generator for converting grammar description into c programs 295 Genome rearrangement analyzer under parsimony and other phylogenetic algorithms 365 Software Clustering tool (Essential java classes only) 360 Graph drawing tool Table 2 present the result comparing Hybrid Clustering Approach and Hill-Climbing approaches by considering MQ value as an assessment criterion. There is a good evidence to suggest that the Hybrid Clustering Approach outperformed the Hill-Climbing approach. That is, the Hybrid Clustering Approach gives higher values for MQ in five from seven problems. TABLE 2 COMPARISON OF SOLUTIONS FOUND BY THE HYBRID CLUSTERING APPROACH AND HILL-CLIMBING APPROACHES USING MQ VALUE AS AN ASSESSMENT CRITERIA Name mtunis ispell rcs bison grappa bunch incl Hybrid Clustering Approach STD Mean Mean Hill-Climbing STD 2.642 0.158 2.216 0.158 2.435 0.143 2.159 0.1666 2.994 0.246 2.186 0.354 2.556 0.108 2.646 0.103 14.016 0.132 12.478 0.103 10.390 0.207 9.048 0.001 12.602 0.001 14.041 0.0700 Fig. 4 Clustered MDG of mtunis system using hill-climbing algorithm (www.ijedr.org) 184

Fig. 5 Mtunis system applying MCA approach VI. CONCLUSION AND FUTURE WORK This paper introduces the Hybrid Clustering approach to software module clustering and comparing the results obtained with this from the existing Hill-Climbing Approach on 7 real world module clustering problems. The result indicated that Hybrid Clustering approach produces superior resultss than the existing Hill-Climbing Approach. Future work could be considering other possible objectives for better modularization. REFERENCES [1] Kata Praditwong, Mark Harman, and Xin Yao, Software Module Clustering as a Multi-Objective Search Problem, Vol. 37, No. 2, March/April 2011. [2] B.S. Mitchell and S. Mancoridis, On the Automatic Modularization of Software Systems Using the Bunch Tool, IEEE Trans. Software Eng., vol. 32, no. 3, pp. 193-208, Mar. 2006. [3] Baresel, H. Sthamer, and M. Schmidt. Fitness function design to improve evolutionary structural testing. In GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, pages 1329 1336, San Francisco, CA 94104, USA, 9-13 July 2002. Morgan Kaufmann Publishers. [4] K. Mahdavi, M. Harman, and R.M. Hierons, A Multiple Hill Climbing Approach to Software Module Clustering, Proc. IEEE Int l Conf. Software Maintenance, pp. 315-324, Sept. 2003. [5] S. Mancoridis, B.S. Mitchell, Y.-F. Chen, and E.R. Gansner, Bunch: A Clustering Tool for the Recovery and Maintenance of Software System Structures, Proc. IEEE Int l Conf. Software Maintenance, pp. 50-59, 1999. [6] C.Kishore, Asadi Srinivasulu, Multi-Objective Approach for Software Module Clustering, International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 3, Issue 2, March 2012. [7] C.Kishore, Asadi Srinivasulu, Anusha G, Comparative Study of Software Module Clustering Algorithms: Hill-Climbing, MCA and ECA, International Journal of Advanced Research in Computer Engineering and Technology, [8] Chandrakanth P, Anusha G, Kishore C, Software Module Clustering using Single and Multi-Objective Approaches, International Journal of Advanced Research in Computer Engineering and Technology, Vol 1( (10), December 2012. [9] Chandrakanth P, Anusha G, Kishore C, ESBA: Enhanced Search Based Approach for Software Module Clustering, International Journal of Advanced Research in Computer Engineering and Technology, Vol 2( (10), October 2013. [10] https://www.cs.drexel.edu/~spiros/bunch/mitchellphd.pdf (www.ijedr.org) 185