Evaluating the Evolution of a C Application

Similar documents
Derivation of Feature Component Maps by means of Concept Analysis

SOFTWARE MAINTENANCE: A

Guidelines for the application of Data Envelopment Analysis to assess evolving software

Recovering Interaction Design Patterns in Web Applications

Software Evolution: An Empirical Study of Mozilla Firefox

Visualizing and Characterizing the Evolution of Class Hierarchies

Training & Documentation. Different Users. Types of training. Reading: Chapter 10. User training (what the system does)

Software Maintenance. Maintenance is Inevitable. Types of Maintenance. Managing the processes of system change

5. Computational Geometry, Benchmarks and Algorithms for Rectangular and Irregular Packing. 6. Meta-heuristic Algorithms and Rectangular Packing

Managing Change and Complexity

Recording end-users security events: A step towards increasing usability

WARE: a tool for the Reverse Engineering of Web Applications

A Study of Bad Smells in Code

Concept Analysis. Porfirio Tramontana Anna Rita Fasolino. Giuseppe A. Di Lucca. University of Sannio, Benevento, Italy

COMMON ISSUES AFFECTING SECURITY USABILITY

Porfirio Tramontana Anna Rita Fasolino. Giuseppe A. Di Lucca. University of Sannio, Benevento, Italy

An Object Oriented Runtime Complexity Metric based on Iterative Decision Points

3 Graphical Displays of Data

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) NEED FOR DESIGN PATTERNS AND FRAMEWORKS FOR QUALITY SOFTWARE DEVELOPMENT

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Comparison between SLOCs and number of files as size metrics for software evolution analysis

Software Maintainability Ontology in Open Source Software. Celia Chen ARR 2018, USC

Ingegneria del Software Corso di Laurea in Informatica per il Management. Software quality and Object Oriented Principles

3 Graphical Displays of Data

Reverse Software Engineering Using UML tools Jalak Vora 1 Ravi Zala 2

Architectural Reflection for Software Evolution

Software metrics for open source systems. Niklas Kurkisuo, Emil Sommarstöm, 25697

1 Hardware virtualization for shading languages Group Technical Proposal

Software Engineering Principles

Moderators Report. January Certificate in Digital Applications DA201

Course Report Computing Science Advanced Higher

Effects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex

2015 User Satisfaction Survey Final report on OHIM s User Satisfaction Survey (USS) conducted in autumn 2015

Cost Models. Chapter Twenty-One Modern Programming Languages, 2nd ed. 1

An Approach to Software Component Specification

Adding Usability to Web Engineering Models and Tools

Basically, a graph is a representation of the relationship between two or more variables.

SQL Solutions Case Study SOUTH WALES POLICE DEPARTMENT. How South Wales PD Improves their SQL Server Management with IDERA

Why Consider Implementation-Level Decisions in Software Architectures?

Extensible and Dynamic Data Structure Viewers in Java

Requirements Engineering for Enterprise Systems

A Comparison of File. D. Roselli, J. R. Lorch, T. E. Anderson Proc USENIX Annual Technical Conference

User Interface Document version

Whitepaper Italy SEO Ranking Factors 2012

Genetic Image Network for Image Classification

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

THE STATE OF IT TRANSFORMATION FOR RETAIL

A Novel Method for the Comparison of Graphical Data Models

GCSE. Business and Communication Systems. Unit 9 Using ICT in Business. Mark Scheme June Version 1: Final Mark Scheme

Data analysis and inference for an industrial deethanizer

Reverse Engineering with Logical Coupling

TYMNET as a multiplexed packet network

Does Firefox obey Lehman s Laws of software Evolution?

The Cost of Phishing. Understanding the True Cost Dynamics Behind Phishing Attacks A CYVEILLANCE WHITE PAPER MAY 2015

Case No COMP/M FLEXTRONICS / ALCATEL. REGULATION (EEC) No 4064/89 MERGER PROCEDURE. Article 6(1)(b) NON-OPPOSITION Date: 29/06/2001

Visualizing Software Dynamics

Multivariate probability distributions

Distribution of Population Entry Scores by Domain Population-based Self-Sufficiency Outcomes Matrix Report

HASS RECORD GUIDANCE. Version 1

VIFOR 2: A Tool for Browsing and Documentation

3Lesson 3: Web Project Management Fundamentals Objectives

Recent Design Optimization Methods for Energy- Efficient Electric Motors and Derived Requirements for a New Improved Method Part 3

Bayesian Learning Networks Approach to Cybercrime Detection

System Design and Modular Programming

JF MSISS. Excel Tutorial 1

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

Quality Driven Software Migration of Procedural Code to Object-Oriented Design

Consistent Measurement of Broadband Availability

Consistent Measurement of Broadband Availability

The C++ SoftBench Class Editor

Case study: evaluating the effect of interruptions within the workplace

Workbook Structure Analysis Coping with the Imperfect

BUSINESS VALUE SPOTLIGHT

Evolutionary form design: the application of genetic algorithmic techniques to computer-aided product design

Common Coupling as a Measure of Reuse Effort in Kernel-Based Software with Case Studies on the Creation of MkLinux and Darwin

The goal of this project is to enhance the identification of code duplication which can result in high cost reductions for a minimal price.

Building a Better Data System: What Are Process and Data Models?

HOW AND WHEN TO FLATTEN JAVA CLASSES?

This demonstration is aimed at anyone with lots of text, unstructured or multiformat data to analyse.

A Mission Critical Protection Investment That Pays You Back

CPSC 444 Project Milestone III: Prototyping & Experiment Design Feb 6, 2018

Die Wear Profile Investigation in Hot Forging

Analyzing Dshield Logs Using Fully Automatic Cross-Associations

Transactions on Information and Communications Technologies vol 11, 1995 WIT Press, ISSN

3D object comparison with geometric guides for Interactive Evolutionary CAD

Usability Evaluation of Software Testing Based on Analytic Hierarchy Process Dandan HE1, a, Can WANG2

JLPT Frequently Asked Questions

Visualising Software in Virtual Reality. Peter Young and Malcolm Munro

Conceptual Model for a Software Maintenance Environment

efmea RAISING EFFICIENCY OF FMEA BY MATRIX-BASED FUNCTION AND FAILURE NETWORKS


Research Article ISSN:

Quick Trial Balance Pro - Accounting Cycle. Accounting Cycle: Home Screen

ITERATIVE MULTI-LEVEL MODELLING - A METHODOLOGY FOR COMPUTER SYSTEM DESIGN. F. W. Zurcher B. Randell

Chapter 2: Descriptive Statistics

Analyzing the Product Line Adequacy of Existing Components

DATA PROCESSING PROCEDURES FOR UCR EPA ENVIRONMENTAL CHAMBER EXPERIMENTS. Appendix B To Quality Assurance Project Plan

CTL.SC4x Technology and Systems

Refactoring Practice: How it is and How it Should be Supported

Process Eye Professional. Recall

Transcription:

Evaluating the Evolution of a C Application Elizabeth Burd, Malcolm Munro Liz.Burd@dur.ac.uk The Centre for Software Maintenance University of Durham South Road Durham, DH1 3LE, UK Abstract This paper describes a case study where versions of software are used to track actual changes made to software application. The analysis of 3 sequential version of the gcc C compiler are described within this paper. The results, where possible, are highlighted using graphical representations of the change process. The discussion of results aims to identify some of the reasons for the specific change features identified. The overall objective of the approach is to gain a more detailed understanding of how and where the change processes take place. This will in the future allow the change processes characterisation, so that eventually it can be used to identify a number of metrics. These metrics will ultimately be used to assess the future maintainability of software applications. Introduction Despite the cost implication of software maintenance it is generally perceived as having a low profile within the software community. Management has often in the past placed little emphasis on maintenance related activities. Small advances have been made in combating these problems, but high profile maintenance tasks such as the year 2 problem have been successful at highlighting the issues. Software maintenance is made difficult by the age of the software requiring maintenance. The age of the software means that documentation has often been lost or is out of date. Furthermore, issues of staff turnover and constant demands for changes due to user enhancements or environmental changes exasperate the problems. Often the code is the only true and accurate description of the functionality of the software. Unfortunately, however, constant corrective maintenance, which is not supported by structural improvements, has a tendency to make the software more difficult to maintain in the future. This paper describes some of the current work being conducted within the Centre for Software Maintenance, at the University of Durham. Specifically, the paper describes a on-going study into the evolution of C programs. The objectives of this work are to gain a deeper understanding of how applications evolve and the reasons for their doing so. Furthermore, the outcome of this study is expected to lead to a more detailed understanding of different maintenance approaches and for some form of qualitative review which show will highlight the most beneficial type of maintenance approaches for the support software evolution. Approach In order to evaluate the results of the process of software change, a number of case studies have been carried out. The case studies have involved the analysis of a number of large software applications. This paper describes one of these studies, where the gnu compiler gcc is examined. In total 3 sequential versions of the software have been examined totalling over 9 million lines of C. In order to examine the evolution process the changes carried out to the software over time are investigated by analysing different versions of the same software. Specifically our current work concentrates on gaining an understanding of the process of evolution in a number of main areas. These are: General change features within the code such as the size of changes, the time taken to make the modification and the number of source code files involved within a change.

Changes in the calling of procedures (including additions, deletions and movement of procedures within the call structure). Changes in data usage (including additions, deletions and movement of data items across procedures). The C is analysed using an in-house C analyser developed by Kinloch[Kinloch93]. He produces a finegrained intermediate represented from programs written within the C language from which program call graphs, flow sensitive data flow, definition-use and control dependence view can be constructed. The approach extends Harrold and Malloy [Harrold91] work by dealing with constructs such as pointer and structure variables and value return functions with pass by value and pointer parameters. Kinloch refers to the analyser as CCG; the Combined C Graph. The analysis of the gcc applications, which is currently ongoing, will eventually assess changes in each of the above representations, however at present we have performed only a detailed high level analysis investigating changes to files and changes to the calling structure within the files. Comparisons between version are made in a number of ways; the most basic of which involves the use of the UNIX utility diff. Diff is particularly useful for identifying the actual changes that were made between versions thus it is possible to ensure that a change involves a modification of the source code rather than changes within the comments or file headers. A more advanced approach to comparison between version is the use of the dominator metric and to investigate how the value of this metric changes of the life-time of the software. Work on dominance trees has been carried out by the Department of Informatica e Sistemistica, at the University of Naples [Cilitile97], Fraunhofer Institute for Experimental Software Engineering [Girard97] and Centre for Software Maintenance [Burd96a, Burd96b]. The approach is primarily used as a means of providing an abstraction of, in this case, the call relations within source code. The dominance trees essentially represent dependencies between code modules in the form of a tree. The dominance relations are defined in the following way. In a call-directed-acyclic-graph (CDAG) a node px dominates a node py if and only if every path from the initial node x of the graph to py spans px. In a CDAG a node px directly dominates a node py if and only if all the nodes that dominate py dominate px. In a CDAG there is a relation of strong direct dominance between the nodes px and py if and only if px directly dominates and it is the only node that calls py. The expression of dominance through the use of the strong and direct dominance relations provides an indication of the complexity of the relationships of the calling structure of the code. The greater its complexity the harder the software is to understand and therefore the harder it is to change. Therefore, the higher the proportion of direct dominance (the more complex) relations the harder the software is to maintain. By tracking the process of evolution using these relations it is possible to gain an understanding of the changes in dominance complexity that are occurring within the code over time. Dominance trees can be used to express potential objects within the code [Burd96b]. For instance, each subgraph within the dominance tree can be considered as a potential object. Dominance trees are often used in this way to give an indication how source code can be restructured. Thus, through the examination of the trees an indication of the modular nature of the code can be obtained. Changes in the modularisations of the code such as the overall number of objects and their composition also provide important information regarding the changing nature of software. For this reason other investigations on the source code are performed, such as investigating the number of nodes in specific dominance trees and the overall number of dominance trees per version of the software. Results In order to give an overview of the gcc application an indication will now be given of the changes that are occurring to the software over time at the file level. In this case only changes to the source code are recorded. The figure on the left (Figure 1) shows all 3 versions of the application. Each column represents a different version of the software. Versions are represented sequentially the oldest version to the left the most recent version (2.8.1) to the right. Each of the rows represents a different file within the application. The shaded boxes represent the files that have been changed within the specific version. The files are sorted into order of the number of changes. Those at the top represent those that most frequently change, those at the bottom are the files that are changed very infrequently.

From the figure it can be seen that there are a number of characteristics of the changes. For instance, it can be seen that for that the top row is shaded for each version of the software. This means that for this particular file a modification has been made for each version of the software. This file is actually version.c which prints out the version number of the application being used so in this case it should be expected to change on each occasion. It is however, the only such file to change on each occasion. Those changes that the columns are most heavily shaded represent major changes to the software. Those columns that contain only a few changes may represent, for instance, the result of small bug corrections. It is interesting to see how the majority of the changes are made to relatively few of the files. This is especially true when major changes to the software are discounted. Specifically, 3 or 4 files seem to be changed in each version of the software. It is therefore likely that it is these files which are in most need of preventative maintenance as these either represent the core functions of the application or are hard to understand. Cases where the software is difficult to understand may mean that during the process of updating the software mistakes are made and therefore such files often require bug fixes. An investigation into this area is an ongoing direction of the research. The visual representation of such a change history provides an important guide to assist the preventative maintenance process. This above chart allows visual identification of those files that are frequently changing. Those files which are changing more frequently and most often might be those to which maintenance work is better targeted. However, would also be of interest to see how many or to what detail changes are made to each of the files. One possible solution to this problem would be the use of colour where colour could be used to categorise the degree to which changes have been made for each file and version. For instance, red boxes may highlight that a large number of changes had been made. In this way a row with a large number of red boxes would be very identifiable as representing where the major proportion of the maintenance activity occurred. Figure 1: 3 versions of the gcc application

Figure 2 shows a graph of the C application. The 3 versions of the software are represented across the horizontal axis, whereas the vertical axis shows the number of source code files involved within a change, per version, and secondly the number of months between each release made. The graph shows a far degree of correspondence between the number of changes per version and the time between each release. Showing Number of Changes and Time to make Modifications 12 16 1 14 12 8 1 6 8 Months Changes 4 6 4 2 2 1 2 3 4 5 6 7 8 9 111121314151617181922122232425262728293 version Figure 2: Number of Changes and Time to Make Them The figure identifies the presence of 5 major changes within the software. Studies in evolution indicate that when such major changes occur within the software its complexity increases [Lehman97]. Furthermore, other specific patterns can also be identified from the graph. For instance, when referring to the changes immediately following a major change a small number of changes are often performed soon after the initial release. It is assumed that this relates to minor corrections such as bug fixes. Later changes show significant increases within the number of files involved within a change. This increase appears to be soon followed by a new major change. The above results give an indication of the changes that are occurring to the application as a whole. A more low level analysis will now be given of the changes that are occurring within the files, in particular, at the calling level. By way of illustration this paper will now describe the results of the study of evolutionary change on a specific file within the gcc application; combine.c. The results of the analysis showed that there were no additions to the functions within the c code. The changes that did occur however, were related to the calls between the functions. Within the study combine.c was updated 2 times over the 3 versions studied. However only 5 of these changes resulted in a change to the call graph. The degree of changes to the call graph differed greatly between the releases. Varying from a single change to 168 accountable changes. Changes were broadly categorised into 4 types from the analysis of the call graph, these are the addition of new call, the removal of a call, an increase in the number of calls between two specific functions and a decrease in the number of call between two specific functions. Each of these changes has the potential, but will not necessarily, change the dominance relations within the code. The results showed that of the five changes to the call graph 3 of these resulted in a change to the dominance relations within the code. It was indicated above that changes within the dominance relation have the potential to express changes within the maintainability of the application. Where there is an increase in direct dominance relations this tends to mean that there is a decrease in comprehensibility of the code and therefore the maintenance process will in the future be harder to perform. Similarly, an increase in the number of strong dominance relations expresses the reverse process. The results of the analysis process showed that there is an inverse relation between an increase within the number of direct dominance relations and strong dominance relations. Thus, in general it was found that when one

increased the other change was small. For instance, a change within on version shows a change of 3 relations between strong to direct dominance, whereas the reverse change accounted for 25 relations. Cumulative total of Dominance Relation Changes 3 25 2 15 1 5 2.5.4 2.5.5 2.7. 2.7.2 2.8. version s->d d->s Figure 3: Changes to the dominance relations Figure 3 shows the cumulative changes over a number of versions. It is interesting to note the high increase of strong dominance relations between version 2.7.2 and 2.8.. It would appear that a preventative maintenance process has been performed during this major release. This process was not found to occur when 2.7. was released. Conclusions and Further Work This paper has described some of the initial finding from a study of the gcc application. It has identified a number of hypotheses regarding the changes to the application over time. Further work will now be performed to verify if these hypotheses are correct. Furthermore, so far the analysis has only been carried out at the file and calling structure level. More detailed analysis is soon to be performed to identify changes, for instance, within the data structures. References Burd96a Burd96b Cilitile97 Girard97 Harrold91 Kinloch93 Lehman97 Burd E.L., Munro M., Wezeman C., Analysing Large COBOL Programs: the extraction of reusable modules, published in Proceedings of the International Conference on Software Maintenance, California, IEEE Press, 1996. Burd E.L., Munro M., Wezeman C., Extracting Reusable Modules from Legacy Code: Considering issues of module granularity, published in Proceedings of the 3rd Working Conference on Reverse Engineering, California, IEEE Press, 1996. Cimitile A., De Lucia A., Di Lucca G.A. Fasolino A.R., Identifying Objects in Legacy Systems, International Workshop on Program Comprehension, IEEE Press 1997 Girard J-F., Koschke R., Finding Components in a Hierarchy of Modules: a step towards architectural understanding, International Conference on Software Maintenance, IEEE Press, 1997 Harrold M., Malloy B., A Unified Interprocedural program representation for a maintenance environment, Proceedings of the Conference on Software Maintenance, Italy, IEEE Press, 1991 Kinloch D., Munro M., A Combined Representation for the Maintenance of C Programs, 2 nd Workshop of Program Comprehension, WPC 93, Italy, IEEE Press, 1993 Lehman M.M., Ramil J.F., Wernick P.D., Perry D.E., 'Metrics and Laws of Software Evolution - the nineties view', Symposium on Software Metrics, IEEE Press, Nov 1997