Computer Technology Institute. Patras, Greece. In this paper we present a user{friendly framework and a

Similar documents
Frama-C's metrics plug-in

Software Metrics and Microcode Development: A Case Study. George Triantafyllosy, Stamatis Vassiliadisz, Jose Delgado-Frias# October 24, 1995

Anale. Seria Informatică. Vol. XVI fasc Annals. Computer Science Series. 16 th Tome 1 st Fasc. 2018

Taxonomy Dimensions of Complexity Metrics

Software Metrics. Lines of Code

However, no results are published that indicate the applicability for cycle-accurate simulation purposes. The language RADL [12] is derived from earli

Finding a winning strategy in variations of Kayles

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

Extra-High Speed Matrix Multiplication on the Cray-2. David H. Bailey. September 2, 1987

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax:

Document Image Restoration Using Binary Morphological Filters. Jisheng Liang, Robert M. Haralick. Seattle, Washington Ihsin T.

\Classical" RSVP and IP over ATM. Steven Berson. April 10, Abstract

SNOW 3G Stream Cipher Operation and Complexity Study

Enhancing Integrated Layer Processing using Common Case. Anticipation and Data Dependence Analysis. Extended Abstract

Measuring Complexity

ACM / /0262 $

Automated Testing Tool for

Outline. Computer Science 331. Information Hiding. What This Lecture is About. Data Structures, Abstract Data Types, and Their Implementations

i=1 i=2 i=3 i=4 i=5 x(4) x(6) x(8)

Frama-C s metrics plug-in

An Interactive Desk Calculator. Project P2 of. Common Lisp: An Interactive Approach. Stuart C. Shapiro. Department of Computer Science

An Ecient Approximation Algorithm for the. File Redistribution Scheduling Problem in. Fully Connected Networks. Abstract

Lecture 26. Introduction to Trees. Trees

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Routing and Ad-hoc Retrieval with the. Nikolaus Walczuch, Norbert Fuhr, Michael Pollmann, Birgit Sievers. University of Dortmund, Germany.

Basel Dudin

An Evaluation of Information Retrieval Accuracy. with Simulated OCR Output. K. Taghva z, and J. Borsack z. University of Massachusetts, Amherst

Let v be a vertex primed by v i (s). Then the number f(v) of neighbours of v which have

Software Metrics and Design Principles. What is Design?

OO Development and Maintenance Complexity. Daniel M. Berry based on a paper by Eliezer Kantorowitz

The temporal explorer who returns to the base 1

ClearSQL Code Metrics Inside

MACHINE LEARNING BASED METHODOLOGY FOR TESTING OBJECT ORIENTED APPLICATIONS

User Interface Modelling Based on the Graph Transformations of Conceptual Data Model

Frama-C s metrics plug-in

Risk-based Object Oriented Testing

Evolutionary Decision Trees and Software Metrics for Module Defects Identification

AST: Support for Algorithm Selection with a CBR Approach

Gen := 0. Create Initial Random Population. Termination Criterion Satisfied? Yes. Evaluate fitness of each individual in population.

Measuring the Vulnerability of Interconnection. Networks in Embedded Systems. University of Massachusetts, Amherst, MA 01003

CHAPTER 4 OBJECT ORIENTED COMPLEXITY METRICS MODEL

Frank Mueller. Dept. of Computer Science. Florida State University. Tallahassee, FL phone: (904)

such internal data dependencies can be formally specied. A possible approach to specify

perspective, logic programs do have a notion of control ow, and the in terms of the central control ow the program embodies.

Comparing Software Abstractions Baby Steps. Michael Hansen Lab Lunch Talk 2011

Software Component Relationships. Stephen H. Edwards. Department of Computer Science. Virginia Polytechnic Institute and State University

C1: C2: C3: C4: C5: C6: C7: pop(a,[],ns):- A=empty, NS=[]. pop(n,l,s,os) and 0 C8: N>0 L=[X NL], pop(x,s,ns), NN=N-1, pop(nn,nl,ns,os)

ABSTRACT Finding a cut or nding a matching in a graph are so simple problems that hardly are considered problems at all. Finding a cut whose split edg

2 Texts and Text Styles The texts considered for generation in the current stage of the AGILE project are simplied versions of routine passages occurr

Impact of Dependency Graph in Software Testing

Evaluating the Effect of Inheritance on the Characteristics of Object Oriented Programs

1 Introduction One of the contributions of Java is in its bytecode verier, which checks type safety of bytecode for JVM (Java Virtual Machine) prior t

A.java class A f void f() f... g g - Java - - class file Compiler > B.class network class file A.class Java Virtual Machine Loa

Function in Marvel. Toni A. Bunter

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

A Formal Analysis of Solution Quality in. FA/C Distributed Sensor Interpretation Systems. Computer Science Department Computer Science Department

15-122: Principles of Imperative Computation, Spring 2016

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014.

Outline. 1 About the course

A technique for adding range restrictions to. August 30, Abstract. In a generalized searching problem, a set S of n colored geometric objects

Detecting Code Similarity Using Patterns. K. Kontogiannis M. Galler R. DeMori. McGill University

Set Constraints: Results, Applications and. current state of the art, open problems, applications and implementations.

Conclusions and Future Work. We introduce a new method for dealing with the shortage of quality benchmark circuits

guessed style annotated.style mixed print mixed print cursive mixed mixed cursive

Process Allocation for Load Distribution in Fault-Tolerant. Jong Kim*, Heejo Lee*, and Sunggu Lee** *Dept. of Computer Science and Engineering

Henning Koch. Dept. of Computer Science. University of Darmstadt. Alexanderstr. 10. D Darmstadt. Germany. Keywords:

Annex A (Informative) Collected syntax The nonterminal symbols pointer-type, program, signed-number, simple-type, special-symbol, and structured-type

instruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals

[13] D. Karger, \Using randomized sparsication to approximate minimum cuts" Proc. 5th Annual

highest cosine coecient [5] are returned. Notice that a query can hit documents without having common terms because the k indexing dimensions indicate

Implementations of Dijkstra's Algorithm. Based on Multi-Level Buckets. November Abstract

A Case for Two-Level Distributed Recovery Schemes. Nitin H. Vaidya. reduce the average performance overhead.

Intersection of sets *

PROJECTION MODELING SIMPLIFICATION MARKER EXTRACTION DECISION. Image #k Partition #k

Localization in Graphs. Richardson, TX Azriel Rosenfeld. Center for Automation Research. College Park, MD

Rance Cleaveland The Concurrency Factory is an integrated toolset for specication, simulation,

CSM27 Exercises. Hans Georg Schaathun. November 25, Week In session. 1.2 Weekly Exercises. Security problems brain storming

Research on outlier intrusion detection technologybased on data mining

. The problem: ynamic ata Warehouse esign Ws are dynamic entities that evolve continuously over time. As time passes, new queries need to be answered

Abstract The Transmission Control Protocol (TCP) has been shown to yield low utilization over networks with high delay*bandwidth products (DBP). We ha

Acknowledgments 2

Matrices. D. P. Koester, S. Ranka, and G. C. Fox. The Northeast Parallel Architectures Center (NPAC) Syracuse University

Parallel Program Graphs and their. (fvivek dependence graphs, including the Control Flow Graph (CFG) which

Creating Meaningful Training Data for Dicult Job Shop Scheduling Instances for Ordinal Regression

2 The Service Provision Problem The formulation given here can also be found in Tomasgard et al. [6]. That paper also details the background of the mo

A macro- generator for ALGOL

IMPACT OF DEPENDENCY GRAPH IN SOFTWARE TESTING

residual residual program final result

CPSC 320 Sample Solution, Playing with Graphs!

VISION-BASED HANDLING WITH A MOBILE ROBOT

B2 if cs < cs_max then cs := cs + 1 cs := 1 ra

An On-line Variable Length Binary. Institute for Systems Research and. Institute for Advanced Computer Studies. University of Maryland

Mahsa Vahidi and Alex Orailoglu. La Jolla CA of alternatives needs to be explored to obtain the

BMVC 1996 doi: /c.10.41

Siegfried Loer and Ahmed Serhrouchni. Abstract. SPIN is a tool to simulate and validate Protocols. PROMELA, its

The 17th International Conference on Safety, Reliability and Security (SAFECOMP 98) LNCS, Heidelberg, 1998

SORT INFERENCE \coregular" signatures, they derive an algorithm for computing a most general typing for expressions e which is only slightly more comp

Plaintext (P) + F. Ciphertext (T)

A note on distributed multicast routing in point-to-point networks

Synchronization Expressions: Characterization Results and. Implementation. Kai Salomaa y Sheng Yu y. Abstract

Transcription:

MEASURING SOFTWARE COMPLEXITY USING SOFTWARE METRICS 1 2 Xenos M., Tsalidis C., Christodoulakis D. Computer Technology Institute Patras, Greece In this paper we present a user{friendly framework and a methodology to analyse and interpret the extracted results, while applying software metrics. This methodology could be used for fast and ecient software quality control, without being an expert, regarding the metrics used and the particular software. 1 INTRODUCTION Nowadays a high number of software projects fails to follow their prespecied requirements regarding time, budget and specications. Also their maintenance eort is higher than their implementation eort. Thus, there is a great need for software metrics, in order to aid towards the overcoming of this \software crisis". Though, the results of the software metrics are not used eciently, in order to direct those actions which will lead towards the improvement of the software's quality. This is because the metric's results are not fully analysed and interpreted. In order to preform measurements we use the ATHENA metrics environment, which has already been presented in a former paper /7/. The overall architecture of ATHENA is depicted in gure 1. When measuring a specic programming language the grammar processor (gp) processes its specications described in the language specication le (LSF) and accordingly produces a metrics processor (mp) for this language. Then the metric processor processes source les and produces the RESULTS le, 1 This paper was supported in part by QUICHE Project in the AIM framework 2 Proceedings of the 3rd European Conference on Software Quality, Sponsored by EOQ (European Organization for Quality), Madrid Spain, pp. 223-234, 1992.

RP report GP MP RES GXP graphics LSF source DBP data-base The ATHENA metrics environment Figure 1: The ATHENA metrics environment holding the results of all measurements. The graphics processor (gxp) processes the RESULT le in order to present the results graphically. The report processor (rp) process the RESULT le and generates a report text le. Finally, the database processor (dbp) saves the RESULTS le in a database. The main goal of this paper is to demonstrate how the extracted from ATHENA measurement results could be analysed and interpreted 3.It provides a methodology, used to evaluate software and locate its problem areas. Section 2 includes a brief presentation of the metrics, which were used in this methodology. Section 3 contains the analysis of the metrics, some ways towards interpretation of the results and the methodology. Finally, in section 4, future goals and open problems are briey discussed. 2 PRESENTATION OF METRICS 3 A tool to preform the methodology has also been developed

Although ATHENA is a metrics{free 4 environment, in our methodology we use three well{known metrics. These are Halstead's \Software Complexity Measure" /4/, McCabe's \Cyclomatic Complexity Metric" /5/ and Tsai's \Data Structure Complexity Metric" /6/. A brief presentation follows. 2.1 Software Complexity Measure Halstead considers a program as a collection of tokens (operators or operands). The measurements of the program are based on counts of those tokens. Four basic quantities are proposed: n 1 : the number of distinct operators. n 2 : the number of distinct operands. N 1 : the total occurrences of operators. N 2 : the total occurrences of operands. Halstead proposed some additional quantities. Those quantities presented in table 1 are used for the interpretation and the methodology of this paper. Quantity's Name Evaluation Formula Program's Length N = N 1 + N 2 Length's Estimator ^N = n1 log 2 n 1 + n 2 log 2 n 2 Language's Level = N(log 2 n)(2n 2 =n 1 N 2 ) 2 Eort E = Nn 1 N 2 (log 2 n)=(2n 2 ) Time Estimation T = E=S where S 18 Table 1: Halstead's Quantities 2.2 Cyclomatic Complexity Metric 4 We can use any new metric, providing that we will formally describe it

sequent. while-do do-while switch if-then-else Various McCabe subgraphs Figure 2: McCabe's statement cases Cyclomatic complexity metric, proposed by McCabe, is used to measure the complexity of a program, as implied by its name. Those measurements take place on the control graph of the program. The cases of the possible statements into a control graph are shown as follows: sequentially executed statements, a While Do statement, a Do While statement, a Switch statement, and an If Then Else statement. All the other commands could derive from the above. The graphs 5 for each one of them are shown at gure 2. McCabe dened the cyclomatic number of a graph as: V (G) = e; n+2p where: e is the number of edges, 5 The control graph for the switch statement is based on our experience using metrics

n is the numberofnodes,and p is the number of connecting components. Additionally McCabe dene the essential complexity as: ev = V (g) ; m where m is the number of subgraphs of the control graph, which have a unique input and output node. 2.3 Data Structure Complexity Metric This metric proposed by Tsai counts the complexity of the data structures, used by the program and, therefore, is applicable before the program's implementation. To apply this metric, the measured software must uses data structures D of the form: D :: F(D 1 ::: D n ), with n 1 where F is the operator 6 which builds the data structure. D i are references into previously dened data structures, which are used to build the structure D while n is their number. Tsai's metric takes as input a series of data structures Q of the form: Q = fd i jd i :: F j (D s1 ::: D sk )g, with 1 k(i) where F j are the data structure buildoperators and D s x (1 x k(i)) are references into previously dened data structures. First of all, a multigraph 7, which shows the references into the structure denitions is built. Afterwards, the graph is reduced, by deleting the nodes with unique input and output. Then the strongly connected components are located and a monomial for each one of them is dened as: S(K)=(V + E)x L 6 Internal function 7 In order to allow circular denitions

S=4x S=6x^2 S=6x^4 Various Tsai multigraphs Figure 3: Tsai's example graphs where V is the number of nodes of the component, E is the number of edges and L is the number of simple circular paths into this component. We dene an ordering relation as follows: If K i 6= K j, then retains K i K j If and only if there is a path from some node of K j to a node of K i It can be easily proven that is a partial ordering. There is a non empty set M: M = fk i j not exist K j : K j K i g Now, for each component, a polynomial of its complexity can be calculated as follows: S(K)= X 8KiK fc(k i )g + S(K) This polynomial expresses the complexity of each data structure used by the program. Examples of graphs and their complexity are illustrated in gure 3. 3 INTERPRETATION OF METRICS AND

PRESENTATION OF THE METHODOLOGY Before presenting the methodology, is important to interpret the metric results. In the following section we will combine all the partial interpretations, in order to evaluate software's quality. A brief presentation of the interpretation of the most signicant quantities follows. n 1 : The number of unique operators is limited (depending upon the programming language). It tends to stabilise its value while the program's length grows. Since each Goto label is a unique operator, the value of n1, when exceeds the average, in most cases, means non{structured programming with low quality. N and ^N : N counts the actual program's length. On the other hand, ^N ignores any code impurities. Thus, correlation of them indicates a program without code impurities. Actually, things are not so simple, because this correlation is eected by the program's size. Christensen's /1/ measurements shown the best correlation for modules between 2000 to 4000 tokens. Additionally, our measurements indicate the best correlation for subroutines, ranging between 300 to 800 tokens. : The language's level () is expected to be similar for every program written in the same language. But that does not occur. In fact, it depends upon the use of the language by the individual programmer. Existing estimated averages for, in every commonly used language, comparable to each resulting, for every measured program, could give an evaluation of \how well is the language used". Eort and Time : Funami's /3/ measurements indicate a 98% correlation between the program's eort and the number of bugs, as this was measured during the program's testing phase. Additionally, our experiments shown great correlation between estimated and real time.

V (g) and ev : As V(g) grows, the program's complexity grows and in most cases its quality decreases. Measurements, performed by McCabe show the number 10 as an upper limit for the cyclomatic complexity of a software's part. A program with V (g) >> 10 its better to get restructured, in order to become less complex. Every program's cyclomatic complexity could be reduced 8 until V(g)=ev. Additionally, could be proven, that the essential complexity of every structured program is equal to one. This means that structured programs could be reduced down to any cyclomatic complexity we have chosen (having minimum one). But, for the non{structured programs, we have the essential complexity as an indicator, to show their reducing ability down to an accepted value. Tsai's polynomials : In the previous section, metric results are polynomials of the form: a 0 + a 1 x + :::+ a n x n. By examining the data structure types that can occur in it, a program is distinguished, according to its data structure types, into two basic categories: a) Non{circular data structures : These are data structures which do not consist of recursive (circular) data denitions. b) Circular data structures : These are recursively dened data structures, which denition contains circular references. In Tsai's metric, exponents measure complexity due to circular denitions, and coecients measure complexity due to non{circular denitions. 3.1 The Methodology An expert scientist could view the tables, which consist of a program's metric results and evaluate its quality. But it uses his experience, rather than an automated method. To develop a program, which will preform the 8 The value of ev shows us the best complexity which could be achieved by breaking the program down to subroutines.

MODULE measurements analysis routine 1 Halstead results routine 2 McCabe results Analysis routine N OUTPUT Tsai results The measuring process Figure 4: The measuring process same task, was very dicult. There are many dierent cases to be tested, resulting a very complex program. In this section we will present the main idea of the used methodology, without entering in technical details. The measuring process is illustrated in gure 4. It shows how the three used metrics are applied and how their results are collected for each module. Halstead's and McCabe's metrics are applied in every routine and Tsai's metric in every module. Then, an evaluation of the module's quality takes place. This process is repeated for every module. The overall results are statistically analysed, primarily routine by routine and then module by module. Thus, a conclusion about the overall program's quality could be drawn and additionally its problematic modules and routines could be located. The analysis is aided by statistics stored in the database, from previous measurements of other programs. ATHENA environment aided the automation of the methodology, providing the necessary tools. An outlined presentation of this methodology is given in the

following steps: Step 1: Apply Halstead's and McCabe's metrics on routine. Store results into database. Step 2: Step 3: Step 4: Repeat step 1 until the end of module. Apply Tsai's metric on module. Store results into database. Repeat steps 1{3 until the end of project. Step 5: Statistically analyse results for every routine. Store results into database. Step 6: Statistically analyse results for every module. Store results into database. Step 7: Interpret the stored results using the output from the statistical analysis. Step 8: Cross-examine the results with those from other projects. Step 9: Conclude about project's quality (using results from steps 5{8). Locate problematic routines and modules. At table 2, the questions asked on every module are presented. Of course, the actual questions are not so simple. The rst 6 questions deal with routine's evaluation and the last one with module's evaluation. The order of the rst six represents the sequence of the automation and the way their results contribute in the nal result. The analysis of the answers, based on the previously presented interpretations, is also not so simple. For instance, when asking \are N and ^N correlated?" you don't answer \Yes" or \No", but a number showing the degree of their correlation. Based on the above analysis, this degree is higher for the pair (N = 25 and ^N =50, with ^N =2N), than the pair (N = 600 and ^N = 800, with ^N = 2 3 N).

# Quantity's Name Analysis 1 N is between 300 to 800? 2 V(g) and ev V (g) 10 and ev =1? 3 n 1 is near the average? 4 Nand ^N are correlated? 5 is near the expected one? 6 E and T are in low levels? 7 Tsai's polynomials are simple or complex? Table 2: Simple questions 4 CONCLUSION We have presented three metrics, their interpretation, a methodology to evaluate the quality of software and the environment under which this methodology was automated. The presentation of the methodology has been basically outlined, in order to maintain the paper's size in low levels. Including more metrics in it, is a future goal. To make the automated program simpler is another future goal and is already in progress. REFERENCES /1/ Christensen K., Fitsos G., Smith C. \A Perspective on Software Science" IBM SYST J. Vol. 20. No 4. (1981) /2/ Christodoulakis D., Tsalidis C. \Design Principles of the ATHENA Software Maintainability Tool" Microprocessing and Microprogramming, The Euromicro Journal, Vol. 28. No. 1{5, (1990) /3/ Funami Y., Halstead M. \A Software Physics Analysis of Akiyama's Dubugging Data" Symposium on Computer Software Engineering Polytechnic Institute of New York, (1976) /4/ Halstead M. \Elements of Software Science" Elsevier, New York, (1977)

/5/ McCabe T. \A complexity Measure", IEEE Transactions on Software Engineering, Vol. SE{2, No 4, pp. 308{320, (1976) /6/ Tsai W., Lopez M., Rodriguez W., Volonik D. \An Approach to Measuring Data Structure Complexity", IEEE COMPSAC 86, pp. 240{246 (1986) /7/ Tsalidis C., Christodoulakis D., Maritsas D. \ATHENA: A Software Measurement and Metrics Environment" Software Maintenance: Research and Practice (1991)