Software metrics for open source systems. Niklas Kurkisuo, Emil Sommarstöm, 25697

Similar documents
Evolution in Open Source Software: A Case Study

(S)LOC Count Evolution for Selected OSS Projects. Tik Report 315

The Evolution of FreeBSD and Linux

Topics. Operating System. What is an Operating System? Let s Get Started! What is an Operating System? Where in the Book are we?

Linux. What is it? What s good about it? What s bad about it?

Topics. Operating System I. What is an Operating System? Let s Get Started! What is an Operating System? OS History.

Principles in Programming: Orientation & Lecture 1. SWE2004: Principles in Programming Spring 2015 Euiseong Seo

Principles in Programming: Orientation & Lecture 1. SWE2004: Principles in Programming Spring 2014 Euiseong Seo

Discovering Computers Chapter 13 Programming Languages and Program Development

Open Source Development

CNT 5605, Fall 2009: Introduction

Scripted Components: Problem. Scripted Components. Problems with Components. Single-Language Assumption. Dr. James A. Bednar

Scripted Components Dr. James A. Bednar

Overview. Rationale Division of labour between script and C++ Choice of language(s) Interfacing to C++ Performance, memory

History of Unix, Linux and the Open Source

The Cathedral and the Bazaar

Does Firefox obey Lehman s Laws of software Evolution?

CNT 4603, Spring 2009: Introduction

Software Evolution: An Empirical Study of Mozilla Firefox

Everyone who has been involved with IT for more than a few

The Linux Kernel as a Case Study in Software Evolution

Compilers. Prerequisites

Roundtable: Shaping the Future of z/os System Programmer Tasks Discussion

Chapter 2: Operating-System Structures. Operating System Concepts 9 th Edit9on

Software Maintenance and Evolution

BOTH the trade press and researchers have examined

Building a Large, Successful Web Site on a Shoestring: A Decade of Progress

Delivering Effective Solutions in the Age of Open Source Technology

An Operating System History of Operating Systems. Operating Systems. Autumn CS4023

Wikis. Wikis. There are two main places where you can access a wiki from within your online course or organization:

Chapter 01: Introduction to Linux

Introducing Computer Programming

COSC Software Engineering. Lecture 23: Multilingual and Component Programming

Outline. Introduction to Programming (in C++) Introduction. First program in C++ Programming examples

Overview. Rationale Division of labour between script and C++ Choice of language(s) Interfacing to C++

Linux operating system

Design & Implementation Overview

Compilation I. Hwansoo Han

Introduction to Engineering Using Robotics Experiments. Dr. Yinong Chen

Analysis of the effects of removing redundant header information in persistent HTTP connections

Principles of Programming Languages. Lecture Outline

SCASE STUDYS. Migrating from MVS to.net: an Italian Case Study. bizlogica Italy. segui bizlogica

General Concepts. Abstraction Computational Paradigms Implementation Application Domains Influence on Success Influences on Design

Chapter 2 Preview. Preview. History of Programming Languages. History of Programming Languages. History of Programming Languages

Chapter 2: Operating-System Structures

Programming Language Concepts 1982, 1987, Outline. Period

What is a programming language?

Chapter Twelve. Systems Design and Development

Scripting languages work methodology. Tomasz Bold D11 pok. 107

Lecture 1. Introduction to course, Welcome to Engineering, What is Programming and Why is this the first thing being covered in Engineering?

COPYRIGHTED MATERIAL. What Is Linux? Linux History. Linux as an Operating System. Linux as Free Software. Commercial Applications for Linux

Topic I. Introduction and motivation References: Chapter 1 of Concepts in programming languages by J. C. Mitchell. CUP, 2003.

OO Development and Maintenance Complexity. Daniel M. Berry based on a paper by Eliezer Kantorowitz

Using GitHub to Share with SparkFun a

Code review guide. Notice: Read about the language that you will test its code, if you don t have an idea about the language this will be difficult.

CO600 Group Project Magnus Bogucki, Christos Fragiadakis

Chapter 1. Preliminaries

Rational Software White paper

JavaScript Context. INFO/CSE 100, Spring 2005 Fluency in Information Technology.

The Z-Files: Field reports from the world of business critical PHP applications

Architecture Proposal for an Internet Services Charging Platform

1.264 Lecture 3. Time and resource estimation

Center for Systems and Software Engineering University of Southern California. Center for Systems And Software Engineering. UCC v.2011.

Evaluating the Evolution of a C Application

Visualizing the evolution of software using softchange

Chapter 2: System Structures

Website Design and Development CSCI 311

Languages october 22, 2017 Éric Lévénez < FORTRAN III end-1958 FORTRAN II FORTRAN I october 1956

Computer Science 4500 Operating Systems. Welcome! In This Module. Module 1 Introduction, Overview and History

Computer Software. c 2016 by David W. Gerbing. School of Business Administration Portland State University

Visual Composer Modeling: Migrating Models from 7.1.X to 7.2.0

Counting Software Size: Is It as Easy as Buying A Gallon of Gas?

Dynamic Languages Toolkit. Presented by Andrey Tarantsov

EZ-Metrix V User Guide

TESTING TRENDS IN 2016: A SURVEY OF SOFTWARE PROFESSIONALS

LEGACY SYSTEMS MODERNIZATION SERVICES.

Building a Browser for Automotive: Alternatives, Challenges and Recommendations

FEATURES EASILY CREATE AND DEPLOY HIGH QUALITY TCL EXECUTABLES TO ANYONE, ANYWHERE

Copyright 2005 Department of Computer & Information Science

Chapter 1: Introduction to Computers and Java

8/23/2014. Chapter Topics. Introduction. Java History. Why Program? Java Applications and Applets. Chapter 1: Introduction to Computers and Java

UNIX and Linux Essentials Student Guide

EEE 435 Principles of Operating Systems

Acknowledgments. Who Should Read This Book How to Read This Book

Comparative Analysis of the Selected Relational Database Management Systems

Concepts in Programming Languages

Arbiter: the Evaluation Tool in the Contests of the China NOI

How Often and What StackOverflow Posts Do Developers Reference in Their GitHub Projects?

Chapter 11 Program Development and Programming Languages

Understanding the Open Source Development Model. » The Linux Foundation. November 2011

These Are the Top Languages for Enterprise Application Development

Why are there so many programming languages? Why do we have programming languages? What is a language for? What makes a language successful?

COMP 201: Principles of Programming

The future of dynamic languages. Gabor Szabo Perl Training Israel - Raz Information Systems -

Popularity, Interoperability, and Impact of Programming Languages in 100,000 Open Source Projects

CS240: Programming in C

2IS55 Software Evolution. Software metrics. Alexander Serebrenik

Chapter 2 Operating-System Structures

But before understanding the Selenium WebDriver concept, we need to know about the Selenium first.

Chapter 2: Operating-System Structures. Operating System Concepts Essentials 8 th Edition

Transcription:

Software metrics for open source systems Niklas Kurkisuo, 26389 Emil Sommarstöm, 25697

Contents Introduction 1 Open Source Software Development 1.1 Open Source Development advantages/disadvantages 1.2 Evolution of OSD Systems 2 The Linux Operating System Kernel 2.1Observation on the system growth of the Linux kernel 3 Conclusion 4 Open Source Metric tools 4.1 Excuberant Ctag 4.2 CCCC 5 References Introduction In this paper we will discuss metrics and development of open source systems and how they differ from traditional in house system development and traditional metrics. We will focus on the Linux kernel but we will also make comparisons to other open source systems as FreeBSD. We chose the Linux kernel because of its size and wide spread use and also because the Linux kernel has displayed a steady growth over the past years. Problem with open source systems is that there are usually many authors and there are no real development plan and usually no extensive testing plan. Because of this metrics are not always used as extensively as in traditional system development. We will also look at some open source tools that can bee used to gather different metrics.

1 Open Source Software Development Software development for open source systems differs from traditional system development. The single most important requirement of an open source software system is that its source code must be freely available to anyone. So he/she can do what they want with the code for their own purposes. We have to different types of open source software (OSS) development. The first type is similar to the corporation style or in-house. The software is made by the staff of the project and later released as open source. Examples of this include the Netscape web browser (Mozilla). The other type is when a person or group starts a project and then let anybody who wants too contribute to the project. Or in other words, the project is open source from the beginning. Usually, such a project begins with a single developer who has a personal goal or vision. Typically, that person will begin work on their system either from scratch or by using other open source projects. For example Linux started with a version of the Minix operating system. The later of these two is the one that we are interested in, because it is the one that can be viewed as a genuine open source system. 1.1 Open Source Development advantages/disadvantages When the author is ready to invite others into the project he/she makes the code base available to others and development proceeds. Anyone may contribute towards the development of the system, but the originator/ owner is free to decide which contributions may or may not become part of the official release. The open source development (OSD) model is different from traditional in-house commercial development processes in several fundamental ways. Commercial projects usually have a goal that is set and a dead-line in which to complete the product, as when an open source project developer might only work on the project when he feels like it. Also when it comes to open source the contributors or the programmers do what they want, i.e. what the are interested in, and therefore the result might not always be what the author wants. In the same fashion it is not so easy to use any metrics for a open source project, when the contributors/developers might not be interested in testing, code restructuring or in the quality of there code, many times so long as it works describes the quality of the code, especially for uninteresting modules. The reason to this is that the contributor works on his/her free time and might not bee interested in these tasks, because they are not as exciting as writing new code. Open source projects does not have the same pressure on their time tables or scheduling. This might seem to be a disadvantage but in this way the software quality is not affected by time pressure. The developer can release the software when he/she thinks it is ready. Even when code quality and standards may vary, since code is contributed by many contributors, the developer does not need to use that code. Code quality is maintained largely by massively parallel debugging (i.e., many developers each using each other s code) rather than by systematic testing or other planned, prescriptive approaches although many projects do have official guidelines.

Unstable code is common, as developers are eager to submit their newest contributions to the project. Some OSD projects, including Linux, address this issue by maintaining two concurrent development paths: a development release path contains new or experimental features and a stable release contains mostly updates and bug fixes relative to the previous stable release. And when some of the features are proven stable or mature they are migrated into the stable release. FreeBsd differs from Linux in the way the handle contributions. Linux accepts code into its releases quite freely when FreeBsd subjects its contributions to much more stringent testing by the core development team. FreeBsds development plan has much of the qualities found in a in house project because of this. As a result of this FreeBsd tends to support fewer devices and development proceeds much slower than on Linux. 1.2 Evolution of OSD Systems We have been examining the growth and evolution patterns of OSD projects to see how they compare to previous studies on the evolution of large proprietary software systems developed using more traditional in-house processes. There are now many large OSD systems that have been in existence for a number of years and have achieved widespread use, including two that we have investigated in some detail: the Linux operating system kernel (2,200,000 lines of code), and the VIM text editor (150,000 lines of code). Naively, we had expected that since evolution of OSD software is usually much less structured and less carefully planned than traditional in-house development that Lehman s laws would apply; that is, as the system grew, the rate of growth would slow, with the system growth approximating an inverse square curve. Indeed, recently the maintainers of the Perl project have undertaken a massive redesign and restructuring of the core system, since the project owners felt that the current system has become almost unmaintainable. However, as we explain below, this is not at all what we found with Linux. 2 The Linux Operating System Kernel Linux is a Unix-like operating system that has gained much support in the last years. Many even thinks Linux might have the power to challenge commercial operating system as Windows and Unix. In many ways Linux has already snapped a large bite of the Unix market share. Even if Windows still dominates the market, Linux and its supporters haven t given up hope of gaining a share of the market. Linux has in the past years gained support from large companies, such as IBM. Linux was originally written by Linus Torvalds, but subsequently worked on by hundreds of other developers. It was originally written for Intel architecture, but since it has been ported to other platforms, such as PowerPc, Sparc and even PDAs. The first official release of the kernel, version 1.0, occurred in March 1994. This release contained 487 source code files comprising over 165,000 lines of code, including comments and blank lines. Linux kernel has after that contained of two releases, stable release and a development release, in witch contains experimental and untested code. The stable release contains fixes to the bugs that where found in the development release.

Since it s first release the Linux kernel has grown in size, and now contains over 2 million lines of code. 2.1 Observation on the system growth of the Linux kernel We will see from the paper, Evolution in Open Source Software : A Case Study by Michael w. Godfrey and Qiang Tu, how Linux has grown over the years. They measured various aspects of the growth using a variety of tools, see Open Source Metric tools for some of them. With source file we will mean file that ends with.c or.h and appear in the distribution package. Note: when performing a system build additional source files will be generated, these files are ignored. We first examined how the system has grown using several common metrics. For example, Fig. 1 shows the growth in size of the compressed tar files for the full kernel release, and Fig. 2 shows the growth of the number of lines of code (LOC). We also measured the growth of the number of source files and the growth of the number of global functions, variables, and macros; however, we have omitted the graphs for these measurements for the sake of brevity as the growth patterns they show are very similar to those of Fig. 1 and Fig. 2. Figure 1. Growth of the compressed tar file for the full Linux kernel source release.

Figure 2. Growth in the number of lines of code measured using two methods: the Unix command wc -l, and an awk script that removes comments and blanks lines. We will now show the growth of major subsystems. Figure 3. Growth of the major subsystems (development releases only).

Figure 4. Percentage of total system LOC for each major subsystem (development releases only). From these figures we see that Linux has experience a steady growth under the past years and also that the growth has been greatest in the drivers subsystem. 3 Conclusion The Linux operating system kernel is a very successful example of a large software system in widespread use that has been developed using an open source development (OSD) model. We have seen that Linux has grown over the years using several metrics, and we have found that the system has grown superlinearly. This strong growth rate seems surprising given (a) its large size (over two million lines of code including comments and blank lines), (b) its development model (a highly collaborative and geographically distributed set of developers, many of whom contributing their time and effort for free), and (c) previously published research that suggests that the growth of large software systems tends to slow down as the systems become larger. It seems that open source projects seem to manage without a strict development plan and with a minimal use of metrics. This phenomenon can be explained by enthusiasm that the people have working on the project.

4 Open Source Metric tools We will now present some tools that can be used to calculate some metrics. These are all licensed as open source. To count the lines of code you can use the unix command wc l, this gives the raw count of lines of code, including blank spaces and commented lines. There are many scripts available that omits commented lines and blank lines. 4.1 Excuberant Ctag Ctags generates an index (or tag) file of language objects found in source files that allows these items to be quickly and easily located by a text editor or other utility. A tag signifies a language object for which an index entry is available (or, alternatively, the index entry created for that object). Tag generation is supported for the following languages: Assembler, AWK, ASP, BETA, Bourne/Korn/Zsh Shell, C, C++, COBOL, Eiffel, Fortran, Java, Lisp, Lua, Make, Pascal, Perl, PHP, Python, REXX, Ruby, S-Lang, Scheme, Tcl, Vim, and YACC. 4.2 CCCC CCCC is a tool made by Tim Littlefair as a part of his research for his PhD level. CCCC is a tool which analyzes C++ and Java files and generates a report on various metrics of the code. Metrics supported include lines of code, McCabe's complexity and metrics proposed by Chidamber&Kemerer and Henry&Kafura. There are many other free tools that can be used when programming. There are also many commercial options available, but as open source projects usually have a small budget you might want to consider the use of free software. 5 References - Evolution in Open Source Software : A Case Study by Michael w. Godfrey and Qiang Tu - Software Metrics, A rigorous & Practical Approach by Fenton & Pfleeger - www.sourceforge.net

This document was created with Win2PDF available at http://www.daneprairie.com. The unregistered version of Win2PDF is for evaluation or non-commercial use only.