Estimating Development Effort in FOSS. Source Software Projects by MSR

Similar documents
Comparison between SLOCs and number of files as size metrics for software evolution analysis 1

Learning about software development with Kibana dashboards

Comparison between SLOCs and number of files as size metrics for software evolution analysis

Applying Social Network Analysis to the Information in CVS Repositories

A comparison between MediaWiki, TWiki and XWiki communities Feb 1st, / 20

Towards a theoretical model for software growth

Open Source Software Developer and Project Networks

Facilitating Social Network Studies of FLOSS using the OSSNetwork Environment

Is the product of two integers positive, negative, or zero? How can you tell? ACTIVITY: Multiplying Integers with the Same Sign

Linux System Administration, level 2

Public-Service Announcement

Evolution and Growth in Large Libre Software Projects

Red Hat Mobile Application Platform Hosted 3

Filtering Bug Reports for Fix-Time Analysis

Public-Service Announcement

Corporate Involvement of Libre Software: Study of Presence in Debian Code over Time

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

Roles and Responsibilities of Maintainers

CS159. Nathan Sprague

Slice Intelligence!

Mining Large Software Compilations over Time: Another Perspective of Software Evolution

Managing Open Bug Repositories through Bug Report Prioritization Using SVMs

FLOSSmole, FLOSShub and the SRDA Repositories

Adams, Bram 21 MapReduce as a General Framework to Support Research in Mining Software Repositories (MSR) Anbalagan, Prasanth

HOW TO STAND OUT IN DEVOPS

Logiciel Libre TP 1 Project Presentation

Guide to depositing research in 02, the UOC s institutional repository SENDING PUBLICATIONS TO THE O2 REPOSITORY FROM GIR

SqlC tutorial. 12 th may Revision 1.0. Visit us at

View the full TurnItIn report. Submit the file to TurnItIn for originality checking

Why So Complicated? Simple Term Filtering and Weighting for Location-Based Bug Report Assignment Recommendation

Revitalizing OSS Contributions and Participation across Mozilla

Cross-project defect prediction. Thomas Zimmermann Microsoft Research

CDL s Web Archiving System

Chapter III Applying Social Network Analysis Techniques to Community-Driven Libre Software Projects

Red Hat Process Automation Manager 7.0 Planning a Red Hat Process Automation Manager installation

Information Literacy

PRODUCING EDUCATIONAL RESOURCES IN THE LIBRE WAY : THE EDUKALIBRE PROJECT

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

A Case Study on the Impact of Global Participation on Mailing Lists Communications of Open Source Projects

Using git to download and update BOUT++

Git AN INTRODUCTION. Introduction to Git as a version control system: concepts, main features and practical aspects.

Organizing and Summarizing Data

Chapter 1: Fundamentals

Git AN INTRODUCTION. Introduction to Git as a version control system: concepts, main features and practical aspects.

Day05 A. Young W. Lim Sat. Young W. Lim Day05 A Sat 1 / 14

FREEWAT. FREE and open source software tools for WATer resource management. FREEWAT platform v.1.0 and User Manual v.1.0

Guide for depositing Final Bachelor's Degree Project Final Master's Degree Project- Practicum in O2, the UOC's institutional repository

CUSTOMER PORTAL USER MANUAL. Marketing

Openbravo Oracle quick-start. installation guide

Heaps with merging. We can implement the other priority queue operations in terms of merging!

Using Machine Learning to Identify Security Issues in Open-Source Libraries. Asankhaya Sharma Yaqin Zhou SourceClear

Integrated Impact Analysis for Managing Software Changes. Malcom Gethers, Bogdan Dit, Huzefa Kagdi, Denys Poshyvanyk

Declarative Modeling for Cloud Deployments

Upgrading & Updating Your Computer

JBoss Operations Network 3.1 Development - REST API

Red Hat CloudForms 4.0 Monitoring, Alerts, and Reporting

High Impact Online Communications. Calendaring Campaigns

Red Hat Virtualization 4.1

1. Which of these Git client commands creates a copy of the repository and a working directory in the client s workspace. (Choose one.

Study on Classifiers using Genetic Algorithm and Class based Rules Generation

TEMPLATE-BASED HIRE INSTRUCTIONS - HR PREPARERS

Laugh When Disaster Strikes

SECTION 10 CONTRACTING FOR PROFESSIONAL SERVICES CONSULTANT COMPETITIVE NEGOTIATION ACT (CCNA)

License. Introduction to Version Control with Git. Local Version Control Systems. Why Use Version Control?

Tips & Tricks: Vault QualityDocs Dashboards and Reports. October 22, 2014

How things work college course/turing machine quiz/testbank/mirror

TWO-PHASE COMMIT ATTRIBUTION 5/11/2018. George Porter May 9 and 11, 2018

Photo by SortOfNatural - Creative Commons Attribution-NonCommercial License

Lab 1: Introduction to data

Day06 A. Young W. Lim Mon. Young W. Lim Day06 A Mon 1 / 16

Lesson 21: Comparing Linear and Exponential Functions Again

Red Hat Application Migration Toolkit 4.0

CS6450: Distributed Systems Lecture 13. Ryan Stutsman

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s]

Practice Problems for the Final

The State of Streaming Video in Professional and Scholarly Communications. STM Digital Publishing, London - December 5, 2017

A Quantitative Study of Social Organisation in Open Source Software Communities

Using GitHub to open up your software project

Agency admins should only see reports courses/users within their agency. Example. A course has 100 users, 35 of which are in Agency Y.

Real-Time Programming with GNAT: Specialised Kernels versus POSIX Threads

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

Building a GNU/Linux distribution with DevOps in mind

SL 7.0 Tools for VB Exam.

How to Export and Import Turnitin Rubrics

Red Hat Process Automation Manager 7.0 Executing a business process in Business Central

The Cost of Going it Alone Dave Neary

Top 10 SAS Functions in A brief summary of SAS Communities Survey - by Flora Fang Liu

PRIMARY-BACKUP REPLICATION

How to Take the CI/CD Plunge

Monitoring the software quality in FairRoot. Gesellschaft für Schwerionenforschung, Plankstrasse 1, Darmstadt, Germany

Red Hat JBoss Enterprise Application Platform 7.0

How likely are you to recommend venue name to a friend?

The interns will be assigned to assist our R&D / IT professionals in the following exciting enabling technology projects.

LED display manager documentation

Validating customer demand

Kalibro (Version 1.0) User Guide

BENEVOL 2008 : the 7th Belgian-Netherlands software evolution workshop proceedings, December 11-12, 2008, Eindhoven : informal pre-proceedings

Red Hat Application Migration Toolkit 4.2

Characterization of the Wikipedia Traffic

Ch6: The Normal Distribution

Transcription:

Estimating Development Effort in Free/Open Source Software Projects by MSR A Case Study of OpenStack Gregorio Robles, Jesús M. González-Barahona, Carlos Cervigón, Andrea Capiluppi, Daniel Izquierdo-Cortázar Universidad Rey Juan Carlos, Brunel University, Bitergia S.L. MSR 2014, Hyderabad, June 1st 2014

(cc) 2014 The authors Some rights reserved. This work licensed under Creative Commons Attribution-ShareAlike License. To view a copy of full license, see http://creativecommons.org/licenses/by-sa/3.0/ or write to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA. Some of the figures have been taken from the Internet Source, and author and licence if known, is specified. For those images, fair use applies.

11th Mining Software Repositories Working Conference The question is... How much money did developing a software cost? Robles, Gonza lez-barahona, Cervigo n, Capiluppi, Izquierdo

How do we measure (past) effort? (I) In a traditional environment, we just ask management. They (have to) know!

How do we measure (past) effort? (and II) How about Free/Open Source Software?

First (maybe bad) idea We can track developers in all repositories and sum all their micro-contributions

Our approach Using a convenient characterization of developers we may obtain a good (and simple) effort estimation model

Model: We divide developers in two groups Full-time and non-full-time developers

Why? Free/Open Source contributions are highly skewed If we identify the most contributing ones and correctly assign effort to them, we will get a good estimation

The Model 1 We define a time period T 2 Any developer with more commits than a threshold t in a time period T is considered a full-time developer: { T if full-time Effort (in person-months) = T x/t else, x = commits in T

An example 1 A project has 3 developers 2 Their number of commits in the last 3 months is 100, 10, 3 3 Let s assume timespan T = 3 months and threshold t = 30 commits 4 The effort spent in this project in the last three months is 3 * (1 + 1/3 + 3/30) = 3 * 1.433 = 4.3 person-months

Case study: OpenStack We know the project very well and we know people there.

What we are looking for What is the best value for the timespan T? What is the best value for the threshold t?

Naïve Approach T = 1 release = 6 months t = 1 commit (used by the OpenStack community as a naïve estimation)

How have we proceeded? We obtained data from their git repository Looking for the timespan T: 6 months (following OpenStack s release periodicity) Looking for the threshold t: We asked (active) developers in a survey on their status: are you full-time in OpenStack? More than 100 responses obtained! (> 10% of the OpenStack active developers, statistically representative of the entire population)

Precision, recall and so on...

Interestingly... For the purpose of our study, false positives and false negatives compensate each other!

False positives and false negatives compensate each other!

Results (I) Effort since beginning of project: 8,655 person-month (naïve: 17,400 person-month) For t = 12 commits in T = 6 months

Results (and II) Effort for the last release: 2,634 person-month (naïve: 5,916 person-month) 440 person-months each month 250 professional developers work on OpenStack hired by companies For t = 12 commits in T = 6 months

Summary Given a repository, we provide a simple way of estimating past development effort Two parameters have to be determined: timespan T and threshold t A value of threshold t of 12 has been obtained as the best for OpenStack

Preview of current work Best threshold t value for following projects Linux: 18 commits every 6 months 653 out of 3,665 (17.8%) developers responded MediaWiki: 29 commits every 6 months 95 out of 605 (15.7%) developers responded WebKit: 17-25 commits every 6 months 86 out of 690 (12.5%) developers responded Moodle: 15 commits every 6 months 43 out of 174 (24.7%) developers responded

Estimating Development Effort in Free/Open Source Software Projects by MSR A Case Study of OpenStack Gregorio Robles, Jesús M. González-Barahona, Carlos Cervigón, Andrea Capiluppi, Daniel Izquierdo-Cortázar Universidad Rey Juan Carlos, Brunel University, Bitergia S.L. MSR 2014, Hyderabad, June 1st 2014