Empirical Study on Impact of Developer Collaboration on Source Code
|
|
- Stewart Hall
- 5 years ago
- Views:
Transcription
1 Empirical Study on Impact of Developer Collaboration on Source Code Akshay Chopra University of Waterloo Waterloo, Ontario Parul Verma University of Waterloo Waterloo, Ontario Sahil Puri University of Waterloo Waterloo, Ontario ABSTRACT Software development is a collaborative effort in which various developers work with each other to create or maintain applications or other software components. Since multiple developers work together to achieve the project goals, this collaboration may have impact on the quality of the final software which may be measured in terms of the number of bugs or defects. In this paper, we try to analyze the effect of developer collaboration on bug proneness of the software by analyzing 50 open source Java projects on Github having considerable project history. We consider each Java file as individual classes and analyze their history to find out the amount of collaboration involved among developers. We then link this developer collaboration to the defects traced in those files to empirically find the effect of collaboration on the software quality. We also try to see if the amount of developer collaboration and defect proneness is influenced by various project characteristics such as source lines of code (SLOC) and project age. Our findings through this study can be summarized as follows: (a) Majority chunk for a great percentage of projects has been maintained by three or less developers (b) More collaboration on a single file leads to more bugs being logged for those files (c) As the number of developers increase with the lines of code decreasing and project age increasing, there is more maintenance work done on the project instead of new features being added. KEYWORDS Developer collaboration, bugs, defect proneness, software quality, Human factor in software engineering 1 INTRODUCTION Source code refers to the final product of efforts single developer or the collaboration of various developers who work together to develop a particular product. Since most of the projects are large scale, they are not within the scope of development of a single individual. In order to achieve this complex task, development effort is often split across teams of individuals, who are responsible for one or more (less complex) concerns of the development effort. The evolution of software version control systems like Github, SVN, etc. as well as issue tracking systems have made it possible to capture traces of these collaborative activities amongst developers. Both open and closed source projects involve collaboration of developers working in different parts of the world and hence their University of Waterloo, Waterloo, Ontario 2018 David R. Cheriton School of Computer Science management is a challenging task. Moreover, changes in the workforce of the teams and companies also impact the level of collaboration in developing a particular feature. Collaboration among the developers is also influenced by the organizational structure as well as the architecture of the software being developed which may change as the project evolves. Thus, software development requires coordination and communication among various developers and teams. The dynamic structure of developer collaboration can be measured accurately by mining the version control systems and analyzing the activity of the developers. Extent of collaboration among developers in a project may also exhibit a relationship with the quality of the software product. In software engineering, software quality is most often linked with the defects encountered in the system which are tracked by the version control systems and the bug tracking systems. In the absence of bug tracking systems, version control systems have become a commonplace to find where bugs have occurred in the past or even to predict where bugs might occur in future. In 1995, Brooks et. al [1] showed that at organizational level, teams with low interdependencies between each other tend to reduce their defect density and hence improve the quality of the software. However, Caglyan et. al [2] showed that the network structure at the source code level is different from the organizational structure. Catalado et. al studied the effects of structure of the teams in a project on the project quality in which one of their finding was that the team structures differ from source code level team collaboration structures. Hence, the influence of the source code network structure on defects may be different from the influence of organizational structure. In this paper, we analyze the impact of collaboration among developers on the quality of the software modules using 50 highly rated open source Java software projects available on Github. We assume that the quality of the projects can be determined by analyzing the commit messages of the source code repositories in the absence of bug tracking system by using bug heuristics such as bug, error, fix, etc. We analyzed the effect of developer collaboration on the software quality at change level. We first go through the git history of all the files in a software project to find the extent of developer collaboration and then check its commit messages to see how many defects were reported in that particular file. In order to avoid the data threats, we report relative bug metrics instead of absolute bug metrics to get a better understanding of the consequences of developer collaboration. The major contributions of this research paper are: (1) We empirically study the effects of developer collaboration on software quality for 50 large scale open source projects. (2) We try to relate the extent of developer collaboration on various project metrics such as age of project, SLOC, etc.
2 University of Waterloo, Waterloo, Ontario A. Chopra et al. 2 RESEARCH QUESTIONS 2.1 Research Question 1 What is the density of developer collaboration in a single project? i.e. How many files per project have collaboration from how many developers. Motivation: Developers work together during software development and maintenance to resolve issues and implement features in software projects. Large open source Java projects have many classes implemented to achieve the required functionality. According to Java design practices, each class should be represented as separate files in the project. Also, in large projects, many developers collaborate to implement a functionality and hence classes in the projects might be a result of a single developer or a collaboration of developers. Through this research question, we identify the percentage contribution of each developer in the project. (2) Data stored as JSON object after step 1 was further processed to gather more detailed information: No of unique developers in a project from commit history. Identity if a file is a buggy file based on the keyword bug, fix, issue, close and error in commit messages. Distribution of source files based on the no. of developers who worked on it. Project start date based on the repository creation. Total bugs, total SLOC, mean of bugs and SLOC associated with each developer group. 2.2 Research Question 2 Does concurrent updates from multiple developers result in more bugs rather than those classes which are maintained by less number of developers? Motivation: The structure of their development collaboration activity may have impact on the quality of the final product in terms of higher number of defects. Since developer collaboration is usually a common activity in large software projects, it would be a promising idea to understand the effect of collaboration on the defect proneness. Commit history and commit messages provide a good indicator to identify the bug fix commits using the bug heuristics present in software engineering literature which includes identification of words such as bug, fix, error, etc. to identify the fixes done. Depending on the files changed during the bug fix, we see whether those files were changed by multiple developers or by a single developer. 2.3 Research Question 3 Is there any correlation between project characteristics and developer collaboration worth mentioning? Motivation: Various characteristics of project may have a direct impact on developer collaboration and if there is any correlation amongst them. The characteristics that we would want to evaluate is age of the project, Source lines of code, etc. 3 DATASET DESCRIPTION As part of the research, we collected data of 50 projects from GitHub. The projects were first sorted based on their ratings and only the projects that has more than 80% Java files and more than 2,000 commits were chosen for further analysis. The project files were collected using GitHub REST API. Data collection process included 2 steps: (1) Using the python git, we collected the following features for each project and stored the data for each project as a JSON object for further processing. The project features extracted in step 1 are: Project Name, Project URL, project SLOC and source code files. Figure 1 is a snapshot of the data gathered after step 1. Figure 1: Features extracted after Step 1 in Data Collection Process The data extracted each project as part of step 2 was also stored as a JSON object. Figure 2 shows the snapshot of JSON object created in step 2.
3 Empirical Study on Impact of Developer Collaboration on Source Code University of Waterloo, Waterloo, Ontario dataset. The range of number of unique developers across multiple projects varied from 2 âăş 159 and average number of developers are 45. Figure 6 shows the distribution of no of unique developers. Project vs Age This section shows the age distribution of all the projects included for analysis. The project is calculated as the difference between the current time and the time when the project repository was created. Figure 7 shows the age distribution across all projects. Figure 2: Features extracted after Step 2 in Data Collection Process distributiondict is a dictionary that stores information about the number of unique developers and the source code files they worked on e.g. a total of 1736/1969 files were written by single develop, 202/1969 files are written by a group of two developers and so on. 4 DATA CHARACTERISTICS This section explains the characteristics of the data that we have collected for the analysis. The various characteristics are: Project vs SLOC This section shows the various projects that we have collected for analysis and their source line of code. SLOC across various projects varied from 40,000-1, 200, 000 lines of code whereas the average lines of code are 200, 000. Figure 3 shows the plot between various projects and their SLOCs. Project vs Buggy File ratio This section provides a picture about the buggy files across each project. Total number of bug files in a project varied from 10% - 100% of project files. Buggy files are classified as per the keywords bug, fix, error, issue and close in the commit messages of the files in a project. Bug File ratio can be defined as: Bug File Ratio = No. of Bug Files in a project / Total number of files in a project Figure 4 shows the distribution of bug file ratio across various projects. Figure 5 shows the distribution of number of bugs across each project Project vs Unique Developers This section shows the distribution of number of unique developers that have worked on individual projects in the Figure 8: Project vs Age(in days) As the age of a project increase more number of developers started working on that project which is shown in figure 8. Figure 9: Number of unique authors in a project increase with increase in project age 5 ANALYSIS This section discusses the insights that were collected as a part of the analysis done on 50 projects as explained in the previous section. We performed exploratory analysis on the data to discover trends between developer collaboration and link them with project characteristics such as Lines of Code, Project Age, Bug proneness of the code. (1) SLOC Distribution for different projects In this analysis, we studied the amount of collaboration in the 50 projects and attempted to find what percent of a project
4 University of Waterloo, Waterloo, Ontario A. Chopra et al. Figure 3: Project vs Source line of code(sloc) Figure 4: Projects vs Bug File Ratio Figure 5: Project vs Total bugs has been edited by how many developers. To perform this analysis, we used to following metric Also as SLOC distribution is the ratio of project maintained by a specific developer group ( developer group refers to Java classes which are maintained by âăÿnâăź developers, where âăÿnâăź can vary from 1 to the total number of unique authors for the project). In the figure 10, the X-axis represents different projects that were used for performing the analysis. The Y-axis shows the developer collaboration metric as explain above. The legends signify the âăÿnâăź value for the developer collaboration. For e.g. Series1 denotes the developer collaboration for 1 developer etc. As can be seen from the above graph, the major chunk of SLOC distribution is present within n <=3. So the major chunk for a great percentage of projects has been maintained by 3 developers or less. (2) Author Distribution for SLOC Ration This analysis supports the previous analysis by calculating the SLOC worked by different author groups. For the purpose of this analysis, the X- axis shows the number of developers working on the Java class. The Y-axis shows the SLOC Ratio for the specific developer group. Each series represents different projects used for this analysis. As can be observed by the trend lines added for the three projects, the SLOC distribution for a specific developer groups decreases as the number of developer in a group increase. This supports the previous analysis that majority chunk
5 Empirical Study on Impact of Developer Collaboration on Source Code University of Waterloo, Waterloo, Ontario Figure 6: Project vs Total Bugs in a project Figure 7: Projects vs Unique authors in a project Figure 10: SLOC Distribution = Files worked upon by unique no. of developers / Total Files of the projects code is maintained by less than three developers working on a Java class. (3) Developer Collaboration v/s Bugs per SLOC In this analysis, we calculated the bugs per software line of code for distribution of Java classes which have been worked upon by varying number of developers. In the figure 12, the X axis represents the number of developers working on a Java class. For e.g., 1 represents that the Java classes which were created and maintained by a single developer, 3 stands for those Java classes that were created by and maintained by three developers. The Y-axis represents the bugs per SLOC that have been found for the corresponding java classes. As can be seen from the four trend lines that have been created in the chart, as the number of developers working on a single java class increase, the number of bugs per SLOC also increases. This shows that having more developer collaboration on a single java class makes increases its probability of having bugs as compared to classes which have been modified by lesser number of developers. [] (4) Mean Bugs for Developer Distribution This analysis calculated the mean bugs for the different developer groups. This analysis is done to observe the pattern amongst the bugs logged for java classes as per the number of developers who collaborated on that java class. In the figure 13, the X-axis shows the different developer groups, where 1 stands for those java classes which have been worked on by a single developer. The Y-axis in turns show the mean value of bugs that have been logged for those files. As can be observed by the added trendlines, as the number of developers working on a file increase, the mean value of the bugs also increase. Hence, this points
6 University of Waterloo, Waterloo, Ontario A. Chopra et al. Figure 11: SLOC Ratio = SLOC of Developer Group / Total SLOC of project Figure 12: Bugs per SLOC vs No. of unique developers for 30 projects (considering max of 10 developers) to the pattern that more collaboration on a single file leads to more bugs being logged for those files. (5) Number of Developers vs Lines of Code in a project and Project Age This analysis compares the number of developers working on various projects with respect to the density of source code in those projects. The X-axis in the figure 6 chart denotes the number of developers working in a project. The Y-axis denotes the lines of code in the project. There are two series in the chart. The orange series points to the SLOC for different projects. The trendline for the series shows that as the number of developers working on a project increase, the net source lines of code tend to decrease. This was counter-intuitive as we initially thought that more developers would result in
7 Empirical Study on Impact of Developer Collaboration on Source Code University of Waterloo, Waterloo, Ontario Figure 13: Mean no. of Bugs vs No. of unique developers more code being written for the project. We used another dimension, the project age, which is represented by the grey series in the chart. It can be seen that as the number of developers increase for a project, the project age was also increasing. The cumulative insight from the above two series show that as the number of developers increase with the lines of code decreasing and the project age increasing, there is more maintenance work done on the project instead of new features being added. 6 RELATED WORK In the past years, many researchers have focused their research on finding the effect of developer collaboration on the quality of the software. Caglayan et al. [2] investigated the evolution of the developer collaboration network with time during a release of a large-scale project. Moreover, Nagappan et. al [3] in 2008 did a study to find the relationship between the structure of the organization and the quality of the software. They conclude their study by providing a list of organizational metrics that should be considered in the structure of the teams and organizations in order to reduce the number of defects arising in the software and improve the overall quality of it. 7 THREATS TO VALIDITY Our biggest threat to validity is the way bug fix commits are identified. We iterated on the commit tree of a repository and searched for the keywords bug or issue or fix or close or error and marked it as a bug fix commit to be used in further analysis. We deliberately chose these words as heuristics as we wanted to capture the issues that developers continuously face in an ongoing development process. However, such a choice possesses threats of over estimation as the descriptiveness of commit logs vary across the projects. Moreover, the source lines of code (SLOC) considered in our analysis refers to the count of the lines as specified by the head of the git repository and not at commit level. Also, the total number of developers in a project are calculated on the basis of distinct committers that have contributed on a branch. Also, there might be a possibility that there are only a few number of committers in a repository which may not represent the actual developer counts as well as developers that worked on a particular file. For example, project CoreNLP has a total of only 2 committers while it has a total SLOC of 759,702. It is highly unlikely that only 2 developers would have contributed to this much amount of code and hence its possible that developers would have worked on the project but the final commit to the master branch was made by only 2 individuals. Also, we consider only the master branch of the software repositories and not all branches which may lead to bias in the dataset. Moreover, our work might also be prone to overestimation due to forks since we took Java projects from only one data source i.e. GitHub and projects on GitHub are easy to fork. This may lead to a potential increase of very similar projects in our dataset. However, we have manually analyzed each repository to understand if a fork of the repository is already chosen for analysis and hence there are no duplicate repositories present in the dataset we chose to the best of our knowledge. The repositories selected(from GitHub) have at least 2000 commits old and are relatively stable and hence the results published in this work cannot be generalized for repositories with fewer than 2000 commits(younger) or repositories which are developed in a professional setting in corporate companies with strict code commit rules.
8 University of Waterloo, Waterloo, Ontario A. Chopra et al. 8 FUTURE WORK AND CONCLUSION This work can be extended to analyze more diverse repositories from all kinds of sources and ages. Since this work has analyzed Java repositories alone, it would be interesting to see the results in other languages such as Python, C++, etc. Also we consider open source projects from Github only. Projects from other version control systems such as SVN may add diversity to the projects and may showcase more generic results. As explained in the previous section, linking number of developers directly to the number of committers is troublesome as visible in the case of CoreNLP. Some better mechanism of finding the actual number of developers may help in giving a real picture of the impact of developer collaboration on software quality. As part of this project, we analyzed 50 open source Java projects (see in Appendix) with varied project characteristics and inferred the impact of developer collaboration on the bug proneness of the source code. We converted data from each individual project into JSON object using python git. By parsing and analyzing those objects we were able to determine that the major chunk of source code was added by three developers or less. Another observation from the analysis is that higher collaboration in a source file leads to more errors being logged in that file. In addition to this, we also observed that as the project age increased along with the increase in number of developers, the source code density i.e. SLOC decreased which pointed to the inference that there was more of maintenance and support activity rather than new feature implementation. 9 ACKNOWLEDGMENTS Many thanks to Professor Michael Godfrey for his invaluable comments and feedback on the research methodology and data analysis. The authors would also like to thank the University of Waterloo for providing the computing resources and other related infrastructure provided which allowed us to run our tool parallel on multiple systems. 10 APPENDIX Figure 14 shows the details of the dataset that we used. It includes the links of project repositories and the bug characteristics of those projects. REFERENCES [1] J. Frederick P. Brooks, The mythical man month : Essays on software engineering, [2] A. M. Bora Caglayan, Ayse Basar Bener, Emergence of developer teams in the collaboration network [3] N. Nagappan, E. M. Maximilien, T. Bhat, and L. Williams, Realizing quality improvement through test driven development: Results and experiences of four industrial teams, Empirical Softw. Engg., vol. 13, pp , June [4] B. ÃĞaglayan and A. B. Bener, Effect of developer collaboration activity on software quality in two large scale projects, Journal of Systems and Software, vol. 118, pp , [5] S. Alhassan, B. Caglayan, and A. Bener, Do more people make the code more defect prone?: Social network analysis in oss projects., [6] A. Meneely, L. Williams, W. Snipes, and J. Osborne, Predicting failures with developer networks and social network analysis, in Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, SIGSOFT 08/FSE-16, (New York, NY, USA), pp , ACM, [7] M. Pinzger, N. Nagappan, and B. Murphy, Can developer-module networks predict failures?, in Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering, SIGSOFT 08/FSE-16, (New York, NY, USA), pp. 2 12, ACM, Figure 14: Characteristics of the project taken into this research [8] G. Madey, V. Freeh, and R. Tynan, The open source software development phenomenon: An analysis based on social network theory, [9] T. Zimmermann and N. Nagappan, Predicting defects with program dependencies, in rd International Symposium on Empirical Software Engineering and Measurement, pp , Oct [10] E. J. Weyuker, T. J. Ostrand, and R. M. Bell, Using developer information as a factor for fault prediction, in Predictor Models in Software Engineering, PROMISE 07: ICSE Workshops International Workshop on, pp. 8 8, May [11] F. Eichinger, K. Böhm, and M. Huber, Mining edge-weighted call graphs to localise software bugs, in Machine Learning and Knowledge Discovery in Databases (W. Daelemans, B. Goethals, and K. Morik, eds.), (Berlin, Heidelberg), pp , Springer Berlin Heidelberg, 2008.
Empirical Study on Impact of Developer Collaboration on Source Code
Empirical Study on Impact of Developer Collaboration on Source Code Akshay Chopra, Sahil Puri and Parul Verma 03 April 2018 Outline Introduction Research Questions Methodology Data Characteristics Analysis
More informationHow Often and What StackOverflow Posts Do Developers Reference in Their GitHub Projects?
How Often and What StackOverflow Posts Do Developers Reference in Their GitHub Projects? Saraj Singh Manes School of Computer Science Carleton University Ottawa, Canada sarajmanes@cmail.carleton.ca Olga
More informationMining Frequent Bug-Fix Code Changes
Mining Frequent Bug-Fix Code Changes Haidar Osman, Mircea Lungu, Oscar Nierstrasz Software Composition Group University of Bern Bern, Switzerland {osman, lungu, oscar@iam.unibe.ch Abstract Detecting bugs
More informationWhy CART Works for Variability-Aware Performance Prediction? An Empirical Study on Performance Distributions
GSDLAB TECHNICAL REPORT Why CART Works for Variability-Aware Performance Prediction? An Empirical Study on Performance Distributions Jianmei Guo, Krzysztof Czarnecki, Sven Apel, Norbert Siegmund, Andrzej
More informationMixing SNA and Classical Software Metrics for Sub-Projects Analysis.
Mixing SNA and Classical Software Metrics for Sub-Projects Analysis. ROBERTO TONELLI University of Cagliari DIEE P.zza D Armi, 91 Cagliari ITALY roberto.tonelli@dsf.unica.it GIUSEPPE DESTEFANIS University
More informationManaging Open Bug Repositories through Bug Report Prioritization Using SVMs
Managing Open Bug Repositories through Bug Report Prioritization Using SVMs Jaweria Kanwal Quaid-i-Azam University, Islamabad kjaweria09@yahoo.com Onaiza Maqbool Quaid-i-Azam University, Islamabad onaiza@qau.edu.pk
More informationExploring the Relationship of History Characteristics and Defect Count: An Empirical Study
Exploring the Relationship of History Characteristics and Defect Count: An Empirical Study Timea Illes-Seifert Institute for Computer Science University of Heidelberg Im Neuenheimer Feld 326, D-69120 Heidelberg
More informationCross-project defect prediction. Thomas Zimmermann Microsoft Research
Cross-project defect prediction Thomas Zimmermann Microsoft Research Upcoming Events ICSE 2010: http://www.sbs.co.za/icse2010/ New Ideas and Emerging Results ACM Student Research Competition (SRC) sponsored
More informationFiltering Bug Reports for Fix-Time Analysis
Filtering Bug Reports for Fix-Time Analysis Ahmed Lamkanfi, Serge Demeyer LORE - Lab On Reengineering University of Antwerp, Belgium Abstract Several studies have experimented with data mining algorithms
More informationProcessing and Data Collection of Program Structures in Open Source Repositories
8 Processing and Data Collection of Program Structures in Open Source Repositories JEAN PETRIĆ, TIHANA GALINAC GRBAC and MARIO DUBRAVAC, University of Rijeka Software structure analysis with help of network
More informationFedX: A Federation Layer for Distributed Query Processing on Linked Open Data
FedX: A Federation Layer for Distributed Query Processing on Linked Open Data Andreas Schwarte 1, Peter Haase 1,KatjaHose 2, Ralf Schenkel 2, and Michael Schmidt 1 1 fluid Operations AG, Walldorf, Germany
More informationA Case Study on the Similarity Between Source Code and Bug Reports Vocabularies
A Case Study on the Similarity Between Source Code and Bug Reports Vocabularies Diego Cavalcanti 1, Dalton Guerrero 1, Jorge Figueiredo 1 1 Software Practices Laboratory (SPLab) Federal University of Campina
More informationImpact of Dependency Graph in Software Testing
Impact of Dependency Graph in Software Testing Pardeep Kaur 1, Er. Rupinder Singh 2 1 Computer Science Department, Chandigarh University, Gharuan, Punjab 2 Assistant Professor, Computer Science Department,
More informationRubicon: Scalable Bounded Verification of Web Applications
Joseph P. Near Research Statement My research focuses on developing domain-specific static analyses to improve software security and reliability. In contrast to existing approaches, my techniques leverage
More informationBug Inducing Analysis to Prevent Fault Prone Bug Fixes
Bug Inducing Analysis to Prevent Fault Prone Bug Fixes Haoyu Yang, Chen Wang, Qingkai Shi, Yang Feng, Zhenyu Chen State Key Laboratory for ovel Software Technology, anjing University, anjing, China Corresponding
More informationSoftware Evolution: An Empirical Study of Mozilla Firefox
Software Evolution: An Empirical Study of Mozilla Firefox Anita Ganpati Dr. Arvind Kalia Dr. Hardeep Singh Computer Science Dept. Computer Science Dept. Computer Sci. & Engg. Dept. Himachal Pradesh University,
More informationA Comparative Study on Different Version Control System
e-issn 2455 1392 Volume 2 Issue 6, June 2016 pp. 449 455 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com A Comparative Study on Different Version Control System Monika Nehete 1, Sagar Bhomkar
More informationGraph Sampling Approach for Reducing. Computational Complexity of. Large-Scale Social Network
Journal of Innovative Technology and Education, Vol. 3, 216, no. 1, 131-137 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/1.12988/jite.216.6828 Graph Sampling Approach for Reducing Computational Complexity
More informationGraph Structure Over Time
Graph Structure Over Time Observing how time alters the structure of the IEEE data set Priti Kumar Computer Science Rensselaer Polytechnic Institute Troy, NY Kumarp3@rpi.edu Abstract This paper examines
More informationCommit Guru: Analytics and Risk Prediction of Software Commits
Commit Guru: Analytics and Risk Prediction of Software Commits Christoffer Rosen, Ben Grawi Department of Software Engineering Rochester Institute of Technology Rochester, NY, USA {cbr4830, bjg1568}@rit.edu
More informationSoftware Metrics based on Coding Standards Violations
Software Metrics based on Coding Standards Violations Yasunari Takai, Takashi Kobayashi and Kiyoshi Agusa Graduate School of Information Science, Nagoya University Aichi, 464-8601, Japan takai@agusa.i.is.nagoya-u.ac.jp,
More informationA Study of Bad Smells in Code
International Journal for Science and Emerging ISSN No. (Online):2250-3641 Technologies with Latest Trends 7(1): 16-20 (2013) ISSN No. (Print): 2277-8136 A Study of Bad Smells in Code Gurpreet Singh* and
More informationFine-grained Software Version Control Based on a Program s Abstract Syntax Tree
Master Thesis Description and Schedule Fine-grained Software Version Control Based on a Program s Abstract Syntax Tree Martin Otth Supervisors: Prof. Dr. Peter Müller Dimitar Asenov Chair of Programming
More informationVisualizing the evolution of software using softchange
Visualizing the evolution of software using softchange Daniel M. German, Abram Hindle and Norman Jordan Software Engineering Group Department of Computer Science University of Victoria dmgerman,abez,njordan
More informationA data-driven framework for archiving and exploring social media data
A data-driven framework for archiving and exploring social media data Qunying Huang and Chen Xu Yongqi An, 20599957 Oct 18, 2016 Introduction Social media applications are widely deployed in various platforms
More informationA Detailed Examination of the Correlation Between Imports and Failure-Proneness of Software Components
A Detailed Examination of the Correlation Between Imports and Failure-Proneness of Software Components Ekwa Duala-Ekoko and Martin P. Robillard School of Computer Science, McGill University Montréal, Québec,
More informationAutomatic Labeling of Issues on Github A Machine learning Approach
Automatic Labeling of Issues on Github A Machine learning Approach Arun Kalyanasundaram December 15, 2014 ABSTRACT Companies spend hundreds of billions in software maintenance every year. Managing and
More informationEvolutionary Decision Trees and Software Metrics for Module Defects Identification
World Academy of Science, Engineering and Technology 38 008 Evolutionary Decision Trees and Software Metrics for Module Defects Identification Monica Chiş Abstract Software metric is a measure of some
More informationRequirements Engineering for Enterprise Systems
Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2001 Proceedings Americas Conference on Information Systems (AMCIS) December 2001 Requirements Engineering for Enterprise Systems
More informationThe Design Space of Software Development Methodologies
The Design Space of Software Development Methodologies Kadie Clancy, CS2310 Term Project I. INTRODUCTION The success of a software development project depends on the underlying framework used to plan and
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: Applying Machine Learning for Fault Prediction Using Software
More informationTowards The Adoption of Modern Software Development Approach: Component Based Software Engineering
Indian Journal of Science and Technology, Vol 9(32), DOI: 10.17485/ijst/2016/v9i32/100187, August 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Towards The Adoption of Modern Software Development
More informationBugzillaMetrics - Design of an adaptable tool for evaluating user-defined metric specifications on change requests
BugzillaMetrics - A tool for evaluating metric specifications on change requests BugzillaMetrics - Design of an adaptable tool for evaluating user-defined metric specifications on change requests Lars
More informationIMPACT OF DEPENDENCY GRAPH IN SOFTWARE TESTING
IMPACT OF DEPENDENCY GRAPH IN SOFTWARE TESTING Pardeep kaur 1 and Er. Rupinder Singh 2 1 Research Scholar, Dept. of Computer Science and Engineering, Chandigarh University, Gharuan, India (Email: Pardeepdharni664@gmail.com)
More informationWhite Box Testing with Object Oriented programming
White Box Testing with Object Oriented programming Mrs.Randive R.B 1, Mrs.Bansode S.Y 2 1,2 TPCT S College Of Engineering,Osmanabad Abstract:-Software testing is one of the best means to affirm the quality
More informationQuantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study
Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study Jadson Santos Department of Informatics and Applied Mathematics Federal University of Rio Grande do Norte, UFRN Natal,
More informationTowards Better Understanding of Software Quality Evolution Through Commit Impact Analysis
Towards Better Understanding of Software Quality Evolution Through Commit Impact Analysis Sponsor: DASD(SE) By Mr. Pooyan Behnamghader 5 th Annual SERC Doctoral Students Forum November 7, 2017 FHI 360
More informationCost Effectiveness of Programming Methods A Replication and Extension
A Replication and Extension Completed Research Paper Wenying Sun Computer Information Sciences Washburn University nan.sun@washburn.edu Hee Seok Nam Mathematics and Statistics Washburn University heeseok.nam@washburn.edu
More information24 th Annual Research Review
24 th Annual Research Review April 4-6 2017 Towards Better Understanding of Software Quality Evolution Through Commit-Impact Analysis Pooyan Behnamghader USC CSSE pbehnamg@usc.edu Commit-Impact Analysis
More informationIteration vs Recursion in Introduction to Programming Classes: An Empirical Study
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 16, No 4 Sofia 2016 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.1515/cait-2016-0068 Iteration vs Recursion in Introduction
More informationPre-Requisites: CS2510. NU Core Designations: AD
DS4100: Data Collection, Integration and Analysis Teaches how to collect data from multiple sources and integrate them into consistent data sets. Explains how to use semi-automated and automated classification
More informationBRANCH COVERAGE BASED TEST CASE PRIORITIZATION
BRANCH COVERAGE BASED TEST CASE PRIORITIZATION Arnaldo Marulitua Sinaga Department of Informatics, Faculty of Electronics and Informatics Engineering, Institut Teknologi Del, District Toba Samosir (Tobasa),
More informationMapping Bug Reports to Relevant Files and Automated Bug Assigning to the Developer Alphy Jose*, Aby Abahai T ABSTRACT I.
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 1 ISSN : 2456-3307 Mapping Bug Reports to Relevant Files and Automated
More informationFurther Thoughts on Precision
Further Thoughts on Precision David Gray, David Bowes, Neil Davey, Yi Sun and Bruce Christianson Abstract Background: There has been much discussion amongst automated software defect prediction researchers
More informationOpen Research Online The Open University s repository of research publications and other research outputs
Open Research Online The Open University s repository of research publications and other research outputs The Smart Book Recommender: An Ontology-Driven Application for Recommending Editorial Products
More informationEvolving SQL Queries for Data Mining
Evolving SQL Queries for Data Mining Majid Salim and Xin Yao School of Computer Science, The University of Birmingham Edgbaston, Birmingham B15 2TT, UK {msc30mms,x.yao}@cs.bham.ac.uk Abstract. This paper
More informationHOW AND WHEN TO FLATTEN JAVA CLASSES?
HOW AND WHEN TO FLATTEN JAVA CLASSES? Jehad Al Dallal Department of Information Science, P.O. Box 5969, Safat 13060, Kuwait ABSTRACT Improving modularity and reusability are two key objectives in object-oriented
More informationFrom Passages into Elements in XML Retrieval
From Passages into Elements in XML Retrieval Kelly Y. Itakura David R. Cheriton School of Computer Science, University of Waterloo 200 Univ. Ave. W. Waterloo, ON, Canada yitakura@cs.uwaterloo.ca Charles
More informationQoS Management of Web Services
QoS Management of Web Services Zibin Zheng (Ben) Supervisor: Prof. Michael R. Lyu Department of Computer Science & Engineering The Chinese University of Hong Kong Dec. 10, 2010 Outline Introduction Web
More informationRobustness of Centrality Measures for Small-World Networks Containing Systematic Error
Robustness of Centrality Measures for Small-World Networks Containing Systematic Error Amanda Lannie Analytical Systems Branch, Air Force Research Laboratory, NY, USA Abstract Social network analysis is
More informationDevNet: Exploring Developer Collaboration in Heterogeneous Networks of Bug Repositories
DevNet: Exploring Collaboration in Heterogeneous Networks of Bug Repositories Song Wang, Wen Zhang, 3, Ye Yang, 2, Qing Wang, 2 Laboratory for Internet Software Technologies, Institute of Software, Chinese
More informationBug Triaging: Profile Oriented Developer Recommendation
Bug Triaging: Profile Oriented Developer Recommendation Anjali Sandeep Kumar Singh Department of Computer Science and Engineering, Jaypee Institute of Information Technology Abstract Software bugs are
More informationDEVELOPING AN INTELLIGENCE ANALYSIS PROCESS THROUGH SOCIAL NETWORK ANALYSIS
DEVELOPING AN INTELLIGENCE ANALYSIS PROCESS THROUGH SOCIAL NETWORK ANALYSIS Todd Waskiewicz and Peter LaMonica Air Force Research Laboratory Information and Intelligence Exploitation Division {Todd.Waskiewicz,
More informationChurrasco: Supporting Collaborative Software Evolution Analysis
Churrasco: Supporting Collaborative Software Evolution Analysis Marco D Ambros a, Michele Lanza a a REVEAL @ Faculty of Informatics - University of Lugano, Switzerland Abstract Analyzing the evolution
More informationAn Empirical Study of the Effect of File Editing Patterns on Software Quality
An Empirical Study of the Effect of File Editing Patterns on Software Quality Feng Zhang, Foutse Khomh, Ying Zou, and Ahmed E. Hassan School of Computing, Queen s University, Canada {feng, ahmed}@cs.queensu.ca
More informationMining Crash Fix Patterns
Mining Crash Fix Patterns Jaechang Nam and Ning Chen Department of Computer Science and Engineering The Hong Kong University of Science and Technology China {jcnam,ning@cse.ust.hk ABSTRACT During the life
More informationAutomatized Generating of GUIs for Domain-Specific Languages
Automatized Generating of GUIs for Domain-Specific Languages Michaela Bačíková, Dominik Lakatoš, and Milan Nosáľ Technical University of Košice, Letná 9, 04200 Košice, Slovakia, (michaela.bacikova, dominik.lakatos,
More informationAn study of the concepts necessary to create, as well as the implementation of, a flexible data processing and reporting engine for large datasets.
An study of the concepts necessary to create, as well as the implementation of, a flexible data processing and reporting engine for large datasets. Ignus van Zyl 1 Statement of problem Network telescopes
More informationInternational Journal for Management Science And Technology (IJMST)
Volume 4; Issue 03 Manuscript- 1 ISSN: 2320-8848 (Online) ISSN: 2321-0362 (Print) International Journal for Management Science And Technology (IJMST) GENERATION OF SOURCE CODE SUMMARY BY AUTOMATIC IDENTIFICATION
More informationBugMaps-Granger: A Tool for Causality Analysis between Source Code Metrics and Bugs
BugMaps-Granger: A Tool for Causality Analysis between Source Code Metrics and Bugs Cesar Couto, Pedro Pires, Marco Tulio Valente, Roberto Bigonha, Andre Hora, Nicolas Anquetil To cite this version: Cesar
More informationAppendix to The Health of Software Engineering Research
Appendix to The Health of Software Engineering Research David Lo School of Information Systems Singapore Management University Singapore davidlo@smu.edu.sg Nachiappan Nagappan and Thomas Zimmermann Research
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
Volume 3, Issue 5, May 213 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Code Reusability
More informationEmployer Reporting. User Guide
Employer Reporting This user guide provides an overview of features supported by the Employer Reporting application and instructions for viewing, customizing, and exporting reports. System Requirements
More informationINTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) NEED FOR DESIGN PATTERNS AND FRAMEWORKS FOR QUALITY SOFTWARE DEVELOPMENT
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)
More informationHybrid Feature Selection for Modeling Intrusion Detection Systems
Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,
More informationDecision Making Procedure: Applications of IBM SPSS Cluster Analysis and Decision Tree
World Applied Sciences Journal 21 (8): 1207-1212, 2013 ISSN 1818-4952 IDOSI Publications, 2013 DOI: 10.5829/idosi.wasj.2013.21.8.2913 Decision Making Procedure: Applications of IBM SPSS Cluster Analysis
More information(S)LOC Count Evolution for Selected OSS Projects. Tik Report 315
(S)LOC Count Evolution for Selected OSS Projects Tik Report 315 Arno Wagner arno@wagner.name December 11, 009 Abstract We measure the dynamics in project code size for several large open source projects,
More informationPrimitives for Active Internet Topology Mapping: Toward High-Frequency Characterization
Primitives for Active Internet Topology Mapping: Toward High-Frequency Characterization Robert Beverly, Arthur Berger, Geoffrey Xie Naval Postgraduate School MIT/Akamai February 9, 2011 CAIDA Workshop
More information3 Prioritization of Code Anomalies
32 3 Prioritization of Code Anomalies By implementing a mechanism for detecting architecturally relevant code anomalies, we are already able to outline to developers which anomalies should be dealt with
More informationINTELLIGENT SUPERMARKET USING APRIORI
INTELLIGENT SUPERMARKET USING APRIORI Kasturi Medhekar 1, Arpita Mishra 2, Needhi Kore 3, Nilesh Dave 4 1,2,3,4Student, 3 rd year Diploma, Computer Engineering Department, Thakur Polytechnic, Mumbai, Maharashtra,
More informationTERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES
TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.
More informationMulti-Project Software Engineering: An Example
Multi-Project Software Engineering: An Example Pankaj K Garg garg@zeesource.net Zee Source 1684 Nightingale Avenue, Suite 201, Sunnyvale, CA 94087, USA Thomas Gschwind tom@infosys.tuwien.ac.at Technische
More informationPredicting Bugs. by Analyzing History. Sunghun Kim Research On Program Analysis System Seoul National University
Predicting Bugs by Analyzing History Sunghun Kim Research On Program Analysis System Seoul National University Around the World in 80 days Around the World in 8 years Predicting Bugs Severe consequences
More informationA Study on a Development Environment for Software Traceability Management
1,a) 1,b) 1,c) TERAS A Study on a Development Environment for Software Traceability Management NORITOSHI ATSUMI 1,a) TAKASHI KOBAYASHI 1,b) HIROAKI TAKADA 1,c) Abstract: Software traceability management
More informationA Firewall Architecture to Enhance Performance of Enterprise Network
A Firewall Architecture to Enhance Performance of Enterprise Network Hailu Tegenaw HiLCoE, Computer Science Programme, Ethiopia Commercial Bank of Ethiopia, Ethiopia hailutegenaw@yahoo.com Mesfin Kifle
More informationIntroduction to Data Mining
Introduction to JULY 2011 Afsaneh Yazdani What motivated? Wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge What motivated? Data
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationAn Empirical Analysis of Communities in Real-World Networks
An Empirical Analysis of Communities in Real-World Networks Chuan Sheng Foo Computer Science Department Stanford University csfoo@cs.stanford.edu ABSTRACT Little work has been done on the characterization
More informationCHAPTER 4 HUMAN FACTOR BASED USER INTERFACE DESIGN
CHAPTER 4 HUMAN FACTOR BASED USER INTERFACE DESIGN 4.1 Introduction Today one of the most important concerns is how to use the system with effectiveness, efficiency and satisfaction. The ease or comfort
More informationSoftware Quality Understanding by Analysis of Abundant Data (SQUAAD)
Software Quality Understanding by Analysis of Abundant Data (SQUAAD) By Pooyan Behnamghader Advisor: Barry Boehm ARR 2018 March 13, 2018 1 Outline Motivation Software Quality Evolution Challenges SQUAAD
More informationStandard Glossary of Terms used in Software Testing. Version 3.2. Foundation Extension - Usability Terms
Standard Glossary of Terms used in Software Testing Version 3.2 Foundation Extension - Usability Terms International Software Testing Qualifications Board Copyright Notice This document may be copied in
More informationA Comparison of Maps Application Programming Interfaces
A Comparison of Maps Application Programming Interfaces Ana Isabel Fernandes, Miguel Goulão, Armanda Rodrigues CITI/FCT, Universidade Nova de Lisboa Quinta da Torre, 2829-516 CAPARICA, PORTUGAL ai.fernandes@campus.fct.unl.pt,
More informationBetter Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web
Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Thaer Samar 1, Alejandro Bellogín 2, and Arjen P. de Vries 1 1 Centrum Wiskunde & Informatica, {samar,arjen}@cwi.nl
More informationA Framework for Source Code metrics
A Framework for Source Code metrics Neli Maneva, Nikolay Grozev, Delyan Lilov Abstract: The paper presents our approach to the systematic and tool-supported source code measurement for quality analysis.
More informationData Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005
Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Abstract Deciding on which algorithm to use, in terms of which is the most effective and accurate
More informationStructural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques
Structural Analysis of Paper Citation and Co-Authorship Networks using Network Analysis Techniques Kouhei Sugiyama, Hiroyuki Ohsaki and Makoto Imase Graduate School of Information Science and Technology,
More informationComparison of FP tree and Apriori Algorithm
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn: 2278-800X, www.ijerd.com Volume 10, Issue 6 (June 2014), PP.78-82 Comparison of FP tree and Apriori Algorithm Prashasti
More informationSNS College of Technology, Coimbatore, India
Support Vector Machine: An efficient classifier for Method Level Bug Prediction using Information Gain 1 M.Vaijayanthi and 2 M. Nithya, 1,2 Assistant Professor, Department of Computer Science and Engineering,
More informationAchieving Right Automation Balance in Agile Projects
Achieving Right Automation Balance in Agile Projects Vijayagopal Narayanan Vijayagopal.n@cognizant.com Abstract When is testing complete and How much testing is sufficient is a fundamental questions that
More informationEVALUATION OF THE USABILITY OF EDUCATIONAL WEB MEDIA: A CASE STUDY OF GROU.PS
EVALUATION OF THE USABILITY OF EDUCATIONAL WEB MEDIA: A CASE STUDY OF GROU.PS Turgay Baş, Hakan Tüzün Hacettepe University (TURKEY) turgaybas@hacettepe.edu.tr, htuzun@hacettepe.edu.tr Abstract In this
More informationStatistical Modeling of Huffman Tables Coding
Statistical Modeling of Huffman Tables Coding S. Battiato 1, C. Bosco 1, A. Bruna 2, G. Di Blasi 1, and G.Gallo 1 1 D.M.I. University of Catania - Viale A. Doria 6, 95125, Catania, Italy {battiato, bosco,
More informationApplying Auto-Data Classification Techniques for Large Data Sets
SESSION ID: PDAC-W02 Applying Auto-Data Classification Techniques for Large Data Sets Anchit Arora Program Manager InfoSec, Cisco The proliferation of data and increase in complexity 1995 2006 2014 2020
More informationAutomatic Merging of Specification Documents in a Parallel Development Environment
Automatic Merging of Specification Documents in a Parallel Development Environment Rickard Böttcher Linus Karnland Department of Computer Science Lund University, Faculty of Engineering December 16, 2008
More informationA Navigation-log based Web Mining Application to Profile the Interests of Users Accessing the Web of Bidasoa Turismo
A Navigation-log based Web Mining Application to Profile the Interests of Users Accessing the Web of Bidasoa Turismo Olatz Arbelaitz, Ibai Gurrutxaga, Aizea Lojo, Javier Muguerza, Jesús M. Pérez and Iñigo
More informationEmpirical Software Engineering. Empirical Software Engineering with Examples. Classification. Software Quality. precision = TP/(TP + FP)
Empirical Software Engineering Empirical Software Engineering with Examples a sub-domain of software engineering focusing on experiments on software systems devise experiments on software, in collecting
More informationAn Information-Theoretic Approach to the Prepruning of Classification Rules
An Information-Theoretic Approach to the Prepruning of Classification Rules Max Bramer University of Portsmouth, Portsmouth, UK Abstract: Keywords: The automatic induction of classification rules from
More informationDEVELOPING A COMPLEXITY METRIC FOR INNER CLASSES
DEVELOPING A COMPLEXITY METRIC FOR INNER CLASSES 1 SIM HUI TEE, 2 RODZIAH ATAN, 3 ABDUL AZIM ABD GHANI 1 Faculty of Creative Multimedia, Multimedia University, Cyberjaya, Malaysia 2,3 Faculty of Computer
More informationA P2P-based Incremental Web Ranking Algorithm
A P2P-based Incremental Web Ranking Algorithm Sumalee Sangamuang Pruet Boonma Juggapong Natwichai Computer Engineering Department Faculty of Engineering, Chiang Mai University, Thailand sangamuang.s@gmail.com,
More informationUnderstanding Semantic Impact of Source Code Changes: an Empirical Study
Understanding Semantic Impact of Source Code Changes: an Empirical Study Danhua Shao, Sarfraz Khurshid, and Dewayne E. Perry Electrical and Computer Engineering, The University of Texas at Austin {dshao,
More informationA Prospect of Websites Evaluation Tools Based on Event Logs
A Prospect of Websites Evaluation Tools Based on Event Logs Vagner Figuerêdo de Santana 1, and M. Cecilia C. Baranauskas 2 1 Institute of Computing, UNICAMP, Brazil, v069306@dac.unicamp.br 2 Institute
More information