Duplication de code: un défi pour l assurance qualité des logiciels?

Similar documents
Automatic Identification of Important Clones for Refactoring and Tracking

An Empirical Study of the Effect of File Editing Patterns on Software Quality. Feng Zhang, Foutse Khomh, Ying Zou and Ahmed E.

An Empirical Study on Clone Stability

Research Article An Empirical Study on the Impact of Duplicate Code

Mining the Relationship between Anti-patterns Dependencies and Fault-Proneness

An Automatic Framework for Extracting and Classifying Near-Miss Clone Genealogies

Searching for Configurations in Clone Evaluation A Replication Study

An empirical study of the effect of file editing patterns on software quality

Software Clone Detection. Kevin Tang Mar. 29, 2012

Sub-clones: Considering the Part Rather than the Whole

Analysing Anti-patterns Static Relationships with Design Patterns

Rearranging the Order of Program Statements for Code Clone Detection

Similarity management of 'cloned and owned' variants

Empirical Software Engineering. Empirical Software Engineering with Examples. Classification. Software Quality. precision = TP/(TP + FP)

Falsification: An Advanced Tool for Detection of Duplex Code

On the Stability of Software Clones: A Genealogy-Based Empirical Study

Sub-clones: Considering the Part Rather than the Whole

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

Empirical Software Engineering. Empirical Software Engineering with Examples! is not a topic for examination. Classification.

Where Should the Bugs Be Fixed?

Analysis of Coding Patterns over Software Versions

CSC 2700: Scientific Computing

3 Prioritization of Code Anomalies

Management. Software Quality. Dr. Stefan Wagner Technische Universität München. Garching 28 May 2010

Compilers and Interpreters

An Empirical Study of the Effect of File Editing Patterns on Software Quality

When does a Refactoring Induce Bugs? An Empirical Study

Empirical Study on Impact of Developer Collaboration on Source Code

Context-Based Detection of Clone-Related Bugs. Lingxiao Jiang, Zhendong Su, Edwin Chiu University of California at Davis

A Measurement of Similarity to Identify Identical Code Clones

PAPER CLCMiner: Detecting Cross-Language Clones without Intermediates

An annotation-centric approach to similarity management

arxiv: v1 [cs.se] 25 Mar 2014

Lecture 25 Clone Detection CCFinder. EE 382V Spring 2009 Software Evolution - Instructor Miryung Kim

An Experience Report on Analyzing Industrial Software Systems Using Code Clone Detection Techniques

Optimization of Query Processing in XML Document Using Association and Path Based Indexing

Cross-project defect prediction. Thomas Zimmermann Microsoft Research

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

Large-Scale Clone Detection and Benchmarking

CMPT 473 Software Quality Assurance. Managing Bugs. Nick Sumner

CLOUD-SCALE FILE SYSTEMS

A Novel Ontology Metric Approach for Code Clone Detection Using FusionTechnique

Insight into a Method Co-change Pattern to Identify Highly Coupled Methods: An Empirical Study

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Assuring Certainty through Effective Regression Testing. Vishvesh Arumugam

Version control CSE 403

DCC / ICEx / UFMG. Software Code Clone. Eduardo Figueiredo.

Proceedings of the Eighth International Workshop on Software Clones (IWSC 2014)

ZFS Async Replication Enhancements Richard Morris Principal Software Engineer, Oracle Peter Cudhea Principal Software Engineer, Oracle

Practical Byzantine Fault Tolerance (The Byzantine Generals Problem)

Master Thesis. Evaluating the Effect of Return Null on Maintenance

Detecting and Quantifying Different Types of Self-Admitted Technical Debt

NFSv4 as the Building Block for Fault Tolerant Applications

... Fisheye Crucible Bamboo

Intro Git Advices. Using Git. Matthieu Moy. Matthieu Moy Git 2016 < 1 / 11 >

Chapter 11, Testing. Using UML, Patterns, and Java. Object-Oriented Software Engineering

Subversion Repository Layout

To Enhance Type 4 Clone Detection in Clone Testing Swati Sharma #1, Priyanka Mehta #2 1 M.Tech Scholar,

Version Control. Kyungbaek Kim. Chonnam National University School of Electronics and Computer Engineering. Original slides from James Brucker

Continuous Integration / Continuous Testing

Finding Extract Method Refactoring Opportunities by Analyzing Development History

Distributed File Systems II

Version control CSE 403

1/30/18. Overview. Code Clones. Code Clone Categorization. Code Clones. Code Clone Categorization. Key Points of Code Clones

Distributed KIDS Labs 1

Improving Bug Management using Correlations in Crash Reports

Fault, Error, and Failure

Co-Evolving Code-Related and Database-Related Changes in a Data-Intensive Software System

SOLUTION BRIEF CA TEST DATA MANAGER FOR HPE ALM. CA Test Data Manager for HPE ALM

Identifying Changed Source Code Lines from Version Repositories

Inheritance Usage Patterns in Open-Source Systems. Jamie Stevenson and Murray Wood. University of Strathclyde, Glasgow, UK

Thanks for Live Snapshots, Where's Live Merge?

Evolutionary Algorithms. CS Evolutionary Algorithms 1

QTEP: Quality-Aware Test Case Prioritization

Software Clone Detection and Refactoring

Analysis Tool Project

Classification of Java Programs in SPARS-J. Kazuo Kobori, Tetsuo Yamamoto, Makoto Matsusita and Katsuro Inoue Osaka University

[ANALYSIS ASSIGNMENT 10]

Chapter 11 Database Concepts

Bug Inducing Analysis to Prevent Fault Prone Bug Fixes

PAPER Proposing and Evaluating Clone Detection Approaches with Preprocessing Input Source Files

Empirical Study on Impact of Developer Collaboration on Source Code

db4o: Part 2 Configuration and Tuning, Distribution and Replication Schema Evolution: Refactoring, Inheritance Evolution Callbacks and Translators

Team Collaboration with. TestingWhiz. TestingWhiz

M E R C U R I A L (The Source Control Management)

Export generates an empty file

A METRIC BASED EVALUATION OF TEST CASE PRIORITATION TECHNIQUES- HILL CLIMBING, REACTIVE GRASP AND TABUSEARCH

An Exploratory Study on Interface Similarities in Code Clones

Comparing Approaches to Analyze Refactoring Activity on Software Repositories

OPS REVIEW [Name of Org]

Predicting Bugs. by Analyzing History. Sunghun Kim Research On Program Analysis System Seoul National University

SANER 17. Klagenfurt, Austria

APIEvolutionMiner: Keeping API Evolution under Control

Detection and Behavior Identification of Higher-Level Clones in Software

FSE Clone- Based and Interactive Recommendation for Modifying Pasted Code

An Empirical Study of Crash-inducing Commits in Mozilla Firefox

Slice Intelligence!

COURSE 11 DESIGN PATTERNS

A Characterization Study of Repeated Bug Fixes

White-Box Testing Techniques

Transcription:

Duplication de code: un défi pour l assurance qualité des logiciels? Foutse Khomh S.W.A.T http://swat.polymtl.ca/

2

JHotDraw 3

Code duplication can be 4

Example of code duplication Duplication to experiment without risking the stability of the system 5

8 General cloning patterns 6

Hardware variations 7

Platform variations 8

Experimental variations 9

Boiler-plating 10

API/Library protocols 11

Language idioms 12

Bug workarounds 13

Replicate & Specialize 14

Types of Code Clone Clone Types Clone A Clone B Type-1 Type-2 Type-3 for( int j = 0; j < 5; j ++ ){ sum = sum + a[j]; } for( int j = 0; j < 5; j ++ ){ sum = sum + a[j]; } for( int j = 0; j < 5; j ++ ){ sum = sum + a[j]; } for( int j = 0; j < 5; j ++ ){ sum = sum + a[j]; } for( int id = 0; id < 5; id ++ ){ sum = sum + a[id]; } for( int id = 0; id < 5; id ++ ){ sum = sum + a[id]; d = sum * c; } 15

Is cloning a good practice? 16

Cloning can introduce bugs Cloning increase maintenance effort 17

Late Propagation in Software Clones 18

Clone Evolution Clone Genealogies The change history of a clone pair Consistent Change One or both of the clones changes, but the clone pair is preserved Inconsistent Change One or both of the clones changes independently, destroying the clone pair Consistent Change Inconsistent Change Clone Genealogy Clone A Clone B 19

Late Propagation (LP) Two Steps: 1. An inconsistent change that diverges the clone pair Diverging Change Clone A Clone B 2. A consistent change that re-synchronizes the clone pair Re-synchronizing Change 20

Why is Late Propagation Risky? Late Propagation can be risky because failure to propagate changes between clones in a clone pair can lead to faults Diverging Change Clone A Clone B 8-21% of genealogies contain a Late Propagation Re-synchronizing Change 21

Late Propagation With Propagation Example from ArgoUML Revision Clone A Clone B 595 add Field(new UMLComboBox(typeModel), 1,0,0); add Field(new UMLComboBox(classifierModel), 2,0,0); 602 604 add Field(new UMLComboBoxNavigator (this, NavClass, new UMLComboBox(typeModel)), 1,0,0); add Field(new UMLComboBoxNavigator (this, NavClass, new UMLComboBox(classifierModel)), 2,0,0); 22

Late Propagation Without Propagation Example from ArgoUML Revision Clone A Clone B 270250 270264 if( destfile == null ) { destfile = new File(destDir,file.getName()); } if ( m_destfile == null ) { m_destfile = new File(m_destDir,m_file.getName() ); } if (destfile == null ) { destfile = new File(destDir,file.getName()); } 271109 if ( destfile == null ) { destfile = new File(destDir,file.getName()); } 23

Types of Late Propagation Propagation Category Propagation Always Occurs Propagation May or May Not Occur Propagation Never Occurs LP Type Modified During Diverging Change Modified During the Period of Divergence Modified During Re-synchronizing Change LP1 A A B LP2 A A and B B LP3 A A A and B LP4 A A and B A LP5 A A and B A and B LP6 A and B A and B A or B LP7 A and B A and B A and B LP8 A A A 24

Research Questions RQ1: Are there different types of Late Propagation? RQ2: Are some types of Late Propagation more faultprone than others? RQ3: Which type of Late Propagation experiences the highest proportion of faults? 25

Subject Systems System # LOC # Revisions # Gen CCFinder # LP CCFinder # Gen Simian # LP Simian ArgoUML 3.1M 18k 14k 1.1k 111 23 Ant 2.3M 1.0M 30k 4.7k 461 80 26

Our Approach 27

Mining the SVN Use J-Rex to mine the SVN Heuristics used to identify the reason for a commit (Mockus et al., 2000) Snapshots of all revisions to each Java file are stored in an XML file Test files are removed 28

Detecting Clones Contents of each method revision extracted into individual files Perform clone detection once on all snapshots Two existing clone detection tools are used Simian (text-based) and CCFinder (token-based) 29

Building Clone Genealogies Build clone genealogies using the existing clone list Query the SVN using diff to track changes to each clone in a clone pair over time If a change modifies one of the clones in a clone pair, query the clone list for a matching clone 30

RQ1: Are there different types of Late Propagation? 31

RQ1: Are there different types of Late Percentage of All LP Occurrences 80% 70% 60% 50% 40% 30% 20% 10% 0% Propagation? Breakdown of LP Type by System LP1 LP2 LP3 LP4 LP5 LP6 LP7 LP8 LP Types ArgoUML - Simian ArgoUML - CCFinder Ant - Simian Ant - CCFinder There is representation from multiple types of Late Propagation and across all categories of Late Propagation 32

RQ2: Are some types of Late Propagation more fault-prone than others? Part 1: Is Late Propagation fault-prone? Part 2: Are specific types of Late Propagation more fault-prone? 33

Fisher's exact test Analysis Method Odds Ratio (OR) OR = p q /(1 /(1 p) q) 34

Part 1: Is Late Propagation Fault-prone? Odds Ratio 4 3 2 1 0 LP vs. Non-LP Odds Ratios Ant - Simian ArgoUML - CCFinder Ant - CCFinder ArgoUML Simian is omitted because it is not statistically significant In all significant cases, the odds ratio is greater than 1. Therefore, Late Propagation genealogies are more fault prone than non-late Propagation genealogies 35

Part 2: Are specific types of Late Propagation more fault-prone? Odds Ratio 16 14 12 10 8 6 4 2 0 Odds Ratios Between Each LP Type and Non-LP Genealogies LP1 LP2 LP3 LP4 LP5 LP6 LP7 LP8 LP Type Ant - Simian ArgoUML - CCFinder Ant - CCFinder Note: ArgoUML Simian is omitted because it is not statistically significant 36

RQ2 Observations In general, some Late Propagation types are not more fault-prone than non-late Propagation genealogies (i.e. odds ratio < 1) Some types that make up a small proportion of Late Propagation instances have a very high odds ratio LP7 and LP8 occur frequently but have low odds ratios Each type of Late Propagation has a different level of fault-proneness 37

RQ3: Which type of Late Propagation experiences the highest proportion of faults? 38

RQ3: Which type of Late Propagation experiences the highest proportion of Percentage of Fault Occurrences 80% 60% 40% 20% 0% faults? Percentage of Fault Occurrences Broken Down by LP Type LP1 LP2 LP3 LP4 LP5 LP6 LP7 LP8 LP Type Ant - Simian ArgoUML - CCFinder Ant - CCFinder Note: ArgoUML Simian is omitted because it is not statistically significant 39

RQ3 Observations LP7 and LP8 contribute a large proportion of the faults but have lower odds ratios (RQ2) When faults occur, they occur in large numbers Overall, LP7 and LP8 are the most dangerous, with the other types being system dependent in their fault-proneness The proportion of faults is different for each Late Propagation type 40

Discussion In general, Late Propagation genealogies are more fault-prone than non-late Propagation genealogies LP7 and LP8 are the riskiest, in terms of their faultproneness and magnitude of faults LP8 contains no propagation of changes LP7 may or may not contain any propagation of changes The fault-proneness and fault-occurrence is dependent on the Late Propagation type and is system-dependent 41

Mutation and Migration occur during Clone Evolution Clone mutation Clone migration These two phenomena further increase the risk of bugs in Late propagation genealogies 42

43

Acknowledgment Cloning patterns illustrations are from Nicolas Bettenburg, Cloning Considered Harmful Considered Harmful? 44