MTAT.03.159: Software Testing Lecture 04: Static Analysis (Inspection) and Defect Estimation, Mutation Testing (Textbook Ch. 10 & 12) Spring 2015 Dietmar Pfahl email: dietmar.pfahl@ut.ee
Structure of Lecture 04 Static Analysis Defect Estimation Lab 4 -- Mutation Testing
Static Analysis Document Review (manual) Different types Static Code Analysis (automatic) Structural properties / metrics / etc.
Static Analysis Document Review (manual) Different types Lab 4 Static Code Analysis (automatic) Structural properties / metrics / etc. Lab 5
Question What is better? Review (Inspection) or Test?
Review Metrics Basic Size of review items Pages, LOC Review time & effort Number of defects found Number of slipping defects found later Derived Defects found per review time or effort Defects found per artifact (review item) size Size per time or effort
Empirical Results: Inspection & Test Source: Runeson, P.; Andersson, C.; Thelin, T.; Andrews, A.; Berling, T.;, "What do we know about defect detection methods?, IEEE Software, vol.23, no.3, pp. 82-90, May-June 2006
Empirical Results: Inspection & Test Code Design Source: Runeson, P.; Andersson, C.; Thelin, T.; Andrews, A.; Berling, T.;, "What do we know about defect detection methods?, IEEE Software, vol.23, no.3, pp. 82-90, May-June 2006
Empirical Results: Inspection & Test Effectiveness Code Efficiency Design Source: Runeson, P.; Andersson, C.; Thelin, T.; Andrews, A.; Berling, T.;, "What do we know about defect detection methods?, IEEE Software, vol.23, no.3, pp. 82-90, May-June 2006
Empirical Results: Inspection & Test Effectiveness Efficiency 1st Entry: Doc Reading 2nd Entry: Formal Inspection 1st Entry: Functional Testing 2nd Entry: Structural Testing Code Design Source: Runeson, P.; Andersson, C.; Thelin, T.; Andrews, A.; Berling, T.;, "What do we know about defect detection methods?, IEEE Software, vol.23, no.3, pp. 82-90, May-June 2006
Empirical Results: Inspection & Test Source: Runeson, P.; Andersson, C.; Thelin, T.; Andrews, A.; Berling, T.;, "What do we know about defect detection methods?, IEEE Software, vol.23, no.3, pp. 82-90, May-June 2006
Inspections Empirical Results Requirements defects no data; but: reviews potentially good since finding defects early is cheaper Design defects reviews are both more efficient and more effective than testing Code defects - functional or structural testing is more efficient and effective than reviews May be complementary regarding types of faults Generally, reported effectiveness is low Reviews find 25-50% of an artifact s defects Testing finds 30-60% of defects in the code
Defect Containment Measures Reviews Test Usage Fixing Cost per Defect Total Fixing Cost
Relative Cost of Faults Maintenance 200 Source: Davis, A.M., Software Requirements: analysis and specification (1990)
Reviews complement testing
Summary: Why Review? Main objective Detect faults Other objectives Inform Educate Learn from (other s) mistakes Improve! (Undetected) faults may affect software quality negatively during all steps of the development process!
Defect Causal Analysis (DCA) Organizational Processes define Software Constr. (Analyse / Design / Code / Rework) Artifact Defect (Fault) Detection (Review / Test) prioritize & implement actions Action Team Meeting fix defects propose actions Defect Database Causal Analysis Meeting find defects extract sample of defects
Inspection Process Origin: Michael Fagan (IBM, early 1970 s) Approach: Checklist-based Phases Overview, Preparation, Meeting, Rework, Follow-up Fault searching at meeting! Synergy Roles Author (designer), reader (coder), tester, moderator Classification Major and minor
Reading Techniques Ad hoc Checklist-based Defect-based Usage-based Perspective-based
Ad-hoc / Checklist-based / Defectbased Reading
Usage-Based Reading Source: Thelin, T.; Runeson, P..; Wohlin, C.; Prioritized Use Cases as a Vehicle for Software Inspections, IEEE Software, vol. 20, no.4, pp. 30-33, July-August 2003 1 Prioritize Use Cases (UCs) 2 Select UC with highest priority 3 Track UC s scenario through the document under review 4 Check whether UC s goals are fulfilled, needed funtionality provided, interfaces are correct, and so on (report issues detected) 5 Select next UC
Usage-Based Reading Source: Thelin, T.; Runeson, P..; Wohlin, C.; Prioritized Use Cases as a Vehicle for Software Inspections, IEEE Software, vol. 20, no.4, pp. 30-33, July-August 2003 Comparison of UBR with Checklist-Based Reading (CBR)
Perspective-based Reading User Designer Tester Scenarios Purpose Decrease overlap (redundancy) Improve effectiveness
Structure of Lecture 04 Static Analysis Defect Estimation Lab 4 -- Mutation Testing
Quality Prediction Quality defined as: Undetected #Defects Based on product and process properties Quality = Function(Code Size or Complexity) Quality = Function(Code Changes) Quality = Function(Test Effort) Quality = Function(Detected #Defects) Based on detected defects Capture-Recapture Models Reliability Growth Models
Capture-Recapture Defect Estimation
Capture-Recapture Defect Estimation
Capture-Recapture Defect Estimation Situation: Two inspectors are assigned to inspect the same product d 1 : #defects detected by Inspector 1 d 2 : #defects detected by Inspector 2 d 12 : #defects by both inspectors N t : total #defects (detected and undetected) N r : remaining #defects (undetected) N t d 1 d d 12 2 N r N t ( d d d 1 2 12 )
Capture-Recapture Example Situation: Two inspectors are assigned to inspect the same product d 1 : 50 defects detected by Inspector 1 d 2 : 40 defects detected by Inspector 2 d 12 : 20 defects by both inspectors N t : total defects (detected and undetected) N r : remaining defects (undetected) d d2 50 40 N 1 100 20 t N 100 (50 40 20) 30 d r 12
Advanced Capture-Recapture Models Four basic models used for inspections Difference: Degrees of freedom Prerequisites for all models Bonus Task in Lab 4 All reviewers work independently of each other It is not allowed to inject or remove faults during inspection
Advanced Capture-Recapture Models Model Probability of defect being found is equal across... Defect Reviewer Estimator M0 Yes Yes Maximum-likelihood Mt Yes No Maximum-likelihood Chao s estimator Mh No Yes Jackknife Chao s estimator Mth No No Chao s estimator
Mt Model Maximum-likelihood: Mt = total marked animals (=faults) at the start of the t'th sampling interval Ct = total number of animals (=faults) sampled during interval t Rt = number of recaptures in the sample Ct An approximation of the maximum likelihood estimate of population size (N) is: SUM(Ct*Mt)/SUM(Rt) First resampling: M1=50 (first inspector) C1=40 (second inspector) R1=20 (duplicates) N=40*50/20=100 Second resampling: M2=70 (first and second inspector) C2=40 (third inspector) R2=30 (duplicates) N=(40*50+40*70)/(20+30)=4800/50=96 Third resampling: M3=80 (first, second and third inspector) C3=30 (fourth inspector) R3=30 (duplicates) N=(2000+2800+30*80)/(20+30+30)=7200/80=90
Structure of Lecture 04 Static Analysis Defect Estimation Lab 4 -- Mutation Testing
Lab 4 Document Inspection & Defect Prediction Lab 4 (week 31: Mar 28 Mar 31) Doc. Inspection (10%) Lab 4 Instructions Lab 4 & Sample Documentation Submission Deadlines: Monday Lab: Sunday, 03 Apr, 23:59 Tuesday Labs: Monday, 04 Apr, 23:59 Wednesday Labs: Tuesday, 05 Apr, 23:59 Thursday Labs: Wednesday, 06 Apr, 23:59 Penalties apply for late delivery: 50% penalty, if submitted up to 24 hours late; 100 penalty, if submitted more than 24 hours late Instructions Documentation: Requirements List (User Stories) Specification - 2 Screens - 1 Text
Lab 4 Document Inspection & Defect Prediction (cont d) Phase A: Individual student work Phase B: Pair work Instructions Requirements (6 User Stories)? Issue List Student 1 Issue List Student 2 Inspection of Specification against Requiremnts Specification (excerpt) 2 screens & Text Student Pair Issue List (at least 8 defects) 1 Student Consolidated Issue List Table columns: ID, Description, Location, Type, Severity Remaining Defects Estimation
Lab 4 Document Inspection & Defect Prediction (cont d) Lab 4 (week 31: Mar 30 Apr 02) Defect Estimation (4% Bonus) Lab 4 Bonus Task Instructions Lab 4 & SciLab Scripts Phase C: SciLab scripts Consolidated Issue List (from Phase B) Script 1 (M0) Script 2 (Mt) Script 3 (Mh) Input File (Defects) Report: - Input Data - 3 Estimates - Discussion of Assumptions
Structure of Lecture 04 Static Analysis Defect Estimation Lab 4 -- Mutation Testing
Mutation Testing (Defect-Based Testing)
Assessing Test Suite Quality Idea I make n copies of my program, each copy with a known number m n of (unique) faults Assume introduced faults are exactly like real faults in every way I run my test suite on the programs with seeded faults...... and the tests reveal 20% of the introduced faults What can I infer about my test suite?
Mutation Testing Procedure 1. Take a program and test data generated for that program 2. Create a number of similar programs (mutants), each differing from the original in a small way 3. The original test data are then run through the mutants 4. If tests detect all changes in mutants, then the mutants are dead and the test suite adequate Otherwise: Create more test cases and iterate 2-4 until a sufficiently high number of mutants is killed
Mutation Testing Assumptions Competent programmer hypothesis: Programs are nearly correct Real faults are small variations from the correct program => Mutants are reasonable models of real faulty programs Coupling effect hypothesis: Tests that find simple faults also find more complex faults Even if mutants are not perfect representatives of real faults, a test suite that kills mutants is good at finding real faults too
Mutation Testing Terminology Mutant new version of the program with a small deviation (=fault) from the original version Killed mutant new version detected by the test suite Live mutant new version not detected by the test suite
Examples of Mutation Operations Change relational operator (<,>, ) Change logical operator (II, &, ) Change arithmetic operator (*, +, -, ) Change constant name / value Change variable name / initialisation Change (or even delete) statement
Example Mutants if (a b) c = a + b; else c = 0; if (a b) c = a + b; else c = 0; if (a && b) c = a + b; else c = 0; if (a b) c = a * b; else c = 0;
Types of Mutants Stillborn mutants: Syntactically incorrect killed by compiler, e.g., x = a ++ b Trivial mutants: Killed by almost any test case Equivalent mutant: Always acts in the same behaviour as the original program, e.g., x = a + b and x = a (-b) None of the above are interesting from a mutation testing perspective Those mutants are interesting which behave differently than the original program, and we do not (yet) have test cases to identify them (i.e., to cover those specific changes)
Equivalent Mutants if (a == 2 && b == 2) c = a + b; else c = 0; int index=0; while (...) {...; index++; if (index==10) break; } if (a == 2 && b == 2) c = a * b; else c = 0; int index=0; while (...) {...; index++; if (index>=10) break; }
Program Example nbrs = new int[range] public int max(int[] a) { int imax := 0; for (int i = 1; i <= range; i++) if a[i] > a[imax] imax:= i; return imax; Program returns the index of the array element with the maximum value. a[0] a[1] a[2] max TC1 1 2 3 2 TC2 1 3 2 1 TC3 3 1 2 0 }
Program Example nbrs = new int[range] public int max(int[] a) { int imax := 0; for (int i = 1; i <= range; i++) if a[i] > a[imax] imax:= i; return imax; Program returns the index of the array element with the maximum value. a[0] a[1] a[2] max TC1 1 2 3 2 TC2 1 3 2 1 TC3 3 1 2 0 }
Variable Name Mutant nbrs = new int[range] public int max(int[] a) { int imax := 0; for (int i = 1; i <= range; i++) if i > a[imax] imax:= i; return imax; Program returns the index of the array element with the maximum value. a[0] a[1] a[2] max TC1 1 2 3 2 TC2 1 3 2 2 TC3 3 1 2 0 }
Relational Operator Mutant nbrs = new int[range] public int max(int[] a) { int imax := 0; for (int i = 1; i <= range; i++) if a[i] >= a[imax] imax:= i; return imax; a[0] a[1] a[2] max TC1 1 2 3 2 TC2 1 3 2 1 TC3 3 1 2 0 } Need a test case with two identical max entries in a[.], e.g., (1, 3, 3)
Variable Operator Mutant nbrs = new int[range] public int max(int[] a) { int imax := 0; for (int i = 0; i < range; i++) if a[i] > a[imax] imax:= i; return imax; a[0] a[1] a[2] max TC1 1 2 3 2 TC2 1 3 2 1 TC3 3 1 2 0 } Need a test case detecting wrong loop counting
Next 2 Weeks Lab 4: Must work in pairs to get full marks! Lab 4: Document Inspection and Defect Prediction Lecture 5: Test Lifecycle, Documentation and Organisation In addition to do: Read textbook chapters 10 and 12 (available via OIS)