Regression Testing Midterm Wednesday Oct. 27, 7pm, room 142 In class, closed book eam Includes all the material covered up (but not including) symbolic eecution Need to understand the concepts, know the basic terminology Answer questions about midterm week before eam
Reading assignment: Invariant Based Testing Michael D. Ernst, Jake Cockrell, William G. Griswold, and David Notkin, Dynamically Discovering Likely Program Invariants to Support Program Evolution, IEEE Transactions on Software Engineering, vol. 27, no. 2, Feb. 2001, pp. 99-123. Michael Harder, Jeff Mellen, and Michael D. Ernst, Improving Test Suites via Operational Abstraction'' 25th International Conference on Software Engineering, Portland, Oregon, May 2003, pp. 60-71. Last reading assignment before eam Why is regression testing a problem? Large systems can take a l.o.n.g time to retest e.g., 6 months of regression testing before every release Sometimes it is difficult and time consuming to create the tests Sometimes it is difficult and time consuming to evaluate the tests e.g., may not be able to automatically determine if the results are correct for eisting and/or new test cases may require a person in the loop (GUI and simulation eamples) to create and evaluate the results Cost of testing can prevent software improvements
Regression Testing Primarily selecting from eisting test cases Plus, adding some new test cases Perhaps, deleting or updating some old test cases Usually view this as deletion plus addition trying to instill confidence that changes are correct new functionality and corrected/modified functionality behave as they should unchanged functionality is indeed unchanged Automated support for Regression testing Test environment or infrastructure support Specification of test cases and results E.g., Junit Capture and replay Especially for GUI components Test data selection support Select a subset of the eisting test cases Based on what? Select new test data to eercise new functionality Based on what? Typically coverage criteria and functional test cases
Steps in regression testing Given: a program P originally tested with test set T producing results R and a modified version of the program P Identify the changes to P Select T a subset of T to re-eecute P Test P with T and reestablish correctness of P with respect to T Create new tests T as necessary Regression Testing Selection Criteria Rothermel and Harrold Developed a framework for analytically comparing regression testing criteria Later used this framework to eperimentally compare regression testing criteria
Regression Testing Criteria a test case t T is fault-revealing if it produces incorrect outputs for P In general, can not determine which elements of T are fault revealing a test case t T is modification-revealing if it produces different outputs for P than for P Modification-revealing test cases over-approimates the fault revealing test cases In general, can not determine which elements of T are modification revealing a test case t T is modification-traversing if it eecutes a statement in P that has changed Modification-traversing over-approimates modification revealing Can be computed Retest selection Some Alternatives: Fault revealing (Conservative and Precise) -impossible to compute Modification revealing (Conservative, but not precise) -also impossible to compute Modification traversing -easier to compute Retest all (Conservative, but not precise )- trivial to compute
SAFE Regression Testing Criteria If the selection criterion is safe,then all of the eisting test cases that could epose a fault have been selected In other words, (T - T ) cannot uncover any faults in the system For t (T - T ), then either t is no longer in the domain or Statements eecuted by t are not impacted by the changes to the code I.e., no statements in the eecution of these test cases were changed An Empirical Study T. L. Graves, M. J. Harrold, J. M. Kim, A. Porter and G. Rothermel, "An Empirical Study of Regression Test Selection Techniques," ACM Transactions on Software Engineering and Methodology, 10 (2), April 2001, pp. 184-208. Eperiment to evaluate Fault detection effectiveness Regression testing is usually not more effective than the original test set Retest-all has good fault detection effectiveness, but may not be cost effective Cost effectiveness Are there techniques that have the similar fault detection effectiveness but the cost of the analysis is significantly less than the test cases it eliminates Cost to compute T versus the cost of eecuting (T-T ) assuming little or no loss in effectiveness
Program studied 7 C++ programs from Siemens 138-516 LOCs Many versions of each 9-41 versions Each version had one seeded fault 2 larger programs 6 Klocs/ 33 versions/multiple faults 49 Klocs/ 5 versions/ multiple faults Test pools, test suites, test cases Test pools Test cases with known edge coverage 1000 edge-coverage test suites selected from the pool randomly Selected test cases to achieve edge-coverage Assume n k test cases needed for the kth suite 1000 non-edge coverage test suites Selected randomly from the pool Kth test suite has n k test cases, so non-edge coverage has a buddy edge-coverage test suite of the same size
Regression testing techniques studied Minimization - select test cases from the test suite so that every edge or node associated with the change is eercised often resulted in a single test case Safe - every test case in a suite that eercises a statement that has been deleted, modified, or is new How do we know if a test case will eercise new statements? Dealing with new code a c f
Dealing with new code a b Test cases that eercise any of the immediate predecessor nodes of a new statement are assumed to eercise the new statement c f Regression testing techniques studied (continued) Data flow-every test case in a test suite that eercises a def-use pair affected by a deleted or modified statement Not quite safe Not full dependence Random-select 25%/50%/75% of the test cases in a suite chosen randomly Retest-all
Test case size reduction Random and test-all select a test suite size that is 25%, 50%, 75%, and 100%, respectively, of T by definition Minimization: ~1% test suite size Safe:~60% test suite size Data flow: 54% test suite size Fault detection effectiveness For minimization, random, and test-all The larger the test suite size the better the fault detection Improvement diminishes as the % gets higher testall 100% random75% random 50% random25% minimization 20 40 60 80 100 Effectiveness (%)
Fault detection effectiveness: safe Safe test suite size averaged 60% of original, but only performed slightly better than random(75%) There was significant variance in the test suite reduction Some programs resulted in almost no reduction in original test suite size Larger programs tended to have a larger reduction in the test suite size for some programs the payoff was significant best case: 5% of the test cases were required Size reduction often depended on where a change was located a Changes to leaf nodes/ components required few test cases b d c e Changes to root node/ component required all test cases f
As Reported by Microsoft for Regression Testing To perform test selection, CRANE uses Microsoft s Echelon test prioritization scheme [22]. Echelon analyzes differences between two binaries, at a basic binary block level, and then uses previously archived code coverage information to identify tests that will trigger eecution through maimum number of changed binary blocks. Echelon prioritizes the selected tests by changed blocks covered per test cost unit ratio. Tests that add more coverage to the changed code per unit of effort will end up at top of the list. At this time, we use Echelon as a test prioritization rather than pure test selection tool i.e. we do not recommend that only the selected tests be eecuted on a fi but rather that they are run first. Another comment about MS processes, describing a regression testing tool, CRANE For the situations where certain portions of changed code will be identified as not covered through eisting tests, these test gaps are an important indicator of test cost ideally all changed code should be eecuted before the release therefore new tests need to be defined and run. In our tool, a source level view of changes represents this information in a form of green (covered by eisting tests) and red (not covered) coloring of all changed lines of code. Our recommendation is that all currently uncovered parts of code have tests developed and eecuted for them.
Suggested regression testing process Determine the set of safe test cases, T safe This can be run as a background job Cost is basically irrelevant Select from these safe test cases, T selected % selected depends on resources available Other studies have suggested selecting the largest changed files Rerun selected test cases Cost of eecuting test cases Cost of evaluating the results Can often be automated Select new test cases to eercise new functionality In background, may want to rerun (T safe -T selected ) Recompute coverage Regression Testing Conclusion Regression testing is a serious problem for some systems Want to reduce test suite size but not fault detection Need prediction models to select test cases Prediction models Safe selection techniques might still return too many test cases Need to combine with prioritization techniques Simple selection techniques, such as LOC and change info, might be sufficient