UNIT OBJECTIVE. Understand what system testing entails Learn techniques for measuring system quality

SYSTEM TEST

UNIT OBJECTIVE Understand what system testing entails Learn techniques for measuring system quality

SYSTEM TEST 1. Focus is on integrating components and sub-systems to create the system 2. Testing checks on component compatibility, interactions, correctly passing information, and timing Unlike Unit Test Like Unit Test, activities focus on following uses and data 1. Typical 2. Boundaries 3. Outliers 4. Failures 1. Components may come from many, independent parties 2. Testing becomes a group activity 3. Bespoke development may meet Off-The- Shelf or reused components 4. Testing may move to an independent team altogether

Will be a complete and utter waste if components are not thoroughly tested

INTEGRATING MULTIPLE PARTIES MAY INTRODUCE CONFLICT System Integration 1. Components may come from multiple, possibly independent, parties 2. Bespoke development may meet Off-The-Shelf or reused components 3. Testing becomes a group activity 4. Testing may move to an independent team altogether Implications Who controls integration readiness? What does lab entry mean? Are COTS components trusted? How to assign credit for test results and then who is responsible for repairs? How to maintain momentum when everyone isn t at the table? When partner priorities are not shared? What about open source?

UNLIKE COMPONENTS, SYSTEMS HAVE EMERGENT BEHAVIOR Some behavior is only clear when you put components together This has to be tested too, although it can be very hard to plan in advance! Usually this is identified after the fact, and test suites/cases are refactored.

TESTING FOCUS Emphasizes component 1. Compatibility 2. Interactions 3. Information passing 4. Timing Integration aims to find misunderstandings one component introduces when it interacts with other components Use Cases are a useful testing model Sequence diagrams form a strong basis for designing these tests Articulates the inputs required and the expected behaviors and outputs

ITERATIVE DEVELOPMENT LEADS TO ITERATIVE TESTING Two senses 1. Create tests incrementally 2. Run tests iteratively On check-in and branch merge, test all affected modules On check-in, test all modules Per a schedule, test all modules Regression Testing E.g. daily Each change, especially after a bug fix, should mean adding at least one new test case It is always best to test after each change as completely as you can, and completely before a release

REGRESSION TESTING Reasserting that changes to system haven t broken previously working parts Changes include Enhancements/extensions to modules Patches Configuration changes Options 1. Retest everything 2. Retest using only selected tests Tester specifies what test cases/suites get run 3. Retest using priority orders Tester defines priorities against test cases/suites. Tests are run in a greedy manner for some time period. Priorities may be set up against specific system or module versions.

OBVIOUS ISSUE WITH REGRESSION TESTING It becomes very expensive as test suites grow and code change velocities are high. It may be that regression test cycles keep getting disrupted by continual code changes Is a great way to put idle machine cycles to work especially if you own the machines (versus renting them) Also as a consequence of the introduction of new bugs, program maintenance requires far more system testing per statement written than any other programming. Theoretically, after each fix one must run the entire batch of test cases previously run against the system, to ensure that it has not been damaged in an obscure way. In practice, such regression testing must indeed approximate this theoretical idea, and it is very costly. - F. Brooks, The Mythical Man Month

WHO GUARDS THE GUARDS? Software Quality Assurance depends on good testing processes and test suite definitions How do you know if the test suite is good?

YOUR TESTING IS GOOD ENOUGH UNTIL A PROBLEM SHOWS THAT IT IS NOT GOOD ENOUGH It is hard to know when you should feel enough confidence to release the system Confidence comes, in part, on the sub-test of possible tests selected Picking the Subset high 1. Selection based on policy Every statement must executed once Every path must be exercised Crafted by specific end user use cases (scenario testing) 2. Selection based on testing team experience 3. Run everything each time Software Quality low many defects found few defects found few defects found few defects found high Test Quality low

MUTATION TESTING Introduce small changes to the code and see if testing catches it Statement Mutation changes to lines of code (adds, removes, changes) Value Mutation- changes parameter values are modified Decision Mutation- changes to control statements Why? Intent is to identify regions of code that are not tested. And catch dumb mistyping related bugs. The focus checks the effectiveness and accuracy of the testing program If testing catches the mutant, it is considered killed (M K ) If testing does not catch the mutant, it is considered an mutant equivalent (M E ) Different code, same accepted result Measuring mutations (Mutation Score) MS = M K M T M E Anything less than MS = 1 is generally bad test suites need refactoring to cover M E Mutation Testing is a White Box Testing activity Original Mutant c = a * b; c = a + b; If (a == 10) { If (a == 11) { If (a == b) { If (a!= b) {

PUTS AND TAKES ON MUTATION TESTING Advantages Disadvantages Can nicely cover the original source Finds ambiguities in the source code May detect detect all the faults in the testing regime Originally proposed in the early 2000 s Back in vogue because of the explosion in compute capacity Mutation testing is extremely costly and time consuming to pursue Requires generating many mutant programs Each mutation runs the original test suite(s), which may involve many test cases or run tests that take a long time or both Creates a potentially huge number of test suites to run. Requires additional tech to manage mutation generation and detection

Defect Density DEFECT DENSITY Using the past to estimate the future Judges code stability by comparing past number of bugs per code measure to present measured levels BugDensity release i = BugsFound(prerelease i)+bugsfound(postrelease i ) CodeMeasure(release i ) Release i is a candidate for release if the following holds: min(bugdensity release 1..i 1 ) BugDensity release i max(bugdensity release 1..i 1 ) 10 8 6 7 Poor Software 9.5 Quality Expected Quality Poor Test Coverage/Quality 4 What this means: If density for the next release s additional code is within ranges of prior releases, it is a candidate for release 2 provided test or development practices haven t improved 0 Release 1 2

CODE MEASURES Approach 1: Count lines of Code Lines of code Count only executable lines. Count executable lines plus data definitions. Count executable lines, data definitions, and comments. Count executable lines, data definitions, comments Count lines as physical lines on an input screen. Count lines as terminated by logical delimiters. Count only shipped lines Count only new/changed lines Approach 2: Count function points Using a standard notion of a function/object/component, use a weighted function over Number of external interfaces Number of inputs Number of outputs Internal complexity Configuration complexity The weights may consider: Data communications Performance Transaction rate Reusability Installation ease Operational ease

MEASURING QUALITY: CAPTURE-RECAPTURE Applies estimating technique used in predicting wild-life populations (Humphrey, Introduction to Team Software Process, Addison Wesley, 2000) Example: Estimating Turtle Population (assuming turtles don t migrate) You catch and tag 5 turtles. Then, you release them. You later catch 10 turtles, and two have tags. Total # of turtles 5 turtles Total # of turtles = 10 turtles 2 turtles 10 turtles 5 turtles 2 turtles 25 Uses data collected by two or more independent collectors Collected via reviews or tests

CAPTURE-RECAPTURE Each collector finds some defects out of the total number of defects Some of these defects found will overlap Method 1. Count the number of defects found by each collector (n 1, n 2 ) 2. Count the number of intersecting defects found by each collector (m) When most findings overlap 3. Calculate defects found = (n 1 + n 2 ) m 4. Estimate total defects = (n 1 n 2 ) m 5. Estimate remaining defects remainder = (n 1 n 2 ) (n m 1 + n 2 ) m If multiple collectors, assign A to the highest collected number and set B to the rest of the collected defects. When multiple engineers find the same defect, count it just once. When there is little overlap Figures taken from http://leansoftwareengineering.com/2007/06/05/the-capturerecapture-code-inspection/

PERFORMANCE TESTING Measures the system s capacity to process a specific load over a specific time-span number of concurrent users specific number of concurrent transactions Involves creating and executing an operational profile that reflects the expected values of uses Ideally the system should degrade gracefully rather than collapse under load There are several types of performance tests 1. Load Aims to assess compliance with nonfunctional requirements 2. Stress Identifies system capacity limits 3. Spike Testing involving rapid swings in load 4. Endurance (or Soak) Continuous operation at a given load Under load, issues like protocol overhead or timing issues take center stage

Why do performance testing? Multiple Dimensions to Optimize The requirements demand demonstration that the system meets performance criteria Some performance related nonfunctional requirements exist It can compare two systems to find which performs better It can identify which parts of the system are the weak links It can identify workloads that cause the system to perform badly Typical optimizations focus on: 1. Throughput or Concurrency Getting the most data processed Greatest number of simultaneous transactions 2. Server response time 3. Service request round-trip time 4. Server utilization