Multivariate Visualization in Observation-Based Testing

Size: px
Start display at page:

Download "Multivariate Visualization in Observation-Based Testing"

Transcription

1 Multivariate Visualization in Observation-Based Testing David Leon, Andy Podgurski, and Lee J. White Electrical Engineering and Computer Science Department Case Western Reserve University Olin Building Cleveland, Ohio USA , , dzl@po.cwru.edu, andy@eecs.cwru.edu, leew@eecs.cwru.edu ABSTRACT We explore the use of multivariate visualization techniques to support a new approach to test data selection, called observation-based testing. Applications of multivariate visualization are described, including: evaluating and improving synthetic tests; filtering regression test suites; filtering captured operational executions; comparing test suites; and assessing bug reports. These applications are illustrated by the use of correspondence analysis to analyze test inputs for the GNU GCC compiler. Keywords Software testing, observation-based testing, multivariate visualization, multivariate data analysis, data visualization, correspondence analysis. 1 INTRODUCTION The traditional paradigm for testing software is to construct test cases that cause runtime events that are likely to reveal certain kinds of defects if they are present. Examples of such events include: the use of program features; execution of statements, branches, loops, functions, or other program elements; flow of data between statements or procedures; program variables taking on boundary values or other special values; message passing between objects or processes; GUI events; and synchronization events. It is generally feasible to construct test cases to induce events of interest if the events involve a program s external interfaces, as in functional testing (black-box testing, specification-based testing). However, it often extremely difficult to create tests that induce specific events internal to a program, as required in structural testing (glass-box testing, code-based testing). For this reason functional testing is the primary form of testing used in practice. Structural testing, if it is employed at all, usually takes the form of assessing the degree of structural coverage Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. ICSE 2000, Limerick, Ireland ACM /00/06 $5.00 achieved by functional tests, that is, the extent to which the tests induce certain internal events. Structural coverage is assessed by profiling the executions induced by functional tests, that is, by instrumenting or monitoring the program under test in order to collect data about the degree of coverage achieved. If necessary, the functional tests are augmented in an ad hoc manner to improve structural coverage. The difficulty of constructing test data to induce internal program events suggests an alternative paradigm for testing software. This form of testing, which we call observationbased testing, emphasizes what is relatively easy to do and de-emphasizes what is difficult to do. It calls for first obtaining a large amount of potential test data as expeditiously as possible, e.g., by constructing functional tests, simulating usage scenarios, capturing operational inputs, or reusing existing test suites. The potential test data is then used to run a version of the software under test that has been instrumented to produce execution profiles characterizing the program s internal events. Next, the potential test data and/or the profiles it induces are analyzed, in order to filter the test data: select a smaller set of test data that induces events of interest or that has other desirable properties. To enable large volumes of potential test data to be analyzed inexpensively, the analysis techniques that are used must be fully or partially automated. Finally, the output resulting from the selected tests is checked for conformance to requirements. This last step typically requires manual effort either in checking actual output or in determining expected output. Many forms of execution profiling can be used in observation-based testing. For example, one may record the occurrences of any of the kinds of program events that have traditionally been of interest in testing. Typically, a profile takes the form of a vector of event counts, although other forms, such as a call graph, may be used in observation-based testing. Since execution profiles are often very large ones with thousands of event counts are common automated help is essential for analyzing them. In structural testing, profiles are usually summarized by computing simple coverage measures, such as the number 116

2 of program statements that were executed at least once during testing. However more sophisticated multivariate data analysis techniques can extract additional information from profile data. For example, [10] and [11] report experiments in which automatic cluster analysis of branch traversal profiles, used together with stratified random sampling, increased the accuracy of software reliability estimates, because it tended to isolate failures in small clusters. Among the most promising multivariate data analysis techniques for use in observation-based testing are multivariate visualization techniques like correspondence analysis and multidimensional scaling. In essence, these computer-intensive techniques project many-dimensional execution profiles onto a two dimensional display, producing a scatter plot that preserves important relationships between the profiles. This permits a human user to visually observe these relationships and, with the aid of interactive tools, to explore their significance for software testing. We present the initial results of a project whose goal is to explore the potential applications of multivariate visualization techniques to software testing and ultimately to develop a methodology for employing them. Section 2 gives an overview of two applicable multivariate visualization techniques: correspondence analysis and multidimensional scaling. Section 3 describes several applications of multivariate visualization in observation based testing. Section 4 presents a case study in which correspondence analysis is applied to a large data set. Future research is discussed in Section 5. Related work is surveyed in Section 6. Section 7 is the Conclusion. 2 VISUALIZATION TECHNIQUES In this section, we present a brief overview of two multivariate visualization techniques that are applicable to observation-based software testing: correspondence analysis and multidimensional scaling. Both techniques are distinguished by their ability to handle data of large volume and high dimensionality. Correspondence analysis is used in the case study described in Section 4. Correspondence Analysis Correspondence analysis is one of many names for a data analysis and visualization technique that has been independently discovered in many fields [3]. It is used to analyze an n-dimensional data set and represent it in few dimensions with the least possible loss of information. This representation allows the user to visually analyze the relationships between different data points, and also between these points and the original dimensions. One way to think about calculating the correspondence analysis display is to first fit a line through the n- dimensional space, so as to maximize the variance of the points along the direction of the line. Then the coordinates of the points are modified so as to take away any variance along this direction, and the process is repeated. The projections of the original points onto the fitted lines (axes) correspond to the points coordinates on the display. The displayed points are called row points, because they correspond to rows in the data matrix. Correspondence analysis assigns weights to both points and dimensions, e.g., to compensate for differences in measurement units. Display points representing the original dimensions, which are called column points, can also be displayed in order to show how the position of row points is affected by various dimensions. Correspondence analysis is usually computed using singular value decomposition (SVD) [7]. SVD is a well-known matrix decomposition technique for which there are efficient, highly-optimized algorithms. In observation-based testing, the input data for correspondence analysis is a matrix where each row corresponds to an execution of the software under test, and each column corresponds to a profile feature. Each row of the matrix can be considered as the coordinates of a point in a space with as many dimensions as there are profile features. Analyzing this matrix with correspondence analysis yields low-dimensional displays that show the relationships between the test points. Multidimensional Scaling Multidimensional scaling is the name for a family of techniques used to project a set of n-dimensional data points onto a plane, given only a matrix of dissimilarities between the points [1]. This matrix is computed from a data matrix by applying a dissimilarity metric (e.g., Euclidean or Manhattan distance) to each pair of rows. The positions of the points on the display are such that the distance between any pair of points reflects as closely as possible the degree of dissimilarity between them. A simple approach is to select a starting configuration of points, and evaluate how closely this approximates the input. The points are then moved to decrease the error, and the process is repeated until a minimum is found. This results in a very flexible approach to finding a display, since there are different ways of calculating the dissimilarity matrix, computing the error, and finding a solution. 3 APPLICATIONS Multivariate visualization techniques like correspondence analysis and multidimensional scaling enable a tester to visualize the distribution of execution profiles induced by a set of potential test cases. They also reveal significant features of the corresponding population of executions. 1 Typical examples of such features include unusual executions, clusters of similar executions, and regions of the profile space without any executions. In addition, visualization often reveals other features of an execution 1 These should not be confused with profile features

3 population that are visually striking but whose significance is not immediately obvious. Upon further investigation, such features may turn out to be significant for testing, as we shall see in Section 4. Besides revealing the distribution and features of an execution population, multivariate visualization techniques provide means of comparing two or more execution populations and of relating an individual execution to other executions. These capabilities of multivariate visualization techniques have several applications in observation-based testing, which are described in the remainder of this section. These include: evaluating synthetic test data; filtering regression test suites; filtering captured operational executions; comparing test suites; and assessing bug reports. Evaluating Synthetic Test Data Multivariate visualization techniques can be used to evaluate and improve a set of test data derived synthetically, e.g., by constructing functional tests or by simulating usage scenarios. As mentioned in the Introduction, it is customary to evaluate such test data by measuring the degree of structural coverage it achieves. Visualization techniques can go further by revealing relationships among test cases. Because the displays produced by these techniques are computed based on all columns of a data matrix, they can reveal relationships involving multiple event counts. An outlier or isolated point in a display indicates a test case that induces unusual behavior. Such test cases are usually desirable, because they exercise aspects of a program not exercised by other test cases. A dense cluster of points in a display suggests that the corresponding test data is redundant with respect to the kinds of events that have been profiled. It may be beneficial to eliminate most of the redundant tests and replace them with more varied ones. In order to decide whether this is appropriate, it is necessary to carry out more detailed analysis of the cluster, for example, by examining other views of the data or by using a different form of profiling. It is desirable for a visualization tool for use in software testing to support such analysis interactively. An empty region R in the display may indicate that the test set fails to exercise important behaviors of the software under test. In this case, it is desirable to augment the test set with one or more tests whose profiles yield points in R when displayed. However, it is possible there are no inputs to the software that will produce profiles in R, and in general it is undecidable whether there is any input to a program that will produce a profile with specified values. One approach to augmenting a test set is to trawl for suitable inputs: obtain additional inputs from any source (e.g., from beta testing); execute the software on them to produce new profiles; display the new profiles together with the original ones; and observe whether any of the new points fall into region R. Another approach to augmenting 3 a test set is to obtain a characterization of the kinds of profiles that would yield points in R and then attempt to construct one or more test cases that produce such profiles. Ideally, a visualization tool for use in testing would produce such a characterization of a display region on request. A display may reveal features of a test set whose explanation requires further investigation. These may be indicated by point sets with distinctive shapes. For example, the displays shown in Section 4 exhibit linear, curvilinear, and triangular point sets, among others. To determine whether the test set adequately exercises the software under test, it may be important to understand the factors that underlie such display features. To discover these factors it may suffice to examine the documentation for the corresponding test cases. If this does not exist or is not sufficient, it is necessary to conduct a detailed analysis of the profiles produced by the test cases and of the software under test. Because correspondence analysis can simultaneously display points representing the rows of the data matrix and points representing the columns, it can reveal the extent to which the position of a point is explained by individual profile features. Essentially, a row point is attracted to column points corresponding to the most prominent features in its profile (those with high values). By considering the affinity of row points for certain column points, the tester can gain an understanding of which events characterize a test case. A set of column points that are close together corresponds to a set of profile features (event counts) that are correlated with each other. The correlations between these features might be explained by an unobserved, latent variable or factor (in the sense of factor analysis [8]). This can be confirmed only by detailed analysis of the software under test. Filtering Regression Test Suites A notable special case of evaluating synthetic test data is analyzing a regression test suite in order to eliminate redundant test cases or to add new tests necessary to exercise new features of the software under test. Several authors have proposed techniques for identifying a minimal or safe subset of a regression test suite, e.g., see [5,13]. For this to be worthwhile, the cost of the analysis it entails must be less than the cost of running and evaluating the tests that are eliminated. Multivariate visualization techniques are applicable to this problem when the cost of evaluating regression tests dominates the cost of running them, e.g., because the tests must be evaluated manually. To apply these techniques, the executions induced by the original test suite must be profiled. Visualization is then used to select a subset that spans the range of tests in the original suite but which contains no redundant tests. This is done as follows. All outliers or isolated points are selected. One representative is chosen from each roughly elliptical cluster 118

4 of points. With other features of the display, one representative is selected from each region and extremity of the feature. Multivariate visualization techniques permit a variety of criteria to be used in filtering regression tests, since they can be used with any kind of profile. Filtering Captured Operational Inputs A serious problem with synthetic test data is that it does not reflect the way that the software under test will be used in the field. Even if it reveals defects, it may not reveal those having a significant impact on the software s reliability as it is perceived by users. By contrast, operational testing (beta testing, field testing) does reflect the way software is used in the field, and it also may reduce the amount of inhouse testing (alpha testing) software developers must do. In operational testing, the software to be tested is provided to some of its intended users to employ as they see fit over an extended period. The advantages of operational testing are somewhat offset by the fact that beta users often fail to observe or report failures, because they are unfamiliar with the software s specification and because testing is not their primary occupation. This problem can be addressed by using a capture/replay tool to capture executions in the field, so they can later be replayed and examined in detail by trained testing personnel. If many executions are captured, it may be practical to examine only a fraction of them in this way. Rather than examining a random sample of executions, it is desirable to filter the captured sample to identify executions with unusual characteristics that may be associated with failure. Multivariate visualizations can be used to filter operational executions in much the same way they can be used to filter regression test suites. Comparing Potential Test Suites Multivariate visualization techniques also provide a means of comparing test suites derived in different ways. As such, they are potentially useful both to practitioners and to researchers. For example, a set of synthetically generated tests can be compared with captured beta-test executions in order to see how well the synthetic tests approximate operational usage. Such a comparison might be used to modify testing procedures to better reflect patterns of operational usage. Captured executions obtained from different user populations can be compared visually in order to understand differences in their usage patterns that should be addressed in future testing. Assessing Bug Reports Software development organizations often have such a backlog of bug reports about a product that when a new report comes in, they cannot address it immediately. Rather, they must prioritize it and focus on repairing the high-priority bugs first. Multivariate visualization provides a means by which a developer can gain insight into the significance of a newly reported bug. This requires the developer to maintain a large repository of operational executions of the product, captured from a random sample of user sites. If a new bug report includes an input that elicits a failure, the execution E the input induces can be profiled, and this profile can be displayed together with profiles of the captured executions in the repository. The executions that are close to E in the display can then be identified, replayed, and examined to determine if they also fail in the same way. If the repository reflects the way the software is used in the field, this procedure will indicate the relative frequency with which the bug causes failures in the field. 4 CASE STUDY In this section we present a case study illustrating several of the applications of multivariate visualization described in Section 3. In this case study, correspondence analysis is used to analyze two sets of inputs to the C-language compiler of the GNU Compiler Collection (GCC), version [2]. The profile data consists of function call counts as reported by GNU's function coverage profiler, gprof. That is, each time the compiler was executed, the number of times each of the functions of the compiler was called was recorded. The execution platform was a Sun Ultra 5 workstation, running SunOS 5.7. One set of inputs was the test suite for GCC 2.95 (The test suite for was not publicly available at the time.) This set of inputs executed the C compiler 6064 times, yielding just as many profiles. A second set of inputs was included to compare with this test suite. It consists of publicly available programs for which the source code is also available. Most of them come from either the GNU project or the X Windows consortium X.Org [14]. These programs were selected to represent a wide variety of applications. Among them are some file, shell, and compression utilities, a compiler (GCC), a debugger (gdb), a text editor (Emacs), some X windows programs, an AI program (GNU Chess), and some network daemons and clients. A total of 32 programs were included, adding 1807 more compilations, for a total of 7871 executions. For all of these programs, the default make files were used, including optimization choices, etc. A total of 2370 different functions were called during at least one of these executions. The result was a data matrix of 7871 rows by 2370 columns. Calculating the correspondence analysis display for this data set takes roughly one hour on a Pentium III 450 with 256 MB of memory. Interpreting the Correspondence Analysis Display Once correspondence analysis has been carried out, one ends up with a set of axes and the data points coordinates on these axes. The axes are called principal axes. They are ordered with respect to the amount of variance in the data they account for, with the most important being the first principal axis. The plane defined by the first and second principal axes is called the first principal plane; the third and fourth principal axis make the second principal plane, and so on. It is also possible to take different pairs of axes, 4 119

5 (a) (b) Figure 1: (a) First principal plane for a subset of the GCC data, including row points (round) and column points (squares). (b) Names of the functions for the lower cluster of column points. though the planes they define have no special names. Figure 1a shows the first principal plane for a subset of the GCC data. This is a scatter plot using the first principal axis as the x axis and the second as the y axis. Each round point in this figure corresponds to an execution of the GCC compiler. The distances between points in the figure reflect the n-dimensional distance between the corresponding profiles. That is, if two points are far apart in the display, then the corresponding executions have very different profiles. As mentioned previously, correspondence analysis can also provide information about the reasons for a point s location in the display. That is, it identifies which of the point s features were most important in determining its placement. Consider Figure 1a. This contains two sets of points: the round points correspond to row points (test cases) whereas the square points are column points. That is, each of those square points correspond to one column of the input matrix, which in turn represents one of the features being profiled, in this case one function. The column points provide two pieces of information. First, the distances between column points can be interpreted in the same way as those for row points. That is, if two column points are far apart, they are essentially unrelated, while points that are close together represent a set of functions whose call counts were linearly related in all runs. These sets of related functions are called factors (see Section 3). On the other hand when two functions are related in a non-linear way, they will be separated, and the row points will be arranged in a curve between these two column points, representing the relationship between these functions. In Figure 1a, one can see a few different factors in the display, one to the left, one on the bottom, and some others around the cloud of row points. The names of the functions in the box can be seen in Figure 1b. The second piece of information given by the column points arises 5 from their relationship with the row points. If a row point is close to a column point, it means that the execution represented by the row point used the function represented by the column point more often than average. For example, the executions on the bottom of Figure 1a used the functions listed in Figure 1b very often. Any row point can be interpreted as a linear combination of column points, since the location of each row point corresponds to how many times each of the functions was called. This means that the position of a row point in the display is a weighted average of the position of all column points. For example, if an execution used only functions f and g, and it used them the same number of times, its point would be halfway between f s point and g s point. A point that is very close to the origin is an execution that did not stress the functions that are represented in that plane. On different principal planes, they might be very far apart from the origin, since different factors will be seen. Finding Unusual Executions By looking at the correspondence analysis display of the GCC data, one can immediately identify some points that are far away from all the others. This means that GCC s behavior during these executions was remarkably different from its behavior during most others. For example, Figure 2 displays the first principal plane of the correspondence analysis. There is an outlier on the right side of the display, very far from the rest of the points. This indicates a very special execution, which made GCC behave in an interesting way. This point, and all such points, should be chosen as test cases worth looking at. After an outlier has been identified, the display is recalculated without taking that point into consideration. The reason for this step is that an outlier greatly influences the display, which makes the representation of all other points less accurate. After recalculating, it is possible to 120

6 Figure 2: First principal plane of the GCC data set. Initial display see more of the structure of the data, as shown in Figure 3. After this step, one can look for more outliers. If none are found in the first principal plane, one can look for them in other principal planes. For example, there are no obvious outliers in Figure 3. But looking at the second principal plane (Figure 4), one can see another outlier. This is because even though the first principal plane is free of outliers, it does not mean there are no points influencing the remaining dimensions. Fortunately, there is a technique related to correspondence analysis, called jackknifing, that identifies outliers automatically. Jackknifing checks each point to determine whether removing it would cause a principal axis to rotate more that 45 degrees. If so, that point is considered an outlier and is actually removed. A new display is then calculated and the process is repeated to identify all outliers. Although jackknifing is very fast, recomputing the display is more expensive. The whole process can be run Figure 3: First principal plane after removing one outlier overnight and requires no human intervention. Figure 5 shows the display after removing outliers 24 times in this manner. Eighty-two outliers were identified and removed in total. (More than one outlier can be identified in any one step.) A problem occurs when a plane has a small cluster of points that is distant from the rest. For example, as will be seen below, Figure 8 contains a small cluster of points on the right. Technically none of these points is an outlier, since they are not unique there are four of them in that region. One may instead label the entire cluster as an outlier and remove all its points. Instead of checking all of the executions in the cluster for conformance to requirements, one or two of them might be checked instead. Outlier clusters have an adverse effect on the jackknifing algorithm. Removing any of the points by itself might not influence the display, so none of these points will be labeled as an outlier. To date, our only means of Figure 4: Second principal plane after removing one outlier. Figure 5: First principal plane after 24 iterations of the jackknifing algorithm. Shaded according to optimization level 6 121

7 (a) (b) Figure 6: First principal plane after 24 iterations of the jackknifing algorithm. Shaded according to optimization level. (a) shows only points representing test suite runs, (b) shows only points representing user programs. identifying clusters of outliers is by visual inspection. It is behavior. better to do this before running the jackknifing algorithm, Another way to approach this problem is by assuming there so this algorithm can run on a more accurate display. is no test suite and picking a set of test cases from the Figure 9 shows the result of removing the outlier cluster executions in Figure 5. Intuitively, one or more test cases from Figure 8. should be picked from each region, since executions in Comparing and Augmenting Test Suites different regions have different behavior. One can also The GCC test suite turns out not to cover all of the look at the features of the population and select tests from functions of the compiler. Just by adding the user each cluster, etc. This process is repeated for subsequent programs, 27 functions were executed that were not principal planes, being careful to mark the tests that have exercised by the test suite. This suggests that the test suite been chosen from higher planes. This way more features might be improved by adding more test cases. Figure 5 can be taken into account, without introducing redundant shows the correspondence analysis for the whole GCC data tests. set. This includes both the existing GCC test suite and the Identifying Significant Features in the Display set of user programs compiled for comparison. To make it By looking at Figure 5, one can see distinct features in the easier to differentiate between these two data sets, Figure display. These correspond to ways in which the factors in 6a shows the same display, with only the test suite points the display are related in the population. For example, plotted, and Figure 6b plots only the user program points. consider the dark points in Figure 6a. All of these points Putting the two together yields Figure 5. exhibit a similar correlation between the factor at the Considering figures 6a and 6b, it is obvious that the test bottom of the display and the factor at the left. This means suite compilations behave very differently than user that for these points, the two factors have an inverse linear program compilations. In particular, the test suite programs relationship. But notice that this correlation is different for seem to concentrate towards the top left part of the display, different shades, while it is basically nonexistent for the while the user programs are closer to the bottom of the points in Figure 6b. This means that these different groups display. This confirms the common knowledge that handmade test cases may not reflect the way the program is different ways, showing that the relationships between the of executions of the compiler behave in remarkably actually utilized by end users. components of the compiler are not static but depend on the specific attributes of the input. Moreover, this shows that Examining these pictures, one can see that the test suite these different behaviors are discrete, since there is stresses the compiler functions at the top left of the display. essentially no middle ground. When selecting test cases, it There are a few user programs that lie in that region, so it is is desirable to test all such program behaviors, and a good thing that region has been tested. On the other therefore one needs to select tests from each region of the hand, many user programs lie in a region at the bottom of display. the display, where there are no test suite programs covering that combination of functions. This indicates that the test In order to understand this data set better, it was necessary suite could be augmented by adding one or more of these to determine what these regions represent. It was first user programs, or by designing tests that produce this noticed that there were differences between test suite 7 122

8 Figure 7: First principal plane when analyzing only runs with no optimization. Black points indicate test suite runs. Light points indicate user programs programs and user programs. Then, different explanations were explored for the regions in the display. In the end, the simplest one proved to be appropriate. The regions represent the level of optimization of the compiler during that execution. Figures 5, 6a and 6b are shaded according to optimization levels, with black being runs that were not optimized. Different optimization levels require different behaviors from the compiler, and this fact is represented in the display. Selecting test cases from each region of the display therefore ensures that each optimization level of the compiler is tested. Although the need for such tests should be obvious to someone knowledgeable about compilers, the test selection procedure we have described does not require such knowledge. This is especially important when the workings of the software under test are poorly understood. Once we have determined that executions with and without optimization behave very differently, we can choose to Figure 9: First principal plane after removing the first cluster of outliers from the set of points with optimization. 8 Figure 8: First principal plane when analyzing only tests with optimization enabled. examine them separately. Figure 7 shows the first principal plane for points without optimization, and Figure 8 shows the points with optimization. Unlike figures 6a and 6b, these displays are calculated separately, so there is no relationship between the displays in figures 7 and 8. Figure 7 shows that test suite and user programs are still fairly disjoint. Additional features can be seen in the data, both linear and otherwise. Again, this suggests there are different kinds of executions that should be checked. Figure 8 shows a cluster of points to the right. It can be seen that these points stress some aspect of the compiler in a special way. It turns out that these points correspond to test cases for testing the built-in memcpy functions, using high optimization. So again, the display suggests meaningful test cases. After removing that cluster of outliers, we get Figure 9. Eliminating Redundant Executions The correspondence analysis display can point a tester to sets of tests that might be redundant. Tests that are far apart in the display have very different profiles. Likewise, tests that are close together have similar profiles. One has to be careful, though. Similarity in this case does not mean equality. It simply indicates that the tests coincide with respect to the factors displayed in the current principal plane. Therefore, when looking for similar executions, it is necessary to check whether the points coincide in several planes. Once a group of seemingly similar executions has been found, it is possible to further analyze it to establish which points really are redundant. This can be done by comparing the profiles with standard statistical techniques, or by examining this smaller group again with correspondence analysis or another multivariate visualization technique. Using correspondence analysis on this smaller set of data would allow the tester to look at the differences between its points without the display being influenced by the differences between the other points. Moreover, looking at the display's column points allows the 123

9 Figure 10: Binary tree data set. Light points indicate failures. tester to see what these differences are and decide whether they are meaningful, or whether the tests really are redundant. 5 FUTURE RESEARCH In order to develop a comprehensive methodology for the use of multivariate visualization in observation-based testing, it is necessary to understand how visualizations are affected by the type of software being studied, the types of defects it contains, the type of profiles that are generated, and the type of visualization technique that is employed. We therefore plan to conduct a substantial empirical study that will address these issues. In this study, different visualization techniques are to be used to analyze a variety of profile types from several representative types of software. One particularly important issue is the extent to which multivariate visualization can distinguish executions that exhibit failures from other executions. The case study presented in Section 4 did not address this issue, because we did not know which, if any, inputs caused the GCC compiler to fail. However, we have preliminary evidence from other programs that multivariate visualization can distinguish failure behavior. Consider for example the correspondence analysis display in Figure 10. It was computed from line-count profiles of a simple program that implements a binary search tree. During execution of this program, a number of key-value pairs are inserted and deleted, and the results are printed out at intervals. The program contains a defect that was deliberately placed in its deletion routine. A second program was used to check whether the tree program s output was correct. The black points in Figure 10 correspond to successful executions, while the light points indicate failed executions. This display clearly separates many failures from successful executions. Consequently, if tests are selected from the major regions of the point cloud, the defect is certain to be 9 revealed. Interestingly, statement coverage would not necessarily reveal this defect, because the defect affects only deletions of internal nodes and causes a failure only if the tree s contents are printed out soon after an internal node is deleted. (Subsequent deletions can cause coincidental correctness.) In Figure 10, the left corner of the point cloud corresponds to executions with only insertions; on the right are executions with many deletions. On the top are executions with insertions and deletions of internal nodes. We shall investigate whether the separation of failures and successful executions exhibited by the binary tree data set is common with other, more substantial programs and different forms of profiling. 6 RELATED WORK A number of authors have addressed topics closely related to observation-based testing and multivariate visualization of execution profiles. Hanson et al describe the use of several multivariate analysis techniques, including star plots and principal components analysis, for studying usage of UNIX operating system commands in support of userinterface enhancement [4]. Podgurski et al examine the use of the multivariate analysis technique cluster analysis for improving the accuracy/efficiency of software reliability estimation [10,11]. In this application, program executions are captured in the field and later replayed and profiled. The profiles are then clustered to obtain a stratified sampling design for estimating reliability. Podgurski et al report experiments in which cluster analysis of branchtraversal profiles permitted the failure frequency of several programs to be estimated, using stratified random sampling, more accurately than it could be with simple random sampling. This result was explained by the fact that cluster analysis of profiles isolated some program failures in very small clusters. Reps et al explore the use of a type of execution profile called a path spectrum for discovering Year 2000 problems and related issues, and they propose several other applications of path spectra to software maintenance and testing [12]. They also describe a prototype system called DynaDiff for comparing path spectra, which produces graphical representations of individual spectra to facilitate their comparison. Harrold et al evaluated several types of program spectra (profiles) empirically, to determine how well they indicate the occurrence of execution failures [6]. They observed that failures were likely to be indicated by differences in complete-path spectra, path-count spectra, and branch-count spectra. They also observed that differences in such spectra are more likely to indicate failures than are differences in execution-trace spectra. Pavlopoulou and Young describe how monitoring residual test coverage in software that is deployed or undergoing beta testing can be used to validate the thoroughness of testing in the development environment [9]. They describe 124

10 a prototype system that monitors residual statement coverage in Java programs and they present performance measurements that suggest the performance impact of monitoring is acceptable. 7 CONCLUSION We have described a new approach to testing, called observation-based testing, which calls for obtaining a large amount of potential test data as expeditiously as possible and then filtering that data to obtain a much smaller subset on which to actually evaluate the software under test. Filtering potential test cases involves profiling the executions they induce and then analyzing the resulting profiles with automated help. We have proposed the use of multivariate visualization techniques for analyzing profiles, described several applications of the techniques, and presented a case study in which correspondence analysis was used to analyze potential tests cases for the GCC compiler. REFERENCES 1. Borg, I. and Groenen, P. Modern Multidimensional Scaling: Theory and Applications, Springer, GCC. The GCC Home Page, Free Software Foundation, Greenacre, M.J. Theory and Applications of Correspondence Analysis, Academic Press, Hanson, S.J., Kraut, R.E, and Farber, J.M. Interface design and multivariate analysis of UNIX command use. ACM Transactions on Office Information Systems 2, 1 (March 1984), Harrold, M.J., Gupta, R., and Soffa, M.L. A methodology for controlling the size of a test suite. ACM Transactions on Software Engineering and Methodology 2, 3 (July 1993), Harrold, M.J., Rothermel, G., Wu, R., and Yi, L. An empirical investigation of program spectra. ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (Montreal, Canada, June 1998), Harville, D.A. Matrix Algebra From a Statistician s Perspective. Springer-Verlag, Krzanowski, W.J. Principles of Multivariate Analysis: A User s Perspective, Oxford Science Publications, Pavlopoulou, C. and Young, M. Residual test coverage monitoring. Proceedings of the 21 st International Conference on Software Engineering (Los Angeles, CA, May 1999), ACM Press, Podgurski, A., Masri, W., McCleese, Y., Wolff, F.G., and Yang, C. Estimation of software reliability by 10 stratified sampling. ACM Transactions on Software Engineering and Methodology 8, 9 (July, 1999), Podgurski, A. and Yang, C. Partition testing, stratified sampling, and cluster analysis. Proceedings of the First ACM Symposium on Foundations of Software Engineering (Los Angeles, CA, December 1993), ACM Press, Reps, T., Ball, T., Das, M., and Larus, J. The use of program profiling for software maintenance with applications to the Year 2000 Problem. Proceedings of the 6th European Software Engineering Conference and 5th ACM SIGSOFT Symposium on the Foundations of Software Engineering (Zurich, Switzerland, September 1997), ACM Press, Rothermel, G. and Harrold, M.J. A safe, efficient regression test algorithm. IEEE Transactions on Software Engineering 6, 10 (April 1997), X.Org. X.Org,

Week 7 Picturing Network. Vahe and Bethany

Week 7 Picturing Network. Vahe and Bethany Week 7 Picturing Network Vahe and Bethany Freeman (2005) - Graphic Techniques for Exploring Social Network Data The two main goals of analyzing social network data are identification of cohesive groups

More information

PROFILE ANALYSIS TECHNIQUES FOR OBSERVATION-BASED SOFTWARE TESTING DAVID ZAEN LEON CESIN. For the degree of Doctor of Philosophy

PROFILE ANALYSIS TECHNIQUES FOR OBSERVATION-BASED SOFTWARE TESTING DAVID ZAEN LEON CESIN. For the degree of Doctor of Philosophy PROFILE ANALYSIS TECHNIQUES FOR OBSERVATION-BASED SOFTWARE TESTING by DAVID ZAEN LEON CESIN Submitted in partial fulfillment of the requirements For the degree of Doctor of Philosophy Dissertation Adviser:

More information

An Empirical Evaluation of Test Adequacy Criteria for Event-Driven Programs

An Empirical Evaluation of Test Adequacy Criteria for Event-Driven Programs An Empirical Evaluation of Test Adequacy Criteria for Event-Driven Programs Jaymie Strecker Department of Computer Science University of Maryland College Park, MD 20742 November 30, 2006 Abstract In model-based

More information

Modern Multidimensional Scaling

Modern Multidimensional Scaling Ingwer Borg Patrick Groenen Modern Multidimensional Scaling Theory and Applications With 116 Figures Springer Contents Preface vii I Fundamentals of MDS 1 1 The Four Purposes of Multidimensional Scaling

More information

A Comparison of Error Metrics for Learning Model Parameters in Bayesian Knowledge Tracing

A Comparison of Error Metrics for Learning Model Parameters in Bayesian Knowledge Tracing A Comparison of Error Metrics for Learning Model Parameters in Bayesian Knowledge Tracing Asif Dhanani Seung Yeon Lee Phitchaya Phothilimthana Zachary Pardos Electrical Engineering and Computer Sciences

More information

K-Means Clustering Using Localized Histogram Analysis

K-Means Clustering Using Localized Histogram Analysis K-Means Clustering Using Localized Histogram Analysis Michael Bryson University of South Carolina, Department of Computer Science Columbia, SC brysonm@cse.sc.edu Abstract. The first step required for many

More information

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations

More information

In this Lecture you will Learn: Testing in Software Development Process. What is Software Testing. Static Testing vs.

In this Lecture you will Learn: Testing in Software Development Process. What is Software Testing. Static Testing vs. In this Lecture you will Learn: Testing in Software Development Process Examine the verification and validation activities in software development process stage by stage Introduce some basic concepts of

More information

Chapter 2 Basic Structure of High-Dimensional Spaces

Chapter 2 Basic Structure of High-Dimensional Spaces Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,

More information

Modern Multidimensional Scaling

Modern Multidimensional Scaling Ingwer Borg Patrick J.F. Groenen Modern Multidimensional Scaling Theory and Applications Second Edition With 176 Illustrations ~ Springer Preface vii I Fundamentals of MDS 1 1 The Four Purposes of Multidimensional

More information

9.1. K-means Clustering

9.1. K-means Clustering 424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific

More information

Requirements Engineering for Enterprise Systems

Requirements Engineering for Enterprise Systems Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2001 Proceedings Americas Conference on Information Systems (AMCIS) December 2001 Requirements Engineering for Enterprise Systems

More information

Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation

Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation by Joe Madden In conjunction with ECE 39 Introduction to Artificial Neural Networks and Fuzzy Systems

More information

Application of Clustering Techniques to Energy Data to Enhance Analysts Productivity

Application of Clustering Techniques to Energy Data to Enhance Analysts Productivity Application of Clustering Techniques to Energy Data to Enhance Analysts Productivity Wendy Foslien, Honeywell Labs Valerie Guralnik, Honeywell Labs Steve Harp, Honeywell Labs William Koran, Honeywell Atrium

More information

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. http://en.wikipedia.org/wiki/local_regression Local regression

More information

The Analysis and Proposed Modifications to ISO/IEC Software Engineering Software Quality Requirements and Evaluation Quality Requirements

The Analysis and Proposed Modifications to ISO/IEC Software Engineering Software Quality Requirements and Evaluation Quality Requirements Journal of Software Engineering and Applications, 2016, 9, 112-127 Published Online April 2016 in SciRes. http://www.scirp.org/journal/jsea http://dx.doi.org/10.4236/jsea.2016.94010 The Analysis and Proposed

More information

Visualization and Analysis of Inverse Kinematics Algorithms Using Performance Metric Maps

Visualization and Analysis of Inverse Kinematics Algorithms Using Performance Metric Maps Visualization and Analysis of Inverse Kinematics Algorithms Using Performance Metric Maps Oliver Cardwell, Ramakrishnan Mukundan Department of Computer Science and Software Engineering University of Canterbury

More information

Program Partitioning - A Framework for Combining Static and Dynamic Analysis

Program Partitioning - A Framework for Combining Static and Dynamic Analysis Program Partitioning - A Framework for Combining Static and Dynamic Analysis Pankaj Jalote, Vipindeep V, Taranbir Singh, Prateek Jain Department of Computer Science and Engineering Indian Institute of

More information

Automated Support for Classifying Software Failure Reports

Automated Support for Classifying Software Failure Reports Automated Support for Classifying Software Failure Reports Andy Podgurski, David Leon, Patrick Francis, Wes Masri, Melinda Minch Electrical Engineering & Computer Science Dept. Case Western Reserve University

More information

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures

More information

Software Testing part II (white box) Lecturer: Giuseppe Santucci

Software Testing part II (white box) Lecturer: Giuseppe Santucci Software Testing part II (white box) Lecturer: Giuseppe Santucci 4. White box testing White-box (or Glass-box) testing: general characteristics Statement coverage Decision coverage Condition coverage Decision

More information

Computer Experiments: Space Filling Design and Gaussian Process Modeling

Computer Experiments: Space Filling Design and Gaussian Process Modeling Computer Experiments: Space Filling Design and Gaussian Process Modeling Best Practice Authored by: Cory Natoli Sarah Burke, Ph.D. 30 March 2018 The goal of the STAT COE is to assist in developing rigorous,

More information

The Projected Dip-means Clustering Algorithm

The Projected Dip-means Clustering Algorithm Theofilos Chamalis Department of Computer Science & Engineering University of Ioannina GR 45110, Ioannina, Greece thchama@cs.uoi.gr ABSTRACT One of the major research issues in data clustering concerns

More information

Software Testing CS 408

Software Testing CS 408 Software Testing CS 408 1/09/18 Course Webpage: http://www.cs.purdue.edu/homes/suresh/408-spring2018 1 The Course Understand testing in the context of an Agile software development methodology - Detail

More information

Scaling Techniques in Political Science

Scaling Techniques in Political Science Scaling Techniques in Political Science Eric Guntermann March 14th, 2014 Eric Guntermann Scaling Techniques in Political Science March 14th, 2014 1 / 19 What you need R RStudio R code file Datasets You

More information

SELECTION OF A MULTIVARIATE CALIBRATION METHOD

SELECTION OF A MULTIVARIATE CALIBRATION METHOD SELECTION OF A MULTIVARIATE CALIBRATION METHOD 0. Aim of this document Different types of multivariate calibration methods are available. The aim of this document is to help the user select the proper

More information

AN EFFICIENT IMPLEMENTATION OF NESTED LOOP CONTROL INSTRUCTIONS FOR FINE GRAIN PARALLELISM 1

AN EFFICIENT IMPLEMENTATION OF NESTED LOOP CONTROL INSTRUCTIONS FOR FINE GRAIN PARALLELISM 1 AN EFFICIENT IMPLEMENTATION OF NESTED LOOP CONTROL INSTRUCTIONS FOR FINE GRAIN PARALLELISM 1 Virgil Andronache Richard P. Simpson Nelson L. Passos Department of Computer Science Midwestern State University

More information

Introduction to Software Testing

Introduction to Software Testing Introduction to Software Testing Software Testing This paper provides an introduction to software testing. It serves as a tutorial for developers who are new to formal testing of software, and as a reminder

More information

Chapter 9. Software Testing

Chapter 9. Software Testing Chapter 9. Software Testing Table of Contents Objectives... 1 Introduction to software testing... 1 The testers... 2 The developers... 2 An independent testing team... 2 The customer... 2 Principles of

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Graph projection techniques for Self-Organizing Maps

Graph projection techniques for Self-Organizing Maps Graph projection techniques for Self-Organizing Maps Georg Pölzlbauer 1, Andreas Rauber 1, Michael Dittenbach 2 1- Vienna University of Technology - Department of Software Technology Favoritenstr. 9 11

More information

Cluster Analysis Gets Complicated

Cluster Analysis Gets Complicated Cluster Analysis Gets Complicated Collinearity is a natural problem in clustering. So how can researchers get around it? Cluster analysis is widely used in segmentation studies for several reasons. First

More information

An Object Oriented Runtime Complexity Metric based on Iterative Decision Points

An Object Oriented Runtime Complexity Metric based on Iterative Decision Points An Object Oriented Runtime Complexity Metric based on Iterative Amr F. Desouky 1, Letha H. Etzkorn 2 1 Computer Science Department, University of Alabama in Huntsville, Huntsville, AL, USA 2 Computer Science

More information

CIE L*a*b* color model

CIE L*a*b* color model CIE L*a*b* color model To further strengthen the correlation between the color model and human perception, we apply the following non-linear transformation: with where (X n,y n,z n ) are the tristimulus

More information

Implementation Techniques

Implementation Techniques V Implementation Techniques 34 Efficient Evaluation of the Valid-Time Natural Join 35 Efficient Differential Timeslice Computation 36 R-Tree Based Indexing of Now-Relative Bitemporal Data 37 Light-Weight

More information

Question 1: What is a code walk-through, and how is it performed?

Question 1: What is a code walk-through, and how is it performed? Question 1: What is a code walk-through, and how is it performed? Response: Code walk-throughs have traditionally been viewed as informal evaluations of code, but more attention is being given to this

More information

Mathematics 308 Geometry. Chapter 9. Drawing three dimensional objects

Mathematics 308 Geometry. Chapter 9. Drawing three dimensional objects Mathematics 308 Geometry Chapter 9. Drawing three dimensional objects In this chapter we will see how to draw three dimensional objects with PostScript. The task will be made easier by a package of routines

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Two-dimensional Totalistic Code 52

Two-dimensional Totalistic Code 52 Two-dimensional Totalistic Code 52 Todd Rowland Senior Research Associate, Wolfram Research, Inc. 100 Trade Center Drive, Champaign, IL The totalistic two-dimensional cellular automaton code 52 is capable

More information

An Experiment in Visual Clustering Using Star Glyph Displays

An Experiment in Visual Clustering Using Star Glyph Displays An Experiment in Visual Clustering Using Star Glyph Displays by Hanna Kazhamiaka A Research Paper presented to the University of Waterloo in partial fulfillment of the requirements for the degree of Master

More information

Boundary Recognition in Sensor Networks. Ng Ying Tat and Ooi Wei Tsang

Boundary Recognition in Sensor Networks. Ng Ying Tat and Ooi Wei Tsang Boundary Recognition in Sensor Networks Ng Ying Tat and Ooi Wei Tsang School of Computing, National University of Singapore ABSTRACT Boundary recognition for wireless sensor networks has many applications,

More information

The latest trend of hybrid instrumentation

The latest trend of hybrid instrumentation Multivariate Data Processing of Spectral Images: The Ugly, the Bad, and the True The results of various multivariate data-processing methods of Raman maps recorded with a dispersive Raman microscope are

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

WELCOME! Lecture 3 Thommy Perlinger

WELCOME! Lecture 3 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important

More information

Fast Fuzzy Clustering of Infrared Images. 2. brfcm

Fast Fuzzy Clustering of Infrared Images. 2. brfcm Fast Fuzzy Clustering of Infrared Images Steven Eschrich, Jingwei Ke, Lawrence O. Hall and Dmitry B. Goldgof Department of Computer Science and Engineering, ENB 118 University of South Florida 4202 E.

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

Adaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited

Adaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited Adaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited Summary We present a new method for performing full-waveform inversion that appears

More information

A DH-parameter based condition for 3R orthogonal manipulators to have 4 distinct inverse kinematic solutions

A DH-parameter based condition for 3R orthogonal manipulators to have 4 distinct inverse kinematic solutions Wenger P., Chablat D. et Baili M., A DH-parameter based condition for R orthogonal manipulators to have 4 distinct inverse kinematic solutions, Journal of Mechanical Design, Volume 17, pp. 150-155, Janvier

More information

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data

An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University

More information

Minsoo Ryu. College of Information and Communications Hanyang University.

Minsoo Ryu. College of Information and Communications Hanyang University. Software Reuse and Component-Based Software Engineering Minsoo Ryu College of Information and Communications Hanyang University msryu@hanyang.ac.kr Software Reuse Contents Components CBSE (Component-Based

More information

BRANCH COVERAGE BASED TEST CASE PRIORITIZATION

BRANCH COVERAGE BASED TEST CASE PRIORITIZATION BRANCH COVERAGE BASED TEST CASE PRIORITIZATION Arnaldo Marulitua Sinaga Department of Informatics, Faculty of Electronics and Informatics Engineering, Institut Teknologi Del, District Toba Samosir (Tobasa),

More information

This blog addresses the question: how do we determine the intersection of two circles in the Cartesian plane?

This blog addresses the question: how do we determine the intersection of two circles in the Cartesian plane? Intersecting Circles This blog addresses the question: how do we determine the intersection of two circles in the Cartesian plane? This is a problem that a programmer might have to solve, for example,

More information

Dr. N. Sureshkumar Principal Velammal College of Engineering and Technology Madurai, Tamilnadu, India

Dr. N. Sureshkumar Principal Velammal College of Engineering and Technology Madurai, Tamilnadu, India Test Case Prioritization for Regression Testing based on Severity of Fault R. Kavitha Assistant Professor/CSE Velammal College of Engineering and Technology Madurai, Tamilnadu, India Dr. N. Sureshkumar

More information

Part 5. Verification and Validation

Part 5. Verification and Validation Software Engineering Part 5. Verification and Validation - Verification and Validation - Software Testing Ver. 1.7 This lecture note is based on materials from Ian Sommerville 2006. Anyone can use this

More information

How to re-open the black box in the structural design of complex geometries

How to re-open the black box in the structural design of complex geometries Structures and Architecture Cruz (Ed) 2016 Taylor & Francis Group, London, ISBN 978-1-138-02651-3 How to re-open the black box in the structural design of complex geometries K. Verbeeck Partner Ney & Partners,

More information

Topics in Software Testing

Topics in Software Testing Dependable Software Systems Topics in Software Testing Material drawn from [Beizer, Sommerville] Software Testing Software testing is a critical element of software quality assurance and represents the

More information

Exploratory Data Analysis EDA

Exploratory Data Analysis EDA Exploratory Data Analysis EDA Luc Anselin http://spatial.uchicago.edu 1 from EDA to ESDA dynamic graphics primer on multivariate EDA interpretation and limitations 2 From EDA to ESDA 3 Exploratory Data

More information

Texture Mapping using Surface Flattening via Multi-Dimensional Scaling

Texture Mapping using Surface Flattening via Multi-Dimensional Scaling Texture Mapping using Surface Flattening via Multi-Dimensional Scaling Gil Zigelman Ron Kimmel Department of Computer Science, Technion, Haifa 32000, Israel and Nahum Kiryati Department of Electrical Engineering

More information

Chemometrics. Description of Pirouette Algorithms. Technical Note. Abstract

Chemometrics. Description of Pirouette Algorithms. Technical Note. Abstract 19-1214 Chemometrics Technical Note Description of Pirouette Algorithms Abstract This discussion introduces the three analysis realms available in Pirouette and briefly describes each of the algorithms

More information

Software Testing Fundamentals. Software Testing Techniques. Information Flow in Testing. Testing Objectives

Software Testing Fundamentals. Software Testing Techniques. Information Flow in Testing. Testing Objectives Software Testing Fundamentals Software Testing Techniques Peter Lo Software Testing is a critical element of software quality assurance and represents the ultimate review of specification, design and coding.

More information

Unit Testing as Hypothesis Testing

Unit Testing as Hypothesis Testing Unit Testing as Hypothesis Testing Jonathan Clark September 19, 2012 You should test your code. Why? To find bugs. Even for seasoned programmers, bugs are an inevitable reality. Today, we ll take an unconventional

More information

MONIKA HEINER.

MONIKA HEINER. LESSON 1 testing, intro 1 / 25 SOFTWARE TESTING - STATE OF THE ART, METHODS, AND LIMITATIONS MONIKA HEINER monika.heiner@b-tu.de http://www.informatik.tu-cottbus.de PRELIMINARIES testing, intro 2 / 25

More information

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite

More information

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010 Statistical Models for Management Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon February 24 26, 2010 Graeme Hutcheson, University of Manchester Principal Component and Factor Analysis

More information

Testing! Prof. Leon Osterweil! CS 520/620! Spring 2013!

Testing! Prof. Leon Osterweil! CS 520/620! Spring 2013! Testing Prof. Leon Osterweil CS 520/620 Spring 2013 Relations and Analysis A software product consists of A collection of (types of) artifacts Related to each other by myriad Relations The relations are

More information

White Paper. Abstract

White Paper. Abstract Keysight Technologies Sensitivity Analysis of One-port Characterized Devices in Vector Network Analyzer Calibrations: Theory and Computational Analysis White Paper Abstract In this paper we present the

More information

A METRIC BASED EVALUATION OF TEST CASE PRIORITATION TECHNIQUES- HILL CLIMBING, REACTIVE GRASP AND TABUSEARCH

A METRIC BASED EVALUATION OF TEST CASE PRIORITATION TECHNIQUES- HILL CLIMBING, REACTIVE GRASP AND TABUSEARCH A METRIC BASED EVALUATION OF TEST CASE PRIORITATION TECHNIQUES- HILL CLIMBING, REACTIVE GRASP AND TABUSEARCH 1 M.Manjunath, 2 N.Backiavathi 1 PG Scholar, Department of Information Technology,Jayam College

More information

Principal Component Analysis of Lack of Cohesion in Methods (LCOM) metrics

Principal Component Analysis of Lack of Cohesion in Methods (LCOM) metrics Principal Component Analysis of Lack of Cohesion in Methods (LCOM) metrics Anuradha Lakshminarayana Timothy S.Newman Department of Computer Science University of Alabama in Huntsville Abstract In this

More information

Test Data Generation based on Binary Search for Class-level Testing

Test Data Generation based on Binary Search for Class-level Testing Test Data Generation based on Binary Search for Class-level Testing Sami Beydeda, Volker Gruhn University of Leipzig Faculty of Mathematics and Computer Science Department of Computer Science Applied Telematics

More information

Two-Dimensional Visualization for Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California

Two-Dimensional Visualization for Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California Two-Dimensional Visualization for Internet Resource Discovery Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California 90089-0781 fshli, danzigg@cs.usc.edu

More information

Higher-order Testing. Stuart Anderson. Stuart Anderson Higher-order Testing c 2011

Higher-order Testing. Stuart Anderson. Stuart Anderson Higher-order Testing c 2011 Higher-order Testing Stuart Anderson Defining Higher Order Tests 1 The V-Model V-Model Stages Meyers version of the V-model has a number of stages that relate to distinct testing phases all of which are

More information

Modeling with Uncertainty Interval Computations Using Fuzzy Sets

Modeling with Uncertainty Interval Computations Using Fuzzy Sets Modeling with Uncertainty Interval Computations Using Fuzzy Sets J. Honda, R. Tankelevich Department of Mathematical and Computer Sciences, Colorado School of Mines, Golden, CO, U.S.A. Abstract A new method

More information

CFDnet: Computational Fluid Dynamics on the Internet

CFDnet: Computational Fluid Dynamics on the Internet CFDnet: Computational Fluid Dynamics on the Internet F. E. Ham, J. Militzer and A. Bemfica Department of Mechanical Engineering Dalhousie University - DalTech Halifax, Nova Scotia Abstract CFDnet is computational

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Visualizing Multi-Dimensional Functions in Economics

Visualizing Multi-Dimensional Functions in Economics Visualizing Multi-Dimensional Functions in Economics William L. Goffe Dept. of Economics and International Business University of Southern Mississippi Hattiesburg, MS 3946 Bill.Goffe@usm.edu June, 1999

More information

Lecture 15 Software Testing

Lecture 15 Software Testing Lecture 15 Software Testing Includes slides from the companion website for Sommerville, Software Engineering, 10/e. Pearson Higher Education, 2016. All rights reserved. Used with permission. Topics covered

More information

Chapter 3. Requirement Based System Test Case Prioritization of New and Regression Test Cases. 3.1 Introduction

Chapter 3. Requirement Based System Test Case Prioritization of New and Regression Test Cases. 3.1 Introduction Chapter 3 Requirement Based System Test Case Prioritization of New and Regression Test Cases 3.1 Introduction In this chapter a new prioritization technique has been proposed with two new prioritization

More information

Towards Cohesion-based Metrics as Early Quality Indicators of Faulty Classes and Components

Towards Cohesion-based Metrics as Early Quality Indicators of Faulty Classes and Components 2009 International Symposium on Computing, Communication, and Control (ISCCC 2009) Proc.of CSIT vol.1 (2011) (2011) IACSIT Press, Singapore Towards Cohesion-based Metrics as Early Quality Indicators of

More information

A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY

A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY KARL L. STRATOS Abstract. The conventional method of describing a graph as a pair (V, E), where V and E repectively denote the sets of vertices and edges,

More information

Bar Graphs and Dot Plots

Bar Graphs and Dot Plots CONDENSED LESSON 1.1 Bar Graphs and Dot Plots In this lesson you will interpret and create a variety of graphs find some summary values for a data set draw conclusions about a data set based on graphs

More information

Optimization I : Brute force and Greedy strategy

Optimization I : Brute force and Greedy strategy Chapter 3 Optimization I : Brute force and Greedy strategy A generic definition of an optimization problem involves a set of constraints that defines a subset in some underlying space (like the Euclidean

More information

Data Preprocessing. Slides by: Shree Jaswal

Data Preprocessing. Slides by: Shree Jaswal Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data

More information

Interactive Campaign Planning for Marketing Analysts

Interactive Campaign Planning for Marketing Analysts Interactive Campaign Planning for Marketing Analysts Fan Du University of Maryland College Park, MD, USA fan@cs.umd.edu Sana Malik Adobe Research San Jose, CA, USA sana.malik@adobe.com Eunyee Koh Adobe

More information

A CELLULAR, LANGUAGE DIRECTED COMPUTER ARCHITECTURE. (Extended Abstract) Gyula A. Mag6. University of North Carolina at Chapel Hill

A CELLULAR, LANGUAGE DIRECTED COMPUTER ARCHITECTURE. (Extended Abstract) Gyula A. Mag6. University of North Carolina at Chapel Hill 447 A CELLULAR, LANGUAGE DIRECTED COMPUTER ARCHITECTURE (Extended Abstract) Gyula A. Mag6 University of North Carolina at Chapel Hill Abstract If a VLSI computer architecture is to influence the field

More information

Boundary/Contour Fitted Grid Generation for Effective Visualizations in a Digital Library of Mathematical Functions

Boundary/Contour Fitted Grid Generation for Effective Visualizations in a Digital Library of Mathematical Functions Boundary/Contour Fitted Grid Generation for Effective Visualizations in a Digital Library of Mathematical Functions Bonita Saunders Qiming Wang National Institute of Standards and Technology Bureau Drive

More information

Using surface markings to enhance accuracy and stability of object perception in graphic displays

Using surface markings to enhance accuracy and stability of object perception in graphic displays Using surface markings to enhance accuracy and stability of object perception in graphic displays Roger A. Browse a,b, James C. Rodger a, and Robert A. Adderley a a Department of Computing and Information

More information

Verification and Validation. Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 22 Slide 1

Verification and Validation. Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 22 Slide 1 Verification and Validation Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 22 Slide 1 Verification vs validation Verification: "Are we building the product right?. The software should

More information

Review of Regression Test Case Selection Techniques

Review of Regression Test Case Selection Techniques Review of Regression Test Case Selection Manisha Rani CSE Department, DeenBandhuChhotu Ram University of Science and Technology, Murthal, Haryana, India Ajmer Singh CSE Department, DeenBandhuChhotu Ram

More information

MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar

MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar Fyrirlestrar 31 & 32 Structural Testing White-box tests. 27/1/25 Dr Andy Brooks 1 Case Study Dæmisaga Reference Structural Testing

More information

NUMERICAL METHODS PERFORMANCE OPTIMIZATION IN ELECTROLYTES PROPERTIES MODELING

NUMERICAL METHODS PERFORMANCE OPTIMIZATION IN ELECTROLYTES PROPERTIES MODELING NUMERICAL METHODS PERFORMANCE OPTIMIZATION IN ELECTROLYTES PROPERTIES MODELING Dmitry Potapov National Research Nuclear University MEPHI, Russia, Moscow, Kashirskoe Highway, The European Laboratory for

More information

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended. Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide

More information

Learning to Learn: additional notes

Learning to Learn: additional notes MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2008 Recitation October 23 Learning to Learn: additional notes Bob Berwick

More information

A Vision System for Automatic State Determination of Grid Based Board Games

A Vision System for Automatic State Determination of Grid Based Board Games A Vision System for Automatic State Determination of Grid Based Board Games Michael Bryson Computer Science and Engineering, University of South Carolina, 29208 Abstract. Numerous programs have been written

More information

Fault Class Prioritization in Boolean Expressions

Fault Class Prioritization in Boolean Expressions Fault Class Prioritization in Boolean Expressions Ziyuan Wang 1,2 Zhenyu Chen 1 Tsong-Yueh Chen 3 Baowen Xu 1,2 1 State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093,

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling

More information

DI TRANSFORM. The regressive analyses. identify relationships

DI TRANSFORM. The regressive analyses. identify relationships July 2, 2015 DI TRANSFORM MVstats TM Algorithm Overview Summary The DI Transform Multivariate Statistics (MVstats TM ) package includes five algorithm options that operate on most types of geologic, geophysical,

More information

Feature Selection Using Principal Feature Analysis

Feature Selection Using Principal Feature Analysis Feature Selection Using Principal Feature Analysis Ira Cohen Qi Tian Xiang Sean Zhou Thomas S. Huang Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign Urbana,

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Data Mining: Exploring Data. Lecture Notes for Chapter 3 Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Look for accompanying R code on the course web site. Topics Exploratory Data Analysis

More information

Chapter 4. Clustering Core Atoms by Location

Chapter 4. Clustering Core Atoms by Location Chapter 4. Clustering Core Atoms by Location In this chapter, a process for sampling core atoms in space is developed, so that the analytic techniques in section 3C can be applied to local collections

More information