Multivariate Visualization in Observation-Based Testing
|
|
- Gwenda Dalton
- 5 years ago
- Views:
Transcription
1 Multivariate Visualization in Observation-Based Testing David Leon, Andy Podgurski, and Lee J. White Electrical Engineering and Computer Science Department Case Western Reserve University Olin Building Cleveland, Ohio USA , , dzl@po.cwru.edu, andy@eecs.cwru.edu, leew@eecs.cwru.edu ABSTRACT We explore the use of multivariate visualization techniques to support a new approach to test data selection, called observation-based testing. Applications of multivariate visualization are described, including: evaluating and improving synthetic tests; filtering regression test suites; filtering captured operational executions; comparing test suites; and assessing bug reports. These applications are illustrated by the use of correspondence analysis to analyze test inputs for the GNU GCC compiler. Keywords Software testing, observation-based testing, multivariate visualization, multivariate data analysis, data visualization, correspondence analysis. 1 INTRODUCTION The traditional paradigm for testing software is to construct test cases that cause runtime events that are likely to reveal certain kinds of defects if they are present. Examples of such events include: the use of program features; execution of statements, branches, loops, functions, or other program elements; flow of data between statements or procedures; program variables taking on boundary values or other special values; message passing between objects or processes; GUI events; and synchronization events. It is generally feasible to construct test cases to induce events of interest if the events involve a program s external interfaces, as in functional testing (black-box testing, specification-based testing). However, it often extremely difficult to create tests that induce specific events internal to a program, as required in structural testing (glass-box testing, code-based testing). For this reason functional testing is the primary form of testing used in practice. Structural testing, if it is employed at all, usually takes the form of assessing the degree of structural coverage Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. ICSE 2000, Limerick, Ireland ACM /00/06 $5.00 achieved by functional tests, that is, the extent to which the tests induce certain internal events. Structural coverage is assessed by profiling the executions induced by functional tests, that is, by instrumenting or monitoring the program under test in order to collect data about the degree of coverage achieved. If necessary, the functional tests are augmented in an ad hoc manner to improve structural coverage. The difficulty of constructing test data to induce internal program events suggests an alternative paradigm for testing software. This form of testing, which we call observationbased testing, emphasizes what is relatively easy to do and de-emphasizes what is difficult to do. It calls for first obtaining a large amount of potential test data as expeditiously as possible, e.g., by constructing functional tests, simulating usage scenarios, capturing operational inputs, or reusing existing test suites. The potential test data is then used to run a version of the software under test that has been instrumented to produce execution profiles characterizing the program s internal events. Next, the potential test data and/or the profiles it induces are analyzed, in order to filter the test data: select a smaller set of test data that induces events of interest or that has other desirable properties. To enable large volumes of potential test data to be analyzed inexpensively, the analysis techniques that are used must be fully or partially automated. Finally, the output resulting from the selected tests is checked for conformance to requirements. This last step typically requires manual effort either in checking actual output or in determining expected output. Many forms of execution profiling can be used in observation-based testing. For example, one may record the occurrences of any of the kinds of program events that have traditionally been of interest in testing. Typically, a profile takes the form of a vector of event counts, although other forms, such as a call graph, may be used in observation-based testing. Since execution profiles are often very large ones with thousands of event counts are common automated help is essential for analyzing them. In structural testing, profiles are usually summarized by computing simple coverage measures, such as the number 116
2 of program statements that were executed at least once during testing. However more sophisticated multivariate data analysis techniques can extract additional information from profile data. For example, [10] and [11] report experiments in which automatic cluster analysis of branch traversal profiles, used together with stratified random sampling, increased the accuracy of software reliability estimates, because it tended to isolate failures in small clusters. Among the most promising multivariate data analysis techniques for use in observation-based testing are multivariate visualization techniques like correspondence analysis and multidimensional scaling. In essence, these computer-intensive techniques project many-dimensional execution profiles onto a two dimensional display, producing a scatter plot that preserves important relationships between the profiles. This permits a human user to visually observe these relationships and, with the aid of interactive tools, to explore their significance for software testing. We present the initial results of a project whose goal is to explore the potential applications of multivariate visualization techniques to software testing and ultimately to develop a methodology for employing them. Section 2 gives an overview of two applicable multivariate visualization techniques: correspondence analysis and multidimensional scaling. Section 3 describes several applications of multivariate visualization in observation based testing. Section 4 presents a case study in which correspondence analysis is applied to a large data set. Future research is discussed in Section 5. Related work is surveyed in Section 6. Section 7 is the Conclusion. 2 VISUALIZATION TECHNIQUES In this section, we present a brief overview of two multivariate visualization techniques that are applicable to observation-based software testing: correspondence analysis and multidimensional scaling. Both techniques are distinguished by their ability to handle data of large volume and high dimensionality. Correspondence analysis is used in the case study described in Section 4. Correspondence Analysis Correspondence analysis is one of many names for a data analysis and visualization technique that has been independently discovered in many fields [3]. It is used to analyze an n-dimensional data set and represent it in few dimensions with the least possible loss of information. This representation allows the user to visually analyze the relationships between different data points, and also between these points and the original dimensions. One way to think about calculating the correspondence analysis display is to first fit a line through the n- dimensional space, so as to maximize the variance of the points along the direction of the line. Then the coordinates of the points are modified so as to take away any variance along this direction, and the process is repeated. The projections of the original points onto the fitted lines (axes) correspond to the points coordinates on the display. The displayed points are called row points, because they correspond to rows in the data matrix. Correspondence analysis assigns weights to both points and dimensions, e.g., to compensate for differences in measurement units. Display points representing the original dimensions, which are called column points, can also be displayed in order to show how the position of row points is affected by various dimensions. Correspondence analysis is usually computed using singular value decomposition (SVD) [7]. SVD is a well-known matrix decomposition technique for which there are efficient, highly-optimized algorithms. In observation-based testing, the input data for correspondence analysis is a matrix where each row corresponds to an execution of the software under test, and each column corresponds to a profile feature. Each row of the matrix can be considered as the coordinates of a point in a space with as many dimensions as there are profile features. Analyzing this matrix with correspondence analysis yields low-dimensional displays that show the relationships between the test points. Multidimensional Scaling Multidimensional scaling is the name for a family of techniques used to project a set of n-dimensional data points onto a plane, given only a matrix of dissimilarities between the points [1]. This matrix is computed from a data matrix by applying a dissimilarity metric (e.g., Euclidean or Manhattan distance) to each pair of rows. The positions of the points on the display are such that the distance between any pair of points reflects as closely as possible the degree of dissimilarity between them. A simple approach is to select a starting configuration of points, and evaluate how closely this approximates the input. The points are then moved to decrease the error, and the process is repeated until a minimum is found. This results in a very flexible approach to finding a display, since there are different ways of calculating the dissimilarity matrix, computing the error, and finding a solution. 3 APPLICATIONS Multivariate visualization techniques like correspondence analysis and multidimensional scaling enable a tester to visualize the distribution of execution profiles induced by a set of potential test cases. They also reveal significant features of the corresponding population of executions. 1 Typical examples of such features include unusual executions, clusters of similar executions, and regions of the profile space without any executions. In addition, visualization often reveals other features of an execution 1 These should not be confused with profile features
3 population that are visually striking but whose significance is not immediately obvious. Upon further investigation, such features may turn out to be significant for testing, as we shall see in Section 4. Besides revealing the distribution and features of an execution population, multivariate visualization techniques provide means of comparing two or more execution populations and of relating an individual execution to other executions. These capabilities of multivariate visualization techniques have several applications in observation-based testing, which are described in the remainder of this section. These include: evaluating synthetic test data; filtering regression test suites; filtering captured operational executions; comparing test suites; and assessing bug reports. Evaluating Synthetic Test Data Multivariate visualization techniques can be used to evaluate and improve a set of test data derived synthetically, e.g., by constructing functional tests or by simulating usage scenarios. As mentioned in the Introduction, it is customary to evaluate such test data by measuring the degree of structural coverage it achieves. Visualization techniques can go further by revealing relationships among test cases. Because the displays produced by these techniques are computed based on all columns of a data matrix, they can reveal relationships involving multiple event counts. An outlier or isolated point in a display indicates a test case that induces unusual behavior. Such test cases are usually desirable, because they exercise aspects of a program not exercised by other test cases. A dense cluster of points in a display suggests that the corresponding test data is redundant with respect to the kinds of events that have been profiled. It may be beneficial to eliminate most of the redundant tests and replace them with more varied ones. In order to decide whether this is appropriate, it is necessary to carry out more detailed analysis of the cluster, for example, by examining other views of the data or by using a different form of profiling. It is desirable for a visualization tool for use in software testing to support such analysis interactively. An empty region R in the display may indicate that the test set fails to exercise important behaviors of the software under test. In this case, it is desirable to augment the test set with one or more tests whose profiles yield points in R when displayed. However, it is possible there are no inputs to the software that will produce profiles in R, and in general it is undecidable whether there is any input to a program that will produce a profile with specified values. One approach to augmenting a test set is to trawl for suitable inputs: obtain additional inputs from any source (e.g., from beta testing); execute the software on them to produce new profiles; display the new profiles together with the original ones; and observe whether any of the new points fall into region R. Another approach to augmenting 3 a test set is to obtain a characterization of the kinds of profiles that would yield points in R and then attempt to construct one or more test cases that produce such profiles. Ideally, a visualization tool for use in testing would produce such a characterization of a display region on request. A display may reveal features of a test set whose explanation requires further investigation. These may be indicated by point sets with distinctive shapes. For example, the displays shown in Section 4 exhibit linear, curvilinear, and triangular point sets, among others. To determine whether the test set adequately exercises the software under test, it may be important to understand the factors that underlie such display features. To discover these factors it may suffice to examine the documentation for the corresponding test cases. If this does not exist or is not sufficient, it is necessary to conduct a detailed analysis of the profiles produced by the test cases and of the software under test. Because correspondence analysis can simultaneously display points representing the rows of the data matrix and points representing the columns, it can reveal the extent to which the position of a point is explained by individual profile features. Essentially, a row point is attracted to column points corresponding to the most prominent features in its profile (those with high values). By considering the affinity of row points for certain column points, the tester can gain an understanding of which events characterize a test case. A set of column points that are close together corresponds to a set of profile features (event counts) that are correlated with each other. The correlations between these features might be explained by an unobserved, latent variable or factor (in the sense of factor analysis [8]). This can be confirmed only by detailed analysis of the software under test. Filtering Regression Test Suites A notable special case of evaluating synthetic test data is analyzing a regression test suite in order to eliminate redundant test cases or to add new tests necessary to exercise new features of the software under test. Several authors have proposed techniques for identifying a minimal or safe subset of a regression test suite, e.g., see [5,13]. For this to be worthwhile, the cost of the analysis it entails must be less than the cost of running and evaluating the tests that are eliminated. Multivariate visualization techniques are applicable to this problem when the cost of evaluating regression tests dominates the cost of running them, e.g., because the tests must be evaluated manually. To apply these techniques, the executions induced by the original test suite must be profiled. Visualization is then used to select a subset that spans the range of tests in the original suite but which contains no redundant tests. This is done as follows. All outliers or isolated points are selected. One representative is chosen from each roughly elliptical cluster 118
4 of points. With other features of the display, one representative is selected from each region and extremity of the feature. Multivariate visualization techniques permit a variety of criteria to be used in filtering regression tests, since they can be used with any kind of profile. Filtering Captured Operational Inputs A serious problem with synthetic test data is that it does not reflect the way that the software under test will be used in the field. Even if it reveals defects, it may not reveal those having a significant impact on the software s reliability as it is perceived by users. By contrast, operational testing (beta testing, field testing) does reflect the way software is used in the field, and it also may reduce the amount of inhouse testing (alpha testing) software developers must do. In operational testing, the software to be tested is provided to some of its intended users to employ as they see fit over an extended period. The advantages of operational testing are somewhat offset by the fact that beta users often fail to observe or report failures, because they are unfamiliar with the software s specification and because testing is not their primary occupation. This problem can be addressed by using a capture/replay tool to capture executions in the field, so they can later be replayed and examined in detail by trained testing personnel. If many executions are captured, it may be practical to examine only a fraction of them in this way. Rather than examining a random sample of executions, it is desirable to filter the captured sample to identify executions with unusual characteristics that may be associated with failure. Multivariate visualizations can be used to filter operational executions in much the same way they can be used to filter regression test suites. Comparing Potential Test Suites Multivariate visualization techniques also provide a means of comparing test suites derived in different ways. As such, they are potentially useful both to practitioners and to researchers. For example, a set of synthetically generated tests can be compared with captured beta-test executions in order to see how well the synthetic tests approximate operational usage. Such a comparison might be used to modify testing procedures to better reflect patterns of operational usage. Captured executions obtained from different user populations can be compared visually in order to understand differences in their usage patterns that should be addressed in future testing. Assessing Bug Reports Software development organizations often have such a backlog of bug reports about a product that when a new report comes in, they cannot address it immediately. Rather, they must prioritize it and focus on repairing the high-priority bugs first. Multivariate visualization provides a means by which a developer can gain insight into the significance of a newly reported bug. This requires the developer to maintain a large repository of operational executions of the product, captured from a random sample of user sites. If a new bug report includes an input that elicits a failure, the execution E the input induces can be profiled, and this profile can be displayed together with profiles of the captured executions in the repository. The executions that are close to E in the display can then be identified, replayed, and examined to determine if they also fail in the same way. If the repository reflects the way the software is used in the field, this procedure will indicate the relative frequency with which the bug causes failures in the field. 4 CASE STUDY In this section we present a case study illustrating several of the applications of multivariate visualization described in Section 3. In this case study, correspondence analysis is used to analyze two sets of inputs to the C-language compiler of the GNU Compiler Collection (GCC), version [2]. The profile data consists of function call counts as reported by GNU's function coverage profiler, gprof. That is, each time the compiler was executed, the number of times each of the functions of the compiler was called was recorded. The execution platform was a Sun Ultra 5 workstation, running SunOS 5.7. One set of inputs was the test suite for GCC 2.95 (The test suite for was not publicly available at the time.) This set of inputs executed the C compiler 6064 times, yielding just as many profiles. A second set of inputs was included to compare with this test suite. It consists of publicly available programs for which the source code is also available. Most of them come from either the GNU project or the X Windows consortium X.Org [14]. These programs were selected to represent a wide variety of applications. Among them are some file, shell, and compression utilities, a compiler (GCC), a debugger (gdb), a text editor (Emacs), some X windows programs, an AI program (GNU Chess), and some network daemons and clients. A total of 32 programs were included, adding 1807 more compilations, for a total of 7871 executions. For all of these programs, the default make files were used, including optimization choices, etc. A total of 2370 different functions were called during at least one of these executions. The result was a data matrix of 7871 rows by 2370 columns. Calculating the correspondence analysis display for this data set takes roughly one hour on a Pentium III 450 with 256 MB of memory. Interpreting the Correspondence Analysis Display Once correspondence analysis has been carried out, one ends up with a set of axes and the data points coordinates on these axes. The axes are called principal axes. They are ordered with respect to the amount of variance in the data they account for, with the most important being the first principal axis. The plane defined by the first and second principal axes is called the first principal plane; the third and fourth principal axis make the second principal plane, and so on. It is also possible to take different pairs of axes, 4 119
5 (a) (b) Figure 1: (a) First principal plane for a subset of the GCC data, including row points (round) and column points (squares). (b) Names of the functions for the lower cluster of column points. though the planes they define have no special names. Figure 1a shows the first principal plane for a subset of the GCC data. This is a scatter plot using the first principal axis as the x axis and the second as the y axis. Each round point in this figure corresponds to an execution of the GCC compiler. The distances between points in the figure reflect the n-dimensional distance between the corresponding profiles. That is, if two points are far apart in the display, then the corresponding executions have very different profiles. As mentioned previously, correspondence analysis can also provide information about the reasons for a point s location in the display. That is, it identifies which of the point s features were most important in determining its placement. Consider Figure 1a. This contains two sets of points: the round points correspond to row points (test cases) whereas the square points are column points. That is, each of those square points correspond to one column of the input matrix, which in turn represents one of the features being profiled, in this case one function. The column points provide two pieces of information. First, the distances between column points can be interpreted in the same way as those for row points. That is, if two column points are far apart, they are essentially unrelated, while points that are close together represent a set of functions whose call counts were linearly related in all runs. These sets of related functions are called factors (see Section 3). On the other hand when two functions are related in a non-linear way, they will be separated, and the row points will be arranged in a curve between these two column points, representing the relationship between these functions. In Figure 1a, one can see a few different factors in the display, one to the left, one on the bottom, and some others around the cloud of row points. The names of the functions in the box can be seen in Figure 1b. The second piece of information given by the column points arises 5 from their relationship with the row points. If a row point is close to a column point, it means that the execution represented by the row point used the function represented by the column point more often than average. For example, the executions on the bottom of Figure 1a used the functions listed in Figure 1b very often. Any row point can be interpreted as a linear combination of column points, since the location of each row point corresponds to how many times each of the functions was called. This means that the position of a row point in the display is a weighted average of the position of all column points. For example, if an execution used only functions f and g, and it used them the same number of times, its point would be halfway between f s point and g s point. A point that is very close to the origin is an execution that did not stress the functions that are represented in that plane. On different principal planes, they might be very far apart from the origin, since different factors will be seen. Finding Unusual Executions By looking at the correspondence analysis display of the GCC data, one can immediately identify some points that are far away from all the others. This means that GCC s behavior during these executions was remarkably different from its behavior during most others. For example, Figure 2 displays the first principal plane of the correspondence analysis. There is an outlier on the right side of the display, very far from the rest of the points. This indicates a very special execution, which made GCC behave in an interesting way. This point, and all such points, should be chosen as test cases worth looking at. After an outlier has been identified, the display is recalculated without taking that point into consideration. The reason for this step is that an outlier greatly influences the display, which makes the representation of all other points less accurate. After recalculating, it is possible to 120
6 Figure 2: First principal plane of the GCC data set. Initial display see more of the structure of the data, as shown in Figure 3. After this step, one can look for more outliers. If none are found in the first principal plane, one can look for them in other principal planes. For example, there are no obvious outliers in Figure 3. But looking at the second principal plane (Figure 4), one can see another outlier. This is because even though the first principal plane is free of outliers, it does not mean there are no points influencing the remaining dimensions. Fortunately, there is a technique related to correspondence analysis, called jackknifing, that identifies outliers automatically. Jackknifing checks each point to determine whether removing it would cause a principal axis to rotate more that 45 degrees. If so, that point is considered an outlier and is actually removed. A new display is then calculated and the process is repeated to identify all outliers. Although jackknifing is very fast, recomputing the display is more expensive. The whole process can be run Figure 3: First principal plane after removing one outlier overnight and requires no human intervention. Figure 5 shows the display after removing outliers 24 times in this manner. Eighty-two outliers were identified and removed in total. (More than one outlier can be identified in any one step.) A problem occurs when a plane has a small cluster of points that is distant from the rest. For example, as will be seen below, Figure 8 contains a small cluster of points on the right. Technically none of these points is an outlier, since they are not unique there are four of them in that region. One may instead label the entire cluster as an outlier and remove all its points. Instead of checking all of the executions in the cluster for conformance to requirements, one or two of them might be checked instead. Outlier clusters have an adverse effect on the jackknifing algorithm. Removing any of the points by itself might not influence the display, so none of these points will be labeled as an outlier. To date, our only means of Figure 4: Second principal plane after removing one outlier. Figure 5: First principal plane after 24 iterations of the jackknifing algorithm. Shaded according to optimization level 6 121
7 (a) (b) Figure 6: First principal plane after 24 iterations of the jackknifing algorithm. Shaded according to optimization level. (a) shows only points representing test suite runs, (b) shows only points representing user programs. identifying clusters of outliers is by visual inspection. It is behavior. better to do this before running the jackknifing algorithm, Another way to approach this problem is by assuming there so this algorithm can run on a more accurate display. is no test suite and picking a set of test cases from the Figure 9 shows the result of removing the outlier cluster executions in Figure 5. Intuitively, one or more test cases from Figure 8. should be picked from each region, since executions in Comparing and Augmenting Test Suites different regions have different behavior. One can also The GCC test suite turns out not to cover all of the look at the features of the population and select tests from functions of the compiler. Just by adding the user each cluster, etc. This process is repeated for subsequent programs, 27 functions were executed that were not principal planes, being careful to mark the tests that have exercised by the test suite. This suggests that the test suite been chosen from higher planes. This way more features might be improved by adding more test cases. Figure 5 can be taken into account, without introducing redundant shows the correspondence analysis for the whole GCC data tests. set. This includes both the existing GCC test suite and the Identifying Significant Features in the Display set of user programs compiled for comparison. To make it By looking at Figure 5, one can see distinct features in the easier to differentiate between these two data sets, Figure display. These correspond to ways in which the factors in 6a shows the same display, with only the test suite points the display are related in the population. For example, plotted, and Figure 6b plots only the user program points. consider the dark points in Figure 6a. All of these points Putting the two together yields Figure 5. exhibit a similar correlation between the factor at the Considering figures 6a and 6b, it is obvious that the test bottom of the display and the factor at the left. This means suite compilations behave very differently than user that for these points, the two factors have an inverse linear program compilations. In particular, the test suite programs relationship. But notice that this correlation is different for seem to concentrate towards the top left part of the display, different shades, while it is basically nonexistent for the while the user programs are closer to the bottom of the points in Figure 6b. This means that these different groups display. This confirms the common knowledge that handmade test cases may not reflect the way the program is different ways, showing that the relationships between the of executions of the compiler behave in remarkably actually utilized by end users. components of the compiler are not static but depend on the specific attributes of the input. Moreover, this shows that Examining these pictures, one can see that the test suite these different behaviors are discrete, since there is stresses the compiler functions at the top left of the display. essentially no middle ground. When selecting test cases, it There are a few user programs that lie in that region, so it is is desirable to test all such program behaviors, and a good thing that region has been tested. On the other therefore one needs to select tests from each region of the hand, many user programs lie in a region at the bottom of display. the display, where there are no test suite programs covering that combination of functions. This indicates that the test In order to understand this data set better, it was necessary suite could be augmented by adding one or more of these to determine what these regions represent. It was first user programs, or by designing tests that produce this noticed that there were differences between test suite 7 122
8 Figure 7: First principal plane when analyzing only runs with no optimization. Black points indicate test suite runs. Light points indicate user programs programs and user programs. Then, different explanations were explored for the regions in the display. In the end, the simplest one proved to be appropriate. The regions represent the level of optimization of the compiler during that execution. Figures 5, 6a and 6b are shaded according to optimization levels, with black being runs that were not optimized. Different optimization levels require different behaviors from the compiler, and this fact is represented in the display. Selecting test cases from each region of the display therefore ensures that each optimization level of the compiler is tested. Although the need for such tests should be obvious to someone knowledgeable about compilers, the test selection procedure we have described does not require such knowledge. This is especially important when the workings of the software under test are poorly understood. Once we have determined that executions with and without optimization behave very differently, we can choose to Figure 9: First principal plane after removing the first cluster of outliers from the set of points with optimization. 8 Figure 8: First principal plane when analyzing only tests with optimization enabled. examine them separately. Figure 7 shows the first principal plane for points without optimization, and Figure 8 shows the points with optimization. Unlike figures 6a and 6b, these displays are calculated separately, so there is no relationship between the displays in figures 7 and 8. Figure 7 shows that test suite and user programs are still fairly disjoint. Additional features can be seen in the data, both linear and otherwise. Again, this suggests there are different kinds of executions that should be checked. Figure 8 shows a cluster of points to the right. It can be seen that these points stress some aspect of the compiler in a special way. It turns out that these points correspond to test cases for testing the built-in memcpy functions, using high optimization. So again, the display suggests meaningful test cases. After removing that cluster of outliers, we get Figure 9. Eliminating Redundant Executions The correspondence analysis display can point a tester to sets of tests that might be redundant. Tests that are far apart in the display have very different profiles. Likewise, tests that are close together have similar profiles. One has to be careful, though. Similarity in this case does not mean equality. It simply indicates that the tests coincide with respect to the factors displayed in the current principal plane. Therefore, when looking for similar executions, it is necessary to check whether the points coincide in several planes. Once a group of seemingly similar executions has been found, it is possible to further analyze it to establish which points really are redundant. This can be done by comparing the profiles with standard statistical techniques, or by examining this smaller group again with correspondence analysis or another multivariate visualization technique. Using correspondence analysis on this smaller set of data would allow the tester to look at the differences between its points without the display being influenced by the differences between the other points. Moreover, looking at the display's column points allows the 123
9 Figure 10: Binary tree data set. Light points indicate failures. tester to see what these differences are and decide whether they are meaningful, or whether the tests really are redundant. 5 FUTURE RESEARCH In order to develop a comprehensive methodology for the use of multivariate visualization in observation-based testing, it is necessary to understand how visualizations are affected by the type of software being studied, the types of defects it contains, the type of profiles that are generated, and the type of visualization technique that is employed. We therefore plan to conduct a substantial empirical study that will address these issues. In this study, different visualization techniques are to be used to analyze a variety of profile types from several representative types of software. One particularly important issue is the extent to which multivariate visualization can distinguish executions that exhibit failures from other executions. The case study presented in Section 4 did not address this issue, because we did not know which, if any, inputs caused the GCC compiler to fail. However, we have preliminary evidence from other programs that multivariate visualization can distinguish failure behavior. Consider for example the correspondence analysis display in Figure 10. It was computed from line-count profiles of a simple program that implements a binary search tree. During execution of this program, a number of key-value pairs are inserted and deleted, and the results are printed out at intervals. The program contains a defect that was deliberately placed in its deletion routine. A second program was used to check whether the tree program s output was correct. The black points in Figure 10 correspond to successful executions, while the light points indicate failed executions. This display clearly separates many failures from successful executions. Consequently, if tests are selected from the major regions of the point cloud, the defect is certain to be 9 revealed. Interestingly, statement coverage would not necessarily reveal this defect, because the defect affects only deletions of internal nodes and causes a failure only if the tree s contents are printed out soon after an internal node is deleted. (Subsequent deletions can cause coincidental correctness.) In Figure 10, the left corner of the point cloud corresponds to executions with only insertions; on the right are executions with many deletions. On the top are executions with insertions and deletions of internal nodes. We shall investigate whether the separation of failures and successful executions exhibited by the binary tree data set is common with other, more substantial programs and different forms of profiling. 6 RELATED WORK A number of authors have addressed topics closely related to observation-based testing and multivariate visualization of execution profiles. Hanson et al describe the use of several multivariate analysis techniques, including star plots and principal components analysis, for studying usage of UNIX operating system commands in support of userinterface enhancement [4]. Podgurski et al examine the use of the multivariate analysis technique cluster analysis for improving the accuracy/efficiency of software reliability estimation [10,11]. In this application, program executions are captured in the field and later replayed and profiled. The profiles are then clustered to obtain a stratified sampling design for estimating reliability. Podgurski et al report experiments in which cluster analysis of branchtraversal profiles permitted the failure frequency of several programs to be estimated, using stratified random sampling, more accurately than it could be with simple random sampling. This result was explained by the fact that cluster analysis of profiles isolated some program failures in very small clusters. Reps et al explore the use of a type of execution profile called a path spectrum for discovering Year 2000 problems and related issues, and they propose several other applications of path spectra to software maintenance and testing [12]. They also describe a prototype system called DynaDiff for comparing path spectra, which produces graphical representations of individual spectra to facilitate their comparison. Harrold et al evaluated several types of program spectra (profiles) empirically, to determine how well they indicate the occurrence of execution failures [6]. They observed that failures were likely to be indicated by differences in complete-path spectra, path-count spectra, and branch-count spectra. They also observed that differences in such spectra are more likely to indicate failures than are differences in execution-trace spectra. Pavlopoulou and Young describe how monitoring residual test coverage in software that is deployed or undergoing beta testing can be used to validate the thoroughness of testing in the development environment [9]. They describe 124
10 a prototype system that monitors residual statement coverage in Java programs and they present performance measurements that suggest the performance impact of monitoring is acceptable. 7 CONCLUSION We have described a new approach to testing, called observation-based testing, which calls for obtaining a large amount of potential test data as expeditiously as possible and then filtering that data to obtain a much smaller subset on which to actually evaluate the software under test. Filtering potential test cases involves profiling the executions they induce and then analyzing the resulting profiles with automated help. We have proposed the use of multivariate visualization techniques for analyzing profiles, described several applications of the techniques, and presented a case study in which correspondence analysis was used to analyze potential tests cases for the GCC compiler. REFERENCES 1. Borg, I. and Groenen, P. Modern Multidimensional Scaling: Theory and Applications, Springer, GCC. The GCC Home Page, Free Software Foundation, Greenacre, M.J. Theory and Applications of Correspondence Analysis, Academic Press, Hanson, S.J., Kraut, R.E, and Farber, J.M. Interface design and multivariate analysis of UNIX command use. ACM Transactions on Office Information Systems 2, 1 (March 1984), Harrold, M.J., Gupta, R., and Soffa, M.L. A methodology for controlling the size of a test suite. ACM Transactions on Software Engineering and Methodology 2, 3 (July 1993), Harrold, M.J., Rothermel, G., Wu, R., and Yi, L. An empirical investigation of program spectra. ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (Montreal, Canada, June 1998), Harville, D.A. Matrix Algebra From a Statistician s Perspective. Springer-Verlag, Krzanowski, W.J. Principles of Multivariate Analysis: A User s Perspective, Oxford Science Publications, Pavlopoulou, C. and Young, M. Residual test coverage monitoring. Proceedings of the 21 st International Conference on Software Engineering (Los Angeles, CA, May 1999), ACM Press, Podgurski, A., Masri, W., McCleese, Y., Wolff, F.G., and Yang, C. Estimation of software reliability by 10 stratified sampling. ACM Transactions on Software Engineering and Methodology 8, 9 (July, 1999), Podgurski, A. and Yang, C. Partition testing, stratified sampling, and cluster analysis. Proceedings of the First ACM Symposium on Foundations of Software Engineering (Los Angeles, CA, December 1993), ACM Press, Reps, T., Ball, T., Das, M., and Larus, J. The use of program profiling for software maintenance with applications to the Year 2000 Problem. Proceedings of the 6th European Software Engineering Conference and 5th ACM SIGSOFT Symposium on the Foundations of Software Engineering (Zurich, Switzerland, September 1997), ACM Press, Rothermel, G. and Harrold, M.J. A safe, efficient regression test algorithm. IEEE Transactions on Software Engineering 6, 10 (April 1997), X.Org. X.Org,
Week 7 Picturing Network. Vahe and Bethany
Week 7 Picturing Network Vahe and Bethany Freeman (2005) - Graphic Techniques for Exploring Social Network Data The two main goals of analyzing social network data are identification of cohesive groups
More informationPROFILE ANALYSIS TECHNIQUES FOR OBSERVATION-BASED SOFTWARE TESTING DAVID ZAEN LEON CESIN. For the degree of Doctor of Philosophy
PROFILE ANALYSIS TECHNIQUES FOR OBSERVATION-BASED SOFTWARE TESTING by DAVID ZAEN LEON CESIN Submitted in partial fulfillment of the requirements For the degree of Doctor of Philosophy Dissertation Adviser:
More informationAn Empirical Evaluation of Test Adequacy Criteria for Event-Driven Programs
An Empirical Evaluation of Test Adequacy Criteria for Event-Driven Programs Jaymie Strecker Department of Computer Science University of Maryland College Park, MD 20742 November 30, 2006 Abstract In model-based
More informationModern Multidimensional Scaling
Ingwer Borg Patrick Groenen Modern Multidimensional Scaling Theory and Applications With 116 Figures Springer Contents Preface vii I Fundamentals of MDS 1 1 The Four Purposes of Multidimensional Scaling
More informationA Comparison of Error Metrics for Learning Model Parameters in Bayesian Knowledge Tracing
A Comparison of Error Metrics for Learning Model Parameters in Bayesian Knowledge Tracing Asif Dhanani Seung Yeon Lee Phitchaya Phothilimthana Zachary Pardos Electrical Engineering and Computer Sciences
More informationK-Means Clustering Using Localized Histogram Analysis
K-Means Clustering Using Localized Histogram Analysis Michael Bryson University of South Carolina, Department of Computer Science Columbia, SC brysonm@cse.sc.edu Abstract. The first step required for many
More informationPoints Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked
Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations
More informationIn this Lecture you will Learn: Testing in Software Development Process. What is Software Testing. Static Testing vs.
In this Lecture you will Learn: Testing in Software Development Process Examine the verification and validation activities in software development process stage by stage Introduce some basic concepts of
More informationChapter 2 Basic Structure of High-Dimensional Spaces
Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,
More informationModern Multidimensional Scaling
Ingwer Borg Patrick J.F. Groenen Modern Multidimensional Scaling Theory and Applications Second Edition With 176 Illustrations ~ Springer Preface vii I Fundamentals of MDS 1 1 The Four Purposes of Multidimensional
More information9.1. K-means Clustering
424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific
More informationRequirements Engineering for Enterprise Systems
Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2001 Proceedings Americas Conference on Information Systems (AMCIS) December 2001 Requirements Engineering for Enterprise Systems
More informationEstablishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation
Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation by Joe Madden In conjunction with ECE 39 Introduction to Artificial Neural Networks and Fuzzy Systems
More informationApplication of Clustering Techniques to Energy Data to Enhance Analysts Productivity
Application of Clustering Techniques to Energy Data to Enhance Analysts Productivity Wendy Foslien, Honeywell Labs Valerie Guralnik, Honeywell Labs Steve Harp, Honeywell Labs William Koran, Honeywell Atrium
More informationLOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.
LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. http://en.wikipedia.org/wiki/local_regression Local regression
More informationThe Analysis and Proposed Modifications to ISO/IEC Software Engineering Software Quality Requirements and Evaluation Quality Requirements
Journal of Software Engineering and Applications, 2016, 9, 112-127 Published Online April 2016 in SciRes. http://www.scirp.org/journal/jsea http://dx.doi.org/10.4236/jsea.2016.94010 The Analysis and Proposed
More informationVisualization and Analysis of Inverse Kinematics Algorithms Using Performance Metric Maps
Visualization and Analysis of Inverse Kinematics Algorithms Using Performance Metric Maps Oliver Cardwell, Ramakrishnan Mukundan Department of Computer Science and Software Engineering University of Canterbury
More informationProgram Partitioning - A Framework for Combining Static and Dynamic Analysis
Program Partitioning - A Framework for Combining Static and Dynamic Analysis Pankaj Jalote, Vipindeep V, Taranbir Singh, Prateek Jain Department of Computer Science and Engineering Indian Institute of
More informationAutomated Support for Classifying Software Failure Reports
Automated Support for Classifying Software Failure Reports Andy Podgurski, David Leon, Patrick Francis, Wes Masri, Melinda Minch Electrical Engineering & Computer Science Dept. Case Western Reserve University
More informationData Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures
More informationSoftware Testing part II (white box) Lecturer: Giuseppe Santucci
Software Testing part II (white box) Lecturer: Giuseppe Santucci 4. White box testing White-box (or Glass-box) testing: general characteristics Statement coverage Decision coverage Condition coverage Decision
More informationComputer Experiments: Space Filling Design and Gaussian Process Modeling
Computer Experiments: Space Filling Design and Gaussian Process Modeling Best Practice Authored by: Cory Natoli Sarah Burke, Ph.D. 30 March 2018 The goal of the STAT COE is to assist in developing rigorous,
More informationThe Projected Dip-means Clustering Algorithm
Theofilos Chamalis Department of Computer Science & Engineering University of Ioannina GR 45110, Ioannina, Greece thchama@cs.uoi.gr ABSTRACT One of the major research issues in data clustering concerns
More informationSoftware Testing CS 408
Software Testing CS 408 1/09/18 Course Webpage: http://www.cs.purdue.edu/homes/suresh/408-spring2018 1 The Course Understand testing in the context of an Agile software development methodology - Detail
More informationScaling Techniques in Political Science
Scaling Techniques in Political Science Eric Guntermann March 14th, 2014 Eric Guntermann Scaling Techniques in Political Science March 14th, 2014 1 / 19 What you need R RStudio R code file Datasets You
More informationSELECTION OF A MULTIVARIATE CALIBRATION METHOD
SELECTION OF A MULTIVARIATE CALIBRATION METHOD 0. Aim of this document Different types of multivariate calibration methods are available. The aim of this document is to help the user select the proper
More informationAN EFFICIENT IMPLEMENTATION OF NESTED LOOP CONTROL INSTRUCTIONS FOR FINE GRAIN PARALLELISM 1
AN EFFICIENT IMPLEMENTATION OF NESTED LOOP CONTROL INSTRUCTIONS FOR FINE GRAIN PARALLELISM 1 Virgil Andronache Richard P. Simpson Nelson L. Passos Department of Computer Science Midwestern State University
More informationIntroduction to Software Testing
Introduction to Software Testing Software Testing This paper provides an introduction to software testing. It serves as a tutorial for developers who are new to formal testing of software, and as a reminder
More informationChapter 9. Software Testing
Chapter 9. Software Testing Table of Contents Objectives... 1 Introduction to software testing... 1 The testers... 2 The developers... 2 An independent testing team... 2 The customer... 2 Principles of
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationGraph projection techniques for Self-Organizing Maps
Graph projection techniques for Self-Organizing Maps Georg Pölzlbauer 1, Andreas Rauber 1, Michael Dittenbach 2 1- Vienna University of Technology - Department of Software Technology Favoritenstr. 9 11
More informationCluster Analysis Gets Complicated
Cluster Analysis Gets Complicated Collinearity is a natural problem in clustering. So how can researchers get around it? Cluster analysis is widely used in segmentation studies for several reasons. First
More informationAn Object Oriented Runtime Complexity Metric based on Iterative Decision Points
An Object Oriented Runtime Complexity Metric based on Iterative Amr F. Desouky 1, Letha H. Etzkorn 2 1 Computer Science Department, University of Alabama in Huntsville, Huntsville, AL, USA 2 Computer Science
More informationCIE L*a*b* color model
CIE L*a*b* color model To further strengthen the correlation between the color model and human perception, we apply the following non-linear transformation: with where (X n,y n,z n ) are the tristimulus
More informationImplementation Techniques
V Implementation Techniques 34 Efficient Evaluation of the Valid-Time Natural Join 35 Efficient Differential Timeslice Computation 36 R-Tree Based Indexing of Now-Relative Bitemporal Data 37 Light-Weight
More informationQuestion 1: What is a code walk-through, and how is it performed?
Question 1: What is a code walk-through, and how is it performed? Response: Code walk-throughs have traditionally been viewed as informal evaluations of code, but more attention is being given to this
More informationMathematics 308 Geometry. Chapter 9. Drawing three dimensional objects
Mathematics 308 Geometry Chapter 9. Drawing three dimensional objects In this chapter we will see how to draw three dimensional objects with PostScript. The task will be made easier by a package of routines
More informationThe Encoding Complexity of Network Coding
The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network
More informationTwo-dimensional Totalistic Code 52
Two-dimensional Totalistic Code 52 Todd Rowland Senior Research Associate, Wolfram Research, Inc. 100 Trade Center Drive, Champaign, IL The totalistic two-dimensional cellular automaton code 52 is capable
More informationAn Experiment in Visual Clustering Using Star Glyph Displays
An Experiment in Visual Clustering Using Star Glyph Displays by Hanna Kazhamiaka A Research Paper presented to the University of Waterloo in partial fulfillment of the requirements for the degree of Master
More informationBoundary Recognition in Sensor Networks. Ng Ying Tat and Ooi Wei Tsang
Boundary Recognition in Sensor Networks Ng Ying Tat and Ooi Wei Tsang School of Computing, National University of Singapore ABSTRACT Boundary recognition for wireless sensor networks has many applications,
More informationThe latest trend of hybrid instrumentation
Multivariate Data Processing of Spectral Images: The Ugly, the Bad, and the True The results of various multivariate data-processing methods of Raman maps recorded with a dispersive Raman microscope are
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationWELCOME! Lecture 3 Thommy Perlinger
Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important
More informationFast Fuzzy Clustering of Infrared Images. 2. brfcm
Fast Fuzzy Clustering of Infrared Images Steven Eschrich, Jingwei Ke, Lawrence O. Hall and Dmitry B. Goldgof Department of Computer Science and Engineering, ENB 118 University of South Florida 4202 E.
More information2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006
2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,
More informationAdaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited
Adaptive Waveform Inversion: Theory Mike Warner*, Imperial College London, and Lluís Guasch, Sub Salt Solutions Limited Summary We present a new method for performing full-waveform inversion that appears
More informationA DH-parameter based condition for 3R orthogonal manipulators to have 4 distinct inverse kinematic solutions
Wenger P., Chablat D. et Baili M., A DH-parameter based condition for R orthogonal manipulators to have 4 distinct inverse kinematic solutions, Journal of Mechanical Design, Volume 17, pp. 150-155, Janvier
More informationAn Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data
An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University
More informationMinsoo Ryu. College of Information and Communications Hanyang University.
Software Reuse and Component-Based Software Engineering Minsoo Ryu College of Information and Communications Hanyang University msryu@hanyang.ac.kr Software Reuse Contents Components CBSE (Component-Based
More informationBRANCH COVERAGE BASED TEST CASE PRIORITIZATION
BRANCH COVERAGE BASED TEST CASE PRIORITIZATION Arnaldo Marulitua Sinaga Department of Informatics, Faculty of Electronics and Informatics Engineering, Institut Teknologi Del, District Toba Samosir (Tobasa),
More informationThis blog addresses the question: how do we determine the intersection of two circles in the Cartesian plane?
Intersecting Circles This blog addresses the question: how do we determine the intersection of two circles in the Cartesian plane? This is a problem that a programmer might have to solve, for example,
More informationDr. N. Sureshkumar Principal Velammal College of Engineering and Technology Madurai, Tamilnadu, India
Test Case Prioritization for Regression Testing based on Severity of Fault R. Kavitha Assistant Professor/CSE Velammal College of Engineering and Technology Madurai, Tamilnadu, India Dr. N. Sureshkumar
More informationPart 5. Verification and Validation
Software Engineering Part 5. Verification and Validation - Verification and Validation - Software Testing Ver. 1.7 This lecture note is based on materials from Ian Sommerville 2006. Anyone can use this
More informationHow to re-open the black box in the structural design of complex geometries
Structures and Architecture Cruz (Ed) 2016 Taylor & Francis Group, London, ISBN 978-1-138-02651-3 How to re-open the black box in the structural design of complex geometries K. Verbeeck Partner Ney & Partners,
More informationTopics in Software Testing
Dependable Software Systems Topics in Software Testing Material drawn from [Beizer, Sommerville] Software Testing Software testing is a critical element of software quality assurance and represents the
More informationExploratory Data Analysis EDA
Exploratory Data Analysis EDA Luc Anselin http://spatial.uchicago.edu 1 from EDA to ESDA dynamic graphics primer on multivariate EDA interpretation and limitations 2 From EDA to ESDA 3 Exploratory Data
More informationTexture Mapping using Surface Flattening via Multi-Dimensional Scaling
Texture Mapping using Surface Flattening via Multi-Dimensional Scaling Gil Zigelman Ron Kimmel Department of Computer Science, Technion, Haifa 32000, Israel and Nahum Kiryati Department of Electrical Engineering
More informationChemometrics. Description of Pirouette Algorithms. Technical Note. Abstract
19-1214 Chemometrics Technical Note Description of Pirouette Algorithms Abstract This discussion introduces the three analysis realms available in Pirouette and briefly describes each of the algorithms
More informationSoftware Testing Fundamentals. Software Testing Techniques. Information Flow in Testing. Testing Objectives
Software Testing Fundamentals Software Testing Techniques Peter Lo Software Testing is a critical element of software quality assurance and represents the ultimate review of specification, design and coding.
More informationUnit Testing as Hypothesis Testing
Unit Testing as Hypothesis Testing Jonathan Clark September 19, 2012 You should test your code. Why? To find bugs. Even for seasoned programmers, bugs are an inevitable reality. Today, we ll take an unconventional
More informationMONIKA HEINER.
LESSON 1 testing, intro 1 / 25 SOFTWARE TESTING - STATE OF THE ART, METHODS, AND LIMITATIONS MONIKA HEINER monika.heiner@b-tu.de http://www.informatik.tu-cottbus.de PRELIMINARIES testing, intro 2 / 25
More informationThe basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student
Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite
More informationStatistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010
Statistical Models for Management Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon February 24 26, 2010 Graeme Hutcheson, University of Manchester Principal Component and Factor Analysis
More informationTesting! Prof. Leon Osterweil! CS 520/620! Spring 2013!
Testing Prof. Leon Osterweil CS 520/620 Spring 2013 Relations and Analysis A software product consists of A collection of (types of) artifacts Related to each other by myriad Relations The relations are
More informationWhite Paper. Abstract
Keysight Technologies Sensitivity Analysis of One-port Characterized Devices in Vector Network Analyzer Calibrations: Theory and Computational Analysis White Paper Abstract In this paper we present the
More informationA METRIC BASED EVALUATION OF TEST CASE PRIORITATION TECHNIQUES- HILL CLIMBING, REACTIVE GRASP AND TABUSEARCH
A METRIC BASED EVALUATION OF TEST CASE PRIORITATION TECHNIQUES- HILL CLIMBING, REACTIVE GRASP AND TABUSEARCH 1 M.Manjunath, 2 N.Backiavathi 1 PG Scholar, Department of Information Technology,Jayam College
More informationPrincipal Component Analysis of Lack of Cohesion in Methods (LCOM) metrics
Principal Component Analysis of Lack of Cohesion in Methods (LCOM) metrics Anuradha Lakshminarayana Timothy S.Newman Department of Computer Science University of Alabama in Huntsville Abstract In this
More informationTest Data Generation based on Binary Search for Class-level Testing
Test Data Generation based on Binary Search for Class-level Testing Sami Beydeda, Volker Gruhn University of Leipzig Faculty of Mathematics and Computer Science Department of Computer Science Applied Telematics
More informationTwo-Dimensional Visualization for Internet Resource Discovery. Shih-Hao Li and Peter B. Danzig. University of Southern California
Two-Dimensional Visualization for Internet Resource Discovery Shih-Hao Li and Peter B. Danzig Computer Science Department University of Southern California Los Angeles, California 90089-0781 fshli, danzigg@cs.usc.edu
More informationHigher-order Testing. Stuart Anderson. Stuart Anderson Higher-order Testing c 2011
Higher-order Testing Stuart Anderson Defining Higher Order Tests 1 The V-Model V-Model Stages Meyers version of the V-model has a number of stages that relate to distinct testing phases all of which are
More informationModeling with Uncertainty Interval Computations Using Fuzzy Sets
Modeling with Uncertainty Interval Computations Using Fuzzy Sets J. Honda, R. Tankelevich Department of Mathematical and Computer Sciences, Colorado School of Mines, Golden, CO, U.S.A. Abstract A new method
More informationCFDnet: Computational Fluid Dynamics on the Internet
CFDnet: Computational Fluid Dynamics on the Internet F. E. Ham, J. Militzer and A. Bemfica Department of Mechanical Engineering Dalhousie University - DalTech Halifax, Nova Scotia Abstract CFDnet is computational
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationVisualizing Multi-Dimensional Functions in Economics
Visualizing Multi-Dimensional Functions in Economics William L. Goffe Dept. of Economics and International Business University of Southern Mississippi Hattiesburg, MS 3946 Bill.Goffe@usm.edu June, 1999
More informationLecture 15 Software Testing
Lecture 15 Software Testing Includes slides from the companion website for Sommerville, Software Engineering, 10/e. Pearson Higher Education, 2016. All rights reserved. Used with permission. Topics covered
More informationChapter 3. Requirement Based System Test Case Prioritization of New and Regression Test Cases. 3.1 Introduction
Chapter 3 Requirement Based System Test Case Prioritization of New and Regression Test Cases 3.1 Introduction In this chapter a new prioritization technique has been proposed with two new prioritization
More informationTowards Cohesion-based Metrics as Early Quality Indicators of Faulty Classes and Components
2009 International Symposium on Computing, Communication, and Control (ISCCC 2009) Proc.of CSIT vol.1 (2011) (2011) IACSIT Press, Singapore Towards Cohesion-based Metrics as Early Quality Indicators of
More informationA GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY
A GRAPH FROM THE VIEWPOINT OF ALGEBRAIC TOPOLOGY KARL L. STRATOS Abstract. The conventional method of describing a graph as a pair (V, E), where V and E repectively denote the sets of vertices and edges,
More informationBar Graphs and Dot Plots
CONDENSED LESSON 1.1 Bar Graphs and Dot Plots In this lesson you will interpret and create a variety of graphs find some summary values for a data set draw conclusions about a data set based on graphs
More informationOptimization I : Brute force and Greedy strategy
Chapter 3 Optimization I : Brute force and Greedy strategy A generic definition of an optimization problem involves a set of constraints that defines a subset in some underlying space (like the Euclidean
More informationData Preprocessing. Slides by: Shree Jaswal
Data Preprocessing Slides by: Shree Jaswal Topics to be covered Why Preprocessing? Data Cleaning; Data Integration; Data Reduction: Attribute subset selection, Histograms, Clustering and Sampling; Data
More informationInteractive Campaign Planning for Marketing Analysts
Interactive Campaign Planning for Marketing Analysts Fan Du University of Maryland College Park, MD, USA fan@cs.umd.edu Sana Malik Adobe Research San Jose, CA, USA sana.malik@adobe.com Eunyee Koh Adobe
More informationA CELLULAR, LANGUAGE DIRECTED COMPUTER ARCHITECTURE. (Extended Abstract) Gyula A. Mag6. University of North Carolina at Chapel Hill
447 A CELLULAR, LANGUAGE DIRECTED COMPUTER ARCHITECTURE (Extended Abstract) Gyula A. Mag6 University of North Carolina at Chapel Hill Abstract If a VLSI computer architecture is to influence the field
More informationBoundary/Contour Fitted Grid Generation for Effective Visualizations in a Digital Library of Mathematical Functions
Boundary/Contour Fitted Grid Generation for Effective Visualizations in a Digital Library of Mathematical Functions Bonita Saunders Qiming Wang National Institute of Standards and Technology Bureau Drive
More informationUsing surface markings to enhance accuracy and stability of object perception in graphic displays
Using surface markings to enhance accuracy and stability of object perception in graphic displays Roger A. Browse a,b, James C. Rodger a, and Robert A. Adderley a a Department of Computing and Information
More informationVerification and Validation. Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 22 Slide 1
Verification and Validation Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 22 Slide 1 Verification vs validation Verification: "Are we building the product right?. The software should
More informationReview of Regression Test Case Selection Techniques
Review of Regression Test Case Selection Manisha Rani CSE Department, DeenBandhuChhotu Ram University of Science and Technology, Murthal, Haryana, India Ajmer Singh CSE Department, DeenBandhuChhotu Ram
More informationMSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar
MSc Software Testing and Maintenance MSc Prófun og viðhald hugbúnaðar Fyrirlestrar 31 & 32 Structural Testing White-box tests. 27/1/25 Dr Andy Brooks 1 Case Study Dæmisaga Reference Structural Testing
More informationNUMERICAL METHODS PERFORMANCE OPTIMIZATION IN ELECTROLYTES PROPERTIES MODELING
NUMERICAL METHODS PERFORMANCE OPTIMIZATION IN ELECTROLYTES PROPERTIES MODELING Dmitry Potapov National Research Nuclear University MEPHI, Russia, Moscow, Kashirskoe Highway, The European Laboratory for
More informationTDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.
Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide
More informationLearning to Learn: additional notes
MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2008 Recitation October 23 Learning to Learn: additional notes Bob Berwick
More informationA Vision System for Automatic State Determination of Grid Based Board Games
A Vision System for Automatic State Determination of Grid Based Board Games Michael Bryson Computer Science and Engineering, University of South Carolina, 29208 Abstract. Numerous programs have been written
More informationFault Class Prioritization in Boolean Expressions
Fault Class Prioritization in Boolean Expressions Ziyuan Wang 1,2 Zhenyu Chen 1 Tsong-Yueh Chen 3 Baowen Xu 1,2 1 State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093,
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling
More informationDI TRANSFORM. The regressive analyses. identify relationships
July 2, 2015 DI TRANSFORM MVstats TM Algorithm Overview Summary The DI Transform Multivariate Statistics (MVstats TM ) package includes five algorithm options that operate on most types of geologic, geophysical,
More informationFeature Selection Using Principal Feature Analysis
Feature Selection Using Principal Feature Analysis Ira Cohen Qi Tian Xiang Sean Zhou Thomas S. Huang Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign Urbana,
More informationRegression III: Advanced Methods
Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Look for accompanying R code on the course web site. Topics Exploratory Data Analysis
More informationChapter 4. Clustering Core Atoms by Location
Chapter 4. Clustering Core Atoms by Location In this chapter, a process for sampling core atoms in space is developed, so that the analytic techniques in section 3C can be applied to local collections
More information