The Care and Feeding of Wild-Caught Mutants
|
|
- Liliana Robertson
- 6 years ago
- Views:
Transcription
1 The Care and Feeding of Wild-Caught Mutants Michael Vaughn and David Bingham Brown December 19th, 2015 Abstract We propose and implement a technique for providing more thorough mutation testing for software test suites by mining publicly available source code repositories. We derive wildcaught mutants from publicly available source code repositories and investigate their effectiveness in evaluating software test suites compared to traditional mutators used in mutation testing. 1 Introduction One of the biggest threats to validity in debugging research is the size and quality of the defect corpus used for evaluation. Existing techniques for empirical evaluation of code analyzers, test methodologies, and other debugging tools and techniques are fairly simplistic. Rigorous scientific analysis requires well-documented and repeatable test conditions, leading to conditions that are either contrived, as is the case for hand-introduced bugs in the Siemens Suite, or are small batches of painstakingly isolated bugs from real projects, as René Just produced for his studies[1]. A reasonable question to ask, then, is whether a more robust and scalable means of reproducibly introducing software defects could be devised. Mutation analysis has gained traction in recent years as a means of evaluating the coverage of test suites. In particular, research has shown[1] mutation coverage is a useful predictor of a test suite effectiveness, independent of code coverage. However, as René Just showed, 17% of software faults could not be accounted for with normal mutants. The simplicity of the model makes it useful in industrial settings, where a degree of scientific validity can be sacrificed in order to obtain a reasonably useful tool. Given the observation that certain real-world defects are disjoint from the class of defects that can be generated by these mutations, however, basic mutation analysis can be too artificial to find general use as a tool for scientific testing of engineering tools. We expand on the core idea of mutation testing automated bug introduction as a means to evaluate the quality of a testing suite by developing and providing an analysis of the utility of a suite of tools for generating novel, human-generated mutants into a codebase for test suite evaluation. To accomplish this, we scoured GitHub[2], a large, publicly accessible source code repository, for small, single-line commits. Working on the assumption that the primary reason one commits a single-line commit to a repository is to fix a bug, we extract these commits and create a reverse patch from the commit, assuming that if the commit exists to fix a bug, reversing the patch would, therefore, insert the fixed bug into the codebase it is applied to. To ensure that the commits are actually applicable to other codebases, we extract these potential mutants in a identifier-agnostic 1
2 way, that is, we maintain keywords and operators from the programming language used, but treat identifiers as wildcards to be matched to the host codebase upon insertion. We provide a wild mutant extraction and insertion toolchain; (scraping tool here); mutgen, a mutant extraction tool; and mutins, a a mutant insertion tool. We provide experimental evidence using source control projects and testing suites in the C language (though the toolchain does, at present, support most languages that lack semantic value for whitespace) to demonstrate the utility of our technique. 2 Development All tools developed as part of the project can be found at mutants. 2.1 Repository Mining To obtain mutants, we decided to mine code from public GitHub repositories. To start out with, we decided to investigate C repositories, as the language s comparatively simple syntax and semantics leads us to believe that a greater proportion of C s reverse patches should be applicable compared to more (syntactically) complex object-oriented languages. In order to obtain a reasonably sized set of patches, we decided to target repositories with the most commits. However, the Git search API does not expose the number of commits. As a heuristic, we decided to instead select the repositories with the most forks, as Git s API does allow users to order search results we felt this heuristic was reasonable, as we generally think projects with a large number of forks are subject to broad interest, and thus a higher rate of development. Quantitatively, this assumption appears warranted, as the top 20 projects by this metric include the Linux kernel, memcached, and Redis. To perform the scraping, we created two automated tools. The first was a small Python script which could automatically submit small batches of search queries to the GitHub server, and build a list of all projects matching the query. To simplify this, we used OctoHub[3], a small Python library which lets users programmatically build API queries. We then built a script to send small numbers of paginated queries, to avoid running afoul of GitHub s API lilmts. By doing this, we were able to build a list of the top 625 C projects, in descending order of fork count. We used the top 50 of these comprising some 850 million lines of commits for our experiments. Once this was done, we created a small program which checked out each repository on the list, and then performed the repository scraping locally. Our local scraper is a Python script which iterates backwards through the commit history, outputting each diff (in unified diff format), along with revision number and commit message to a text file. The construction of this scraper was simplified by using GitPython[4], which provides a robust object-oriented implementation of the relevant git functionality, such as reading commit histories and constructing diffs. 2.2 Mutant Extraction : mutgen We developed mutgen, the second element of our toolchain, to extract potential wild mutants from the unified diff files scraped from GitHub. mutgen identifies potential mutants by isolating single-line changes from the source control commits it reads as input. Both lines of each identified commit are tokenized with help from language specification files provided at command line detailing language keywords and operators; 2
3 Usage : mutgen [ o p t i o n s ] Options : help d i s p l a y t h i s t e x t k KEYWORD FILE load language keywords from KEYWORD FILE o OPERATOR FILE load language o p e r a t o r s from OPERATOR FILE i INPUT FILES... e x t r a c t mutants from INPUT FILES x EXTRACT FILE s t o r e e x t r a c t e d mutants in EXTRACT FILE Figure 1: mutgen command line options i f ( x && y ) + i f ( x ) i f ( x ) + i f ( x && y ) (a) Invalid potential mutant + : i f. ( $1.&& $ 1. ) : i f. ( $1. ) (b) Valid potential mutant (c) Mutant extracted from (b) Figure 2: Example potential mutants the tokenizer ignores whitespace and uses rules simple yet (largely) universal among programming languages for processing keywords, identifiers, literals, and whitespace operators are consumed greedily, while keywords and identifiers must be separated by operators or whitespace. Once the two lines the before and after lines are tokenized, mutgen then analyzes both to ensure that, once matched, it is possible to generate the before line from the after line; mutins is not yet robust enough to synthesize identifiers or literals, so mutgen requires that potential mutants not require the synthesis of new information; that is, the before state must be able to be generated solely from identifiers and literals matched in the after state. Figure 2 shows example single-line commits as processed by mutgen; figure 2a would be discarded because the before state cannot be generated from the after state (as the identifier y is unique to the before state, and thus cannot be generated solely from the after state). Figure 2c shows the tokenized mutant fully extracted; in the mutant extract language of mutgen and mutins, : indicates a keyword,. an operator, and $ an identifier keywords and operators contain their literal value, while identifiers are given an index to be used in converting the after state to the before, with -1 indicating that the identifier like the y in figure 2b is unused. Our initial expectation that valid mutants would be rare proved to be untrue, and our initial run of mutgen over our scraped corpus yielded almost three million mutants more than it was reasonably possible to evaluate. After manually combing through a subset of the mutants produced by the initial run, we added heuristics to mutgen to cull both commits likely to be comments (e.g., those containing several identifiers in a row, indicating that the committed text is more likely to be natural language than a programming language or containing several repeated operators, often seen as horizontal lines drawn in comments) and those complex enough to be unlikely to be found 3
4 } + } e l s e while ( i < n ) ; + while ( i < n ) + TMPFILE TMPFILE % 512 (a) (b) Figure 3: Example extracted mutants (c) in other codebases (that is, those that have after states containing more than four identifiers to be matched). The result of the application of this culling heuristic was to reduce the generated set from roughly three million potential mutants, most of them unusable, to roughly thirty thousand, of which a larger proportion were likely to be matched. Figure 3 shows some example mutants identified by mutgen 1. Notably, while mutgen (and the rest of the toolchain) has only been tested for the C programming language, the entire system is designed in such a way to be used on virtually any programming language that does not assign semantic value to whitespace 2 via operator and keyword specifications (with the companion tool mutins reading these language specifications in the extracted mutant file generated by mutgen). mutgen applied to the entire scraped corpus (approximately 850 million lines of commits) extracted 29,704 possibly viable mutants in less than ten minutes on desktop-grade hardware. 2.3 Mutant Insertion : mutins mutins, the final element of the toolchain, reads mutants and language definitions from the mutant extract file generated by mutgen and inserts them into a codebase specified as a list of files. mutins offers several command line options to facilitate automatic and repeated use of the tool, as due to the nature of the mutants generated, many will cause the resulting code to not compile correctly. In addition to strictly random use (by default mutins chooses a mutant at random from the mutant extract file and inserts it into a randomly selected insertion point in the target codebase), mutins can be forced to use a specific random number seed (we use the C++ STL s implementation of the Mersenne Twister[5] for pseudo-random number generation). Mutant insertion works much like mutant extraction in reverse; mutins tokenizes the input files with keyword and operator lists provided in the mutant extract file, and then finds possible insertion points by identifying token sequences matching the after state of the chosen mutant. Once an insertion point is selected, the range of text represented by the after state tokens is replaced by text synthesized from the before state tokens, matching identifier to identifier in the synthesized code. 3 Evaluation For our experimental validation, we chose to replicate a subset of an experiment J.H. Andrews and L.C. Briand and Y. Labiche describe in [6]. Their experiment takes a program from the 1 We posit no explanation why one would need to perform modulo arithmetic on a variable named TMPFILE, but do note that stripping a modulo operation does provide an interesting mutation for use in mutation testing. 2 A conversation with Ben Liblit yielded a simple and elegant method to implement support for semantic whitespace à la Python, but this has not yet been implemented. 4
5 Usage : mutins [ o p t i o n s ] Options : help d i s p l a y t h i s t e x t v verbose output c only count p o t e n t i a l matches, do not i n s e r t i MATCH INDEX i n s e r t the match at the s p e c i f i c index r RANDOM SEED use RANDOM SEED to i n i t i a l i z e random number g e n e r a t o r ( t h i s i s ignored i f the i option i s used ) m MUTANT INDEX use only the mutant found at the ( zero based ) index MUTANT INDEX x EXTRACT FILE load mutants ( and language data ) from EXTRACT FILE t TARGET FILES... attempt to i n s e r t mutant i n t o TARGET FILE b s k i p backup ( by d e f a u l t, modified f i l e s are copied to f i l e. o r i g b e f o r e mutant i n s e r t i o n ) Figure 4: mutins command line options SIR repository[7], and randomly generates a number of test suites by randomly choosing from the artifact s tests. They then measure the mutation adequacy of each suite by running them over the set of all possible program mutations. By collecting these measurements, he constructs a model of the statistical distribution of the mutant detection rate over arbitrary test rates, which he compares to a similarly constructed approximation of the distribution hand-seeded faults. 3.1 Target Program While Andrews works with a wide variety of programs from SIR, including the Siemens suite, we decided to work with Space [7]. Space is an appealing subject for this form of experimentation, as it is a mature piece of software that has been subject to years of production use. Because of this, Space is also the only program Andrews tested which used real faults instead of hand-introduced ones. Thus, we can already get a sense of how wild mutants fare against the test suites detection rates for real faults. Moreover, at 6,199 lines of code, it is larger than the other classical Siemens Suite programs. This size is large enough that mutins can generate 117,744 possible mutants, which is a non-trivial set that is still small enough to explore exhaustively. 3.2 Procedure Prior to testing, we obtained 5, case test suites by randomly shuffling the list of available suites repeatedly and taking the first 100 elements, thus ensuring that no single suite contained duplicate experiments. Next, we ran mutins on Space, in order to identify each possible point at which a wild mutant can be inserted. We recorded each possible insertion into a list of entries that could be fed to our suite execution framework at a later time. We then divide the space of 117,744 mutants into batches of six jobs each, as prior experimentation indicated that such a test 5
6 could complete in one to two hours. We packaged each list as part of an HTCondor job, which executed applied each mutation in sequence, and ran all six against each of the 5,000 test suites. Test successes and failures were recorded, and sent back from each job. Once the test results were returned, we then performed some simple analyses on the results. For each test suite S, we calculated the mutation adequacy score, Am(S), where Am(S) is the ratio of mutants detected by the suite to the total number of mutants [6]. We also recorded the number of mutants that successfully compiled, in order to get a general sense of the feasibility of mutant insertion. 3.3 Experimentation Framework and Tooling As the experimental procedure calls for exhaustively building and testing each possible mutant for a subject program, a significant amount of computation time is required to obtain a reasonably informative data set. We decided to use UW s Center for High Throughput Computing, which provides a robust environment for large-scale grid computing via the HTCondor framework [8]. Since each job can be run independently of others, the parallelization is simply a matter of appropriately packaging the mutator, along with the target program and associated suites. This posed a significant difficulty, as the computing pool s Linux environments are heterogeneous, and host machines are not guaranteed to have any specific version or build of many none-core programs, if any version is indeed present. In particular, many nodes do not have either GCC nor the headers needed to build code, which required us to create relocatable binaries of both GCC and glibc, which we could pass in as part of the job. Obtaining such a version of GCC is non-trivial, and requires a significant amount of configuration and testing to ensure that the correct versions of libraries are built, and that no subtle discrepancies are present in the toolchain. Moreover, HTCondor jobs can be located in an arbitrary directory of the host system. This poses a problem, as a naive build of GCC and Glibc may experience various linking and loading errors in this situation We eventually discovered crosstool-ng [9], which is a configurable too intended to create cross-compiler toolchains. After some investigation and experimentation, we were able to correctly build a version of GCC with the desired properties. As the experiment used the SIR repository [7], we built a Python framework for building and executing arbitrary experiments on the. By taking advantage SIR s standardized framework, we constructed a system which can be used to move various objects to staging areas, build test suites, and invoke external tools, such as mutins, to manipulate the source code. We also created a similar set of Bash scripts, with a similar functionality, in case Python was either unavailable, or more robust shell-style functionality was required. By packaging this with the relocatable GCC, we were effectively able to construct a framework for reproducible software engineering experiments. By packaging the desired artifact from the SIR repository, the Python framework, and the compiler, along with a top-level script that invokes the necessary behaviors, the experimenter can present the experiment in such a way that any interested party can simply obtain the package, and begin tweaking and experimenting. 4 Results and Discussion Given our data we were able to record a few basic statistical metrics. 6
7 Total mutants 108,134 Successfully Compiled Mutants 20,638 compilation rate Average Am(S) Interestingly, nearly 20 percent of the inserted mutants compiled, which was much greater than the rate of less than 5 percent we originally expected. This is still comparatively low; Andrews reported a compilation success rate of 92 percent[6]. However, at scale, mutation still appears feasible, as our set of compiled mutants is roughly twice as large as Andrews s 11,379 mutant set. Curiously, the wild mutants were far more difficult to detect than both the real world mutants or the mutants generated by Andrews. The average Am(S) real and generated mutants were recorded as.75 and.75 respectively, nearly 1.5 times easier to catch. Every wild mutant we tested was recorded as being caught by at least one test suite, so each one introduced some fault. This seems to indicate that wild mutants tend to induce more subtle variations than those produced by other operators. Andrews asserts that an Am(s) lower than the rate for real faults is an argument against the realism of hand-seeded faults[6]. However, given the apparent subtle behavior of the reverse patches, we feel that more careful analysis of both our results and Andrews s results is warranted. 5 Threats to Validity Currently, the most significant threat to validity is the relative age of our data collection and analysis software. In particular, as stated before, HTCondor is a challenging system to develop for - our testing and compilation frameworks required a significant number of false starts and reworkings before we had a successful execution. Thus, our test executors may still have bugs that altered our experiment in some way, or some unforeseen aspect of the execution environment may be altering the behavior of the program in a hard to detect way. 6 Related Work René Just s evaluation of mutation testing s external validity, and subsequent analysis of the limitations[1] of mutation testing served as the main impetus for our work. In particular, careful consideration of his discussion of the classes of faults which cannot be expressed in terms of basic mutation operators was our primary inspiration in searching for a more realistic set of mutation operators. Jia and Harmon [10] wrote a robust survey of the history of mutation testing, including a section delineating the various techniques for testing mutation frameworks, along with an overview of the most significant works of mutation evaluation. In addition, they also discuss various subject programs used in testing and detailed list of programs used for evaluation, sorted by number of papers using each. Their work quickly pointed us towards the SIR repository as a viable set of tools for mutation testing. Moreover, after Dr. Liblit told us about James Andrews s mutation framework, we found Andrews s evaluation experiment in the bibliography of Jia and Harman s survey. 7
8 7 Future Work The clearest short-term objective is to continue comparing our mutation tool against the results of Andrews s experiment. With our tools, it should be fairly straightforward to perform the experimental procedure on the the other 7 SIR programs he analyzed. Additionally, he performed more sophisticated statistical analyses on his results, such as the statistical significance of variation between test suites as well as between mutation styles. Given our existing framework for exhaustively searching the mutation space of a given program, we feel we can efficiently replicate the rest of his experiment in a matter of weeks. Another reasonable axis of evaluation is the derivability relationship between the basic operators provided by common mutation frameworks and our wild caught mutants. Specifically, it is reasonable to inquire what proportion of wild mutants can be derived from some bounded number of applications of a mutation operator. For mutant insertions where both the before and after code can be described as functions that map from one state of the variables the before code touches touches to a state of the variables touched by the after code, there may be a way to apply syntax guided synthesis [11] in attempting to derive the mutation. 8 Conclusion Mutation testing is predicated on software engineering researchers and practioners testing needs. In particular, test suites and bug finders, like all other software, need to be extensively tested and verified, which requires a large corpus of test cases. Mutation testing provides one way of quickly and reproducibly introducing large numbers of faults into a known piece of software. However, as Just demonstrates in [1], simple mutation operators cannot span the full space of software faults. Wild mutants derived from reversed patch data bridge this gap. By reflecting real-world code changes, these mutants will affect code in a manner that was deemed significant in some context. Moreover, given the vast quantities of publicly available patch data, large sets of candidate operations can be collected and evaluated in a matter of days. Given the one in five probability that a such a mutant will successfully compile, and their distinctive behavior with respect to test suites, we believe wild mutants are objects deserving further study. 9 Acknowledgements This research was performed using the compute resources and assistance of the UW-Madison Center For High Throughput Computing (CHTC) in the Department of Computer Sciences. The CHTC is supported by UW-Madison, the Advanced Computing Initiative, the Wisconsin Alumni Research Foundation, the Wisconsin Institutes for Discovery, and the National Science Foundation, and is an active member of the Open Science Grid, which is supported by the National Science Foundation and the U.S. Department of Energy s Office of Science. 8
9 References [1] R. Just, D. Jalali, L. Inozemtseva, M. D. Ernst, R. Holmes, and G. Fraiser, Are mutants a valid substitute for real faults in software testing? FSE, [Online]. Available: http: //homes.cs.washington.edu/~rjust/publ/mutants_real_faults_fse_2014.pdf. [2] GitHub, Inc. (2015). Github - where software is built, [Online]. Available: https : / / ww. github.com/. [3] A. Swartz. (2013). Octohub: Low level python and cli interface to github, [Online]. Available: [4] M. Trier. (2008). Gitpython, [Online]. Available: GitPython/. [5] M. Matsumoto and T. Nishimura, Mersenne twister: A 623-dimensionally equidistributed uniform pseudorandom number generator, ACM Trans. on Modeling and Computer Simulation, [Online]. Available: http : / / www. math. sci. hiroshima - u. ac. jp / ~m - mat/mt/articles/mt.pdf. [6] J. H. Andrews, L. C. Briand, and Y. Labiche, Is mutation an appropriate tool for testing experiments? In Proceedings of the 27th International Conference on Software Engineering, ser. ICSE 05, St. Louis, MO, USA: ACM, 2005, pp , isbn: doi: / [Online]. Available: [7] H. Do, S. G. Elbaum, and G. Rothermel, Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact., Empirical Software Engineering: An International Journal, vol. 10, no. 4, pp , [8] M. Litzkow, M. Livny, and M. Mutka, Condor - a hunter of idle workstations, in Proceedings of the 8th International Conference of Distributed Computing Systems, [9] Y. E. Morin. (2013). Crosstool-ng, [Online]. Available: [10] Y. Jia and M. Harman, An analysis and survey of the development of mutation testing, IEEE Trans. Softw. Eng., vol. 37, no. 5, pp , Sep. 2011, issn: doi: /TSE [Online]. Available: [11] R. Alur et al., Syntax-guided synthesis. 9
ExMAn: A Generic and Customizable Framework for Experimental Mutation Analysis 1
ExMAn: A Generic and Customizable Framework for Experimental Mutation Analysis 1 Jeremy S. Bradbury, James R. Cordy, Juergen Dingel School of Computing, Queen s University Kingston, Ontario, Canada {bradbury,
More informationBallista Design and Methodology
Ballista Design and Methodology October 1997 Philip Koopman Institute for Complex Engineered Systems Carnegie Mellon University Hamershlag Hall D-202 Pittsburgh, PA 15213 koopman@cmu.edu (412) 268-5225
More informationEmpirical Study on Impact of Developer Collaboration on Source Code
Empirical Study on Impact of Developer Collaboration on Source Code Akshay Chopra University of Waterloo Waterloo, Ontario a22chopr@uwaterloo.ca Parul Verma University of Waterloo Waterloo, Ontario p7verma@uwaterloo.ca
More informationCA Test Data Manager Key Scenarios
WHITE PAPER APRIL 2016 CA Test Data Manager Key Scenarios Generate and secure all the data needed for rigorous testing, and provision it to highly distributed teams on demand. Muhammad Arif Application
More informationMulti-Way Number Partitioning
Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09) Multi-Way Number Partitioning Richard E. Korf Computer Science Department University of California,
More informationExposing unforeseen consequences of software change
Exposing unforeseen consequences of software change David Notkin University of Washington February 2010 Joint work with Reid Holmes Thank you! My first trip to India and I am sure not my last! Wonderful
More informationCAP6135: Programming Project 2 (Spring 2010)
CAP6135: Programming Project 2 (Spring 2010) This project is modified from the programming project 2 in Dr. Dawn Song s course CS161: computer security in Fall 2008: http://inst.eecs.berkeley.edu/~cs161/fa08/
More informationOn the Use of Mutation Faults in Empirical Assessments of Test Case Prioritization Techniques
On the Use of Mutation Faults in Empirical Assessments of Test Case Prioritization Techniques Hyunsook Do, Gregg Rothermel Department of Computer Science and Engineering University of Nebraska - Lincoln
More informationHDF Virtualization Review
Scott Wegner Beginning in July 2008, The HDF Group embarked on a new project to transition Windows support to a virtualized environment using VMWare Workstation. We utilized virtual machines in order to
More informationLecture Notes on Liveness Analysis
Lecture Notes on Liveness Analysis 15-411: Compiler Design Frank Pfenning André Platzer Lecture 4 1 Introduction We will see different kinds of program analyses in the course, most of them for the purpose
More informationSoftware Quality Assurance. David Janzen
Software Quality Assurance David Janzen What is quality? Crosby: Conformance to requirements Issues: who establishes requirements? implicit requirements Juran: Fitness for intended use Issues: Who defines
More informationA Comparative Study on Different Version Control System
e-issn 2455 1392 Volume 2 Issue 6, June 2016 pp. 449 455 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com A Comparative Study on Different Version Control System Monika Nehete 1, Sagar Bhomkar
More informationSourcererCC -- Scaling Code Clone Detection to Big-Code
SourcererCC -- Scaling Code Clone Detection to Big-Code What did this paper do? SourcererCC a token-based clone detector, that can detect both exact and near-miss clones from large inter project repositories
More informationCypress Adopts Questa Formal Apps to Create Pristine IP
Cypress Adopts Questa Formal Apps to Create Pristine IP DAVID CRUTCHFIELD, SENIOR PRINCIPLE CAD ENGINEER, CYPRESS SEMICONDUCTOR Because it is time consuming and difficult to exhaustively verify our IP
More informationExMAn: A Generic and Customizable Framework for Experimental Mutation Analysis
ExMAn: A Generic and Customizable Framework for Experimental Mutation Analysis Technical Report 2006-519 Jeremy S. Bradbury, James R. Cordy, Juergen Dingel School of Computing, Queen s University Kingston,
More informationA Case Study on the Similarity Between Source Code and Bug Reports Vocabularies
A Case Study on the Similarity Between Source Code and Bug Reports Vocabularies Diego Cavalcanti 1, Dalton Guerrero 1, Jorge Figueiredo 1 1 Software Practices Laboratory (SPLab) Federal University of Campina
More informationA Systematic Study of Automated Program Repair: Fixing 55 out of 105 Bugs for $8 Each
A Systematic Study of Automated Program Repair: Fixing 55 out of 105 Bugs for $8 Each Claire Le Goues (Virginia), Michael Dewey-Vogt (Virginia), Stephanie Forrest (New Mexico), Westley Weimer (Virginia)
More informationAn Empirical Evaluation of Test Adequacy Criteria for Event-Driven Programs
An Empirical Evaluation of Test Adequacy Criteria for Event-Driven Programs Jaymie Strecker Department of Computer Science University of Maryland College Park, MD 20742 November 30, 2006 Abstract In model-based
More informationAutomating Test Driven Development with Grammatical Evolution
http://excel.fit.vutbr.cz Automating Test Driven Development with Grammatical Evolution Jan Svoboda* Abstract Test driven development is a widely used process of creating software products with automated
More informationA Virtual Laboratory for Study of Algorithms
A Virtual Laboratory for Study of Algorithms Thomas E. O'Neil and Scott Kerlin Computer Science Department University of North Dakota Grand Forks, ND 58202-9015 oneil@cs.und.edu Abstract Empirical studies
More informationCPSC 427a: Object-Oriented Programming
CPSC 427a: Object-Oriented Programming Michael J. Fischer Lecture 1 September 2, 2010 CPSC 427a 1/54 Overview Course information Goals Learning C++ Programming standards Comparison of C and C++ Example
More informationMaking Workstations a Friendly Environment for Batch Jobs. Miron Livny Mike Litzkow
Making Workstations a Friendly Environment for Batch Jobs Miron Livny Mike Litzkow Computer Sciences Department University of Wisconsin - Madison {miron,mike}@cs.wisc.edu 1. Introduction As time-sharing
More informationComparing Centralized and Decentralized Distributed Execution Systems
Comparing Centralized and Decentralized Distributed Execution Systems Mustafa Paksoy mpaksoy@swarthmore.edu Javier Prado jprado@swarthmore.edu May 2, 2006 Abstract We implement two distributed execution
More informationInformation Discovery, Extraction and Integration for the Hidden Web
Information Discovery, Extraction and Integration for the Hidden Web Jiying Wang Department of Computer Science University of Science and Technology Clear Water Bay, Kowloon Hong Kong cswangjy@cs.ust.hk
More informationA Controlled Experiment Assessing Test Case Prioritization Techniques via Mutation Faults
A Controlled Experiment Assessing Test Case Prioritization Techniques via Mutation Faults Hyunsook Do and Gregg Rothermel Department of Computer Science and Engineering University of Nebraska - Lincoln
More informationCollaborative Framework for Testing Web Application Vulnerabilities Using STOWS
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More informationStriped Data Server for Scalable Parallel Data Analysis
Journal of Physics: Conference Series PAPER OPEN ACCESS Striped Data Server for Scalable Parallel Data Analysis To cite this article: Jin Chang et al 2018 J. Phys.: Conf. Ser. 1085 042035 View the article
More informationRubicon: Scalable Bounded Verification of Web Applications
Joseph P. Near Research Statement My research focuses on developing domain-specific static analyses to improve software security and reliability. In contrast to existing approaches, my techniques leverage
More informationAn Anomaly in Unsynchronized Pointer Jumping in Distributed Memory Parallel Machine Model
An Anomaly in Unsynchronized Pointer Jumping in Distributed Memory Parallel Machine Model Sun B. Chung Department of Quantitative Methods and Computer Science University of St. Thomas sbchung@stthomas.edu
More informationExecuting Evaluations over Semantic Technologies using the SEALS Platform
Executing Evaluations over Semantic Technologies using the SEALS Platform Miguel Esteban-Gutiérrez, Raúl García-Castro, Asunción Gómez-Pérez Ontology Engineering Group, Departamento de Inteligencia Artificial.
More informationAutomated Documentation Inference to Explain Failed Tests
Automated Documentation Inference to Explain Failed Tests Sai Zhang University of Washington Joint work with: Cheng Zhang, Michael D. Ernst A failed test reveals a potential bug Before bug-fixing, programmers
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK REVIEW PAPER ON IMPLEMENTATION OF DOCUMENT ANNOTATION USING CONTENT AND QUERYING
More informationAdditional Guidelines and Suggestions for Project Milestone 1 CS161 Computer Security, Spring 2008
Additional Guidelines and Suggestions for Project Milestone 1 CS161 Computer Security, Spring 2008 Some students may be a little vague on what to cover in the Milestone 1 submission for the course project,
More informationFrom Whence It Came: Detecting Source Code Clones by Analyzing Assembler
From Whence It Came: Detecting Source Code Clones by Analyzing Assembler Ian J. Davis and Michael W. Godfrey David R. Cheriton School of Computer Science University of Waterloo Waterloo, Ontario, Canada
More informationChapter 2 Basic Structure of High-Dimensional Spaces
Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,
More informationNetwork Programmability with Cisco Application Centric Infrastructure
White Paper Network Programmability with Cisco Application Centric Infrastructure What You Will Learn This document examines the programmability support on Cisco Application Centric Infrastructure (ACI).
More informationEmpirical Studies of Test Case Prioritization in a JUnit Testing Environment
University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln CSE Conference and Workshop Papers Computer Science and Engineering, Department of 2004 Empirical Studies of Test Case Prioritization
More informationTreeSearch User Guide
TreeSearch User Guide Version 0.9 Derrick Stolee University of Nebraska-Lincoln s-dstolee1@math.unl.edu March 30, 2011 Abstract The TreeSearch library abstracts the structure of a search tree in order
More informationInternational Journal for Management Science And Technology (IJMST)
Volume 4; Issue 03 Manuscript- 1 ISSN: 2320-8848 (Online) ISSN: 2321-0362 (Print) International Journal for Management Science And Technology (IJMST) GENERATION OF SOURCE CODE SUMMARY BY AUTOMATIC IDENTIFICATION
More informationSample Exam. Advanced Test Automation - Engineer
Sample Exam Advanced Test Automation - Engineer Questions ASTQB Created - 2018 American Software Testing Qualifications Board Copyright Notice This document may be copied in its entirety, or extracts made,
More informationChapter 9. Software Testing
Chapter 9. Software Testing Table of Contents Objectives... 1 Introduction to software testing... 1 The testers... 2 The developers... 2 An independent testing team... 2 The customer... 2 Principles of
More informationPart I: Preliminaries 24
Contents Preface......................................... 15 Acknowledgements................................... 22 Part I: Preliminaries 24 1. Basics of Software Testing 25 1.1. Humans, errors, and testing.............................
More informationAn Exploratory Study on Interface Similarities in Code Clones
1 st WETSoDA, December 4, 2017 - Nanjing, China An Exploratory Study on Interface Similarities in Code Clones Md Rakib Hossain Misu, Abdus Satter, Kazi Sakib Institute of Information Technology University
More informationQuantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study
Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study Jadson Santos Department of Informatics and Applied Mathematics Federal University of Rio Grande do Norte, UFRN Natal,
More informationMTAT : Software Testing
MTAT.03.159: Software Testing Lecture 03: White-Box Testing (Textbook Ch. 5) Spring 2013 Dietmar Pfahl email: dietmar.pfahl@ut.ee Lecture Chapter 5 White-box testing techniques (Lab 3) Structure of Lecture
More informationSFWR ENG 3S03: Software Testing
(Slide 1 of 52) Dr. Ridha Khedri Department of Computing and Software, McMaster University Canada L8S 4L7, Hamilton, Ontario Acknowledgments: Material based on [?] Techniques (Slide 2 of 52) 1 2 3 4 Empirical
More informationPliny and Fixr Meeting. September 15, 2014
Pliny and Fixr Meeting September 15, 2014 Fixr: Mining and Understanding Bug Fixes for App-Framework Protocol Defects (TA2) University of Colorado Boulder September 15, 2014 Fixr: Mining and Understanding
More informationCS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul
1 CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul Introduction Our problem is crawling a static social graph (snapshot). Given
More informationProgram Partitioning - A Framework for Combining Static and Dynamic Analysis
Program Partitioning - A Framework for Combining Static and Dynamic Analysis Pankaj Jalote, Vipindeep V, Taranbir Singh, Prateek Jain Department of Computer Science and Engineering Indian Institute of
More informationFig 1. Overview of IE-based text mining framework
DiscoTEX: A framework of Combining IE and KDD for Text Mining Ritesh Kumar Research Scholar, Singhania University, Pacheri Beri, Rajsthan riteshchandel@gmail.com Abstract: Text mining based on the integration
More informationHierarchical Addressing and Routing Mechanisms for Distributed Applications over Heterogeneous Networks
Hierarchical Addressing and Routing Mechanisms for Distributed Applications over Heterogeneous Networks Damien Magoni Université Louis Pasteur LSIIT magoni@dpt-info.u-strasbg.fr Abstract. Although distributed
More informationManagement Tools. Management Tools. About the Management GUI. About the CLI. This chapter contains the following sections:
This chapter contains the following sections:, page 1 About the Management GUI, page 1 About the CLI, page 1 User Login Menu Options, page 2 Customizing the GUI and CLI Banners, page 3 REST API, page 3
More informationSimilarities in Source Codes
Similarities in Source Codes Marek ROŠTÁR* Slovak University of Technology in Bratislava Faculty of Informatics and Information Technologies Ilkovičova 2, 842 16 Bratislava, Slovakia rostarmarek@gmail.com
More informationImproving Origin Analysis with Weighting Functions
Improving Origin Analysis with Weighting Functions Lin Yang, Anwar Haque and Xin Zhan Supervisor: Michael Godfrey University of Waterloo Introduction Software systems must undergo modifications to improve
More informationSome Applications of Graph Bandwidth to Constraint Satisfaction Problems
Some Applications of Graph Bandwidth to Constraint Satisfaction Problems Ramin Zabih Computer Science Department Stanford University Stanford, California 94305 Abstract Bandwidth is a fundamental concept
More informationLaboratorio di Programmazione. Prof. Marco Bertini
Laboratorio di Programmazione Prof. Marco Bertini marco.bertini@unifi.it http://www.micc.unifi.it/bertini/ Code versioning: techniques and tools Software versions All software has multiple versions: Each
More informationExploring Performance Tradeoffs in a Sudoku SAT Solver CS242 Project Report
Exploring Performance Tradeoffs in a Sudoku SAT Solver CS242 Project Report Hana Lee (leehana@stanford.edu) December 15, 2017 1 Summary I implemented a SAT solver capable of solving Sudoku puzzles using
More informationGit with It and Version Control!
Paper CT10 Git with It and Version Control! Carrie Dundas-Lucca, Zencos Consulting, LLC., Cary, NC, United States Ivan Gomez, Zencos Consulting, LLC., Cary, NC, United States ABSTRACT It is a long-standing
More informationParallel Algorithms for the Third Extension of the Sieve of Eratosthenes. Todd A. Whittaker Ohio State University
Parallel Algorithms for the Third Extension of the Sieve of Eratosthenes Todd A. Whittaker Ohio State University whittake@cis.ohio-state.edu Kathy J. Liszka The University of Akron liszka@computer.org
More informationAutomatically Locating software Errors using Interesting Value Mapping Pair (IVMP)
71 Automatically Locating software Errors using Interesting Value Mapping Pair (IVMP) Ajai Kumar 1, Anil Kumar 2, Deepti Tak 3, Sonam Pal 4, 1,2 Sr. Lecturer, Krishna Institute of Management & Technology,
More informationWith data-based models and design of experiments towards successful products - Concept of the product design workbench
European Symposium on Computer Arded Aided Process Engineering 15 L. Puigjaner and A. Espuña (Editors) 2005 Elsevier Science B.V. All rights reserved. With data-based models and design of experiments towards
More informationAnalysis Tool Project
Tool Overview The tool we chose to analyze was the Java static analysis tool FindBugs (http://findbugs.sourceforge.net/). FindBugs is A framework for writing static analyses Developed at the University
More informationAdding a Source Code Searching Capability to Yioop ADDING A SOURCE CODE SEARCHING CAPABILITY TO YIOOP CS297 REPORT
ADDING A SOURCE CODE SEARCHING CAPABILITY TO YIOOP CS297 REPORT Submitted to Dr. Chris Pollett By Snigdha Rao Parvatneni 1 1. INTRODUCTION The aim of the CS297 project is to explore and learn important
More informationSOLUTION BRIEF CA TEST DATA MANAGER FOR HPE ALM. CA Test Data Manager for HPE ALM
SOLUTION BRIEF CA TEST DATA MANAGER FOR HPE ALM CA Test Data Manager for HPE ALM Generate all the data needed to deliver fully tested software, and export it directly into Hewlett Packard Enterprise Application
More informationDynamic Test Generation to Find Bugs in Web Application
Dynamic Test Generation to Find Bugs in Web Application C.SathyaPriya 1 and S.Thiruvenkatasamy 2 1 Department of IT, Shree Venkateshwara Hi-Tech Engineering College, Gobi, Tamilnadu, India. 2 Department
More informationTowards a Taxonomy of Approaches for Mining of Source Code Repositories
Towards a Taxonomy of Approaches for Mining of Source Code Repositories Huzefa Kagdi, Michael L. Collard, Jonathan I. Maletic Department of Computer Science Kent State University Kent Ohio 44242 {hkagdi,
More informationSearching the Deep Web
Searching the Deep Web 1 What is Deep Web? Information accessed only through HTML form pages database queries results embedded in HTML pages Also can included other information on Web can t directly index
More informationHarvard School of Engineering and Applied Sciences CS 152: Programming Languages
Harvard School of Engineering and Applied Sciences CS 152: Programming Languages Lecture 18 Thursday, April 3, 2014 1 Error-propagating semantics For the last few weeks, we have been studying type systems.
More informationRegression Test Case Prioritization using Genetic Algorithm
9International Journal of Current Trends in Engineering & Research (IJCTER) e-issn 2455 1392 Volume 2 Issue 8, August 2016 pp. 9 16 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com Regression
More informationCSCI6900 Assignment 3: Clustering on Spark
DEPARTMENT OF COMPUTER SCIENCE, UNIVERSITY OF GEORGIA CSCI6900 Assignment 3: Clustering on Spark DUE: Friday, Oct 2 by 11:59:59pm Out Friday, September 18, 2015 1 OVERVIEW Clustering is a data mining technique
More informationOptimized Implementation of Logic Functions
June 25, 22 9:7 vra235_ch4 Sheet number Page number 49 black chapter 4 Optimized Implementation of Logic Functions 4. Nc3xe4, Nb8 d7 49 June 25, 22 9:7 vra235_ch4 Sheet number 2 Page number 5 black 5 CHAPTER
More informationFeature Selection Technique to Improve Performance Prediction in a Wafer Fabrication Process
Feature Selection Technique to Improve Performance Prediction in a Wafer Fabrication Process KITTISAK KERDPRASOP and NITTAYA KERDPRASOP Data Engineering Research Unit, School of Computer Engineering, Suranaree
More informationUsing Mutation to Automatically Suggest Fixes for Faulty Programs
2010 Third International Conference on Software Testing, Verification and Validation Using Mutation to Automatically Suggest Fixes for Faulty Programs Vidroha Debroy and W. Eric Wong Department of Computer
More informationJoe Wingbermuehle, (A paper written under the guidance of Prof. Raj Jain)
1 of 11 5/4/2011 4:49 PM Joe Wingbermuehle, wingbej@wustl.edu (A paper written under the guidance of Prof. Raj Jain) Download The Auto-Pipe system allows one to evaluate various resource mappings and topologies
More informationRDGL Reference Manual
RDGL Reference Manual COMS W4115 Programming Languages and Translators Professor Stephen A. Edwards Summer 2007(CVN) Navid Azimi (na2258) nazimi@microsoft.com Contents Introduction... 3 Purpose... 3 Goals...
More informationCS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp
CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as
More informationChapter 1 Summary. Chapter 2 Summary. end of a string, in which case the string can span multiple lines.
Chapter 1 Summary Comments are indicated by a hash sign # (also known as the pound or number sign). Text to the right of the hash sign is ignored. (But, hash loses its special meaning if it is part of
More informationProfile-Guided Program Simplification for Effective Testing and Analysis
Profile-Guided Program Simplification for Effective Testing and Analysis Lingxiao Jiang Zhendong Su Program Execution Profiles A profile is a set of information about an execution, either succeeded or
More informationA Module Mapper. 1 Background. Nathan Sidwell. Document Number: p1184r1 Date: SC22/WG21 SG15. /
A Module Mapper Nathan Sidwell Document Number: p1184r1 Date: 2018-11-12 To: SC22/WG21 SG15 Reply to: Nathan Sidwell nathan@acm.org / nathans@fb.com The modules-ts specifies no particular mapping between
More informationMubug: a mobile service for rapid bug tracking
. MOO PAPER. SCIENCE CHINA Information Sciences January 2016, Vol. 59 013101:1 013101:5 doi: 10.1007/s11432-015-5506-4 Mubug: a mobile service for rapid bug tracking Yang FENG, Qin LIU *,MengyuDOU,JiaLIU&ZhenyuCHEN
More informationRunning Head: APPLIED KNOWLEDGE MANAGEMENT. MetaTech Consulting, Inc. White Paper
Applied Knowledge Management 1 Running Head: APPLIED KNOWLEDGE MANAGEMENT MetaTech Consulting, Inc. White Paper Application of Knowledge Management Constructs to the Massive Data Problem Jim Thomas July
More informationTesting unrolling optimization technique for quasi random numbers
Testing unrolling optimization technique for quasi random numbers Romain Reuillon David R.C. Hill LIMOS, UMR CNRS 6158 LIMOS, UMR CNRS 6158 Blaise Pascal University Blaise Pascal University ISIMA, Campus
More informationProgramming. We will be introducing various new elements of Python and using them to solve increasingly interesting and complex problems.
Plan for the rest of the semester: Programming We will be introducing various new elements of Python and using them to solve increasingly interesting and complex problems. We saw earlier that computers
More informationSplit-Brain Consensus
Split-Brain Consensus On A Raft Up Split Creek Without A Paddle John Burke jcburke@stanford.edu Rasmus Rygaard rygaard@stanford.edu Suzanne Stathatos sstat@stanford.edu ABSTRACT Consensus is critical for
More informationComparing Implementations of Optimal Binary Search Trees
Introduction Comparing Implementations of Optimal Binary Search Trees Corianna Jacoby and Alex King Tufts University May 2017 In this paper we sought to put together a practical comparison of the optimality
More informationMapping Bug Reports to Relevant Files and Automated Bug Assigning to the Developer Alphy Jose*, Aby Abahai T ABSTRACT I.
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 1 ISSN : 2456-3307 Mapping Bug Reports to Relevant Files and Automated
More informationIncorporating Satellite Documents into Co-citation Networks for Scientific Paper Searches
Incorporating Satellite Documents into Co-citation Networks for Scientific Paper Searches Masaki Eto Gakushuin Women s College Tokyo, Japan masaki.eto@gakushuin.ac.jp Abstract. To improve the search performance
More informationBug Inducing Analysis to Prevent Fault Prone Bug Fixes
Bug Inducing Analysis to Prevent Fault Prone Bug Fixes Haoyu Yang, Chen Wang, Qingkai Shi, Yang Feng, Zhenyu Chen State Key Laboratory for ovel Software Technology, anjing University, anjing, China Corresponding
More informationIstat s Pilot Use Case 1
Istat s Pilot Use Case 1 Pilot identification 1 IT 1 Reference Use case X 1) URL Inventory of enterprises 2) E-commerce from enterprises websites 3) Job advertisements on enterprises websites 4) Social
More informationSOURCE code repositories hold a wealth of information
466 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 6, JUNE 2005 Automatic Mining of Source Code Repositories to Improve Bug Finding Techniques Chadd C. Williams and Jeffrey K. Hollingsworth, Senior
More informationIn this project, I examined methods to classify a corpus of s by their content in order to suggest text blocks for semi-automatic replies.
December 13, 2006 IS256: Applied Natural Language Processing Final Project Email classification for semi-automated reply generation HANNES HESSE mail 2056 Emerson Street Berkeley, CA 94703 phone 1 (510)
More informationTEST FRAMEWORKS FOR ELUSIVE BUG TESTING
TEST FRAMEWORKS FOR ELUSIVE BUG TESTING W.E. Howden CSE, University of California at San Diego, La Jolla, CA, 92093, USA howden@cse.ucsd.edu Cliff Rhyne Intuit Software Corporation, 6220 Greenwich D.,
More informationAdvanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret
Advanced Algorithms Class Notes for Monday, October 23, 2012 Min Ye, Mingfu Shao, and Bernard Moret Greedy Algorithms (continued) The best known application where the greedy algorithm is optimal is surely
More informationMICRO-SPECIALIZATION IN MULTIDIMENSIONAL CONNECTED-COMPONENT LABELING CHRISTOPHER JAMES LAROSE
MICRO-SPECIALIZATION IN MULTIDIMENSIONAL CONNECTED-COMPONENT LABELING By CHRISTOPHER JAMES LAROSE A Thesis Submitted to The Honors College In Partial Fulfillment of the Bachelors degree With Honors in
More informationAutomated Adaptive Bug Isolation using Dyninst. Piramanayagam Arumuga Nainar, Prof. Ben Liblit University of Wisconsin-Madison
Automated Adaptive Bug Isolation using Dyninst Piramanayagam Arumuga Nainar, Prof. Ben Liblit University of Wisconsin-Madison Cooperative Bug Isolation (CBI) ++branch_17[p!= 0]; if (p) else Predicates
More informationFailure Detection Algorithm for Testing Dynamic Web Applications
J. Vijaya Sagar Reddy & G. Ramesh Department of CSE, JNTUA College of Engineering, Anantapur, Andhra Pradesh, India E-mail: vsreddyj5@gmail.com, ramesh680@gmail.com Abstract - Web applications are the
More informationEquation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.
Equation to LaTeX Abhinav Rastogi, Sevy Harris {arastogi,sharris5}@stanford.edu I. Introduction Copying equations from a pdf file to a LaTeX document can be time consuming because there is no easy way
More informationGenetic Model Optimization for Hausdorff Distance-Based Face Localization
c In Proc. International ECCV 2002 Workshop on Biometric Authentication, Springer, Lecture Notes in Computer Science, LNCS-2359, pp. 103 111, Copenhagen, Denmark, June 2002. Genetic Model Optimization
More informationEnterprise Management of Windows NT Services
Enterprise Management of Windows NT Services J. Nick Otto notto@parikh.net Parikh Advanced Systems Abstract A problem faced by NT administrators is the management of NT based services in the enterprise.
More informationA Propagation Engine for GCC
A Propagation Engine for GCC Diego Novillo Red Hat Canada dnovillo@redhat.com May 1, 2005 Abstract Several analyses and transformations work by propagating known values and attributes throughout the program.
More information