A Case Study on the Similarity Between Source Code and Bug Reports Vocabularies

Size: px
Start display at page:

Download "A Case Study on the Similarity Between Source Code and Bug Reports Vocabularies"

Transcription

1 A Case Study on the Similarity Between Source Code and Bug Reports Vocabularies Diego Cavalcanti 1, Dalton Guerrero 1, Jorge Figueiredo 1 1 Software Practices Laboratory (SPLab) Federal University of Campina Grande Campina Grande, PB Brazil {diegot,dalton,abrantes}@dsc.ufcg.edu.br Abstract. Bug tracking systems, such as Bugzilla, are widely used in software development projects to register and manage bug reports. Based on the information provided in bug reports by software users, developers must identify and locate the defective code. This task, however, can be quite challenging and time consuming. In our research, we investigate means to develop tools and techniques that can help developers in bridging the gap between bug reports and defective code, in effective and efficient ways. Our current research centers on the hypothesis that we can explore the similarities and dissimilarities of vocabularies of both bug report and software source code. In this paper, we present a preliminary case study on this subject, developed over the data of the Eclipse IDE project. We analyzed over than 4,000 bug reports which impact more than 3,000 different classes of the software, using information retrieval techniques. Our results indicate that almost 90% of analyzed bug reports impact up to three different classes and also that more than half of similarities between vocabularies are up to 25%. Therefore, we conclude that is not a trivial task for developers to relate source code and bug report without any systematic approach. 1. Introduction Bug tracking systems, such as Bugzilla 1, GNATS 2 and JIRA 3, are widely used in large software development projects. In these kind of systems, users and developers register bug reports containing technical information about a problematic software system and a free-form text that describes the problems encountered. Moreover, they also can suggest improvements and comment upon existing bug reports [Anvik et al. 2006]. Usually, bug reports are the unique source of information about software problems. They are used by developers to find bugs throughout the code and fix them. In order to do that, programmers have to understand the code and find its specific entities to change. For large projects with several modules, each one containing thousands of lines of code, this task has a significant cost. [Eisenbarth et al. 2003] This situation is even worse for developers who are newcomers, since they do not know the code yet and must learn about it on their own. It is simple to locate a bug when the bug report contains low-level information about it, such as patches or stack traces [Bettenburg et al. 2008]. However, only few bug This research has been partially supported by MCT/CNPq-14/2009 project, through grant number /

2 reports contain this kind of information [Schröter et al. 2010]. In such situation, developers need additional effort, since they have to rely only on free-form text. In summary, the main problem that is the focus of this paper is that developers spend a great effort to analyze software entities (e.g., classes) and to find which ones will be impacted by a maintenance requested in a bug report. Since we want to focus on bug reports of widely used systems, we will use Bugzilla-based repositories which are used by numerous large projects, such as: Eclipse, Mozilla, Gnome, RedHat and Apache. Nevertheless, we believe that any approach used in Bugzilla can be generalized to other bug reports with minor changes, since other bug tracking systems store similar information of Bugzilla. In our research, we investigate means to develop tools and techniques that can help developers in bridging the gap between bug reports and defective code, in effective and efficient ways. In this paper, we present a preliminary case study on the existent relation between source code and bug report, specially regarding about their vocabulary. Our study is developed over the data of the Eclipse bug data set, containing more than 4,000 bug reports. Our results indicate that most bug reports impact a few number of different classes and that their vocabularies have similarity lower than 50%. The analysis of bug reports and source code vocabularies helped us to improve the treatment approach of these vocabularies and to propose a technique that recommends Java classes which are more likely to be impacted by a bug report, using its vocabulary as descriptor. The remaining of this paper is organized as follows. Section 2 brings some explanation about key concepts of Bugzilla-based repositories and information retrieval. Section 3 presents the performed study and obtained results, followed by Section 4 which describes the proposed technique. Finally, Section 5 presents some related work and Section 6 concludes the paper. 2. Background 2.1. Bugzilla-based repositories Bug reports from Bugzilla have similar structure. Figure 1 presents a sample 4 of a bug report from Eclipse s Bugzilla repository. Each bug report is represented by a unique id and is composed by four sections: pre-defined fields, free-form text, attachments and dependencies [Anvik et al. 2006]. Pre-defined fields store informations about the software with error (e.g. product, component, version and platform used) and about the bug report itself (e.g., status, priority, target milestone, developer assigned to solve it, keywords and timestamp). Most data are supplied by the reporter when the report is filled and the remaining are automatically generated or supplied by project manager. The free-form text includes the title of report, a description and comments. The title commonly is an one-line summary of bug. The description should contain a detailed description of the bug, steps to reproduce it and any other kind of information that can help developers to identify and solve the bug. The additional comments represent discussions about possible approaches to solve the bug and pointers to other bug reports that can contain more information about the problem or that appear to be duplicated. 4 Bug # Source: bug.cgi?id=62741

3 Figure 1. A sample of Bugzilla bug report from Eclipse Finally, attachments are allowed in order to add non-textual information (e.g. screenshot of error) to the bug report and dependencies are the tracking of which bugs are pre-requisites for resolution of other bugs Key Concepts of Information Retrieval Manning et al. [Manning et al. 2008] define information retrieval (IR) as the activity which aims to find material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers). In terms of our study, we can say we use IR to extract, treat and analyze vocabulary of source code entities which are composed by identifiers that satisfy an information need from bug reports description. Before working with IR, we need to understand some key concepts: Tokens are the most basic units of a document. Usually, tokens are words of a text splitted by space, excluding certain characters, such as punctuation. Stop words removal is the process of excluding from vocabulary tokens that represent extremely common words which appear in almost every text (e.g. the, is, a, but). Removing them is important because they add little value in helping selecting a document which matches some query. Token normalization is the process of normalizing a token so that matches can occur despite superficial differences in the character sequences (e.g. plug-in and plugin; or Class and class). Stemming is the process of reducing a token to its root using some heuristic (in our case, Porter s algorithm [Porter 1997]). Stemming is needed because, when processing a text, one generally wants to consider different forms of a word as the same (e.g. program, programs and programming). Moreover, there are some words which are derivationally related and have similar meanings, such as am, are, is and be and we need to treat them as a unique term.

4 Term is a token that was normalized and stemmed. Document is a sequence of terms. 3. Study and Results We performed a preliminary case study on the similarity between source code and bug reports vocabularies in order to know whether bug reports are somehow related to vocabulary of their impacted classes. In our work, IR documents are source code s entities and their vocabularies are composed by names of packages, classes, fields and methods Study Setup In order to perform the study, we need to obtain fixed bug reports, each one mapped to the entities impacted for its solution that is, for each bug report, we need to know which classes were impacted for its fixing. This way, we can analyze the relation between vocabularies of a specific bug report and its impacted classes. Downloading bug reports is a trivial task, because the Bugzilla system provides a default URL pattern to export each report in XML format. However, mapping each bug report to source code entities it has impacted is not as easy, since the Bugzilla usually does not store this kind of information. We can find some work in the literature which have done this kind of mapping by mining software repositories [Śliwerski et al. 2005, Fischer et al. 2003, Zimmermann et al. 2007]. They search for references to bug reports in commit messages (e.g., Fixed and bug #53784 ). We can use these results as reference to guide us on this task. In our study, we used an Eclipse bug data set provided by Schröter et al [Schröter et al. 2006]. They provide data of three versions of Eclipse, namely versions 2.0, 2.1 and 3.0. We are using the latter, which has 3,333 impacted classes mapped to 4,136 bug reports from dec/2003 to dec/ Similarity between vocabularies of bug reports and source code in the Eclipse project Using Eclipse 3.0 bug data set, which contains more than 4K bug reports, we plotted the number of classes per bug report (presented in Figure 2). The chart shows that most bug reports impact few classes, while few bug reports impact various classes (e.g., 113 classes). In fact, 99% of bug reports impact less than 20 classes. Figure 3 presents the density of classes per bug report. We can see that the number of impacted classes by almost 60% of bug reports is only one. Moreover, almost 90% of bug reports impact one, two or three classes. Therefore, in general, a developer has to identify for each bug report less than four classes to change among thousands of classes contained in the project. We analyzed the relationship between vocabularies of bug reports and source code in the simplest way: we extracted both vocabularies; treated them with information retrieval approaches, namely tokenization, stop words removal and stemming; and calculated cosine similarity from their vectors of terms. We did not use any other treatment (e.g. filtering of terms) because we would like to check whether the most basic approach was enough to achieve good results (i.e., similarity above 75%). However, we discovered that the similarity between vocabularies without any specific treatment is very low.

5 Figure 2. Number of classes per bug report of Eclipse 3.0 bug data set Figure 4 presents the obtained similarities from analyzed vocabularies. We grouped similarities in four categories: from 0% to 25%; from 25% to 50%; from 50% to 75% and from 75% to 100%. As we can see, more than the half of similarities are from 0% to 25%. Moreover, we have more than 90% of similarities lower than 50%. Therefore, we can conclude that processing vocabularies in the simplest way, without any specific treatment for bug report and source code is not enough for predicting classes from bug reports. To use bug reports as descriptors of source code, one needs to improve the treatment approach of both vocabularies. This study helped us to understand the existent relation between both vocabularies and led us to adapt and improve information retrieval techniques aiming to increase accuracy on impacted classes prediction. 4. Our Proposed Approach After an analysis of the case study results, we propose a technique to relate bug report and source code vocabularies, aiming to achieve reasonable accuracy in location of classes impacted by a bug report. Our hypothesis is that we can explore the similarities and dissimilarities of vocabularies in order to predict such classes from reports. We rely on Figure 3. Density of classes per bug report of Eclipse 3.0 bug data set

6 Figure 4. Summary of similarity of vocabularies from Eclipse 3.0 bug data set the assumption that a bug report description (as free-form text) has information about the problem domain related to the described bug. The term software vocabulary comprises all identifiers names presented in code [Deissenboeck and Pizka 2006] (e.g., name of packages, classes, fields and methods). They are the primary source of conceptual information for comprehension of a software [Rajlich and Wilde 2002]. These identifiers are chosen by stakeholders and generally represent the problem domain [Haiduc and Marcus 2008]. Figure 5 presents an overview of our technique. It is broken into five steps, as follows: First of all (Step 1), we extract the vocabulary of source code and apply algorithms of information retrieval (e.g., tokenization, stop-words removal and stemming) in order to treat terms. In Step 2, we index the terms from classes, using the search engine library Apache Lucene 5. The index stores statistics about terms in order to make term-based search more efficient. So, indexing terms let us to efficiently search over the entities vocabulary. Step 3 aims at extracting the vocabulary of bug report and performing the same treatment as before. However, we do not index bug report s vocabulary. Instead, we use it as a query to search over the source code s entities vocabulary (Step 4). Finally, in Step 5, we score classes of source code and return a ranking of them to the user. For that purpose, we use a combination of the Vector Space Model (VSM) [Salton et al. 1975] of Information Retrieval and the Boolean model [Lashkari et al. 2009] to determine how relevant each class is to the bug report vocabulary (the query). 5. Related Work Nowadays, bug reports are used for various purposes. Research work has been carried out on mining bug repositories to help on software maintenance. Various studies used bug reports to track features over time [Fischer et al. 2003], understand how people describe problems [Ko et al. 2006], extract structural information from bug reports [Bettenburg et al. 2008], automatically assign them developers [Matter et al. 2009, Anvik et al. 2006], assign artifacts to bug reports [Čubranić and Murphy 2003] and improve their quality [Bettenburg et al. 2007]. 5

7 Figure 5. Our Technique However, to the best of our knowledge, only two studies [Canfora and Cerulo 2005, Canfora and Cerulo 2006] have proposed an approach similar to ours, which maps entities to bug reports. The both are very similar since they only differ in level of granularity (the first retrieves classes and the second, lines of code). Their obtained precision ranges from 30% to 78%. We think that range is not significant for evaluation about the effectiveness of their technique, because of its sparsity. Our study aims to better understand the behaviour of software and bug report vocabularies and to propose a technique which excels their results. Moreover, although their main objective is the same of ours, there are differences in the retrieval process, because they do not use software vocabulary. 6. Conclusion In our study, the results showed us that most bug reports (almost 90%) impact up to three classes. Moreover, without any specific treatment for source code and bug reports vocabularies, we discovered that they generally have less than 50% of similarity. Therefore, there is a relation between bug report and source code, even though with low similarity. It led us to propose a technique to adapt and improve information retrieval approaches aiming to increase the accuracy of impact prediction from a bug report. As a future work, we intend to implement the proposed technique, then experiment and improve it. Moreover, we aim at providing to developers community a tool that automatically implements our approach, predicting to developers which classes should be impacted to fix a given bug report. References Anvik, J., Hiew, L., and Murphy, G. (2006). Who should fix this bug? In International Conference on Software Engineering. Bettenburg, N., Just, S., Schröter, A., Weiß, C., Premraj, R., and Zimmermann, T. (2007). Quality of bug reports in Eclipse. In OOPSLA Workshop on Eclipse Technology exchange. Bettenburg, N., Premraj, R., Zimmermann, T., and Kim, S. (2008). Extracting structural information from bug reports. In IEEE Working Conference on Mining Software Repositories.

8 Canfora, G. and Cerulo, L. (2005). Impact analysis by mining software and change request repositories. IEEE International Symposium on Software Metrics. Canfora, G. and Cerulo, L. (2006). Fine grained indexing of software repositories to support impact analysis. In IEEE Working Conference on Mining Software Repositories. Čubranić, D. and Murphy, G. (2003). Hipikat: Recommending pertinent software development artifacts. In International Conference on Software Engineering. Deissenboeck, F. and Pizka, M. (2006). Concise and consistent naming. Software Quality Journal, 14(3): Eisenbarth, T., Koschke, R., and Simon, D. (2003). Locating features in source code. IEEE Transactions on Software Engineering. Fischer, M., Pinzger, M., and Gall, H. (2003). Analyzing and relating bug report data for feature tracking. Published by the IEEE Computer Society. Haiduc, S. and Marcus, A. (2008). On the use of domain terms in source code. In IEEE International Conference on Program Comprehension. Ko, A. J., Myers, B. A., and Chau, D. H. (2006). A linguistic analysis of how people describe software problems. IEEE Symposium on Visual Languages and Human-Centric Computing. Lashkari, A., Mahdavi, F., and Ghomi, V. (2009). A Boolean Model in Information Retrieval for Search Engines. In International Conference on Information Management and Engineering. Manning, C. D., Raghavan, P., and Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press. Matter, D., Kuhn, A., and Nierstrasz, O. (2009). vocabulary-based expertise model of developers. Mining Software Repositories. Assigning bug reports using a In IEEE Working Conference on Porter, M. F. (1997). An algorithm for suffix stripping, pages Morgan Kaufmann Publishers Inc. Rajlich, V. and Wilde, N. (2002). The role of concepts in program comprehension. In Proceedings of the 10th International Workshop on Program Comprehension, pages IEEE. Salton, G., Wong, A., and Yang, C. (1975). A vector space model for information retrieval. Journal of the American Society for information Science, 18(11). Schröter, A., Bettenburg, N., and Premraj, R. (2010). Do stack traces help developers fix bugs? In IEEE Working Conference on Mining Software Repositories. Schröter, A., Zimmermann, T., Premraj, R., and Zeller, A. (2006). If your bug database could talk. In International Symposium on Empirical Software Engineering. Śliwerski, J., Zimmermann, T., and Zeller, A. (2005). When do changes induce fixes? In IEEE Working Conference on Mining Software Repositories. Zimmermann, T., Premraj, R., and Zeller, A. (2007). Predicting defects for eclipse. In International Workshop on Predictor Models in Software Engineering. IEEE Computer Society.

Measuring the Semantic Similarity of Comments in Bug Reports

Measuring the Semantic Similarity of Comments in Bug Reports Measuring the Semantic Similarity of Comments in Bug Reports Bogdan Dit, Denys Poshyvanyk, Andrian Marcus Department of Computer Science Wayne State University Detroit Michigan 48202 313 577 5408

More information

Managing Open Bug Repositories through Bug Report Prioritization Using SVMs

Managing Open Bug Repositories through Bug Report Prioritization Using SVMs Managing Open Bug Repositories through Bug Report Prioritization Using SVMs Jaweria Kanwal Quaid-i-Azam University, Islamabad kjaweria09@yahoo.com Onaiza Maqbool Quaid-i-Azam University, Islamabad onaiza@qau.edu.pk

More information

Automatic Bug Assignment Using Information Extraction Methods

Automatic Bug Assignment Using Information Extraction Methods Automatic Bug Assignment Using Information Extraction Methods Ramin Shokripour Zarinah M. Kasirun Sima Zamani John Anvik Faculty of Computer Science & Information Technology University of Malaya Kuala

More information

A Replicated Study on Duplicate Detection: Using Apache Lucene to Search Among Android Defects

A Replicated Study on Duplicate Detection: Using Apache Lucene to Search Among Android Defects A Replicated Study on Duplicate Detection: Using Apache Lucene to Search Among Android Defects Borg, Markus; Runeson, Per; Johansson, Jens; Mäntylä, Mika Published in: [Host publication title missing]

More information

Bug Inducing Analysis to Prevent Fault Prone Bug Fixes

Bug Inducing Analysis to Prevent Fault Prone Bug Fixes Bug Inducing Analysis to Prevent Fault Prone Bug Fixes Haoyu Yang, Chen Wang, Qingkai Shi, Yang Feng, Zhenyu Chen State Key Laboratory for ovel Software Technology, anjing University, anjing, China Corresponding

More information

Investigating the usefulness of stack traces in bug triaging

Investigating the usefulness of stack traces in bug triaging Investigating the usefulness of stack traces in bug triaging Master s Thesis Marco Krikke Investigating the usefulness of stack traces in bug triaging THESIS submitted in partial fulfillment of the requirements

More information

Visualizing the evolution of software using softchange

Visualizing the evolution of software using softchange Visualizing the evolution of software using softchange Daniel M. German, Abram Hindle and Norman Jordan Software Engineering Group Department of Computer Science University of Victoria dmgerman,abez,njordan

More information

Where Should the Bugs Be Fixed?

Where Should the Bugs Be Fixed? Where Should the Bugs Be Fixed? More Accurate Information Retrieval-Based Bug Localization Based on Bug Reports Presented by: Chandani Shrestha For CS 6704 class About the Paper and the Authors Publication

More information

Adrian Bachmann Abraham Bernstein. Data Retrieval, Processing and Linking for Software Process Data Analysis. December 2009

Adrian Bachmann Abraham Bernstein. Data Retrieval, Processing and Linking for Software Process Data Analysis. December 2009 Adrian Bachmann Abraham Bernstein TECHNICAL REPORT No. IFI-2009.07 Data Retrieval, Processing and Linking for Software Process Data Analysis December 2009 University of Zurich Department of Informatics

More information

Churrasco: Supporting Collaborative Software Evolution Analysis

Churrasco: Supporting Collaborative Software Evolution Analysis Churrasco: Supporting Collaborative Software Evolution Analysis Marco D Ambros a, Michele Lanza a a REVEAL @ Faculty of Informatics - University of Lugano, Switzerland Abstract Analyzing the evolution

More information

A Survey of Bug Tracking Tools: Presentation, Analysis and Trends

A Survey of Bug Tracking Tools: Presentation, Analysis and Trends A Survey of Bug Tracking Tools: Presentation, Analysis and Trends Trajkov Marko, Smiljkovic Aleksandar markostrajkov@gmail.com aleksandarsmiljkovic@gmail.com Department of Computer Science, University

More information

BUG TRACKING SYSTEM. November 2015 IJIRT Volume 2 Issue 6 ISSN: Kavita Department of computer science, india

BUG TRACKING SYSTEM. November 2015 IJIRT Volume 2 Issue 6 ISSN: Kavita Department of computer science, india BUG TRACKING SYSTEM Kavita Department of computer science, india Abstract It is important that information provided in bug reports is relevant and complete in order to help resolve bugs quickly. However,

More information

TopicViewer: Evaluating Remodularizations Using Semantic Clustering

TopicViewer: Evaluating Remodularizations Using Semantic Clustering TopicViewer: Evaluating Remodularizations Using Semantic Clustering Gustavo Jansen de S. Santos 1, Katyusco de F. Santos 2, Marco Tulio Valente 1, Dalton D. S. Guerrero 3, Nicolas Anquetil 4 1 Federal

More information

Why So Complicated? Simple Term Filtering and Weighting for Location-Based Bug Report Assignment Recommendation

Why So Complicated? Simple Term Filtering and Weighting for Location-Based Bug Report Assignment Recommendation Why So Complicated? Simple Term Filtering and Weighting for Location-Based Bug Report Assignment Recommendation Ramin Shokripour, John Anvik, Zarinah M. Kasirun, Sima Zamani Faculty of Computer Science

More information

Coping with an Open Bug Repository

Coping with an Open Bug Repository Coping with an Open Bug Repository John Anvik, Lyndon Hiew and Gail C. Murphy Department of Computer Science University of British Columbia {janvik, lyndonh, murphy}@cs.ubc.ca ABSTRACT Most open source

More information

Merging Duplicate Bug Reports by Sentence Clustering

Merging Duplicate Bug Reports by Sentence Clustering Merging Duplicate Bug Reports by Sentence Clustering Abstract Duplicate bug reports are often unfavorable because they tend to take many man hours for being identified as duplicates, marked so and eventually

More information

Assisted Detection of Duplicate Bug Reports

Assisted Detection of Duplicate Bug Reports Assisted Detection of Duplicate Bug Reports by Lyndon Hiew B.Sc., The University of British Columbia, 2003 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science

More information

A Detailed Examination of the Correlation Between Imports and Failure-Proneness of Software Components

A Detailed Examination of the Correlation Between Imports and Failure-Proneness of Software Components A Detailed Examination of the Correlation Between Imports and Failure-Proneness of Software Components Ekwa Duala-Ekoko and Martin P. Robillard School of Computer Science, McGill University Montréal, Québec,

More information

Towards More Accurate Retrieval of Duplicate Bug Reports

Towards More Accurate Retrieval of Duplicate Bug Reports Towards More Accurate Retrieval of Duplicate Bug Reports Chengnian Sun, David Lo, Siau-Cheng Khoo, Jing Jiang School of Computing, National University of Singapore School of Information Systems, Singapore

More information

Empirical Study on Impact of Developer Collaboration on Source Code

Empirical Study on Impact of Developer Collaboration on Source Code Empirical Study on Impact of Developer Collaboration on Source Code Akshay Chopra University of Waterloo Waterloo, Ontario a22chopr@uwaterloo.ca Parul Verma University of Waterloo Waterloo, Ontario p7verma@uwaterloo.ca

More information

Prakash Poudyal University of Evora ABSTRACT

Prakash Poudyal University of Evora ABSTRACT Information Retrieval Based on Extraction of Domain Specific Significant Keywords and Other Relevant Phrases from a Conceptual Semantic Network Structure Mohammad Moinul Hoque University of Evora, Portugal

More information

Combining Information Retrieval and Relevance Feedback for Concept Location

Combining Information Retrieval and Relevance Feedback for Concept Location Combining Information Retrieval and Relevance Feedback for Concept Location Sonia Haiduc - WSU CS Graduate Seminar - Jan 19, 2010 1 Software changes Software maintenance: 50-90% of the global costs of

More information

Improved Duplicate Bug Report Identification

Improved Duplicate Bug Report Identification 2012 16th European Conference on Software Maintenance and Reengineering Improved Duplicate Bug Report Identification Yuan Tian 1, Chengnian Sun 2, and David Lo 1 1 School of Information Systems, Singapore

More information

Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling

Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling Natthakul Pingclasai Department of Computer Engineering Kasetsart University Bangkok, Thailand Email: b5310547207@ku.ac.th Hideaki

More information

Similarity search in multimedia databases

Similarity search in multimedia databases Similarity search in multimedia databases Performance evaluation for similarity calculations in multimedia databases JO TRYTI AND JOHAN CARLSSON Bachelor s Thesis at CSC Supervisor: Michael Minock Examiner:

More information

Bug Triaging: Profile Oriented Developer Recommendation

Bug Triaging: Profile Oriented Developer Recommendation Bug Triaging: Profile Oriented Developer Recommendation Anjali Sandeep Kumar Singh Department of Computer Science and Engineering, Jaypee Institute of Information Technology Abstract Software bugs are

More information

Finding Duplicates of Your Yet Unwritten Bug Report

Finding Duplicates of Your Yet Unwritten Bug Report 2013 17th European Conference on Software Maintenance and Reengineering Finding Duplicates of Your Yet Unwritten Bug Report Johannes Lerch Technische Universität Darmstadt Darmstadt, Germany lerch@st.informatik.tu-darmstadt.de

More information

A Discriminative Model Approach for Accurate Duplicate Bug Report Retrieval

A Discriminative Model Approach for Accurate Duplicate Bug Report Retrieval A Discriminative Model Approach for Accurate Duplicate Bug Report Retrieval Chengnian Sun 1, David Lo 2, Xiaoyin Wang 3, Jing Jiang 2, Siau-Cheng Khoo 1 1 School of Computing, National University of Singapore

More information

Predicting Source Code Changes by Mining Revision History

Predicting Source Code Changes by Mining Revision History Predicting Source Code Changes by Mining Revision History Annie T.T. Ying*+, Gail C. Murphy*, Raymond Ng* Dep. of Computer Science, U. of British Columbia* {aying,murphy,rng}@cs.ubc.ca Mark C. Chu-Carroll+

More information

On the use of Relevance Feedback in IR-based Concept Location

On the use of Relevance Feedback in IR-based Concept Location On the use of Relevance Feedback in IR-based Concept Location Gregory Gay 1, Sonia Haiduc 2, Andrian Marcus 2, Tim Menzies 1 1 Lane Department of Computer Science, West Virginia University Morgantown,

More information

Reusing Program Investigation Knowledge for Code Understanding

Reusing Program Investigation Knowledge for Code Understanding Reusing Program Investigation Knowledge for Code Understanding Martin P. Robillard and Putra Manggala School of Computer Science McGill University Montréal, QC, Canada {martin,pmangg}@cs.mcgill.ca Abstract

More information

NaCIN An Eclipse Plug-In for Program Navigation-based Concern Inference

NaCIN An Eclipse Plug-In for Program Navigation-based Concern Inference NaCIN An Eclipse Plug-In for Program Navigation-based Concern Inference Imran Majid and Martin P. Robillard School of Computer Science McGill University Montreal, QC, Canada {imajid, martin} @cs.mcgill.ca

More information

Locus: Locating Bugs from Software Changes

Locus: Locating Bugs from Software Changes Locus: Locating Bugs from Software Changes Ming Wen, Rongxin Wu, Shing-Chi Cheung Department of Computer Science and Engineering The Hong Kong University of Science and Technology, Hong Kong, China {mwenaa,

More information

Cross-project defect prediction. Thomas Zimmermann Microsoft Research

Cross-project defect prediction. Thomas Zimmermann Microsoft Research Cross-project defect prediction Thomas Zimmermann Microsoft Research Upcoming Events ICSE 2010: http://www.sbs.co.za/icse2010/ New Ideas and Emerging Results ACM Student Research Competition (SRC) sponsored

More information

What Makes a Satisficing Bug Report?

What Makes a Satisficing Bug Report? What Makes a Satisficing Bug Report? Tommaso Dal Sasso, Andrea Mocci, Michele Lanza REVEAL @ Faculty of Informatics University of Lugano, Switzerland Abstract To ensure quality of software systems, developers

More information

An Efficient Approach for Requirement Traceability Integrated With Software Repository

An Efficient Approach for Requirement Traceability Integrated With Software Repository IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 15, Issue 4 (Nov. - Dec. 2013), PP 65-71 An Efficient Approach for Requirement Traceability Integrated With Software

More information

Impact Analysis by Mining Software and Change Request Repositories

Impact Analysis by Mining Software and Change Request Repositories Impact Analysis by Mining Software and Change Request Repositories Gerardo Canfora, Luigi Cerulo RCOST Research Centre on Software Technology Department of Engineering University of Sannio Viale Traiano

More information

An Efficient Approach for Requirement Traceability Integrated With Software Repository

An Efficient Approach for Requirement Traceability Integrated With Software Repository An Efficient Approach for Requirement Traceability Integrated With Software Repository P.M.G.Jegathambal, N.Balaji P.G Student, Tagore Engineering College, Chennai, India 1 Asst. Professor, Tagore Engineering

More information

Impact Analysis of Granularity Levels on Feature Location Technique

Impact Analysis of Granularity Levels on Feature Location Technique Impact Analysis of Granularity Levels on Feature Location Technique Chakkrit Tantithamthavorn, Akinori Ihara, Hideaki Hata, and Ken-ichi Matsumoto Software Engineering Laboratory, Graduate School of Information

More information

60-538: Information Retrieval

60-538: Information Retrieval 60-538: Information Retrieval September 7, 2017 1 / 48 Outline 1 what is IR 2 3 2 / 48 Outline 1 what is IR 2 3 3 / 48 IR not long time ago 4 / 48 5 / 48 now IR is mostly about search engines there are

More information

Mapping Bug Reports to Relevant Files and Automated Bug Assigning to the Developer Alphy Jose*, Aby Abahai T ABSTRACT I.

Mapping Bug Reports to Relevant Files and Automated Bug Assigning to the Developer Alphy Jose*, Aby Abahai T ABSTRACT I. International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 1 ISSN : 2456-3307 Mapping Bug Reports to Relevant Files and Automated

More information

BugzillaMetrics - Design of an adaptable tool for evaluating user-defined metric specifications on change requests

BugzillaMetrics - Design of an adaptable tool for evaluating user-defined metric specifications on change requests BugzillaMetrics - A tool for evaluating metric specifications on change requests BugzillaMetrics - Design of an adaptable tool for evaluating user-defined metric specifications on change requests Lars

More information

SEMI-AUTOMATIC ASSIGNMENT OF WORK ITEMS

SEMI-AUTOMATIC ASSIGNMENT OF WORK ITEMS SEMI-AUTOMATIC ASSIGNMENT OF WORK ITEMS Jonas Helming, Holger Arndt, Zardosht Hodaie, Maximilian Koegel, Nitesh Narayan Institut für Informatik,Technische Universität München, Garching, Germany {helming,

More information

An Approach for Mapping Features to Code Based on Static and Dynamic Analysis

An Approach for Mapping Features to Code Based on Static and Dynamic Analysis An Approach for Mapping Features to Code Based on Static and Dynamic Analysis Abhishek Rohatgi 1, Abdelwahab Hamou-Lhadj 2, Juergen Rilling 1 1 Department of Computer Science and Software Engineering 2

More information

Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification

Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification Denys Poshyvanyk, Yann-Gaël Guéhéneuc, Andrian Marcus, Giuliano Antoniol, Václav Rajlich 14 th IEEE International

More information

Can Better Identifier Splitting Techniques Help Feature Location?

Can Better Identifier Splitting Techniques Help Feature Location? Can Better Identifier Splitting Techniques Help Feature Location? Bogdan Dit, Latifa Guerrouj, Denys Poshyvanyk, Giuliano Antoniol SEMERU 19 th IEEE International Conference on Program Comprehension (ICPC

More information

Mining Software Repositories. Seminar The Mining Project Yana Mileva & Kim Herzig

Mining Software Repositories. Seminar The Mining Project Yana Mileva & Kim Herzig Mining Software Repositories Seminar 2010 - The Mining Project Yana Mileva & Kim Herzig Predicting Defects for Eclipse [Zimmermann et al.] SCM Repository Predicting Defects for Eclipse [Zimmermann et al.]

More information

CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL

CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LVII, Number 4, 2012 CLUSTERING, TIERED INDEXES AND TERM PROXIMITY WEIGHTING IN TEXT-BASED RETRIEVAL IOAN BADARINZA AND ADRIAN STERCA Abstract. In this paper

More information

Structured Information Retrival Based Bug Localization

Structured Information Retrival Based Bug Localization ISSN (online): 2456-0006 International Journal of Science Technology Management and Research Available online at: Structured Information Retrival Based Bug Localization Shraddha Kadam 1 Department of Computer

More information

Filtering Bug Reports for Fix-Time Analysis

Filtering Bug Reports for Fix-Time Analysis Filtering Bug Reports for Fix-Time Analysis Ahmed Lamkanfi, Serge Demeyer LORE - Lab On Reengineering University of Antwerp, Belgium Abstract Several studies have experimented with data mining algorithms

More information

Version History, Similar Report, and Structure: Putting Them Together for Improved Bug Localization

Version History, Similar Report, and Structure: Putting Them Together for Improved Bug Localization Version History, Similar Report, and Structure: Putting Them Together for Improved Bug Localization Shaowei Wang and David Lo School of Information Systems Singapore Management University, Singapore {shaoweiwang.2010,davidlo}@smu.edu.sg

More information

Inside JIRA scheme, everything can be configured, and it consists of. This section will guide you through JIRA Issue and it's types.

Inside JIRA scheme, everything can be configured, and it consists of. This section will guide you through JIRA Issue and it's types. JIRA Tutorial What is JIRA? JIRA is a tool developed by Australian Company Atlassian. It is used for bug tracking, issue tracking, and project management. The name "JIRA" is actually inherited from the

More information

Mining Software Repositories for Software Change Impact Analysis: A Case Study

Mining Software Repositories for Software Change Impact Analysis: A Case Study Mining Software Repositories for Software Change Impact Analysis: A Case Study Lile Hattori 1, Gilson dos Santos Jr. 2, Fernando Cardoso 2, Marcus Sampaio 2 1 Faculty of Informatics University of Lugano

More information

A model of information searching behaviour to facilitate end-user support in KOS-enhanced systems

A model of information searching behaviour to facilitate end-user support in KOS-enhanced systems A model of information searching behaviour to facilitate end-user support in KOS-enhanced systems Dorothee Blocks Hypermedia Research Unit School of Computing University of Glamorgan, UK NKOS workshop

More information

Multi-Project Software Engineering: An Example

Multi-Project Software Engineering: An Example Multi-Project Software Engineering: An Example Pankaj K Garg garg@zeesource.net Zee Source 1684 Nightingale Avenue, Suite 201, Sunnyvale, CA 94087, USA Thomas Gschwind tom@infosys.tuwien.ac.at Technische

More information

Mining CVS repositories, the softchange experience

Mining CVS repositories, the softchange experience Mining CVS repositories, the softchange experience Daniel M. German Software Engineering Group Department of Computer Science University of Victoria dmgerman@uvic.ca Abstract CVS logs are a rich source

More information

Identifying Changed Source Code Lines from Version Repositories

Identifying Changed Source Code Lines from Version Repositories Identifying Changed Source Code Lines from Version Repositories Gerardo Canfora, Luigi Cerulo, Massimiliano Di Penta RCOST Research Centre on Software Technology Department of Engineering - University

More information

CS105 Introduction to Information Retrieval

CS105 Introduction to Information Retrieval CS105 Introduction to Information Retrieval Lecture: Yang Mu UMass Boston Slides are modified from: http://www.stanford.edu/class/cs276/ Information Retrieval Information Retrieval (IR) is finding material

More information

Indexing in Search Engines based on Pipelining Architecture using Single Link HAC

Indexing in Search Engines based on Pipelining Architecture using Single Link HAC Indexing in Search Engines based on Pipelining Architecture using Single Link HAC Anuradha Tyagi S. V. Subharti University Haridwar Bypass Road NH-58, Meerut, India ABSTRACT Search on the web is a daily

More information

Automatic Extraction of Bug Localization Benchmarks from History

Automatic Extraction of Bug Localization Benchmarks from History Automatic Extraction of Bug Localization Benchmarks from History Valentin Dallmeier Dept. of Computer Science Saarland University Saarbrücken, Germany dallmeier@cs.uni-sb.de Thomas Zimmermann Dept. of

More information

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion

MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion MIRACLE at ImageCLEFmed 2008: Evaluating Strategies for Automatic Topic Expansion Sara Lana-Serrano 1,3, Julio Villena-Román 2,3, José C. González-Cristóbal 1,3 1 Universidad Politécnica de Madrid 2 Universidad

More information

Exploring the Relationship of History Characteristics and Defect Count: An Empirical Study

Exploring the Relationship of History Characteristics and Defect Count: An Empirical Study Exploring the Relationship of History Characteristics and Defect Count: An Empirical Study Timea Illes-Seifert Institute for Computer Science University of Heidelberg Im Neuenheimer Feld 326, D-69120 Heidelberg

More information

Improving Evolvability through Refactoring

Improving Evolvability through Refactoring Improving Evolvability through Refactoring Jacek Ratzinger, Michael Fischer Vienna University of Technology Institute of Information Systems A-1040 Vienna, Austria {ratzinger,fischer}@infosys.tuwien.ac.at

More information

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014.

International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Volume 1, Issue 2, July 2014. A B S T R A C T International Journal of Advance Foundation and Research in Science & Engineering (IJAFRSE) Information Retrieval Models and Searching Methodologies: Survey Balwinder Saini*,Vikram Singh,Satish

More information

3 Prioritization of Code Anomalies

3 Prioritization of Code Anomalies 32 3 Prioritization of Code Anomalies By implementing a mechanism for detecting architecturally relevant code anomalies, we are already able to outline to developers which anomalies should be dealt with

More information

International Journal for Management Science And Technology (IJMST)

International Journal for Management Science And Technology (IJMST) Volume 4; Issue 03 Manuscript- 1 ISSN: 2320-8848 (Online) ISSN: 2321-0362 (Print) International Journal for Management Science And Technology (IJMST) GENERATION OF SOURCE CODE SUMMARY BY AUTOMATIC IDENTIFICATION

More information

Using Information Retrieval to Support Software Evolution

Using Information Retrieval to Support Software Evolution Using Information Retrieval to Support Software Evolution Denys Poshyvanyk Ph.D. Candidate SEVERE Group @ Software is Everywhere Software is pervading every aspect of life Software is difficult to make

More information

Software Engineering

Software Engineering Software Engineering Lecture 15: Testing and Debugging Debugging Peter Thiemann University of Freiburg, Germany SS 2014 Motivation Debugging is unavoidable and a major economical factor Software bugs cost

More information

modern database systems lecture 4 : information retrieval

modern database systems lecture 4 : information retrieval modern database systems lecture 4 : information retrieval Aristides Gionis Michael Mathioudakis spring 2016 in perspective structured data relational data RDBMS MySQL semi-structured data data-graph representation

More information

A Text Retrieval Approach to Recover Links among s and Source Code Classes

A Text Retrieval Approach to Recover Links among  s and Source Code Classes 318 A Text Retrieval Approach to Recover Links among E-Mails and Source Code Classes Giuseppe Scanniello and Licio Mazzeo Universitá della Basilicata, Macchia Romana, Viale Dell Ateneo, 85100, Potenza,

More information

A Measurement Design for the Comparison of Expert Usability Evaluation and Mobile App User Reviews

A Measurement Design for the Comparison of Expert Usability Evaluation and Mobile App User Reviews A Measurement Design for the Comparison of Expert Usability Evaluation and Mobile App User Reviews Necmiye Genc-Nayebi and Alain Abran Department of Software Engineering and Information Technology, Ecole

More information

Software Metrics based on Coding Standards Violations

Software Metrics based on Coding Standards Violations Software Metrics based on Coding Standards Violations Yasunari Takai, Takashi Kobayashi and Kiyoshi Agusa Graduate School of Information Science, Nagoya University Aichi, 464-8601, Japan takai@agusa.i.is.nagoya-u.ac.jp,

More information

A Statement Level Bug Localization Technique using Statement Dependency Graph

A Statement Level Bug Localization Technique using Statement Dependency Graph A Statement Level Bug Localization Technique using Statement Dependency Graph Shanto Rahman, Md. Mostafijur Rahman, Ahmad Tahmid and Kazi Sakib Institute of Information Technology, University of Dhaka,

More information

311 Predictions on Kaggle Austin Lee. Project Description

311 Predictions on Kaggle Austin Lee. Project Description 311 Predictions on Kaggle Austin Lee Project Description This project is an entry into the SeeClickFix contest on Kaggle. SeeClickFix is a system for reporting local civic issues on Open311. Each issue

More information

Jose Ricardo Esteban Clua Leonardo Murta. Anita Sarma

Jose Ricardo Esteban Clua Leonardo Murta. Anita Sarma Exploratory Data Analysis of Software Repositories via GPU Jose Ricardo Esteban Clua Leonardo Murta Anita Sarma Introduction Who was the last person who edit method Z? Who has expertise in module X? Which

More information

Bug or Not? Bug Report Classification using N-Gram IDF

Bug or Not? Bug Report Classification using N-Gram IDF Bug or Not? Bug Report Classification using N-Gram IDF Pannavat Terdchanakul 1, Hideaki Hata 1, Passakorn Phannachitta 2, and Kenichi Matsumoto 1 1 Graduate School of Information Science, Nara Institute

More information

Improving Bug Triage with Bug Tossing Graphs

Improving Bug Triage with Bug Tossing Graphs Improving Bug Triage with Bug Tossing Graphs Gaeul Jeong Seoul National University gejeong@ropas.snu.ac.kr Sunghun Kim Hong Kong University of Science and Technology hunkim@cse.ust.hk Thomas Zimmermann

More information

Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study

Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study Jadson Santos Department of Informatics and Applied Mathematics Federal University of Rio Grande do Norte, UFRN Natal,

More information

Bug tracking. Second level Third level Fourth level Fifth level. - Software Development Project. Wednesday, March 6, 2013

Bug tracking. Second level Third level Fourth level Fifth level. - Software Development Project. Wednesday, March 6, 2013 Bug tracking Click to edit Master CSE text 2311 styles - Software Development Project Second level Third level Fourth level Fifth level Wednesday, March 6, 2013 1 Prototype submission An email with your

More information

Steven Davies Marc Roper Department of Computer and Information Sciences University of Strathclyde. International Workshop on Program Debugging, 2013

Steven Davies Marc Roper Department of Computer and Information Sciences University of Strathclyde. International Workshop on Program Debugging, 2013 1/22 Bug localisation through diverse sources of information Steven Davies Marc Roper Department of Computer and Information Sciences University of Strathclyde International Workshop on Program Debugging,

More information

Recovering Traceability Links between Code and Documentation

Recovering Traceability Links between Code and Documentation Recovering Traceability Links between Code and Documentation Paper by: Giuliano Antoniol, Gerardo Canfora, Gerardo Casazza, Andrea De Lucia, and Ettore Merlo Presentation by: Brice Dobry and Geoff Gerfin

More information

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval 1 Naïve Implementation Convert all documents in collection D to tf-idf weighted vectors, d j, for keyword vocabulary V. Convert

More information

CS 6320 Natural Language Processing

CS 6320 Natural Language Processing CS 6320 Natural Language Processing Information Retrieval Yang Liu Slides modified from Ray Mooney s (http://www.cs.utexas.edu/users/mooney/ir-course/slides/) 1 Introduction of IR System components, basic

More information

EasyChair Preprint. A Study on the Use of IDE Features for Debugging

EasyChair Preprint. A Study on the Use of IDE Features for Debugging EasyChair Preprint 111 A Study on the Use of IDE Features for Debugging Afsoon Afzal and Claire Le Goues EasyChair preprints are intended for rapid dissemination of research results and are integrated

More information

Towards a Taxonomy of Approaches for Mining of Source Code Repositories

Towards a Taxonomy of Approaches for Mining of Source Code Repositories Towards a Taxonomy of Approaches for Mining of Source Code Repositories Huzefa Kagdi, Michael L. Collard, Jonathan I. Maletic Department of Computer Science Kent State University Kent Ohio 44242 {hkagdi,

More information

Collaborative bug triaging using textual similarities and change set analysis

Collaborative bug triaging using textual similarities and change set analysis Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2013 Collaborative bug triaging using textual similarities and change set analysis

More information

Improving Stack Overflow Tag Prediction Using Eye Tracking Alina Lazar Youngstown State University Bonita Sharif, Youngstown State University

Improving Stack Overflow Tag Prediction Using Eye Tracking Alina Lazar Youngstown State University Bonita Sharif, Youngstown State University Improving Stack Overflow Tag Prediction Using Eye Tracking Alina Lazar, Youngstown State University Bonita Sharif, Youngstown State University Jenna Wise, Youngstown State University Alyssa Pawluk, Youngstown

More information

IMPROVING FEATURE LOCATION BY COMBINING DYNAMIC ANALYSIS AND STATIC INFERENCES

IMPROVING FEATURE LOCATION BY COMBINING DYNAMIC ANALYSIS AND STATIC INFERENCES IMPROVING FEATURE LOCATION BY COMBINING DYNAMIC ANALYSIS AND STATIC INFERENCES Meghan Revelle Advisor: David Coppit Department of Computer Science The College of William and Mary Williamsburg, Virginia

More information

Introduction to Information Retrieval

Introduction to Information Retrieval Introduction Inverted index Processing Boolean queries Course overview Introduction to Information Retrieval http://informationretrieval.org IIR 1: Boolean Retrieval Hinrich Schütze Institute for Natural

More information

BugMaps-Granger: A Tool for Causality Analysis between Source Code Metrics and Bugs

BugMaps-Granger: A Tool for Causality Analysis between Source Code Metrics and Bugs BugMaps-Granger: A Tool for Causality Analysis between Source Code Metrics and Bugs Cesar Couto, Pedro Pires, Marco Tulio Valente, Roberto Bigonha, Andre Hora, Nicolas Anquetil To cite this version: Cesar

More information

Fig 1. Overview of IE-based text mining framework

Fig 1. Overview of IE-based text mining framework DiscoTEX: A framework of Combining IE and KDD for Text Mining Ritesh Kumar Research Scholar, Singhania University, Pacheri Beri, Rajsthan riteshchandel@gmail.com Abstract: Text mining based on the integration

More information

NextBug: a Bugzilla extension for recommending similar bugs

NextBug: a Bugzilla extension for recommending similar bugs Rocha et al. Journal of Software Research and Development (2015) 3:3 DOI 10.1186/s40411-015-0018-x SOFTWARE Open Access NextBug: a Bugzilla extension for recommending similar bugs Henrique Rocha 1*, Guilherme

More information

Automatic Labeling of Issues on Github A Machine learning Approach

Automatic Labeling of Issues on Github A Machine learning Approach Automatic Labeling of Issues on Github A Machine learning Approach Arun Kalyanasundaram December 15, 2014 ABSTRACT Companies spend hundreds of billions in software maintenance every year. Managing and

More information

Multi-Stage Rocchio Classification for Large-scale Multilabeled

Multi-Stage Rocchio Classification for Large-scale Multilabeled Multi-Stage Rocchio Classification for Large-scale Multilabeled Text data Dong-Hyun Lee Nangman Computing, 117D Garden five Tools, Munjeong-dong Songpa-gu, Seoul, Korea dhlee347@gmail.com Abstract. Large-scale

More information

Citation for published version (APA): He, J. (2011). Exploring topic structure: Coherence, diversity and relatedness

Citation for published version (APA): He, J. (2011). Exploring topic structure: Coherence, diversity and relatedness UvA-DARE (Digital Academic Repository) Exploring topic structure: Coherence, diversity and relatedness He, J. Link to publication Citation for published version (APA): He, J. (211). Exploring topic structure:

More information

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS 1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,

More information

An Approach to Detecting Duplicate Bug Reports using N-gram Features and Cluster Chrinkage Technique

An Approach to Detecting Duplicate Bug Reports using N-gram Features and Cluster Chrinkage Technique International Journal of Scientific and Research Publications, Volume 4, Issue 5, May 2014 1 An Approach to Detecting Duplicate Bug Reports using N-gram Features and Cluster Chrinkage Technique Phuc Nhan

More information

Impact of Dependency Graph in Software Testing

Impact of Dependency Graph in Software Testing Impact of Dependency Graph in Software Testing Pardeep Kaur 1, Er. Rupinder Singh 2 1 Computer Science Department, Chandigarh University, Gharuan, Punjab 2 Assistant Professor, Computer Science Department,

More information

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Thaer Samar 1, Alejandro Bellogín 2, and Arjen P. de Vries 1 1 Centrum Wiskunde & Informatica, {samar,arjen}@cwi.nl

More information

STRICT: Information Retrieval Based Search Term Identification for Concept Location

STRICT: Information Retrieval Based Search Term Identification for Concept Location STRICT: Information Retrieval Based Search Term Identification for Concept Location Mohammad Masudur Rahman Chanchal K. Roy Department of Computer Science, University of Saskatchewan, Canada {masud.rahman,

More information