Impact Analysis by Mining Software and Change Request Repositories

Size: px
Start display at page:

Download "Impact Analysis by Mining Software and Change Request Repositories"

Transcription

1 Impact Analysis by Mining Software and Change Request Repositories Gerardo Canfora, Luigi Cerulo RCOST Research Centre on Software Technology Department of Engineering University of Sannio Viale Traiano Benevento, Italy {canfora, Abstract Impact analysis is the identification of the work products affected by a proposed change request, either a bug fix or a new feature request. In many opensource projects, such as KDE, Gnome, Mozilla, Openoffice, change requests, and related data, are stored in a bug tracking system such as Bugzilla [1]. These data, together with the data stored in a versioning system, such as CVS [2], are a valuable source of information on which useful analyses can be performed. In this paper we propose a method to derive the set of source files impacted by a proposed change request. The method exploits information retrieval algorithms to link the change request description and the set of historical source file revisions impacted by similar past change requests. The method is evaluated by applying it on four opensource projects. 1. Introduction Nowadays, a huge amount of data regarding version history are freely available in opensource projects (eg. SourceForge). They provide useful information on the evolution of a software project in terms of who have changed what and when. New opportunities for data analysis have been proposed in recent years not only in the field of software evolution, but also for change propagation [23], fault analysis [11], software complexity [7], software reuse [19], and social networks [17]. These issues have been faced mainly by analyzing the product instead of the process. For example, Graves et al. [11] found that the number of times code has been changed is a better indication of how many faults it will contain than is its length; Ying et al. [23] report that history change patterns can be used to recommend potentially relevant source code to a developer performing a modification task. In this paper we propose an approach for deriving the set of source files impacted by a proposed change request by using information retrieval techniques. We use the historical data stored in a versioning system, namely CVS [2], and in a bug tracking system, Bugzilla [1]. In the opensource community Bugzilla is usually used for tracking bugs, and to manage enhancement feature requests. Information stored within Bugzilla are semistructured and are in large part composed by textual information: a short description, a long description, and a set of discussion comments about the resolution of the change request. Our hypothesis is that the textual data carried by the bug tracking system during the resolution of a change request is a useful descriptor of the impacted files to be considered in the impact analysis of future similar change requests. The approach is useful primarily for the developer but also for the project manager. A developer may know what are the set of files he/she should work on to resolve a given change request. We show with a case study that the set of file returned by our approach are right in no less 30 % of cases and reaches a maximum of 78 %. A project manager can take advantage of this approach to have an estimation of what are the impacted files in the next release. This is useful to focus testing effort. The paper provides two main contributions: we show that historical information stored in a versioning and a bug tracking system are useful for analyzing aspects not strictly related with software evolution but also for impact analysis; we verify with four case studies that the information stored in historical change requests are a useful descriptor of source files when it is used for impact analysis; The paper is organized as follows: the next section gives a review of related works regarding impact analysis and how software repositories have been used in different contexts; the third section introduces our approach and shows how software repositories such as CVS and Bugzilla are used for supporting impact analysis; the fourth section present

2 the application and validation of the approach in four case studies; the fifth section discusses the results obtained; the final section concludes the paper with suggestions for further development. 2. Related work Traditionally, impact analysis has been faced by static analyzing the product [5]. Many approaches are based on traceability analysis and dependence analysis. Traceability analysis identifies affected software objects using explicit traceability relationship. Some methods use a traceability matrix for representing relationships and the impacted objects are inferred by calculating the transitive closure of the matrix. Dependence analysis attempts to assess the effects of change on semantic dependencies between program entities; a technique is to use static and/or dynamic slicing [14]. Expert judgement and code inspection are also used; however, expert predictions have been shown to be frequently incorrect [16], and source code inspection can be prohibitively expensive [20]. The availability of data on software process, such as those deriving from software and change repositories, can provide new opportunities for impact analysis. In particular, approaches to predict the impact and propagation of changes can be found in [24, 23]. They use heuristics, such as historical cochange, and coauthorship to derive the set of impacted source file or program entities. A method to evaluate the performance of change propagation heuristics has been introduced in [12]. These methods predict the impacted files starting from a given initial source file. The approach we present in this papers starts from a change request description. Many software engineering artifacts are represented in natural language and it is not unusual to find in literature methods that take advantage of information retrieval algorithms and natural language processing approaches. As an example, in [4] a probabilistic information retrieval model is used to map source code artifacts with documentation and in [18] the same approach is extended, obtaining a better performance, with the use of latent semantic indexing. 3. Background In the opensource community project management is usually performed with two systems: a source version control system, such as CVS, and a bug tracking system, such as Bugzilla. The first, is used mainly for file sharing and versioning while the second is used for change management, either bugs and new feature requests. In this section an overview of these two systems is given CVS CVS can handle revisions of textual files by storing the differences between subsequent revisions in a repository. In opensource projects CVS is used to store every files regarding the software system under development: this usually comprises source code, project documentation, internationalization text, arts, help text. Every file in the repository evolve through a sequence of revisions; a software release represents a snapshot on the CVS repository. Revisions are stored together with the user id that performs the commit, the date, and a textual block representing the user comment to the change. The complete history of file revisions can be retrieved. In this paper we focus on the RCS log file, which identifies artifacts in the source repository (i.e. the file name and its path in the source tree) and the list of modification records describing the change history of each artifact from its initial checkin to the current release. A modification record starts with the revision number and contains the following fields: date, the date and time of checkin; author, an identificator of the person who did the checkin; state, one of the following values: exp means experimental and dead means that the file has been removed; lines, the number of lines added and deleted with respect to the previous version of the file; and a final block of free text that contains informal data entered by the author during the checkin process Bugzilla Bugzilla is a database for bugs. Its main use is to report a bug and to assign bugs to the appropriate developers. However, Bugzilla considers also requests for enhancement. In the opensource community it is largely used for proposing changes, assigning tasks, discussing different enhancement solutions and so on. Developers use Bugzilla to keep a todo list as well as to prioritize, schedule and track dependencies. The Bugzilla database is accessible from http and some of its reports can be exported in XML. A bug, or a request for enhancement, has a life cycle in Bugzilla. When a new bug is discovered it becomes by default UNCONFIRMED, because what happens to a bug when it is first reported depends on who reported it. A QA (Quality Assurance) person looks at it and confirm it exists before it gets turned into a NEW bug. Usually that person is one of the maintainers of the project. When a bug becomes NEW, the developer will probably look at the bug and either accept it or give it to someone else. When a bug is fixed it is marked RESOLVED and its resolution is set to FIXED. Other valid resolutions are: INVALID, when the problem described is not a bug; WONTFIX, when the problem described is a bug which will never be fixed; DUPLI CATE, when the problem is a duplicate of an existing bug;

3 and WORKSFORME, when all attempts at reproducing this bug in the current build were futile. In this paper we refer to Change Request (CR) as an item tracked by Bugzilla that can be either a bug or a request for enhancement. In the following, an excerpt in XML format of the information contained in a CR is shown. <bug> <bugid>261054</bugid> <creationts> :07 PST</creationts> <shortdesc>cursor size too wide in a textarea</shortdesc> <deltats> :18 PST</deltats> <product>firefox</product> <component>general</component> <bugstatus>resolved</bugstatus> <resolution>fixed</resolution> <reporter>ffes@users.sourceforge.net</reporter> <assignedto>aaronleventhal@moonset.net</assignedto> <longdesc> <who>ffes@users.sourceforge.net</who> <bugwhen> :07 PST</bugwhen> <thetext> [] the cursor is much wider then in input type [] This makes it hard to determine [] Expected Results: The cursor size of the input should also be used in the []</thetext> </longdesc> </bug> A CR is enclosed in a bug tag, which contains: bugid,a unique identifier assigned by Bugzilla; creationts, the date and time of CR creation; shortdesc, a short description; deltats, the date and time of the last CR state change; product, the product name; component, the component of the system; reporter, who has submitted the CR; assignedto, who was assigned the CR for resolution; and longdesc, a structure comprising a long description of the CR, thetext, who submitted it, who, and when, bugwhen. Bugzilla and CVS are often used in the opensource community in a disciplined and standard way [10]. Opensource development process usually follows guidelines developed around CVS and Bugzilla. One of these is to keep track of the files impacted by a CR. CVS and Bugzilla databases are not correlated but a common practice is to integrate in a CVS commit comment the id number of the CR tracked by Bugzilla. This is a common used heuristic to generate a one to many relation between CR and file revisions [8]. 4. The proposed approach 4.1. Impact analysis process Impact analysis is the identification of the work products affected by a proposed change request, either a bug fix or a new feature request. Our approach to impact analysis is based on the hypothesis that the set of revision comments of a file and the set of CRs that previously impacted it are a good descriptor of the file to support impact analysis of new CRs. We use textual similarity to retrieve past CRs similar to a new CR and to compute the impacted files. This is granted by the fact that CVS and Bugzilla are extensively used as tools for sharing knowledge during the software development process. The clarity of textual information, such as bug comments, bug description, feature proposal and so on is a critical needs in an environment in which no people meetings, no phone calls, and no coffee break discussions are possible. As a matter of fact, the quality of textual information is high specially for mature projects such as KDE, Mozilla, and OpenOffice, for which a history of about seven years is stored in their own CVS and Bugzilla repositories. Textual similarity is a critical part of our approach. The Information Retrieval community dealt with text similarity for a long time. Given a set of text documents and a user information needs represented as a set of words, or more generally as a free text, the information retrieval problem is to retrieve all documents relevant to the user [22]. The process depicted in figure 1 shows our approach. Each work file is the result of a set of revisions in the CVS repository and each revision can be the impacted modification of a CR in the Bugzilla database. These relationships are depicted in the figure as onetomany and manytoone database relations between respectively Work File, Revision, Fixed CR entities. A user submits a new CR to the Bugzilla database. If it is a bug, it is usually composed by a short text description and a long text description in which the steps to reproduce the bug are explained. Otherwise, if it is an enhancement feature request, a description of what is desired is explained in the long description. The new CR is then assigned to a developer in order to resolve it. The developer has to comprehend the CR and determine the files of the source tree that will be impacted by the new CR. A developer that has dealt with similar CRs in the past has a wide knowledge of the files involved and can easily locate the files that should be changed. Otherwise if he/she does not have such knowledge he/she has to analyze more deeper the problem and usually asks the help of other developers involved in the process. It is not unusual to find in the discussion comments of a CR, topics regarding similar past behaviors resolved in other CRs. Our approach attempts to keep track of similar past CRs that have been correctly resolved and fixed in order to give to the developer a first

4 CVS VERSIONING SYSTEM BUGZILLA TRACKING SYSTEM WORK FILE 1 * * REVISION 1 FIXED CR NEW CR NEW CHANGE REQUEST (BUG or NEW FEATURE) NEW CR DESCRIPTOR User WORK FILE DESCRIPTOR INDEX MATCH RANKED LIST OF IMPACTED WORK FILES INDEXING PROCESS Developer Figure 1. Impacted files retrieval process ranked list of probable impacted files which can be the start point for the change. The ranked list is the output of an information retrieval algorithm that compares the query representation of the new CR with the XML representation of the work files indexed through and indexing process. The new CR query representation and the XML file descriptor representation are built by the descriptor builder process shown in figure 2. A new CR item passes through a number of steps in which some standard query transformation are performed. In the first step, field extractor, the set of textual information that are used to build the query is extracted. We have considered two alternatives, one in which only the short description of the CR has been considered and another in which also the long description has been considered. In some cases a more wide query description gives more precise results. The second step, term tokenizer, regards the subdivision of the text in terms that will be considered in the query. A token is a sequence of alphanumeric characters separated by nonalphanumeric characters. In our case we have discarded tokens consisting only of digits. The third step, stemmer, serves to lead a term to its root. For example verb conjugation are leaded to the infinite verb, plurals are leaded to singulars, and so on. This is a common step of an information retrieval process in which terms are stemmed in order to collapse terms with the same meaning into a single term. In our case we have used the Porter stemmer algorithm for English [21]. The fourth step, stopwords, servesasafilterofcommon word that are not discriminant for the text. We have used a common stop word vocabulary used in the context of English text retrieval enriched with a set of words picked up from the software system domain. For example, we have discarded the words bug, feature, and words regarding the name of the system under consideration such as kde, koffice, and so on. The final step, new CR descriptor builder, builds the query in the language used by the retrieval engine that in REVISION CHECK IN FREE TEXT STOPWORDS FILE DESCRIPTOR BUILDER WORK FILE (document) FIXED CR ITEM TERM TOKENIZER STEMMER NEW CR ITEM FIELD EXTRACTOR STOPWORDS NEW CR DESCRIPTOR BUILDER NEW CR (query) Figure 2. Descriptors builder process our case is represented by the set of terms produced in the previous phases. The XML file descriptor representation passes through similar steps and is built from the fixed CR items in relation with the file and the revision checkin free text comment. Also in this case we have considered various alternatives for file representation. One in which only revision comments have been included, and an other in which also fixed CR short and long descriptions have been considered. The ranked impacted file list is retrieved through an in

5 formation retrieval algorithm that scores each work file on the basis of its relevance to the new CR query representation. In information retrieval queries and documents are described by a set of index terms. Let T = {t 1,t 2,,t n } denote the set of term used in the collection of documents. Both a document d and a query q are represented as a vector (x 1,x 2,,x n ) with x i =1if t i belong to the document/query and x i =0otherwise. In our approach we have used a probabilistic information retrieval model in which the relevance of a document with respect to a query is computed by evaluating P (R d, q), that is the probability that a given document d is relevant to a given query q. Different probabilistic models have been proposed in literature to evaluate this probability in order to score each document in the collection. We have used the model introduced in [13]. It assumes that each term is associated with a topic, and that a document may be about the topic, or not. Statistic measures about the term occurrences in documents are used to estimate the probability. In particular, a document d is scored with respect to a query q by using the following scoring function: S(d, q) = t q W (t) that sums the weight of each query term with respect to the document d on the basis of a weighting function W.An overview of weighting functions can be found in [6]. We have used the following [13]: W (t) = TF t (k 1 +1) ( ) log N k 1 (1 b)+b DL AV DL + TFt ND t where TF t is the frequency of term t in the document, DL is the document length (i.e. number of terms), AV DL is the average document length in the collection, N is the number of documents in the collection, and ND t is the number of documents in which the term t appears. The constant k 1 determines how much the weight reacts to increasing TF t, and b is a normalization factor. We use the values of k 1 =1.2and b =0.75, which are recommended values for generic English text Tool support We have developed a toolkit in order to support the process depicted in figure 1. All steps of the process are completely automated, including the download from the CVS and Bugzilla repositories. In particular we have developed a java software that parses a cvs log file and a script that can submit a request to the Bugzilla repository and download the output of a set of fixed CRs. We use the WebL scripting language [15] together with its information extraction Table 1. OpenSource projects # work #CR project files fixed in the starts last six months kcalc Apr 1997 kpdf Aug 2002 kspread Apr 1998 firefox Sep 2002 features in order to parse the html Bugzilla output and generate an XML version of a CR. The java software correlates file revisions with each Bugzilla CRs and generates the set of work file represented in XML ready to be parsed by the indexing engine. The indexing process and the matching algorithm has been implemented in java by using an information retrieval framework named Terrier [3]. 5. Case study We have applied the impact analysis approach in four case studies with different characteristics (Tab. 1). The first three systems come from the kde opensource project, a unix window manager with many desktop applications. They include a small system kcalc, which is a desktop calculator, a medium system kpdf, which is a pdf file viewer, and a medium large system kspread, which is a spreadsheet contained in the koffice environment. The fourth, a medium large system, firefox, is the Mozilla Internet desktop browser, which have a plugin architecture with many additional features. The size, in terms of work files, of each system is shown in the table together with the number of CRs fixed in the last six months, and information regarding the start date of the project. The results have been assessed using two widely accepted information retrieval metrics, namely, Precision and [22]. Precision is the ratio between the number of relevant documents retrieved for a given query and the total number of documents retrieved for that query. is the ratio between the number of relevant documents retrieved for a given query and the total number of relevant documents for that query. In our case study recall and precision indicates how much of the right impacted files has been correctly predicted (recall) and how much of the predicted impacted files is right (precision). We use the same methodology used for evaluating an information retrieval algorithm, that is, the retrieved ranked list of documents is considered at different cut levels [22]. A cut level N is the list of the first N documents. For each cut level the behavior of precision and recall is analyzed and traced on a graph. It is wort noting that the performance of the approach

6 depends on the amount of textual information that describes the work files of the system. A mature system has much more information than a system that has not been released yet. We show it with two analyses. In the first case we consider a snapshot of the system for a given time and we use a set of past CRs (fixed in the past six months) to build an oracle and evaluate the performance by using the leaveoneout assessment technique [9]. For a set of CRs we find the impacted work files by using a correlation heuristic based on the presence of the Bugzilla id number value in the revision comment of the file [8]. We think this is a good oracle as CRs managed in Bugzilla follow basically some accepted guidelines, and one of these is to indicate in the checkin comment the Bugzilla id that identifies the relevant CR. This is generally true for CRs regarding bug fixing; in addition, for the systems we have considered, also enhancement request are usually tracked in the same way. To evaluate the precision and recall of each CR we build the index without the textual information of the current CR. In particular we eliminate from the index the revisions impacted by the CR and the description of the CR itself. This guarantees that the impacted files are retrieved only by different CRs that have descriptions similar to the current CR. In the second case we consider the evolution of the system over time. We sort a set of fixed CRs by they creation date and time. For each CR we include in the index only those revisions and CRs before the creation date and time of the CR under consideration. We do this for the Kspread system in an interval starting from 1999 to 2003 and we show that the precision and recall increases because more information to build the work file descriptors are available. 6. Results In this section we discuss the results obtained in the two case studies. Different aspects regarding the information we have included in the work file descriptor and the new CR descriptor have been considered (Tab. 2). The first, labeled with A, considers a partial set of information, revision comments for work files, and CR short description for the new CR descriptor. It excludes past CR descriptors in work files. The second, labeled with B, considers a minimal set of CR descriptors to be included in work files, revision comments and CR short descriptor for work files and again CR short description for the new CR descriptor. The third, labeled with C, considers also a partial set of information for work files. It excludes the revision comments in work files and considers both short and long description of CRs. The last, labeled with D, considers the full set of information for both work files and the new CR. Figures 3, 4, 5, and 6 show the result of the first type of analysis for each of the four systems analyzed. Each fig Table 2. Combination of different descriptors work file new CR representation representation label revision fixed CR description new CR description Precision comment short long short long A B C D 1,00 0,90 0,80 0,70 0,60 0,50 0,50 0,60 0,70 0,80 0,90 1,00 Figure 3. Kcalc results ure shows a set of four precisionrecall curves, one for each descriptor combination of table 2 and labeled from A to D. The curves have been traced by observing the top 30 files ranked by the information retrieval algorithm and averaging the precision and recall on the number of CRs considered for each system. In each graph there are 30 points, each one corresponding to a cut level starting from 1 to 30. The upper limit of 30 has been chosen because, in each system, the number of impacted files is no more than 30. The first set of points should be read as a measure of overall precision, while the last set of points as a measure of overall recall. For example, for Kcalc, the first point means that in 28 cases, that is the number of CRs considered for averaging, the first work file retrieved is correct 22 times (78%, label D ). While, the last point in the same example means that in 28 cases the top 30 work files retrieved covers the set of correct work files, 26 times entirely (i.e. in 26 cases recall is 100%), one time the 97% (i.e. in 1 case recall is 97%), and one time the 50% (i.e. in 1 case recall is 50%). The resulting average recall is: 98% = ( )/28. A general consideration is that precision and recall grow when more information is used to index work files, and A B C D

7 0,50 A 0,50 A 0,45 B 0,45 B C C D D 0,35 0,35 Precision 0,25 Precision 0,25 0,15 0,15 0,05 0,05 0,50 0,60 0,70 0,80 0,90 0,50 0,60 0,70 0,80 Figure 4. Kpdf results Figure 6. Firefox results Precision 0,50 0,45 0,35 0,25 0,15 0,05 0,50 0,60 0,70 0,80 Figure 5. Kspread results more terms are used to represent a new CR. The curves show that CRs are better descriptors than revision comments in all systems except for Kpdf and Firefox, in which the behavior is inverted. We suppose that the motivation can be found in the fact that Kpdf have reused code from xpdf, a pdf visualizer for XWindows, and Firefox have reused code from Mozilla and Netscape browsers. Code reusing, usually, implies a bulk checkin of the reused system in the source tree. This operation carries in only file revisions of the system being reused, and not CRs, and the source of descriptors are only those from revision comments. Of course, more analyses are needed to argue this position. At first cut level the precision of Kpdf ranges from a maximum of 45% to a minimum of 36%, while at 30th cut level recall ranges from a maximum of 85% to a minimum of 70%. An overall good performance can be seen for Kcalc. At first cut level A B C D the precision of Kcalc ranges from a maximum of 78% to a minimum of 38%, while at 30th cut level recall ranges from a maximum of 98% and a minimum of 82%. We suppose this is mainly due to the fact that the size of Kcalc is small and its architecture is quite simple, 3 or 4 modules each encapsulated in one or two files. If a system is well modularized, then changes impact a small set of work files. This induces a good topic discrimination of work file descriptors, in the sense that different topics belong to different modules. That is, for example, if the topic of a new CR is regarding user interface the impacted files should be the set of files belonging to the user interface module. Kspread and Firefox exhibit overall similar performance: precision is lower than 39% for Kspread and 36% for Firefox, while recall is lower than 79% for Kspread and 67% for Firefox. Figure 7 shows the results of the second type of analysis performed on Kspread. The figure shows a set of precision and recall curves averaged on 16 CRs fixed in year The Kspread project started in April 1998 and while new file revisions have been performed over time new information can be used to build work file descriptors. We have considered five milestones in which the work file descriptors have been build considering only information available before that time. For example, the time period labeled with 2002 considers revision comments and CRs respectively submitted and fixed before 01 January The graph show that information available in 01 January 2000 are not good descriptors, while starting from 01 January 2001 the performance for precision, at first cut level, grows to 44%. For successive time period (2002 to 2003) no more evident improvement can be seen. This may be because during that period of time change activities on software have not produced discriminant descriptors. For example, debugging activities have dealt with the same topics.

8 Precision 0,60 0, ,50 0,60 Figure 7. Kspread time incremental results 7. Conclusions Software and change repositories give new opportunities to software assessment. In this paper an approach to predict impacted files from a change request definition has been presented. The approach has given promising results for further improvements and investigation. A direction, of improvement that we will follow, is to consider different level of granules of impacted entities. In this paper we have considered the file level but also program entities and architectural entities such as modules can be considered. We believe that descriptors at module level and program entity level can improve the performance of the retrieval algorithm especially for large systems. Furthermore, different retrieval algorithms can be considered and we have in program to compare different approaches based on language models. Further investigation of the analysis regarding the evolution of the system over time, should be interesting. In particular what happens to descriptors between relevant milestones, such as software releases, can be an important indicator to improve the performance of work file descriptors. References [1] Bugzilla. bug tracking system. [2] Cvs. concurrent versions system. [3] Terrier Information Retrieval Platform, Lecture Notes in Computer Science. Springer, [4] G. Antoniol, G. Canfora, G. Casazza, A. D. Lucia, and E. Merlo. Recovering traceability links between code and documentation. IEEE Trans. Softw. Eng., 28(10): , [5] R. S. Arnold and S. A. Bohner. Impact analysis towards a framework for comparison. In ICSM 93: Proceedings of the Conference on Software Maintenance, pages IEEE Computer Society, [6] F. Crestani, M. Lalmas, C. J. V. Rijsbergen, and I. Campbell. Is this document relevant?probably: a survey of probabilistic models in information retrieval. ACM Comput. Surv., 30(4): , [7] S. G. Eick, T. L. Graves, A. F. Karr, J. S. Marron, and A. Mockus. Does code decay? assessing the evidence from change management data. IEEE Trans. Softw. Eng., 27(1):1 12, [8] M. Fischer, M. Pinzger, and H. Gall. Populating a release history database from version control and bug tracking systems. In International Conference on Software Maintenance ICSM03, Amsterdam, Netherlands, Sept [9] K. Fogel and M. Bar. CrossValidatory Choice and Assessment of Statistical Predictions (with Discussion), volume 36. J. the Royal Statistical Soc., [10] K. Fogel and M. Bar. Open Source Development with CVS. Coriolis, [11] T. L. Graves, A. F. Karr, J. S. Marron, and H. Siy. Predicting fault incidence using software change history. IEEE Trans. Softw. Eng., 26(7): , [12] A. Hassan and R. Holt. Predicting change propagation in software systems. In Proceedings of 20th IEEE International Conference on Software Maintenance (ICSM 04), pages IEEE Computer Society Press, Sept [13] K. S. Jones, S. Walker, and S. E. Robertson. A probabilistic model of information retrieval: development and comparative experiments. Inf. Process. Manage., 36(6): , [14] M. Kamkar. An overview and comparative classification of program slicing techniques. J. Syst. Softw., 31(3): , [15] T. Kistler and H. Marais. WebL a programming language for the Web. Computer Networks and ISDN Systems, 30(1 7): , Apr [16] M. Lindvall and K. Sandahl. How well do experienced software developers predict software change? J. Syst. Softw., 43(1):19 27, [17] L. Lpez, J. GonzlezBarahona, and G. Robles. Applying social network analysis to the information in cvs respositories. In IEEE 26th International Conference on Software Engineering The International Workshop on Mining Software Repositories, [18] A. Marcus and J. I. Maletic. Recovering documentationtosourcecode traceability links using latent semantic indexing. In International Conference on Software Engineering, pages IEEE Computer Society Press, May [19] A. Michail. Data mining library reuse patterns using generalized association rules. In ICSE 00: Proceedings of the 22nd international conference on Software engineering, pages ACM Press, [20] S. L. Pfleeger. Software Engineering: Theory and Practice. PrenticeHall, Upper Saddle River, NJ, [21] M. F. Porter. An algorithm for suffix stripping. pages , 1997.

9 [22] B. Ribeironeto and Baezayates. Modern Information Retrieval. Addison Wesley, [23] A. T. T. Ying, G. C. Murphy, R. Ng, and M. C. ChuCarroll. Predicting source code changes by mining revision history. IEEE Transactions on Software Engineering, 30: , Sept [24] T. Zimmermann, P. Weisgerber, S. Diehl, and A. Zeller. Mining version histories to guide software changes. In ICSE 04: Proceedings of the 26th International Conference on Software Engineering, pages IEEE Computer Society, 2004.

A Case Study on the Similarity Between Source Code and Bug Reports Vocabularies

A Case Study on the Similarity Between Source Code and Bug Reports Vocabularies A Case Study on the Similarity Between Source Code and Bug Reports Vocabularies Diego Cavalcanti 1, Dalton Guerrero 1, Jorge Figueiredo 1 1 Software Practices Laboratory (SPLab) Federal University of Campina

More information

An Efficient Approach for Requirement Traceability Integrated With Software Repository

An Efficient Approach for Requirement Traceability Integrated With Software Repository An Efficient Approach for Requirement Traceability Integrated With Software Repository P.M.G.Jegathambal, N.Balaji P.G Student, Tagore Engineering College, Chennai, India 1 Asst. Professor, Tagore Engineering

More information

Visualizing the evolution of software using softchange

Visualizing the evolution of software using softchange Visualizing the evolution of software using softchange Daniel M. German, Abram Hindle and Norman Jordan Software Engineering Group Department of Computer Science University of Victoria dmgerman,abez,njordan

More information

Bug Triaging: Profile Oriented Developer Recommendation

Bug Triaging: Profile Oriented Developer Recommendation Bug Triaging: Profile Oriented Developer Recommendation Anjali Sandeep Kumar Singh Department of Computer Science and Engineering, Jaypee Institute of Information Technology Abstract Software bugs are

More information

Recovering Traceability Links between Code and Documentation

Recovering Traceability Links between Code and Documentation Recovering Traceability Links between Code and Documentation Paper by: Giuliano Antoniol, Gerardo Canfora, Gerardo Casazza, Andrea De Lucia, and Ettore Merlo Presentation by: Brice Dobry and Geoff Gerfin

More information

Mapping Bug Reports to Relevant Files and Automated Bug Assigning to the Developer Alphy Jose*, Aby Abahai T ABSTRACT I.

Mapping Bug Reports to Relevant Files and Automated Bug Assigning to the Developer Alphy Jose*, Aby Abahai T ABSTRACT I. International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 1 ISSN : 2456-3307 Mapping Bug Reports to Relevant Files and Automated

More information

Predicting Source Code Changes by Mining Revision History

Predicting Source Code Changes by Mining Revision History Predicting Source Code Changes by Mining Revision History Annie T.T. Ying*+, Gail C. Murphy*, Raymond Ng* Dep. of Computer Science, U. of British Columbia* {aying,murphy,rng}@cs.ubc.ca Mark C. Chu-Carroll+

More information

Identifying Changed Source Code Lines from Version Repositories

Identifying Changed Source Code Lines from Version Repositories Identifying Changed Source Code Lines from Version Repositories Gerardo Canfora, Luigi Cerulo, Massimiliano Di Penta RCOST Research Centre on Software Technology Department of Engineering - University

More information

Automatic Bug Assignment Using Information Extraction Methods

Automatic Bug Assignment Using Information Extraction Methods Automatic Bug Assignment Using Information Extraction Methods Ramin Shokripour Zarinah M. Kasirun Sima Zamani John Anvik Faculty of Computer Science & Information Technology University of Malaya Kuala

More information

Advanced Matching Technique for Trustrace To Improve The Accuracy Of Requirement

Advanced Matching Technique for Trustrace To Improve The Accuracy Of Requirement Advanced Matching Technique for Trustrace To Improve The Accuracy Of Requirement S.Muthamizharasi 1, J.Selvakumar 2, M.Rajaram 3 PG Scholar, Dept of CSE (PG)-ME (Software Engineering), Sri Ramakrishna

More information

An Efficient Approach for Requirement Traceability Integrated With Software Repository

An Efficient Approach for Requirement Traceability Integrated With Software Repository IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 15, Issue 4 (Nov. - Dec. 2013), PP 65-71 An Efficient Approach for Requirement Traceability Integrated With Software

More information

3 Prioritization of Code Anomalies

3 Prioritization of Code Anomalies 32 3 Prioritization of Code Anomalies By implementing a mechanism for detecting architecturally relevant code anomalies, we are already able to outline to developers which anomalies should be dealt with

More information

Coping with an Open Bug Repository

Coping with an Open Bug Repository Coping with an Open Bug Repository John Anvik, Lyndon Hiew and Gail C. Murphy Department of Computer Science University of British Columbia {janvik, lyndonh, murphy}@cs.ubc.ca ABSTRACT Most open source

More information

Towards a Taxonomy of Approaches for Mining of Source Code Repositories

Towards a Taxonomy of Approaches for Mining of Source Code Repositories Towards a Taxonomy of Approaches for Mining of Source Code Repositories Huzefa Kagdi, Michael L. Collard, Jonathan I. Maletic Department of Computer Science Kent State University Kent Ohio 44242 {hkagdi,

More information

Managing Open Bug Repositories through Bug Report Prioritization Using SVMs

Managing Open Bug Repositories through Bug Report Prioritization Using SVMs Managing Open Bug Repositories through Bug Report Prioritization Using SVMs Jaweria Kanwal Quaid-i-Azam University, Islamabad kjaweria09@yahoo.com Onaiza Maqbool Quaid-i-Azam University, Islamabad onaiza@qau.edu.pk

More information

CoxR: Open Source Development History Search System

CoxR: Open Source Development History Search System CoxR: Open Source Development History Search System Makoto Matsushita, Kei Sasaki and Katsuro Inoue Graduate School of Information Science and Technology, Osaka University 1-3, Machikaneyama-cho, Toyonaka-shi,

More information

Mining CVS repositories, the softchange experience

Mining CVS repositories, the softchange experience Mining CVS repositories, the softchange experience Daniel M. German Software Engineering Group Department of Computer Science University of Victoria dmgerman@uvic.ca Abstract CVS logs are a rich source

More information

Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study

Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study Jadson Santos Department of Informatics and Applied Mathematics Federal University of Rio Grande do Norte, UFRN Natal,

More information

Filtering Bug Reports for Fix-Time Analysis

Filtering Bug Reports for Fix-Time Analysis Filtering Bug Reports for Fix-Time Analysis Ahmed Lamkanfi, Serge Demeyer LORE - Lab On Reengineering University of Antwerp, Belgium Abstract Several studies have experimented with data mining algorithms

More information

Understanding Semantic Impact of Source Code Changes: an Empirical Study

Understanding Semantic Impact of Source Code Changes: an Empirical Study Understanding Semantic Impact of Source Code Changes: an Empirical Study Danhua Shao, Sarfraz Khurshid, and Dewayne E. Perry Electrical and Computer Engineering, The University of Texas at Austin {dshao,

More information

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task

Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Applying the KISS Principle for the CLEF- IP 2010 Prior Art Candidate Patent Search Task Walid Magdy, Gareth J.F. Jones Centre for Next Generation Localisation School of Computing Dublin City University,

More information

Churrasco: Supporting Collaborative Software Evolution Analysis

Churrasco: Supporting Collaborative Software Evolution Analysis Churrasco: Supporting Collaborative Software Evolution Analysis Marco D Ambros a, Michele Lanza a a REVEAL @ Faculty of Informatics - University of Lugano, Switzerland Abstract Analyzing the evolution

More information

BugzillaMetrics - Design of an adaptable tool for evaluating user-defined metric specifications on change requests

BugzillaMetrics - Design of an adaptable tool for evaluating user-defined metric specifications on change requests BugzillaMetrics - A tool for evaluating metric specifications on change requests BugzillaMetrics - Design of an adaptable tool for evaluating user-defined metric specifications on change requests Lars

More information

How Often and What StackOverflow Posts Do Developers Reference in Their GitHub Projects?

How Often and What StackOverflow Posts Do Developers Reference in Their GitHub Projects? How Often and What StackOverflow Posts Do Developers Reference in Their GitHub Projects? Saraj Singh Manes School of Computer Science Carleton University Ottawa, Canada sarajmanes@cmail.carleton.ca Olga

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES Mu. Annalakshmi Research Scholar, Department of Computer Science, Alagappa University, Karaikudi. annalakshmi_mu@yahoo.co.in Dr. A.

More information

Bug tracking. Second level Third level Fourth level Fifth level. - Software Development Project. Wednesday, March 6, 2013

Bug tracking. Second level Third level Fourth level Fifth level. - Software Development Project. Wednesday, March 6, 2013 Bug tracking Click to edit Master CSE text 2311 styles - Software Development Project Second level Third level Fourth level Fifth level Wednesday, March 6, 2013 1 Prototype submission An email with your

More information

Measuring fine-grained change in software: towards modification-aware change metrics

Measuring fine-grained change in software: towards modification-aware change metrics Measuring fine-grained change in software: towards modification-aware change metrics Daniel M. German Abram Hindle Software Engineering Group Department fo Computer Science University of Victoria Abstract

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Bug Inducing Analysis to Prevent Fault Prone Bug Fixes

Bug Inducing Analysis to Prevent Fault Prone Bug Fixes Bug Inducing Analysis to Prevent Fault Prone Bug Fixes Haoyu Yang, Chen Wang, Qingkai Shi, Yang Feng, Zhenyu Chen State Key Laboratory for ovel Software Technology, anjing University, anjing, China Corresponding

More information

Are Refactorings Less Error-prone Than Other Changes?

Are Refactorings Less Error-prone Than Other Changes? Are Refactorings Less Error-prone Than Other Changes? Peter Weißgerber University of Trier Computer Science Department 54286 Trier, Germany weissger@uni-trier.de Stephan Diehl University of Trier Computer

More information

Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling

Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling Natthakul Pingclasai Department of Computer Engineering Kasetsart University Bangkok, Thailand Email: b5310547207@ku.ac.th Hideaki

More information

Capturing Window Attributes for Extending Web Browsing History Records

Capturing Window Attributes for Extending Web Browsing History Records Capturing Window Attributes for Extending Web Browsing History Records Motoki Miura 1, Susumu Kunifuji 1, Shogo Sato 2, and Jiro Tanaka 3 1 School of Knowledge Science, Japan Advanced Institute of Science

More information

Empirical Study on Impact of Developer Collaboration on Source Code

Empirical Study on Impact of Developer Collaboration on Source Code Empirical Study on Impact of Developer Collaboration on Source Code Akshay Chopra University of Waterloo Waterloo, Ontario a22chopr@uwaterloo.ca Parul Verma University of Waterloo Waterloo, Ontario p7verma@uwaterloo.ca

More information

From Passages into Elements in XML Retrieval

From Passages into Elements in XML Retrieval From Passages into Elements in XML Retrieval Kelly Y. Itakura David R. Cheriton School of Computer Science, University of Waterloo 200 Univ. Ave. W. Waterloo, ON, Canada yitakura@cs.uwaterloo.ca Charles

More information

Processing and Data Collection of Program Structures in Open Source Repositories

Processing and Data Collection of Program Structures in Open Source Repositories 8 Processing and Data Collection of Program Structures in Open Source Repositories JEAN PETRIĆ, TIHANA GALINAC GRBAC and MARIO DUBRAVAC, University of Rijeka Software structure analysis with help of network

More information

University of Santiago de Compostela at CLEF-IP09

University of Santiago de Compostela at CLEF-IP09 University of Santiago de Compostela at CLEF-IP9 José Carlos Toucedo, David E. Losada Grupo de Sistemas Inteligentes Dept. Electrónica y Computación Universidad de Santiago de Compostela, Spain {josecarlos.toucedo,david.losada}@usc.es

More information

Why So Complicated? Simple Term Filtering and Weighting for Location-Based Bug Report Assignment Recommendation

Why So Complicated? Simple Term Filtering and Weighting for Location-Based Bug Report Assignment Recommendation Why So Complicated? Simple Term Filtering and Weighting for Location-Based Bug Report Assignment Recommendation Ramin Shokripour, John Anvik, Zarinah M. Kasirun, Sima Zamani Faculty of Computer Science

More information

Web Service Usage Mining: Mining For Executable Sequences

Web Service Usage Mining: Mining For Executable Sequences 7th WSEAS International Conference on APPLIED COMPUTER SCIENCE, Venice, Italy, November 21-23, 2007 266 Web Service Usage Mining: Mining For Executable Sequences MOHSEN JAFARI ASBAGH, HASSAN ABOLHASSANI

More information

International Journal of Applied Sciences, Engineering and Management ISSN , Vol. 04, No. 01, January 2015, pp

International Journal of Applied Sciences, Engineering and Management ISSN , Vol. 04, No. 01, January 2015, pp Towards Effective Bug Triage with Software Data Reduction Techniques G. S. Sankara Rao 1, Ch. Srinivasa Rao 2 1 M. Tech Student, Dept of CSE, Amrita Sai Institute of Science and Technology, Paritala, Krishna-521180.

More information

3 Analysing Software Repositories to Understand Software Evolution

3 Analysing Software Repositories to Understand Software Evolution 3 Analysing Software Repositories to Understand Software Evolution Marco D Ambros 1, Harald C. Gall 2, Michele Lanza 1, and Martin Pinzger 2 1 Faculty of Informatics, University of Lugano, Switzerland

More information

Porfirio Tramontana Anna Rita Fasolino. Giuseppe A. Di Lucca. University of Sannio, Benevento, Italy

Porfirio Tramontana Anna Rita Fasolino. Giuseppe A. Di Lucca. University of Sannio, Benevento, Italy A Technique for Reducing User Session Data Sets in Web Application Testing Porfirio Tramontana Anna Rita Fasolino Dipartimento di Informatica e Sistemistica University of Naples Federico II, Italy Giuseppe

More information

Interactive Exploration of Semantic Clusters

Interactive Exploration of Semantic Clusters Interactive Exploration of Semantic Clusters In proceedings of the International Workshop on Visualizing Software for Understanding and Analysis (VISSOFT 2005) Mircea Lungu 1, Adrian Kuhn 2, Tudor Gîrba

More information

ResPubliQA 2010

ResPubliQA 2010 SZTAKI @ ResPubliQA 2010 David Mark Nemeskey Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary (SZTAKI) Abstract. This paper summarizes the results of our first

More information

Improving Evolvability through Refactoring

Improving Evolvability through Refactoring Improving Evolvability through Refactoring Jacek Ratzinger, Michael Fischer Vienna University of Technology Institute of Information Systems A-1040 Vienna, Austria {ratzinger,fischer}@infosys.tuwien.ac.at

More information

Patent documents usecases with MyIntelliPatent. Alberto Ciaramella IntelliSemantic 25/11/2012

Patent documents usecases with MyIntelliPatent. Alberto Ciaramella IntelliSemantic 25/11/2012 Patent documents usecases with MyIntelliPatent Alberto Ciaramella IntelliSemantic 25/11/2012 Objectives and contents of this presentation This presentation: identifies and motivates the most significant

More information

AN ALGORITHM FOR TEST DATA SET REDUCTION FOR WEB APPLICATION TESTING

AN ALGORITHM FOR TEST DATA SET REDUCTION FOR WEB APPLICATION TESTING AN ALGORITHM FOR TEST DATA SET REDUCTION FOR WEB APPLICATION TESTING A. Askarunisa, N. Ramaraj Abstract: Web Applications have become a critical component of the global information infrastructure, and

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

1 Version management tools as a basis for integrating Product Derivation and Software Product Families

1 Version management tools as a basis for integrating Product Derivation and Software Product Families 1 Version management tools as a basis for integrating Product Derivation and Software Product Families Jilles van Gurp, Christian Prehofer Nokia Research Center, Software and Application Technology Lab

More information

Web Data mining-a Research area in Web usage mining

Web Data mining-a Research area in Web usage mining IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 13, Issue 1 (Jul. - Aug. 2013), PP 22-26 Web Data mining-a Research area in Web usage mining 1 V.S.Thiyagarajan,

More information

A Heuristic-based Approach to Identify Concepts in Execution Traces

A Heuristic-based Approach to Identify Concepts in Execution Traces A Heuristic-based Approach to Identify Concepts in Execution Traces Fatemeh Asadi * Massimiliano Di Penta ** Giuliano Antoniol * Yann-Gaël Guéhéneuc ** * Ecole Polytechnique de Montréal, Canada ** Dept.

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Study of Perfective Maintainability for Component-based software systems using Aspect-Oriented-Programming Techniques

Study of Perfective Maintainability for Component-based software systems using Aspect-Oriented-Programming Techniques Study of Perfective Maintainability for Component-based software systems using Aspect-Oriented-Programming Techniques JyothiRandDr.V.K. Agrawal Abstract As Maintainability plays a very important role in

More information

A PRELIMINARY STUDY ON THE EXTRACTION OF SOCIO-TOPICAL WEB KEYWORDS

A PRELIMINARY STUDY ON THE EXTRACTION OF SOCIO-TOPICAL WEB KEYWORDS A PRELIMINARY STUDY ON THE EXTRACTION OF SOCIO-TOPICAL WEB KEYWORDS KULWADEE SOMBOONVIWAT Graduate School of Information Science and Technology, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033,

More information

Where Should the Bugs Be Fixed?

Where Should the Bugs Be Fixed? Where Should the Bugs Be Fixed? More Accurate Information Retrieval-Based Bug Localization Based on Bug Reports Presented by: Chandani Shrestha For CS 6704 class About the Paper and the Authors Publication

More information

Support for Static Concept Location with sv3d

Support for Static Concept Location with sv3d Support for Static Concept Location with sv3d Xinrong Xie, Denys Poshyvanyk, Andrian Marcus Department of Computer Science Wayne State University Detroit Michigan 48202 {xxr, denys, amarcus}@wayne.edu

More information

TopicViewer: Evaluating Remodularizations Using Semantic Clustering

TopicViewer: Evaluating Remodularizations Using Semantic Clustering TopicViewer: Evaluating Remodularizations Using Semantic Clustering Gustavo Jansen de S. Santos 1, Katyusco de F. Santos 2, Marco Tulio Valente 1, Dalton D. S. Guerrero 3, Nicolas Anquetil 4 1 Federal

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 2, Mar Apr 2017

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 2, Mar Apr 2017 RESEARCH ARTICLE OPEN ACCESS Handling Anomalies in the System Design: A Unique Methodology and Solution Pratik Rajan Bhore [1], Dr. Shashank D. Joshi [2], Dr. Naveenkumar Jayakumar [3] Department of Computer

More information

Measuring the Semantic Similarity of Comments in Bug Reports

Measuring the Semantic Similarity of Comments in Bug Reports Measuring the Semantic Similarity of Comments in Bug Reports Bogdan Dit, Denys Poshyvanyk, Andrian Marcus Department of Computer Science Wayne State University Detroit Michigan 48202 313 577 5408

More information

Research and implementation of search engine based on Lucene Wan Pu, Wang Lisha

Research and implementation of search engine based on Lucene Wan Pu, Wang Lisha 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) Research and implementation of search engine based on Lucene Wan Pu, Wang Lisha Physics Institute,

More information

Sindice Widgets: Lightweight embedding of Semantic Web capabilities into existing user applications.

Sindice Widgets: Lightweight embedding of Semantic Web capabilities into existing user applications. Sindice Widgets: Lightweight embedding of Semantic Web capabilities into existing user applications. Adam Westerski, Aftab Iqbal, and Giovanni Tummarello Digital Enterprise Research Institute, NUI Galway,Ireland

More information

Reconstructing Requirements Coverage Views from Design and Test using Traceability Recovery via LSI

Reconstructing Requirements Coverage Views from Design and Test using Traceability Recovery via LSI Reconstructing Requirements Coverage Views from Design and Test using Traceability Recovery via LSI Marco Lormans Delft Univ. of Technology M.Lormans@ewi.tudelft.nl Arie van Deursen CWI and Delft Univ.

More information

Approach Research of Keyword Extraction Based on Web Pages Document

Approach Research of Keyword Extraction Based on Web Pages Document 2017 3rd International Conference on Electronic Information Technology and Intellectualization (ICEITI 2017) ISBN: 978-1-60595-512-4 Approach Research Keyword Extraction Based on Web Pages Document Yangxin

More information

International Journal for Management Science And Technology (IJMST)

International Journal for Management Science And Technology (IJMST) Volume 4; Issue 03 Manuscript- 1 ISSN: 2320-8848 (Online) ISSN: 2321-0362 (Print) International Journal for Management Science And Technology (IJMST) GENERATION OF SOURCE CODE SUMMARY BY AUTOMATIC IDENTIFICATION

More information

Impact of Dependency Graph in Software Testing

Impact of Dependency Graph in Software Testing Impact of Dependency Graph in Software Testing Pardeep Kaur 1, Er. Rupinder Singh 2 1 Computer Science Department, Chandigarh University, Gharuan, Punjab 2 Assistant Professor, Computer Science Department,

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Dong Han and Kilian Stoffel Information Management Institute, University of Neuchâtel Pierre-à-Mazel 7, CH-2000 Neuchâtel,

More information

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Thaer Samar 1, Alejandro Bellogín 2, and Arjen P. de Vries 1 1 Centrum Wiskunde & Informatica, {samar,arjen}@cwi.nl

More information

Manage quality processes with Bugzilla

Manage quality processes with Bugzilla Manage quality processes with Bugzilla Birth Certificate of a Bug: Bugzilla in a Nutshell An open-source bugtracker and testing tool initially developed by Mozilla. Initially released by Netscape in 1998.

More information

Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification

Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification Denys Poshyvanyk, Yann-Gaël Guéhéneuc, Andrian Marcus, Giuliano Antoniol, Václav Rajlich 14 th IEEE International

More information

Mining Software Repositories for Software Change Impact Analysis: A Case Study

Mining Software Repositories for Software Change Impact Analysis: A Case Study Mining Software Repositories for Software Change Impact Analysis: A Case Study Lile Hattori 1, Gilson dos Santos Jr. 2, Fernando Cardoso 2, Marcus Sampaio 2 1 Faculty of Informatics University of Lugano

More information

"A Bug's Life" Visualizing a Bug Database

A Bug's Life Visualizing a Bug Database "A Bug's Life" Visualizing a Bug Database Marco D'Ambros and Michele Lanza Faculty of Informatics University of Lugano, Switzerland Martin Pinzger s.e.a.l. - software evolution and architecture lab University

More information

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data

Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data American Journal of Applied Sciences (): -, ISSN -99 Science Publications Designing and Building an Automatic Information Retrieval System for Handling the Arabic Data Ibrahiem M.M. El Emary and Ja'far

More information

10 Steps to Building an Architecture for Space Surveillance Projects. Eric A. Barnhart, M.S.

10 Steps to Building an Architecture for Space Surveillance Projects. Eric A. Barnhart, M.S. 10 Steps to Building an Architecture for Space Surveillance Projects Eric A. Barnhart, M.S. Eric.Barnhart@harris.com Howard D. Gans, Ph.D. Howard.Gans@harris.com Harris Corporation, Space and Intelligence

More information

University of Amsterdam at INEX 2010: Ad hoc and Book Tracks

University of Amsterdam at INEX 2010: Ad hoc and Book Tracks University of Amsterdam at INEX 2010: Ad hoc and Book Tracks Jaap Kamps 1,2 and Marijn Koolen 1 1 Archives and Information Studies, Faculty of Humanities, University of Amsterdam 2 ISLA, Faculty of Science,

More information

Improving Bug Management using Correlations in Crash Reports

Improving Bug Management using Correlations in Crash Reports Noname manuscript No. (will be inserted by the editor) Improving Bug Management using Correlations in Crash Reports Shaohua Wang Foutse Khomh Ying Zou Received: date / Accepted: date Abstract Nowadays,

More information

SHINOBI: A Real-Time Code Clone Detection Tool for Software Maintenance

SHINOBI: A Real-Time Code Clone Detection Tool for Software Maintenance : A Real-Time Code Clone Detection Tool for Software Maintenance Takanobu Yamashina Hidetake Uwano Kyohei Fushida Yasutaka Kamei Masataka Nagura Shinji Kawaguchi Hajimu Iida Nara Institute of Science and

More information

Using Information Retrieval to Support Software Evolution

Using Information Retrieval to Support Software Evolution Using Information Retrieval to Support Software Evolution Denys Poshyvanyk Ph.D. Candidate SEVERE Group @ Software is Everywhere Software is pervading every aspect of life Software is difficult to make

More information

A Text Retrieval Approach to Recover Links among s and Source Code Classes

A Text Retrieval Approach to Recover Links among  s and Source Code Classes 318 A Text Retrieval Approach to Recover Links among E-Mails and Source Code Classes Giuseppe Scanniello and Licio Mazzeo Universitá della Basilicata, Macchia Romana, Viale Dell Ateneo, 85100, Potenza,

More information

Assisted Detection of Duplicate Bug Reports

Assisted Detection of Duplicate Bug Reports Assisted Detection of Duplicate Bug Reports by Lyndon Hiew B.Sc., The University of British Columbia, 2003 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Master of Science

More information

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,

More information

Making Retrieval Faster Through Document Clustering

Making Retrieval Faster Through Document Clustering R E S E A R C H R E P O R T I D I A P Making Retrieval Faster Through Document Clustering David Grangier 1 Alessandro Vinciarelli 2 IDIAP RR 04-02 January 23, 2004 D a l l e M o l l e I n s t i t u t e

More information

Cross-project defect prediction. Thomas Zimmermann Microsoft Research

Cross-project defect prediction. Thomas Zimmermann Microsoft Research Cross-project defect prediction Thomas Zimmermann Microsoft Research Upcoming Events ICSE 2010: http://www.sbs.co.za/icse2010/ New Ideas and Emerging Results ACM Student Research Competition (SRC) sponsored

More information

Modeling Software Developer Expertise and Inexpertise to Handle Diverse Information Needs

Modeling Software Developer Expertise and Inexpertise to Handle Diverse Information Needs Modeling Software Developer Expertise and Inexpertise to Handle Diverse Information Needs Frank Lykes Claytor Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University

More information

Exploring the Relationship of History Characteristics and Defect Count: An Empirical Study

Exploring the Relationship of History Characteristics and Defect Count: An Empirical Study Exploring the Relationship of History Characteristics and Defect Count: An Empirical Study Timea Illes-Seifert Institute for Computer Science University of Heidelberg Im Neuenheimer Feld 326, D-69120 Heidelberg

More information

Multi-Project Software Engineering: An Example

Multi-Project Software Engineering: An Example Multi-Project Software Engineering: An Example Pankaj K Garg garg@zeesource.net Zee Source 1684 Nightingale Avenue, Suite 201, Sunnyvale, CA 94087, USA Thomas Gschwind tom@infosys.tuwien.ac.at Technische

More information

Java Archives Search Engine Using Byte Code as Information Source

Java Archives Search Engine Using Byte Code as Information Source Java Archives Search Engine Using Byte Code as Information Source Oscar Karnalim School of Electrical Engineering and Informatics Bandung Institute of Technology Bandung, Indonesia 23512012@std.stei.itb.ac.id

More information

Investigating the usefulness of stack traces in bug triaging

Investigating the usefulness of stack traces in bug triaging Investigating the usefulness of stack traces in bug triaging Master s Thesis Marco Krikke Investigating the usefulness of stack traces in bug triaging THESIS submitted in partial fulfillment of the requirements

More information

Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic Algorithms

Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic Algorithms Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic Algorithms Annibale Panichella 1, Bogdan Dit 2, Rocco Oliveto 3, Massimiliano Di Penta 4, Denys Poshyvanyk 5, Andrea De Lucia

More information

Visual Querying and Analysis of Large Software Repositories

Visual Querying and Analysis of Large Software Repositories Empirical Software Engineering manuscript No. (will be inserted by the editor) Visual Querying and Analysis of Large Software Repositories Lucian Voinea 1, Alexandru Telea 2 1 Technische Universiteit Eindhoven,

More information

Reverse Engineering with Logical Coupling

Reverse Engineering with Logical Coupling Reverse Engineering with Logical Coupling Marco D Ambros, Michele Lanza - Faculty of Informatics - University of Lugano Switzerland 13th Working Conference on Reverse Engineering October 23-27, 2006, Benevento,

More information

Component ranking and Automatic Query Refinement for XML Retrieval

Component ranking and Automatic Query Refinement for XML Retrieval Component ranking and Automatic uery Refinement for XML Retrieval Yosi Mass, Matan Mandelbrod IBM Research Lab Haifa 31905, Israel {yosimass, matan}@il.ibm.com Abstract ueries over XML documents challenge

More information

The goal of this project is to enhance the identification of code duplication which can result in high cost reductions for a minimal price.

The goal of this project is to enhance the identification of code duplication which can result in high cost reductions for a minimal price. Code Duplication New Proposal Dolores Zage, Wayne Zage Ball State University June 1, 2017 July 31, 2018 Long Term Goals The goal of this project is to enhance the identification of code duplication which

More information

Part I: Data Mining Foundations

Part I: Data Mining Foundations Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?

More information

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge Discover hidden information from your texts! Information overload is a well known issue in the knowledge industry. At the same time most of this information becomes available in natural language which

More information

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY

WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.4, April 2009 349 WEIGHTING QUERY TERMS USING WORDNET ONTOLOGY Mohammed M. Sakre Mohammed M. Kouta Ali M. N. Allam Al Shorouk

More information

Inverted Indexes. Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5

Inverted Indexes. Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5 Inverted Indexes Indexing and Searching, Modern Information Retrieval, Addison Wesley, 2010 p. 5 Basic Concepts Inverted index: a word-oriented mechanism for indexing a text collection to speed up the

More information

QUERY RECOMMENDATION SYSTEM USING USERS QUERYING BEHAVIOR

QUERY RECOMMENDATION SYSTEM USING USERS QUERYING BEHAVIOR International Journal of Emerging Technology and Innovative Engineering QUERY RECOMMENDATION SYSTEM USING USERS QUERYING BEHAVIOR V.Megha Dept of Computer science and Engineering College Of Engineering

More information

CANDIDATE LINK GENERATION USING SEMANTIC PHEROMONE SWARM

CANDIDATE LINK GENERATION USING SEMANTIC PHEROMONE SWARM CANDIDATE LINK GENERATION USING SEMANTIC PHEROMONE SWARM Ms.Susan Geethu.D.K 1, Ms. R.Subha 2, Dr.S.Palaniswami 3 1, 2 Assistant Professor 1,2 Department of Computer Science and Engineering, Sri Krishna

More information

Lecture 2 Pre-processing Concurrent Versions System Archives

Lecture 2 Pre-processing Concurrent Versions System Archives Lecture 2 Pre-processing Concurrent Versions System Archives Debian Developer Coordinates Distributed Development Distributed Development I hope no one else is editing the same parts of the code as I am.

More information