Can Better Identifier Splitting Techniques Help Feature Location?
|
|
- Brook Parsons
- 5 years ago
- Views:
Transcription
1 Can Better Identifier Splitting Techniques Help Feature Location? Bogdan Dit, Latifa Guerrouj, Denys Poshyvanyk, Giuliano Antoniol SEMERU 19 th IEEE International Conference on Program Comprehension (ICPC 11) Kingston, Ontario, Canada
2 2
3 Textual information embeds domain knowledge 3
4 Textual information embeds domain knowledge About 70% of source code consists of identifiers* * Deissenboeck, F. and Pizka, M., "Concise and Consistent Naming", Software 4 Quality Journal, vol. 14, no. 3, 2006, pp
5 Textual information embeds domain knowledge About 70% of source code consists of identifiers* Identifiers are important source of information for maintenance tasks: traceability link recovery feature location * Deissenboeck, F. and Pizka, M., "Concise and Consistent Naming", Software 5 Quality Journal, vol. 14, no. 3, 2006, pp
6 # of Cumulative Feature Location Papers based on Textual Information 6
7 Related Work on Identifiers Takang et al. (JPL 96) programs with full-word identifiers are more understandable than those with abbreviated ones Lawrie et al. (ICPC 06) full words and recognizable abbreviations lead to better comprehension Binkley et al. (ICPC 09) CamelCase style is easier to recognize than CamelCase style is easier to recognize than underscore 7
8 Related Work on Identifiers Enslen et al. (MSR 09) Samurai: algorithm for splitting identifiers (using tables of identifier frequencies) Guerrouj et al. (JSME 11) TIDIER: algorithm for splitting identifiers (using contextual information) Other related work Deissenboeck and Pizka (SQJ 06), Antoniol et al. (ICSM 07), Haiduc and Marcus (ICPC 08), etc. 8
9 Splitting Identifiers Correctly is Challenging 9
10 Identifier Splitting Algorithms Original Identifier userid setgid print_file2device SSLCertificate MINstring USERID currentsize readadapterobject tolocale imitating DEFMASKBit 10
11 Identifier Splitting Algorithms Original Identifier Camel Case userid user Id setgid set GID print_file2device print file 2 device SSLCertificate SSL Certificate MINstring MI Nstring USERID USERID currentsize currentsize readadapterobject readadapterobject tolocale tolocale imitating imitating DEFMASKBit DEFMASK Bit 11
12 Identifier Splitting Algorithms Original Identifier userid setgid print_file2device SSLCertificate MINstring USERID currentsize Camel Case user Id set GID print file 2 device SSL Certificate MI Nstring USERID currentsize readadapterobject p j tolocale imitating DEFMASKBit Handles underscore and digits readadapterobject Fails at mixed cases tolocale imitating DEFMASK Bit 12 Fails at same case identifiers
13 Identifier Splitting Algorithms Original Identifier Camel Case userid user Id setgid set GID print_file2device print file 2 device SSLCertificate SSL Certificate MINstring MI Nstring USERID USERID currentsize currentsize readadapterobject readadapterobject tolocale tolocale imitating imitating DEFMASKBit DEFMASK Bit 13
14 Identifier Splitting Algorithms Original Identifier Camel Case Samurai userid user Id user Id setgid set GID set GID print_file2device print file 2 device print file 2 device SSLCertificate SSL Certificate SSL Certificate MINstring MI Nstring MIN string USERID USERID USER ID currentsize currentsize current size readadapterobject readadapterobject read adapter object tolocale tolocale tol ocal e imitating imitating imi ta ting DEFMASKBit DEFMASK Bit DEF MASK Bit 14
15 Identifier Splitting Algorithms Original Identifier Camel Case Samurai Splits some cases where CamelCase cannot userid user Id user Id setgid set GID set GID print_file2device print file 2 device print file 2 device SSLCertificate SSL Certificate SSL Certificate MINstring MI Nstring MIN string USERID USERID USER ID currentsize currentsize current size readadapterobject readadapterobject read adapter object tolocale tolocale tol ocal e imitating imitating imi ta ting DEFMASKBit DEFMASK Bit DEF MASK Bit Oversplits 15
16 # of Cumulative Feature Location Papers based on Textual Information 16
17 # of Cumulative Feature Location Papers based on Textual Information Existing feature location techniques use Camel Case splitting 17
18 Information Retrieval FLT Generate corpus Preprocessing Remove non-literals Remove stop words Split identifiers Stemming Indexing Term-by-document matrix Singular Value Decomposition synchronized void print(testresult result, long runtime) throws IOException{ printheader(runtime); printerrors(result); printfailures(result); printfooter(result); } synchronized void print TestResult result long runtime throws IOException printheader runtime printerrors result printfailures result printfooter result User formulate query print TestResult result runtime IOException printheader runtime Generate results printerrors result printfailures result Ranked list printfooter result 18
19 Information Retrieval FLT Generate corpus Preprocessing Remove non-literals Remove stop words Split identifiers Stemming Indexing Term-by-document matrix Singular Value Decomposition print Test Result result run Time IO Exception print itheader run Time print Errors result print Failures result print Footer result print test result result run time io exception print head run time print error result print fail result print foot result print test result... User formulate query Generate results m Ranked list m
20 Information Retrieval FLT Generate corpus Preprocessing Remove non-literals Remove stop words Split identifiers Stemming Indexing Term-by-document matrix Singular Value Decomposition User formulate query Generate results print test result... m m Ranked list 20
21 IR and Dynamic Information FLT Generate corpus Preprocessing Remove non-literals Remove stop words Split identifiers Stemming Indexing Term-by-document matrix Singular Value Decomposition User formulate query Generate results Ranked list of executed methods Collect execution trace 21
22 Research Goal Evaluate how advanced splitting techniques impact the performance of feature location techniques 22
23 Information Retrieval FLT Generate corpus Preprocessing Remove non-literals Remove stop words Split identifiers Stemming Indexing Term-by-document matrix Singular Value Decomposition User formulate query Generate results Replace Camel Case with : Samurai Perfect Splitting algorithm (Oracle) Ranked list 23 B etter Worst
24 Extract Identifiers All Identifiers Building the Oracle 24
25 Extract Identifiers All Identifiers Same split? (CamelCase Samurai TIDIER) YES Building the Oracle Concordant Split Identifiers 25
26 Extract Identifiers All Identifiers Same split? (CamelCase Samurai TIDIER) Assume they are correct Building the Oracle YES Manually verified a sample Concordant Split Identifiers 26 Threat to validity
27 Manually Split Identifiers Extract Identifiers Manual Split All Identifiers Same split? (CamelCase Samurai TIDIER) NO Discordant Split Identifiers YES Building the Oracle Concordant Split Identifiers 27
28 Manually Split Identifiers Extract Identifiers Consensus between authors Manual Split All Identifiers Same split? (CamelCase Samurai TIDIER) Checked source NO code Discordant Split Identifiers YES Building the Oracle Concordant Split Identifiers 28
29 Examples: DT, i3, P754, zzz, etc. Identifiers that could not be split Manually Split Identifiers Left unchanged Extract Identifiers Manual Split All Identifiers Same split? (CamelCase Samurai TIDIER) NO Discordant Split Identifiers YES Building the Oracle Concordant Split Identifiers 29
30 Design of the Case Study 30
31 Design of the Case Study RQ: Does a FLT with an advanced splitting algorithm produce better results than the same FLT using the CamelCase splitting algorithm? 31
32 How to Compare two FLTs? 32
33 How to Compare two FLTs? Effectiveness measure for each feature Method IR LSI score M M M M M M M M M Gold set method Effectiveness = 5 33
34 How to Compare two FLTs? Effectiveness measure for each feature Method IR LSI score M M M M M M M M M Gold set method Effectiveness e ess = 2 IRDyn Method LSI score M M M M Method Executed method (from trace) 34
35 Which FLTs are we Comparing? 35
36 Software Systems Rhino 1.6R5 138 classes, 1,870 methods, 32K LOC Eaddy et al. s data* 2 datasets Dataset Size Queries Gold Sets Execution Information Rhino Features 241 Sections of ECMAScript documentation Eaddy et al.* Rhino Bugs 143 Bug title and Eaddy et al.* N/A description (CVS) * Full Execution Traces (from unit tests) 36
37 Software Systems jedit classes, 6.4K methods, 109K LOC 2 datasets t Dataset Size Queries Gold Sets Execution Information jedit Features 64 Feature (or Patch) title and description jedit Bugs 86 Bug title and description SVN SVN Marked Execution Traces Marked Execution Traces Datasets available at: 37
38 Generating the jedit Datasets SVN Commits between v4.2-v4.3 38
39 Generating the jedit Datasets SVN commit message Title + Description = Query
40 Generating the jedit Datasets Changed files Previous Version Compare using Eclipse AST Current Version Modified methods (gold set) 40
41 Presenting the Results 41
42 Presenting the Results Box plot of all effectiveness measure in datasets (e.g., 241 datapoints for Rhino Features ) Average Median 42
43 Rhino Features IR FLTs
44 IR FLTs Similar median and average Rhino Features
45 IR FLTs Rhino Features Similar median and average Rhino Bugs jedit Features jedit Bugs 45
46 IR FLTs Rhino Features Similar median and average Rhino Bugs Datasets with features have better results than datasets with bugs jedit Features jedit Bugs 46
47 IRDyn FLTs N/A Rhino Features Similar median and average Rhino Bugs jedit Features jedit Bugs 47
48 IRDyn FLTs N/A Rhino Features Similar median and average Rhino Bugs Datasets with features have better results than datasets with bugs jedit Features jedit Bugs 48
49 Compare FLTs by Percentages IR Oracle IRCamelCase (Baseline)
50 Compare FLTs by Percentages IR Oracle IRCamelCase (Baseline) /8 2/8 50
51 IR Rhino Features Rhino Bugs jedit Features jedit Bugs 51
52 IR Rhino Features Datasets with features vs. Datasets with bugs Rhino Bugs jedit Features jedit Bugs 52
53 IRDyn N/A Similar trend Rhino Features Rhino Bugs jedit Features jedit Bugs 53
54 Statistical Results Wilcoxon signed-rank test Null hypothesis There is no statistical ttiti significance ifi difference in terms of effectiveness between IR Samurai /IR Oracle and IR CamelCase Alternative hypothesis IR Samurai /IR Oracle has statistically ti ti significantly ifi higher effectiveness than IR CamelCase alpha =
55 IR Rhino Features The only statistical significant result (p=0.05) Rhino Bugs jedit Features jedit B 55
56 Qualitative Results Vocabulary mismatch between queries and code: Name of developers (e.g., Slava, Carlos) Identifiers specific to communication i (e.g., thanks, greetings, annoying) 56
57 Qualitative Results Features are more descriptive than bugs 57
58 Qualitative Results Features are more descriptive than bugs Words join and line are not mentioned 58
59 External Threats to Validity 2 Java applications (different domains) More systems needed Construct Errors may be present in Oracle and gold sets We used data produced by other researchers Internal Subjectivity and bias in building the Oracle Conclusion Non-parametric test: Wilcoxon signed-rank 59
60 Research Questions RQ1 Does IR Samurai outperform IR CamelCase in terms of effectiveness? NO RQ2 Does IR Samurai Dyn outperform IR CamelCase Dyn in terms of effectiveness? NO RQ3 Does IR Oracle outperform IR CamelCase in terms of effectiveness? In some cases (Rhino) RQ4 Does IR Oracle Dyn outperform IR CamelCase Dyn 60 in terms of effectiveness? NO
61 Future Work More systems and datasets Different maintenance tasks Traceability link recovery Consider other splitting algorithms 61
62 Conclusions Advanced splitting technique could improve FLTs We found some empirical evidence Splitting has more impact on IR FLT If execution information is available, it is not necessary to use an advance splitting technique 62
63 Thank you! Questions? William and Mary bdit@cs.wm.edud SEMERU 63
64 References Takang et al. (1996) Takang, A., Grubb, P., and Macredie, R., "The Effects of Comments and Identifier Names on Program Comprehensibility: An Experimental Investigation", Journal of Programming Languages, vol. 4, no. 3, 1996, pp Lawrie et al. (2006) Lawrie, D., Morrell, C., Feild, H., and Binkley, D., "What's in a Name? A Study of Identifiers", in Proc. of IEEE ICPC'06, June , pp Binkley et al. (2009) Binkley, D., Davis, M., Lawrie, D., and Morrell, C., "To CamelCase or Under_score", in Proc. of IEEE ICPC'09, May , pp Enslen et al. (2009) Enslen, E., Hill, E., Pollock, L., and Vijay-Shanker, K., "Mining i Source Code to Automatically Split Identifiers for Software Analysis", in Proc. of IEEE MSR'09, May , pp Guerrouj et al. (2011) Guerrouj, L., Di Penta, M., Antoniol, G., and Guéhéneuc, Y. -G G., "TIDIER: An Identifier Splitting Approach using Speech Recognition Techniques", JSME, vol. to appear,
Integrated Impact Analysis for Managing Software Changes. Malcom Gethers, Bogdan Dit, Huzefa Kagdi, Denys Poshyvanyk
Integrated Impact Analysis for Managing Software Changes Malcom Gethers, Bogdan Dit, Huzefa Kagdi, Denys Poshyvanyk Change Impact Analysis Software change impact analysis aims at estimating the potentially
More informationAn Efficient Approach for Requirement Traceability Integrated With Software Repository
An Efficient Approach for Requirement Traceability Integrated With Software Repository P.M.G.Jegathambal, N.Balaji P.G Student, Tagore Engineering College, Chennai, India 1 Asst. Professor, Tagore Engineering
More informationAn Efficient Approach for Requirement Traceability Integrated With Software Repository
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 15, Issue 4 (Nov. - Dec. 2013), PP 65-71 An Efficient Approach for Requirement Traceability Integrated With Software
More informationUsing Information Retrieval to Support Software Evolution
Using Information Retrieval to Support Software Evolution Denys Poshyvanyk Ph.D. Candidate SEVERE Group @ Software is Everywhere Software is pervading every aspect of life Software is difficult to make
More informationConfiguring Topic Models for Software Engineering Tasks in TraceLab
Configuring Topic Models for Software Engineering Tasks in TraceLab Bogdan Dit Annibale Panichella Evan Moritz Rocco Oliveto Massimiliano Di Penta Denys Poshyvanyk Andrea De Lucia TEFSE 13 San Francisco,
More informationA Heuristic-based Approach to Identify Concepts in Execution Traces
A Heuristic-based Approach to Identify Concepts in Execution Traces Fatemeh Asadi * Massimiliano Di Penta ** Giuliano Antoniol * Yann-Gaël Guéhéneuc ** * Ecole Polytechnique de Montréal, Canada ** Dept.
More informationParameterizing and Assembling IR-based Solutions for SE Tasks using Genetic Algorithms
Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic Algorithms Annibale Panichella 1, Bogdan Dit 2, Rocco Oliveto 3, Massimiliano Di Penta 4, Denys Poshyvanyk 5, Andrea De Lucia
More informationConclave: Writing Programs to Understand Programs
Conclave: Writing Programs to Understand Programs Nuno Ramos Carvalho 1, José João Almeida 1, Maria João Varanda Pereira 2, and Pedro Rangel Henriques 1 1 Departamento de Informática/CCTC Universidade
More informationFLAT 3 : Feature Location & Textual Tracing Tool
FLAT 3 : Feature Location & Textual Tracing Tool Trevor Savage, Meghan Revelle, Denys Poshyvanyk SEMERU Group @ William and Mary Addressed Problem The software developer has to maintain large software
More informationQUICKAR: Automatic Query Reformulation for Concept Location using Crowdsourced Knowledge
QUICKAR: Automatic Query Reformulation for Concept Location using Crowdsourced Knowledge Mohammad Masudur Rahman Chanchal K. Roy Department of Computer Science, University of Saskatchewan, Canada {masud.rahman,
More informationA Case Study on the Similarity Between Source Code and Bug Reports Vocabularies
A Case Study on the Similarity Between Source Code and Bug Reports Vocabularies Diego Cavalcanti 1, Dalton Guerrero 1, Jorge Figueiredo 1 1 Software Practices Laboratory (SPLab) Federal University of Campina
More informationCombining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification
Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification Denys Poshyvanyk, Yann-Gaël Guéhéneuc, Andrian Marcus, Giuliano Antoniol, Václav Rajlich 14 th IEEE International
More informationNormalizing Source Code Vocabulary
Normalizing Source Code Vocabulary Dawn Lawrie Dave Binkley Christopher Morrell Loyola University Maryland Baltimore MD 21210-2699, USA {lawrie, binkley}@cs.loyola.edu, chm@loyola.edu Keywords: source
More informationNatural Language-based Software Analyses and Tools for Software Maintenance
Natural Language-based Software Analyses and Tools for Software Maintenance Lori Pollock 1, K. Vijay-Shanker 1, Emily Hill 2, Giriprasad Sridhara 1, and David Shepherd 3 1 Computer and Information Sciences,
More informationRecovering Traceability Links between Code and Documentation
Recovering Traceability Links between Code and Documentation Paper by: Giuliano Antoniol, Gerardo Canfora, Gerardo Casazza, Andrea De Lucia, and Ettore Merlo Presentation by: Brice Dobry and Geoff Gerfin
More informationCombining Information Retrieval and Relevance Feedback for Concept Location
Combining Information Retrieval and Relevance Feedback for Concept Location Sonia Haiduc - WSU CS Graduate Seminar - Jan 19, 2010 1 Software changes Software maintenance: 50-90% of the global costs of
More informationOn the use of Relevance Feedback in IR-based Concept Location
On the use of Relevance Feedback in IR-based Concept Location Gregory Gay 1, Sonia Haiduc 2, Andrian Marcus 2, Tim Menzies 1 1 Lane Department of Computer Science, West Virginia University Morgantown,
More informationEnhancing Software Traceability By Automatically Expanding Corpora With Relevant Documentation
Enhancing Software Traceability By Automatically Expanding Corpora With Relevant Documentation Tathagata Dasgupta, Mark Grechanik: U. of Illinois, Chicago Evan Moritz, Bogdan Dit, Denys Poshyvanyk: College
More informationImpact Analysis of Granularity Levels on Feature Location Technique
Impact Analysis of Granularity Levels on Feature Location Technique Chakkrit Tantithamthavorn, Akinori Ihara, Hideaki Hata, and Ken-ichi Matsumoto Software Engineering Laboratory, Graduate School of Information
More informationQuery Expansion Based on Crowd Knowledge for Code Search
PAGE 1 Query Expansion Based on Crowd Knowledge for Code Search Liming Nie, He Jiang*, Zhilei Ren, Zeyi Sun, Xiaochen Li Abstract As code search is a frequent developer activity in software development
More informationUsing Code Ownership to Improve IR-Based Traceability Link Recovery
Using Code Ownership to Improve IR-Based Traceability Link Recovery Diana Diaz 1, Gabriele Bavota 2, Andrian Marcus 3, Rocco Oliveto 4, Silvia Takahashi 1, Andrea De Lucia 5 1 Universidad de Los Andes,
More informationAccepted Manuscript. The Impact of IR-based Classifier Configuration on the Performance and the Effort of Method-Level Bug Localization
Accepted Manuscript The Impact of IR-based Classifier Configuration on the Performance and the Effort of Method-Level Bug Localization Chakkrit Tantithamthavorn, Surafel Lemma Abebe, Ahmed E. Hassan, Akinori
More informationQuery Expansion via Wordnet for Effective Code Search
Expansion via Wordnet for Effective Code Search Meili Lu, Xiaobing Sun, Shaowei Wang, David Lo Yucong Duan School of Information Engineering, Yangzhou University, Yangzhou, China School of Information
More informationQuery Expansion via Wordnet for Effective Code Search
Expansion via Wordnet for Effective Code Search Meili Lu, Xiaobing Sun, Shaowei Wang, David Lo Yucong Duan School of Information Engineering, Yangzhou University, Yangzhou, China School of Information
More informationSTRICT: Information Retrieval Based Search Term Identification for Concept Location
STRICT: Information Retrieval Based Search Term Identification for Concept Location Mohammad Masudur Rahman Chanchal K. Roy Department of Computer Science, University of Saskatchewan, Canada {masud.rahman,
More informationA Text Retrieval Approach to Recover Links among s and Source Code Classes
318 A Text Retrieval Approach to Recover Links among E-Mails and Source Code Classes Giuseppe Scanniello and Licio Mazzeo Universitá della Basilicata, Macchia Romana, Viale Dell Ateneo, 85100, Potenza,
More information2. PREVIOUS APPROACHES
Locating Bugs without Looking Back Tezcan Dilshener Michel Wermelinger Yijun Yu Computing and Communications Department, The Open University, United Kingdom ABSTRACT Bug localisation is a core program
More informationFeature Location via Information Retrieval based Filtering of a Single Scenario Execution Trace
Feature Location via Information Retrieval based Filtering of a Single Scenario Execution Trace Dapeng Liu, Andrian Marcus, Denys Poshyvanyk, Václav Rajlich SEVERE Group @ Incremental Change of Software
More information5/8/09. Tools for Finding Concerns
Tools for Finding Concerns Concepts Google Eclipse Search Find Concept What is the purpose of tools like make, ant, and maven? How are they similar to each other? How are they different from each other?
More informationAn Empirical Study of Identifier Splitting Techniques
Noname manuscript No. (will be inserted by the editor) An Empirical Study of Identifier Splitting Techniques Emily Hill, David Binkley, Dawn Lawrie, Lori Pollock, K. Vijay-Shanker Received: date / Accepted:
More informationA Statement Level Bug Localization Technique using Statement Dependency Graph
A Statement Level Bug Localization Technique using Statement Dependency Graph Shanto Rahman, Md. Mostafijur Rahman, Ahmad Tahmid and Kazi Sakib Institute of Information Technology, University of Dhaka,
More informationSearch-Based Software Maintenance and Evolution
Search-Based Software Maintenance and Evolution Annibale Panichella Advisor: Prof. Andrea De Lucia Dott. Rocco Oliveto Search-Based Software Engineering «The application of meta-heuristic search-based
More informationAmalgamating Source Code Authors, Maintainers, and Change Proneness to Triage Change Requests
Amalgamating Source Code Authors, Maintainers, and Change Proneness to Triage Change Requests Md Kamal Hossen, Huzefa Kagdi Department of Electrical Engineering and Computer Science Wichita State University
More informationWhere Should the Bugs Be Fixed?
Where Should the Bugs Be Fixed? More Accurate Information Retrieval-Based Bug Localization Based on Bug Reports Presented by: Chandani Shrestha For CS 6704 class About the Paper and the Authors Publication
More informationClassifying Bug Reports to Bugs and Other Requests Using Topic Modeling
Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling Natthakul Pingclasai Department of Computer Engineering Kasetsart University Bangkok, Thailand Email: b5310547207@ku.ac.th Hideaki
More informationA Topic Modeling Based Solution for Confirming Software Documentation Quality
A Topic Modeling Based Solution for Confirming Software Documentation Quality Nouh Alhindawi 1 Faculty of Sciences and Information Technology, JADARA UNIVERSITY Obaida M. Al-Hazaimeh 2 Department of Information
More informationVersion History, Similar Report, and Structure: Putting Them Together for Improved Bug Localization
Version History, Similar Report, and Structure: Putting Them Together for Improved Bug Localization Shaowei Wang and David Lo School of Information Systems Singapore Management University, Singapore {shaoweiwang.2010,davidlo}@smu.edu.sg
More informationSOURCE CODE ANALYSIS EXTRACTIVE APPROACH TO GENERATE TEXTUAL SUMMARY
SOURCE CODE ANALYSIS EXTRACTIVE APPROACH TO GENERATE TEXTUAL SUMMARY 1 KAREEM ABBAS DAWOOD, 2 KHAIRONI YATIM SHARIF, 3 KOH TIENG WEI 1,2,3 Department of Software Engineering and Information System Faculty
More informationAdvanced Matching Technique for Trustrace To Improve The Accuracy Of Requirement
Advanced Matching Technique for Trustrace To Improve The Accuracy Of Requirement S.Muthamizharasi 1, J.Selvakumar 2, M.Rajaram 3 PG Scholar, Dept of CSE (PG)-ME (Software Engineering), Sri Ramakrishna
More informationAdams, Bram 21 MapReduce as a General Framework to Support Research in Mining Software Repositories (MSR) Anbalagan, Prasanth
MSR 2009 Detailed Author Index [Page 1/11] A Adams, Bram 21 MapReduce as a General Framework to Support Research in Mining Software Repositories (MSR) Anbalagan, Prasanth 171 On Mining Data Across Software
More informationA Textual-based Technique for Smell Detection
A Textual-based Technique for Smell Detection Fabio Palomba1, Annibale Panichella2, Andrea De Lucia1, Rocco Oliveto3, Andy Zaidman2 1 University of Salerno, Italy 2 Delft University of Technology, The
More informationMeasuring the Semantic Similarity of Comments in Bug Reports
Measuring the Semantic Similarity of Comments in Bug Reports Bogdan Dit, Denys Poshyvanyk, Andrian Marcus Department of Computer Science Wayne State University Detroit Michigan 48202 313 577 5408
More informationAURA: A Hybrid Approach to Identify
: A Hybrid to Identify Wei Wu 1, Yann-Gaël Guéhéneuc 1, Giuliano Antoniol 2, and Miryung Kim 3 1 Ptidej Team, DGIGL, École Polytechnique de Montréal, Canada 2 SOCCER Lab, DGIGL, École Polytechnique de
More informationUSING THE STRUCTURAL LOCATION OF TERMS TO IMPROVE THE RESULTS OF TEXT RETRIEVAL BASED APPROACHES TO FEATURE LOCATION BRIAN EDDY
USING THE STRUCTURAL LOCATION OF TERMS TO IMPROVE THE RESULTS OF TEXT RETRIEVAL BASED APPROACHES TO FEATURE LOCATION by BRIAN EDDY JEFF GRAY, COMMITTEE CHAIR NICHOLAS KRAFT JEFFREY CARVER SUSAN VRBSKY
More informationMapping Bug Reports to Relevant Files and Automated Bug Assigning to the Developer Alphy Jose*, Aby Abahai T ABSTRACT I.
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 1 ISSN : 2456-3307 Mapping Bug Reports to Relevant Files and Automated
More informationExploiting Spatial Code Proximity and Order for Improved Source Code Retrieval for Bug Localization
Purdue University Purdue e-pubs Department of Electrical and Computer Engineering Technical Reports Department of Electrical and Computer Engineering 2013 Exploiting Spatial Code Proximity and Order for
More informationA New Family of Software Anti-Patterns: Linguistic Anti-Patterns
A New Family of Software Anti-Patterns: Linguistic Anti-Patterns Venera Arnaoudova 1,2, Massimiliano Di Penta 3, Giuliano Antoniol 2, Yann-Gaël Guéhéneuc 1 1 Ptidej Team, DGIGL, École Polytechnique de
More informationDURING the process of software development, developers
IEEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. X, NO. X, XX 20XX 1 Expanding Queries for Search Using Semantically Related API Class-names Feng Zhang, Haoran Niu, Iman Keivanloo, Member, IEEE, and Ying
More informationA Replicated Study on Duplicate Detection: Using Apache Lucene to Search Among Android Defects
A Replicated Study on Duplicate Detection: Using Apache Lucene to Search Among Android Defects Borg, Markus; Runeson, Per; Johansson, Jens; Mäntylä, Mika Published in: [Host publication title missing]
More informationFeature Location via Information Retrieval based Filtering of a Single Scenario Execution Trace
Feature Location via Information Retrieval based Filtering of a Single Scenario Execution Trace Dapeng Liu, Andrian Marcus, Denys Poshyvanyk, Václav Rajlich Department of Computer Science Wayne State University
More informationBug or Not? Bug Report Classification using N-Gram IDF
Bug or Not? Bug Report Classification using N-Gram IDF Pannavat Terdchanakul 1, Hideaki Hata 1, Passakorn Phannachitta 2, and Kenichi Matsumoto 1 1 Graduate School of Information Science, Nara Institute
More informationLocus: Locating Bugs from Software Changes
Locus: Locating Bugs from Software Changes Ming Wen, Rongxin Wu, Shing-Chi Cheung Department of Computer Science and Engineering The Hong Kong University of Science and Technology, Hong Kong, China {mwenaa,
More informationHow are Developers Treating License Inconsistency Issues? A Case Study on License Inconsistency Evolution in FOSS Projects
How are Developers Treating License Inconsistency Issues? A Case Study on License Inconsistency Evolution in FOSS Projects Yuhao Wu 1(B), Yuki Manabe 2, Daniel M. German 3, and Katsuro Inoue 1 1 Graduate
More informationImproving Information Retrieval. Based Bug Localisation Using. Contextual Heuristics
Improving Information Retrieval Based Bug Localisation Using Contextual Heuristics Tezcan Dilshener M.Sc. A thesis submitted to The Open University in partial fulfilment of the requirements for the degree
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More informationQuery-Based Configuration of Text Retrieval Solutions for Software Engineering Tasks
Query-Based Configuration of Text Retrieval Solutions for Software Engineering Tasks Laura Moreno 1, Gabriele Bavota 2, Sonia Haiduc 3, Massimiliano Di Penta 4, Rocco Oliveto 5, Barbara Russo 2, Andrian
More informationAutomatic Identification of Important Clones for Refactoring and Tracking
Automatic Identification of Important Clones for Refactoring and Tracking Manishankar Mondal Chanchal K. Roy Kevin A. Schneider Department of Computer Science, University of Saskatchewan, Canada {mshankar.mondal,
More informationFACE DETECTION AND LOCALIZATION USING DATASET OF TINY IMAGES
FACE DETECTION AND LOCALIZATION USING DATASET OF TINY IMAGES Swathi Polamraju and Sricharan Ramagiri Department of Electrical and Computer Engineering Clemson University ABSTRACT: Being motivated by the
More informationEmpirical Evaluation of Diagrams of the Run-Time Structure for Coding Tasks. Nariman Ammar Marwan Abi-Antoun Department of Computer Science
Empirical Evaluation of Diagrams of the Run-Time Structure for Coding Tasks Nariman Ammar Marwan Abi-Antoun Department of Computer Science Diagrams (can?) help developers with code modifications during
More informationImproving Bug Management using Correlations in Crash Reports
Noname manuscript No. (will be inserted by the editor) Improving Bug Management using Correlations in Crash Reports Shaohua Wang Foutse Khomh Ying Zou Received: date / Accepted: date Abstract Nowadays,
More informationAre Object Graphs Extracted Using Abstract Interpretation Significantly Different from the Code?
Are Object Graphs Extracted Using Abstract Interpretation Significantly Different from the Code? Marwan Abi-Antoun Sumukhi Chandrashekar Radu Vanciu Andrew Giang Wayne State University Department of Computer
More informationChapter 8. Evaluating Search Engine
Chapter 8 Evaluating Search Engine Evaluation Evaluation is key to building effective and efficient search engines Measurement usually carried out in controlled laboratory experiments Online testing can
More informationAutomatic Query Performance Assessment during the Retrieval of Software Artifacts
Automatic Query Performance Assessment during the Retrieval of Software Artifacts Sonia Haiduc Wayne State Univ. Detroit, MI, USA Gabriele Bavota Univ. of Salerno Fisciano (SA), Italy Rocco Oliveto Univ.
More informationIDENTIFICATION OF PROMOTED ECLIPSE UNSTABLE INTERFACES USING CLONE DETECTION TECHNIQUE
International Journal of Software Engineering & Applications (IJSEA), Vol.9, No.5, September 2018 IDENTIFICATION OF PROMOTED ECLIPSE UNSTABLE INTERFACES USING CLONE DETECTION TECHNIQUE Simon Kawuma 1 and
More informationASE 2017, Urbana-Champaign, IL, USA Technical Research /17 c 2017 IEEE
Improved Query Reformulation for Concept Location using CodeRank and Document Structures Mohammad Masudur Rahman Chanchal K. Roy Department of Computer Science, University of Saskatchewan, Canada {masud.rahman,
More informationTraceclipse: An Eclipse Plug-in for Traceability Link Recovery and Management
Traceclipse: An Eclipse Plug-in for Traceability Link Recovery and Management Samuel Klock, Malcom Gers, Bogdan Dit, Denys Poshyvanyk Department of Computer Science The College of William and Mary Williamsburg,
More informationConfiguring Topic Models for Software Engineering Tasks in TraceLab
Configuring Topic Models for Software Engineering Tasks in TraceLab Bogdan Dit 1, Annibale Panichella 2, Evan Moritz 1, Rocco Oliveto 3, Massimiliano Di Penta 4, Denys Poshyvanyk 1, Andrea De Lucia 2 1
More informationPredicting Source Code Quality with Static Analysis and Machine Learning. Vera Barstad, Morten Goodwin, Terje Gjøsæter
Predicting Source Code Quality with Static Analysis and Machine Learning Vera Barstad, Morten Goodwin, Terje Gjøsæter Faculty of Engineering and Science, University of Agder Serviceboks 509, NO-4898 Grimstad,
More informationToward Automatic Summarization of Arbitrary Java Statements for Novice Programmers
2018 IEEE International Conference on Software Maintenance and Evolution Toward Automatic Summarization of Arbitrary Java Statements for Novice Programmers Mohammed Hassan, Emily Hill Drew University Madison,
More informationIMPROVING FEATURE LOCATION BY COMBINING DYNAMIC ANALYSIS AND STATIC INFERENCES
IMPROVING FEATURE LOCATION BY COMBINING DYNAMIC ANALYSIS AND STATIC INFERENCES Meghan Revelle Advisor: David Coppit Department of Computer Science The College of William and Mary Williamsburg, Virginia
More informationEffect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching
Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Wolfgang Tannebaum, Parvaz Madabi and Andreas Rauber Institute of Software Technology and Interactive Systems, Vienna
More informationINFORMATION ACCESS VIA VOICE. dissertation is my own or was done in collaboration with my advisory committee. Yapin Zhong
INFORMATION ACCESS VIA VOICE Except where reference is made to the work of others, the work described in this dissertation is my own or was done in collaboration with my advisory committee. Yapin Zhong
More informationCloCom: Mining Existing Source Code for Automatic Comment Generation
CloCom: Mining Existing Source Code for Automatic Comment Generation Abstract Code comments are an integral part of software development. They improve program comprehension and software maintainability.
More informationStudying and Detecting Log-Related Issues
Noname manuscript No. (will be inserted by the editor) Studying and Detecting Log-Related Issues Mehran Hassani Weiyi Shang Emad Shihab Nikolaos Tsantalis Received: date / Accepted: date Abstract Logs
More informationAssisting Code Search with Automatic Query Reformulation for Bug Localization
2013 10th IEEE Working Conference on Mining Software Repositories Assisting Code Search with Automatic Query Reformulation for Bug Localization Bunyamin Sisman, Avinash C. Kak Purdue University West Lafayette,
More informationAutomatic Bug Assignment Using Information Extraction Methods
Automatic Bug Assignment Using Information Extraction Methods Ramin Shokripour Zarinah M. Kasirun Sima Zamani John Anvik Faculty of Computer Science & Information Technology University of Malaya Kuala
More informationRearranging the Order of Program Statements for Code Clone Detection
Rearranging the Order of Program Statements for Code Clone Detection Yusuke Sabi, Yoshiki Higo, Shinji Kusumoto Graduate School of Information Science and Technology, Osaka University, Japan Email: {y-sabi,higo,kusumoto@ist.osaka-u.ac.jp
More informationABSTRACT INTRODUCTION ISSN: OPEN ACCESS ARTICLE.
ISSN: 0976-3104 SPECIAL ISSUE: (EMERGING TECHNOLOGIES IN NETWORKING AND SECURITY (ETNS) Suseela and Devi IMPROVING THE ACCURACY OF IR TECHNIQUES USING TRACEABILITY MB. Suseela, PP. Devi Department Of Computer
More informationLRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier
LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier Wang Ding, Songnian Yu, Shanqing Yu, Wei Wei, and Qianfeng Wang School of Computer Engineering and Science, Shanghai University, 200072
More informationAn Exploratory Study on Interface Similarities in Code Clones
1 st WETSoDA, December 4, 2017 - Nanjing, China An Exploratory Study on Interface Similarities in Code Clones Md Rakib Hossain Misu, Abdus Satter, Kazi Sakib Institute of Information Technology University
More informationConfidence Measures: how much we can trust our speech recognizers
Confidence Measures: how much we can trust our speech recognizers Prof. Hui Jiang Department of Computer Science York University, Toronto, Ontario, Canada Email: hj@cs.yorku.ca Outline Speech recognition
More informationAUTOMATICALLY GENERATING DESCRIPTIVE SUMMARY COMMENTS FOR C LANGUAGE CODE AND FUNCTIONS
AUTOMATICALLY GENERATING DESCRIPTIVE SUMMARY COMMENTS FOR C LANGUAGE CODE AND FUNCTIONS Kusuma M S 1, K Srinivasa 2 1 M.Tech Student, 2 Assistant Professor Email: kusumams92@gmail.com 1, srinivas.karur@gmail.com
More informationMaintainability Index Variation Among PHP, Java, and Python Open Source Projects
Maintainability Index Variation Among PHP, Java, and Python Open Source Projects Celia Chen 1, Lin Shi 2, Kamonphop Srisopha 1 1 Computer Science Department, USC 2 Laboratory for Internet Software Technologies,
More informationUsing Stereotypes in the Automatic Generation of Natural Language Summaries for C++ Methods
Using Stereotypes in the Automatic Generation of Natural Language Summaries for C++ Methods 1 Department of Computer Science Kent State University Kent, Ohio 44242, USA {nabid, jmaletic}@kent.edu Nahla
More informationUniversity of Santiago de Compostela at CLEF-IP09
University of Santiago de Compostela at CLEF-IP9 José Carlos Toucedo, David E. Losada Grupo de Sistemas Inteligentes Dept. Electrónica y Computación Universidad de Santiago de Compostela, Spain {josecarlos.toucedo,david.losada}@usc.es
More informationSeparating Speech From Noise Challenge
Separating Speech From Noise Challenge We have used the data from the PASCAL CHiME challenge with the goal of training a Support Vector Machine (SVM) to estimate a noise mask that labels time-frames/frequency-bins
More informationEmpirical Study on Impact of Developer Collaboration on Source Code
Empirical Study on Impact of Developer Collaboration on Source Code Akshay Chopra University of Waterloo Waterloo, Ontario a22chopr@uwaterloo.ca Parul Verma University of Waterloo Waterloo, Ontario p7verma@uwaterloo.ca
More informationFiltering Bug Reports for Fix-Time Analysis
Filtering Bug Reports for Fix-Time Analysis Ahmed Lamkanfi, Serge Demeyer LORE - Lab On Reengineering University of Antwerp, Belgium Abstract Several studies have experimented with data mining algorithms
More informationCombining Conceptual and Domain-Based Couplings to Detect Database and Code Dependencies
Combining Conceptual and Domain-Based Couplings to Detect Database and Code Dependencies Malcom Gethers, Amir Aryani, Denys Poshyvanyk College of William and Mary, United States {mgethers,denys}@cs.wm.edu
More informationTopicViewer: Evaluating Remodularizations Using Semantic Clustering
TopicViewer: Evaluating Remodularizations Using Semantic Clustering Gustavo Jansen de S. Santos 1, Katyusco de F. Santos 2, Marco Tulio Valente 1, Dalton D. S. Guerrero 3, Nicolas Anquetil 4 1 Federal
More informationActive Code Search: Incorporating User Feedback to Improve Code Search Relevance
Active Code Search: Incorporating User Feedback to Improve Code Search Relevance Shaowei Wang, David Lo, and Lingxiao Jiang School of Information Systems, Singapore Management University {shaoweiwang.2010,davidlo,lxjiang}@smu.edu.sg
More informationRecovering Traceability Links between Code and Textual Specification through Automated Domain Model Extraction
Recovering Traceability Links between Code and Textual Specification through Automated Domain Model Extraction Jiří Vinárek 1, Petr Hnětynka 1, Viliam Šimko2, and Petr Kroha 1 1 Charles University in Prague,
More informationDomain Knowledge Driven Program Analysis
Domain Knowledge Driven Program Analysis Daniel Ratiu http://www4.in.tum.de/~ratiu/knowledge_repository.html WSR Bad-Honnef, 4 May 2009 Pressing Issues Program understanding is expensive - over 60% of
More informationA Case Study on the Impact of Similarity Measure on Information Retrieval based Software Engineering Tasks
Noname manuscript No. (will be inserted by the editor) A Case Study on the Impact of Similarity Measure on Information Retrieval based Software Engineering Tasks Md Masudur Rahman Saikat Chakraborty Gail
More informationComparative Assessment of Testing and Model Checking Using Program Mutation
Comparative Assessment of Testing and Using Mutation Research Talk Jeremy S. Bradbury, James R. Cordy, Juergen Dingel School of Computing! Queen s University Kingston! Ontario! Canada {bradbury, cordy,
More informationThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution Analyses Keisuke Miura Kyushu University, Japan miura@posl.ait.kyushuu.ac.jp Ahmed E. Hassan Queen s University, Canada ahmed@cs.queensu.ca Shane McIntosh
More informationDigital Forensic Text String Searching: Improving Information Retrieval Effectiveness by Thematically Clustering Search Results
Digital Forensic Text String Searching: Improving Information Retrieval Effectiveness by Thematically Clustering Search Results DFRWS 2007 Department of Information Systems & Technology Management The
More informationWeb Information Retrieval using WordNet
Web Information Retrieval using WordNet Jyotsna Gharat Asst. Professor, Xavier Institute of Engineering, Mumbai, India Jayant Gadge Asst. Professor, Thadomal Shahani Engineering College Mumbai, India ABSTRACT
More informationBetter Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web
Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Thaer Samar 1, Alejandro Bellogín 2, and Arjen P. de Vries 1 1 Centrum Wiskunde & Informatica, {samar,arjen}@cwi.nl
More informationBasic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval
Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval 1 Naïve Implementation Convert all documents in collection D to tf-idf weighted vectors, d j, for keyword vocabulary V. Convert
More information