Can Better Identifier Splitting Techniques Help Feature Location?

Size: px
Start display at page:

Download "Can Better Identifier Splitting Techniques Help Feature Location?"

Transcription

1 Can Better Identifier Splitting Techniques Help Feature Location? Bogdan Dit, Latifa Guerrouj, Denys Poshyvanyk, Giuliano Antoniol SEMERU 19 th IEEE International Conference on Program Comprehension (ICPC 11) Kingston, Ontario, Canada

2 2

3 Textual information embeds domain knowledge 3

4 Textual information embeds domain knowledge About 70% of source code consists of identifiers* * Deissenboeck, F. and Pizka, M., "Concise and Consistent Naming", Software 4 Quality Journal, vol. 14, no. 3, 2006, pp

5 Textual information embeds domain knowledge About 70% of source code consists of identifiers* Identifiers are important source of information for maintenance tasks: traceability link recovery feature location * Deissenboeck, F. and Pizka, M., "Concise and Consistent Naming", Software 5 Quality Journal, vol. 14, no. 3, 2006, pp

6 # of Cumulative Feature Location Papers based on Textual Information 6

7 Related Work on Identifiers Takang et al. (JPL 96) programs with full-word identifiers are more understandable than those with abbreviated ones Lawrie et al. (ICPC 06) full words and recognizable abbreviations lead to better comprehension Binkley et al. (ICPC 09) CamelCase style is easier to recognize than CamelCase style is easier to recognize than underscore 7

8 Related Work on Identifiers Enslen et al. (MSR 09) Samurai: algorithm for splitting identifiers (using tables of identifier frequencies) Guerrouj et al. (JSME 11) TIDIER: algorithm for splitting identifiers (using contextual information) Other related work Deissenboeck and Pizka (SQJ 06), Antoniol et al. (ICSM 07), Haiduc and Marcus (ICPC 08), etc. 8

9 Splitting Identifiers Correctly is Challenging 9

10 Identifier Splitting Algorithms Original Identifier userid setgid print_file2device SSLCertificate MINstring USERID currentsize readadapterobject tolocale imitating DEFMASKBit 10

11 Identifier Splitting Algorithms Original Identifier Camel Case userid user Id setgid set GID print_file2device print file 2 device SSLCertificate SSL Certificate MINstring MI Nstring USERID USERID currentsize currentsize readadapterobject readadapterobject tolocale tolocale imitating imitating DEFMASKBit DEFMASK Bit 11

12 Identifier Splitting Algorithms Original Identifier userid setgid print_file2device SSLCertificate MINstring USERID currentsize Camel Case user Id set GID print file 2 device SSL Certificate MI Nstring USERID currentsize readadapterobject p j tolocale imitating DEFMASKBit Handles underscore and digits readadapterobject Fails at mixed cases tolocale imitating DEFMASK Bit 12 Fails at same case identifiers

13 Identifier Splitting Algorithms Original Identifier Camel Case userid user Id setgid set GID print_file2device print file 2 device SSLCertificate SSL Certificate MINstring MI Nstring USERID USERID currentsize currentsize readadapterobject readadapterobject tolocale tolocale imitating imitating DEFMASKBit DEFMASK Bit 13

14 Identifier Splitting Algorithms Original Identifier Camel Case Samurai userid user Id user Id setgid set GID set GID print_file2device print file 2 device print file 2 device SSLCertificate SSL Certificate SSL Certificate MINstring MI Nstring MIN string USERID USERID USER ID currentsize currentsize current size readadapterobject readadapterobject read adapter object tolocale tolocale tol ocal e imitating imitating imi ta ting DEFMASKBit DEFMASK Bit DEF MASK Bit 14

15 Identifier Splitting Algorithms Original Identifier Camel Case Samurai Splits some cases where CamelCase cannot userid user Id user Id setgid set GID set GID print_file2device print file 2 device print file 2 device SSLCertificate SSL Certificate SSL Certificate MINstring MI Nstring MIN string USERID USERID USER ID currentsize currentsize current size readadapterobject readadapterobject read adapter object tolocale tolocale tol ocal e imitating imitating imi ta ting DEFMASKBit DEFMASK Bit DEF MASK Bit Oversplits 15

16 # of Cumulative Feature Location Papers based on Textual Information 16

17 # of Cumulative Feature Location Papers based on Textual Information Existing feature location techniques use Camel Case splitting 17

18 Information Retrieval FLT Generate corpus Preprocessing Remove non-literals Remove stop words Split identifiers Stemming Indexing Term-by-document matrix Singular Value Decomposition synchronized void print(testresult result, long runtime) throws IOException{ printheader(runtime); printerrors(result); printfailures(result); printfooter(result); } synchronized void print TestResult result long runtime throws IOException printheader runtime printerrors result printfailures result printfooter result User formulate query print TestResult result runtime IOException printheader runtime Generate results printerrors result printfailures result Ranked list printfooter result 18

19 Information Retrieval FLT Generate corpus Preprocessing Remove non-literals Remove stop words Split identifiers Stemming Indexing Term-by-document matrix Singular Value Decomposition print Test Result result run Time IO Exception print itheader run Time print Errors result print Failures result print Footer result print test result result run time io exception print head run time print error result print fail result print foot result print test result... User formulate query Generate results m Ranked list m

20 Information Retrieval FLT Generate corpus Preprocessing Remove non-literals Remove stop words Split identifiers Stemming Indexing Term-by-document matrix Singular Value Decomposition User formulate query Generate results print test result... m m Ranked list 20

21 IR and Dynamic Information FLT Generate corpus Preprocessing Remove non-literals Remove stop words Split identifiers Stemming Indexing Term-by-document matrix Singular Value Decomposition User formulate query Generate results Ranked list of executed methods Collect execution trace 21

22 Research Goal Evaluate how advanced splitting techniques impact the performance of feature location techniques 22

23 Information Retrieval FLT Generate corpus Preprocessing Remove non-literals Remove stop words Split identifiers Stemming Indexing Term-by-document matrix Singular Value Decomposition User formulate query Generate results Replace Camel Case with : Samurai Perfect Splitting algorithm (Oracle) Ranked list 23 B etter Worst

24 Extract Identifiers All Identifiers Building the Oracle 24

25 Extract Identifiers All Identifiers Same split? (CamelCase Samurai TIDIER) YES Building the Oracle Concordant Split Identifiers 25

26 Extract Identifiers All Identifiers Same split? (CamelCase Samurai TIDIER) Assume they are correct Building the Oracle YES Manually verified a sample Concordant Split Identifiers 26 Threat to validity

27 Manually Split Identifiers Extract Identifiers Manual Split All Identifiers Same split? (CamelCase Samurai TIDIER) NO Discordant Split Identifiers YES Building the Oracle Concordant Split Identifiers 27

28 Manually Split Identifiers Extract Identifiers Consensus between authors Manual Split All Identifiers Same split? (CamelCase Samurai TIDIER) Checked source NO code Discordant Split Identifiers YES Building the Oracle Concordant Split Identifiers 28

29 Examples: DT, i3, P754, zzz, etc. Identifiers that could not be split Manually Split Identifiers Left unchanged Extract Identifiers Manual Split All Identifiers Same split? (CamelCase Samurai TIDIER) NO Discordant Split Identifiers YES Building the Oracle Concordant Split Identifiers 29

30 Design of the Case Study 30

31 Design of the Case Study RQ: Does a FLT with an advanced splitting algorithm produce better results than the same FLT using the CamelCase splitting algorithm? 31

32 How to Compare two FLTs? 32

33 How to Compare two FLTs? Effectiveness measure for each feature Method IR LSI score M M M M M M M M M Gold set method Effectiveness = 5 33

34 How to Compare two FLTs? Effectiveness measure for each feature Method IR LSI score M M M M M M M M M Gold set method Effectiveness e ess = 2 IRDyn Method LSI score M M M M Method Executed method (from trace) 34

35 Which FLTs are we Comparing? 35

36 Software Systems Rhino 1.6R5 138 classes, 1,870 methods, 32K LOC Eaddy et al. s data* 2 datasets Dataset Size Queries Gold Sets Execution Information Rhino Features 241 Sections of ECMAScript documentation Eaddy et al.* Rhino Bugs 143 Bug title and Eaddy et al.* N/A description (CVS) * Full Execution Traces (from unit tests) 36

37 Software Systems jedit classes, 6.4K methods, 109K LOC 2 datasets t Dataset Size Queries Gold Sets Execution Information jedit Features 64 Feature (or Patch) title and description jedit Bugs 86 Bug title and description SVN SVN Marked Execution Traces Marked Execution Traces Datasets available at: 37

38 Generating the jedit Datasets SVN Commits between v4.2-v4.3 38

39 Generating the jedit Datasets SVN commit message Title + Description = Query

40 Generating the jedit Datasets Changed files Previous Version Compare using Eclipse AST Current Version Modified methods (gold set) 40

41 Presenting the Results 41

42 Presenting the Results Box plot of all effectiveness measure in datasets (e.g., 241 datapoints for Rhino Features ) Average Median 42

43 Rhino Features IR FLTs

44 IR FLTs Similar median and average Rhino Features

45 IR FLTs Rhino Features Similar median and average Rhino Bugs jedit Features jedit Bugs 45

46 IR FLTs Rhino Features Similar median and average Rhino Bugs Datasets with features have better results than datasets with bugs jedit Features jedit Bugs 46

47 IRDyn FLTs N/A Rhino Features Similar median and average Rhino Bugs jedit Features jedit Bugs 47

48 IRDyn FLTs N/A Rhino Features Similar median and average Rhino Bugs Datasets with features have better results than datasets with bugs jedit Features jedit Bugs 48

49 Compare FLTs by Percentages IR Oracle IRCamelCase (Baseline)

50 Compare FLTs by Percentages IR Oracle IRCamelCase (Baseline) /8 2/8 50

51 IR Rhino Features Rhino Bugs jedit Features jedit Bugs 51

52 IR Rhino Features Datasets with features vs. Datasets with bugs Rhino Bugs jedit Features jedit Bugs 52

53 IRDyn N/A Similar trend Rhino Features Rhino Bugs jedit Features jedit Bugs 53

54 Statistical Results Wilcoxon signed-rank test Null hypothesis There is no statistical ttiti significance ifi difference in terms of effectiveness between IR Samurai /IR Oracle and IR CamelCase Alternative hypothesis IR Samurai /IR Oracle has statistically ti ti significantly ifi higher effectiveness than IR CamelCase alpha =

55 IR Rhino Features The only statistical significant result (p=0.05) Rhino Bugs jedit Features jedit B 55

56 Qualitative Results Vocabulary mismatch between queries and code: Name of developers (e.g., Slava, Carlos) Identifiers specific to communication i (e.g., thanks, greetings, annoying) 56

57 Qualitative Results Features are more descriptive than bugs 57

58 Qualitative Results Features are more descriptive than bugs Words join and line are not mentioned 58

59 External Threats to Validity 2 Java applications (different domains) More systems needed Construct Errors may be present in Oracle and gold sets We used data produced by other researchers Internal Subjectivity and bias in building the Oracle Conclusion Non-parametric test: Wilcoxon signed-rank 59

60 Research Questions RQ1 Does IR Samurai outperform IR CamelCase in terms of effectiveness? NO RQ2 Does IR Samurai Dyn outperform IR CamelCase Dyn in terms of effectiveness? NO RQ3 Does IR Oracle outperform IR CamelCase in terms of effectiveness? In some cases (Rhino) RQ4 Does IR Oracle Dyn outperform IR CamelCase Dyn 60 in terms of effectiveness? NO

61 Future Work More systems and datasets Different maintenance tasks Traceability link recovery Consider other splitting algorithms 61

62 Conclusions Advanced splitting technique could improve FLTs We found some empirical evidence Splitting has more impact on IR FLT If execution information is available, it is not necessary to use an advance splitting technique 62

63 Thank you! Questions? William and Mary bdit@cs.wm.edud SEMERU 63

64 References Takang et al. (1996) Takang, A., Grubb, P., and Macredie, R., "The Effects of Comments and Identifier Names on Program Comprehensibility: An Experimental Investigation", Journal of Programming Languages, vol. 4, no. 3, 1996, pp Lawrie et al. (2006) Lawrie, D., Morrell, C., Feild, H., and Binkley, D., "What's in a Name? A Study of Identifiers", in Proc. of IEEE ICPC'06, June , pp Binkley et al. (2009) Binkley, D., Davis, M., Lawrie, D., and Morrell, C., "To CamelCase or Under_score", in Proc. of IEEE ICPC'09, May , pp Enslen et al. (2009) Enslen, E., Hill, E., Pollock, L., and Vijay-Shanker, K., "Mining i Source Code to Automatically Split Identifiers for Software Analysis", in Proc. of IEEE MSR'09, May , pp Guerrouj et al. (2011) Guerrouj, L., Di Penta, M., Antoniol, G., and Guéhéneuc, Y. -G G., "TIDIER: An Identifier Splitting Approach using Speech Recognition Techniques", JSME, vol. to appear,

Integrated Impact Analysis for Managing Software Changes. Malcom Gethers, Bogdan Dit, Huzefa Kagdi, Denys Poshyvanyk

Integrated Impact Analysis for Managing Software Changes. Malcom Gethers, Bogdan Dit, Huzefa Kagdi, Denys Poshyvanyk Integrated Impact Analysis for Managing Software Changes Malcom Gethers, Bogdan Dit, Huzefa Kagdi, Denys Poshyvanyk Change Impact Analysis Software change impact analysis aims at estimating the potentially

More information

An Efficient Approach for Requirement Traceability Integrated With Software Repository

An Efficient Approach for Requirement Traceability Integrated With Software Repository An Efficient Approach for Requirement Traceability Integrated With Software Repository P.M.G.Jegathambal, N.Balaji P.G Student, Tagore Engineering College, Chennai, India 1 Asst. Professor, Tagore Engineering

More information

An Efficient Approach for Requirement Traceability Integrated With Software Repository

An Efficient Approach for Requirement Traceability Integrated With Software Repository IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 15, Issue 4 (Nov. - Dec. 2013), PP 65-71 An Efficient Approach for Requirement Traceability Integrated With Software

More information

Using Information Retrieval to Support Software Evolution

Using Information Retrieval to Support Software Evolution Using Information Retrieval to Support Software Evolution Denys Poshyvanyk Ph.D. Candidate SEVERE Group @ Software is Everywhere Software is pervading every aspect of life Software is difficult to make

More information

Configuring Topic Models for Software Engineering Tasks in TraceLab

Configuring Topic Models for Software Engineering Tasks in TraceLab Configuring Topic Models for Software Engineering Tasks in TraceLab Bogdan Dit Annibale Panichella Evan Moritz Rocco Oliveto Massimiliano Di Penta Denys Poshyvanyk Andrea De Lucia TEFSE 13 San Francisco,

More information

A Heuristic-based Approach to Identify Concepts in Execution Traces

A Heuristic-based Approach to Identify Concepts in Execution Traces A Heuristic-based Approach to Identify Concepts in Execution Traces Fatemeh Asadi * Massimiliano Di Penta ** Giuliano Antoniol * Yann-Gaël Guéhéneuc ** * Ecole Polytechnique de Montréal, Canada ** Dept.

More information

Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic Algorithms

Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic Algorithms Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic Algorithms Annibale Panichella 1, Bogdan Dit 2, Rocco Oliveto 3, Massimiliano Di Penta 4, Denys Poshyvanyk 5, Andrea De Lucia

More information

Conclave: Writing Programs to Understand Programs

Conclave: Writing Programs to Understand Programs Conclave: Writing Programs to Understand Programs Nuno Ramos Carvalho 1, José João Almeida 1, Maria João Varanda Pereira 2, and Pedro Rangel Henriques 1 1 Departamento de Informática/CCTC Universidade

More information

FLAT 3 : Feature Location & Textual Tracing Tool

FLAT 3 : Feature Location & Textual Tracing Tool FLAT 3 : Feature Location & Textual Tracing Tool Trevor Savage, Meghan Revelle, Denys Poshyvanyk SEMERU Group @ William and Mary Addressed Problem The software developer has to maintain large software

More information

QUICKAR: Automatic Query Reformulation for Concept Location using Crowdsourced Knowledge

QUICKAR: Automatic Query Reformulation for Concept Location using Crowdsourced Knowledge QUICKAR: Automatic Query Reformulation for Concept Location using Crowdsourced Knowledge Mohammad Masudur Rahman Chanchal K. Roy Department of Computer Science, University of Saskatchewan, Canada {masud.rahman,

More information

A Case Study on the Similarity Between Source Code and Bug Reports Vocabularies

A Case Study on the Similarity Between Source Code and Bug Reports Vocabularies A Case Study on the Similarity Between Source Code and Bug Reports Vocabularies Diego Cavalcanti 1, Dalton Guerrero 1, Jorge Figueiredo 1 1 Software Practices Laboratory (SPLab) Federal University of Campina

More information

Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification

Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification Denys Poshyvanyk, Yann-Gaël Guéhéneuc, Andrian Marcus, Giuliano Antoniol, Václav Rajlich 14 th IEEE International

More information

Normalizing Source Code Vocabulary

Normalizing Source Code Vocabulary Normalizing Source Code Vocabulary Dawn Lawrie Dave Binkley Christopher Morrell Loyola University Maryland Baltimore MD 21210-2699, USA {lawrie, binkley}@cs.loyola.edu, chm@loyola.edu Keywords: source

More information

Natural Language-based Software Analyses and Tools for Software Maintenance

Natural Language-based Software Analyses and Tools for Software Maintenance Natural Language-based Software Analyses and Tools for Software Maintenance Lori Pollock 1, K. Vijay-Shanker 1, Emily Hill 2, Giriprasad Sridhara 1, and David Shepherd 3 1 Computer and Information Sciences,

More information

Recovering Traceability Links between Code and Documentation

Recovering Traceability Links between Code and Documentation Recovering Traceability Links between Code and Documentation Paper by: Giuliano Antoniol, Gerardo Canfora, Gerardo Casazza, Andrea De Lucia, and Ettore Merlo Presentation by: Brice Dobry and Geoff Gerfin

More information

Combining Information Retrieval and Relevance Feedback for Concept Location

Combining Information Retrieval and Relevance Feedback for Concept Location Combining Information Retrieval and Relevance Feedback for Concept Location Sonia Haiduc - WSU CS Graduate Seminar - Jan 19, 2010 1 Software changes Software maintenance: 50-90% of the global costs of

More information

On the use of Relevance Feedback in IR-based Concept Location

On the use of Relevance Feedback in IR-based Concept Location On the use of Relevance Feedback in IR-based Concept Location Gregory Gay 1, Sonia Haiduc 2, Andrian Marcus 2, Tim Menzies 1 1 Lane Department of Computer Science, West Virginia University Morgantown,

More information

Enhancing Software Traceability By Automatically Expanding Corpora With Relevant Documentation

Enhancing Software Traceability By Automatically Expanding Corpora With Relevant Documentation Enhancing Software Traceability By Automatically Expanding Corpora With Relevant Documentation Tathagata Dasgupta, Mark Grechanik: U. of Illinois, Chicago Evan Moritz, Bogdan Dit, Denys Poshyvanyk: College

More information

Impact Analysis of Granularity Levels on Feature Location Technique

Impact Analysis of Granularity Levels on Feature Location Technique Impact Analysis of Granularity Levels on Feature Location Technique Chakkrit Tantithamthavorn, Akinori Ihara, Hideaki Hata, and Ken-ichi Matsumoto Software Engineering Laboratory, Graduate School of Information

More information

Query Expansion Based on Crowd Knowledge for Code Search

Query Expansion Based on Crowd Knowledge for Code Search PAGE 1 Query Expansion Based on Crowd Knowledge for Code Search Liming Nie, He Jiang*, Zhilei Ren, Zeyi Sun, Xiaochen Li Abstract As code search is a frequent developer activity in software development

More information

Using Code Ownership to Improve IR-Based Traceability Link Recovery

Using Code Ownership to Improve IR-Based Traceability Link Recovery Using Code Ownership to Improve IR-Based Traceability Link Recovery Diana Diaz 1, Gabriele Bavota 2, Andrian Marcus 3, Rocco Oliveto 4, Silvia Takahashi 1, Andrea De Lucia 5 1 Universidad de Los Andes,

More information

Accepted Manuscript. The Impact of IR-based Classifier Configuration on the Performance and the Effort of Method-Level Bug Localization

Accepted Manuscript. The Impact of IR-based Classifier Configuration on the Performance and the Effort of Method-Level Bug Localization Accepted Manuscript The Impact of IR-based Classifier Configuration on the Performance and the Effort of Method-Level Bug Localization Chakkrit Tantithamthavorn, Surafel Lemma Abebe, Ahmed E. Hassan, Akinori

More information

Query Expansion via Wordnet for Effective Code Search

Query Expansion via Wordnet for Effective Code Search Expansion via Wordnet for Effective Code Search Meili Lu, Xiaobing Sun, Shaowei Wang, David Lo Yucong Duan School of Information Engineering, Yangzhou University, Yangzhou, China School of Information

More information

Query Expansion via Wordnet for Effective Code Search

Query Expansion via Wordnet for Effective Code Search Expansion via Wordnet for Effective Code Search Meili Lu, Xiaobing Sun, Shaowei Wang, David Lo Yucong Duan School of Information Engineering, Yangzhou University, Yangzhou, China School of Information

More information

STRICT: Information Retrieval Based Search Term Identification for Concept Location

STRICT: Information Retrieval Based Search Term Identification for Concept Location STRICT: Information Retrieval Based Search Term Identification for Concept Location Mohammad Masudur Rahman Chanchal K. Roy Department of Computer Science, University of Saskatchewan, Canada {masud.rahman,

More information

A Text Retrieval Approach to Recover Links among s and Source Code Classes

A Text Retrieval Approach to Recover Links among  s and Source Code Classes 318 A Text Retrieval Approach to Recover Links among E-Mails and Source Code Classes Giuseppe Scanniello and Licio Mazzeo Universitá della Basilicata, Macchia Romana, Viale Dell Ateneo, 85100, Potenza,

More information

2. PREVIOUS APPROACHES

2. PREVIOUS APPROACHES Locating Bugs without Looking Back Tezcan Dilshener Michel Wermelinger Yijun Yu Computing and Communications Department, The Open University, United Kingdom ABSTRACT Bug localisation is a core program

More information

Feature Location via Information Retrieval based Filtering of a Single Scenario Execution Trace

Feature Location via Information Retrieval based Filtering of a Single Scenario Execution Trace Feature Location via Information Retrieval based Filtering of a Single Scenario Execution Trace Dapeng Liu, Andrian Marcus, Denys Poshyvanyk, Václav Rajlich SEVERE Group @ Incremental Change of Software

More information

5/8/09. Tools for Finding Concerns

5/8/09. Tools for Finding Concerns Tools for Finding Concerns Concepts Google Eclipse Search Find Concept What is the purpose of tools like make, ant, and maven? How are they similar to each other? How are they different from each other?

More information

An Empirical Study of Identifier Splitting Techniques

An Empirical Study of Identifier Splitting Techniques Noname manuscript No. (will be inserted by the editor) An Empirical Study of Identifier Splitting Techniques Emily Hill, David Binkley, Dawn Lawrie, Lori Pollock, K. Vijay-Shanker Received: date / Accepted:

More information

A Statement Level Bug Localization Technique using Statement Dependency Graph

A Statement Level Bug Localization Technique using Statement Dependency Graph A Statement Level Bug Localization Technique using Statement Dependency Graph Shanto Rahman, Md. Mostafijur Rahman, Ahmad Tahmid and Kazi Sakib Institute of Information Technology, University of Dhaka,

More information

Search-Based Software Maintenance and Evolution

Search-Based Software Maintenance and Evolution Search-Based Software Maintenance and Evolution Annibale Panichella Advisor: Prof. Andrea De Lucia Dott. Rocco Oliveto Search-Based Software Engineering «The application of meta-heuristic search-based

More information

Amalgamating Source Code Authors, Maintainers, and Change Proneness to Triage Change Requests

Amalgamating Source Code Authors, Maintainers, and Change Proneness to Triage Change Requests Amalgamating Source Code Authors, Maintainers, and Change Proneness to Triage Change Requests Md Kamal Hossen, Huzefa Kagdi Department of Electrical Engineering and Computer Science Wichita State University

More information

Where Should the Bugs Be Fixed?

Where Should the Bugs Be Fixed? Where Should the Bugs Be Fixed? More Accurate Information Retrieval-Based Bug Localization Based on Bug Reports Presented by: Chandani Shrestha For CS 6704 class About the Paper and the Authors Publication

More information

Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling

Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling Natthakul Pingclasai Department of Computer Engineering Kasetsart University Bangkok, Thailand Email: b5310547207@ku.ac.th Hideaki

More information

A Topic Modeling Based Solution for Confirming Software Documentation Quality

A Topic Modeling Based Solution for Confirming Software Documentation Quality A Topic Modeling Based Solution for Confirming Software Documentation Quality Nouh Alhindawi 1 Faculty of Sciences and Information Technology, JADARA UNIVERSITY Obaida M. Al-Hazaimeh 2 Department of Information

More information

Version History, Similar Report, and Structure: Putting Them Together for Improved Bug Localization

Version History, Similar Report, and Structure: Putting Them Together for Improved Bug Localization Version History, Similar Report, and Structure: Putting Them Together for Improved Bug Localization Shaowei Wang and David Lo School of Information Systems Singapore Management University, Singapore {shaoweiwang.2010,davidlo}@smu.edu.sg

More information

SOURCE CODE ANALYSIS EXTRACTIVE APPROACH TO GENERATE TEXTUAL SUMMARY

SOURCE CODE ANALYSIS EXTRACTIVE APPROACH TO GENERATE TEXTUAL SUMMARY SOURCE CODE ANALYSIS EXTRACTIVE APPROACH TO GENERATE TEXTUAL SUMMARY 1 KAREEM ABBAS DAWOOD, 2 KHAIRONI YATIM SHARIF, 3 KOH TIENG WEI 1,2,3 Department of Software Engineering and Information System Faculty

More information

Advanced Matching Technique for Trustrace To Improve The Accuracy Of Requirement

Advanced Matching Technique for Trustrace To Improve The Accuracy Of Requirement Advanced Matching Technique for Trustrace To Improve The Accuracy Of Requirement S.Muthamizharasi 1, J.Selvakumar 2, M.Rajaram 3 PG Scholar, Dept of CSE (PG)-ME (Software Engineering), Sri Ramakrishna

More information

Adams, Bram 21 MapReduce as a General Framework to Support Research in Mining Software Repositories (MSR) Anbalagan, Prasanth

Adams, Bram 21 MapReduce as a General Framework to Support Research in Mining Software Repositories (MSR) Anbalagan, Prasanth MSR 2009 Detailed Author Index [Page 1/11] A Adams, Bram 21 MapReduce as a General Framework to Support Research in Mining Software Repositories (MSR) Anbalagan, Prasanth 171 On Mining Data Across Software

More information

A Textual-based Technique for Smell Detection

A Textual-based Technique for Smell Detection A Textual-based Technique for Smell Detection Fabio Palomba1, Annibale Panichella2, Andrea De Lucia1, Rocco Oliveto3, Andy Zaidman2 1 University of Salerno, Italy 2 Delft University of Technology, The

More information

Measuring the Semantic Similarity of Comments in Bug Reports

Measuring the Semantic Similarity of Comments in Bug Reports Measuring the Semantic Similarity of Comments in Bug Reports Bogdan Dit, Denys Poshyvanyk, Andrian Marcus Department of Computer Science Wayne State University Detroit Michigan 48202 313 577 5408

More information

AURA: A Hybrid Approach to Identify

AURA: A Hybrid Approach to Identify : A Hybrid to Identify Wei Wu 1, Yann-Gaël Guéhéneuc 1, Giuliano Antoniol 2, and Miryung Kim 3 1 Ptidej Team, DGIGL, École Polytechnique de Montréal, Canada 2 SOCCER Lab, DGIGL, École Polytechnique de

More information

USING THE STRUCTURAL LOCATION OF TERMS TO IMPROVE THE RESULTS OF TEXT RETRIEVAL BASED APPROACHES TO FEATURE LOCATION BRIAN EDDY

USING THE STRUCTURAL LOCATION OF TERMS TO IMPROVE THE RESULTS OF TEXT RETRIEVAL BASED APPROACHES TO FEATURE LOCATION BRIAN EDDY USING THE STRUCTURAL LOCATION OF TERMS TO IMPROVE THE RESULTS OF TEXT RETRIEVAL BASED APPROACHES TO FEATURE LOCATION by BRIAN EDDY JEFF GRAY, COMMITTEE CHAIR NICHOLAS KRAFT JEFFREY CARVER SUSAN VRBSKY

More information

Mapping Bug Reports to Relevant Files and Automated Bug Assigning to the Developer Alphy Jose*, Aby Abahai T ABSTRACT I.

Mapping Bug Reports to Relevant Files and Automated Bug Assigning to the Developer Alphy Jose*, Aby Abahai T ABSTRACT I. International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 1 ISSN : 2456-3307 Mapping Bug Reports to Relevant Files and Automated

More information

Exploiting Spatial Code Proximity and Order for Improved Source Code Retrieval for Bug Localization

Exploiting Spatial Code Proximity and Order for Improved Source Code Retrieval for Bug Localization Purdue University Purdue e-pubs Department of Electrical and Computer Engineering Technical Reports Department of Electrical and Computer Engineering 2013 Exploiting Spatial Code Proximity and Order for

More information

A New Family of Software Anti-Patterns: Linguistic Anti-Patterns

A New Family of Software Anti-Patterns: Linguistic Anti-Patterns A New Family of Software Anti-Patterns: Linguistic Anti-Patterns Venera Arnaoudova 1,2, Massimiliano Di Penta 3, Giuliano Antoniol 2, Yann-Gaël Guéhéneuc 1 1 Ptidej Team, DGIGL, École Polytechnique de

More information

DURING the process of software development, developers

DURING the process of software development, developers IEEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. X, NO. X, XX 20XX 1 Expanding Queries for Search Using Semantically Related API Class-names Feng Zhang, Haoran Niu, Iman Keivanloo, Member, IEEE, and Ying

More information

A Replicated Study on Duplicate Detection: Using Apache Lucene to Search Among Android Defects

A Replicated Study on Duplicate Detection: Using Apache Lucene to Search Among Android Defects A Replicated Study on Duplicate Detection: Using Apache Lucene to Search Among Android Defects Borg, Markus; Runeson, Per; Johansson, Jens; Mäntylä, Mika Published in: [Host publication title missing]

More information

Feature Location via Information Retrieval based Filtering of a Single Scenario Execution Trace

Feature Location via Information Retrieval based Filtering of a Single Scenario Execution Trace Feature Location via Information Retrieval based Filtering of a Single Scenario Execution Trace Dapeng Liu, Andrian Marcus, Denys Poshyvanyk, Václav Rajlich Department of Computer Science Wayne State University

More information

Bug or Not? Bug Report Classification using N-Gram IDF

Bug or Not? Bug Report Classification using N-Gram IDF Bug or Not? Bug Report Classification using N-Gram IDF Pannavat Terdchanakul 1, Hideaki Hata 1, Passakorn Phannachitta 2, and Kenichi Matsumoto 1 1 Graduate School of Information Science, Nara Institute

More information

Locus: Locating Bugs from Software Changes

Locus: Locating Bugs from Software Changes Locus: Locating Bugs from Software Changes Ming Wen, Rongxin Wu, Shing-Chi Cheung Department of Computer Science and Engineering The Hong Kong University of Science and Technology, Hong Kong, China {mwenaa,

More information

How are Developers Treating License Inconsistency Issues? A Case Study on License Inconsistency Evolution in FOSS Projects

How are Developers Treating License Inconsistency Issues? A Case Study on License Inconsistency Evolution in FOSS Projects How are Developers Treating License Inconsistency Issues? A Case Study on License Inconsistency Evolution in FOSS Projects Yuhao Wu 1(B), Yuki Manabe 2, Daniel M. German 3, and Katsuro Inoue 1 1 Graduate

More information

Improving Information Retrieval. Based Bug Localisation Using. Contextual Heuristics

Improving Information Retrieval. Based Bug Localisation Using. Contextual Heuristics Improving Information Retrieval Based Bug Localisation Using Contextual Heuristics Tezcan Dilshener M.Sc. A thesis submitted to The Open University in partial fulfilment of the requirements for the degree

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

Query-Based Configuration of Text Retrieval Solutions for Software Engineering Tasks

Query-Based Configuration of Text Retrieval Solutions for Software Engineering Tasks Query-Based Configuration of Text Retrieval Solutions for Software Engineering Tasks Laura Moreno 1, Gabriele Bavota 2, Sonia Haiduc 3, Massimiliano Di Penta 4, Rocco Oliveto 5, Barbara Russo 2, Andrian

More information

Automatic Identification of Important Clones for Refactoring and Tracking

Automatic Identification of Important Clones for Refactoring and Tracking Automatic Identification of Important Clones for Refactoring and Tracking Manishankar Mondal Chanchal K. Roy Kevin A. Schneider Department of Computer Science, University of Saskatchewan, Canada {mshankar.mondal,

More information

FACE DETECTION AND LOCALIZATION USING DATASET OF TINY IMAGES

FACE DETECTION AND LOCALIZATION USING DATASET OF TINY IMAGES FACE DETECTION AND LOCALIZATION USING DATASET OF TINY IMAGES Swathi Polamraju and Sricharan Ramagiri Department of Electrical and Computer Engineering Clemson University ABSTRACT: Being motivated by the

More information

Empirical Evaluation of Diagrams of the Run-Time Structure for Coding Tasks. Nariman Ammar Marwan Abi-Antoun Department of Computer Science

Empirical Evaluation of Diagrams of the Run-Time Structure for Coding Tasks. Nariman Ammar Marwan Abi-Antoun Department of Computer Science Empirical Evaluation of Diagrams of the Run-Time Structure for Coding Tasks Nariman Ammar Marwan Abi-Antoun Department of Computer Science Diagrams (can?) help developers with code modifications during

More information

Improving Bug Management using Correlations in Crash Reports

Improving Bug Management using Correlations in Crash Reports Noname manuscript No. (will be inserted by the editor) Improving Bug Management using Correlations in Crash Reports Shaohua Wang Foutse Khomh Ying Zou Received: date / Accepted: date Abstract Nowadays,

More information

Are Object Graphs Extracted Using Abstract Interpretation Significantly Different from the Code?

Are Object Graphs Extracted Using Abstract Interpretation Significantly Different from the Code? Are Object Graphs Extracted Using Abstract Interpretation Significantly Different from the Code? Marwan Abi-Antoun Sumukhi Chandrashekar Radu Vanciu Andrew Giang Wayne State University Department of Computer

More information

Chapter 8. Evaluating Search Engine

Chapter 8. Evaluating Search Engine Chapter 8 Evaluating Search Engine Evaluation Evaluation is key to building effective and efficient search engines Measurement usually carried out in controlled laboratory experiments Online testing can

More information

Automatic Query Performance Assessment during the Retrieval of Software Artifacts

Automatic Query Performance Assessment during the Retrieval of Software Artifacts Automatic Query Performance Assessment during the Retrieval of Software Artifacts Sonia Haiduc Wayne State Univ. Detroit, MI, USA Gabriele Bavota Univ. of Salerno Fisciano (SA), Italy Rocco Oliveto Univ.

More information

IDENTIFICATION OF PROMOTED ECLIPSE UNSTABLE INTERFACES USING CLONE DETECTION TECHNIQUE

IDENTIFICATION OF PROMOTED ECLIPSE UNSTABLE INTERFACES USING CLONE DETECTION TECHNIQUE International Journal of Software Engineering & Applications (IJSEA), Vol.9, No.5, September 2018 IDENTIFICATION OF PROMOTED ECLIPSE UNSTABLE INTERFACES USING CLONE DETECTION TECHNIQUE Simon Kawuma 1 and

More information

ASE 2017, Urbana-Champaign, IL, USA Technical Research /17 c 2017 IEEE

ASE 2017, Urbana-Champaign, IL, USA Technical Research /17 c 2017 IEEE Improved Query Reformulation for Concept Location using CodeRank and Document Structures Mohammad Masudur Rahman Chanchal K. Roy Department of Computer Science, University of Saskatchewan, Canada {masud.rahman,

More information

Traceclipse: An Eclipse Plug-in for Traceability Link Recovery and Management

Traceclipse: An Eclipse Plug-in for Traceability Link Recovery and Management Traceclipse: An Eclipse Plug-in for Traceability Link Recovery and Management Samuel Klock, Malcom Gers, Bogdan Dit, Denys Poshyvanyk Department of Computer Science The College of William and Mary Williamsburg,

More information

Configuring Topic Models for Software Engineering Tasks in TraceLab

Configuring Topic Models for Software Engineering Tasks in TraceLab Configuring Topic Models for Software Engineering Tasks in TraceLab Bogdan Dit 1, Annibale Panichella 2, Evan Moritz 1, Rocco Oliveto 3, Massimiliano Di Penta 4, Denys Poshyvanyk 1, Andrea De Lucia 2 1

More information

Predicting Source Code Quality with Static Analysis and Machine Learning. Vera Barstad, Morten Goodwin, Terje Gjøsæter

Predicting Source Code Quality with Static Analysis and Machine Learning. Vera Barstad, Morten Goodwin, Terje Gjøsæter Predicting Source Code Quality with Static Analysis and Machine Learning Vera Barstad, Morten Goodwin, Terje Gjøsæter Faculty of Engineering and Science, University of Agder Serviceboks 509, NO-4898 Grimstad,

More information

Toward Automatic Summarization of Arbitrary Java Statements for Novice Programmers

Toward Automatic Summarization of Arbitrary Java Statements for Novice Programmers 2018 IEEE International Conference on Software Maintenance and Evolution Toward Automatic Summarization of Arbitrary Java Statements for Novice Programmers Mohammed Hassan, Emily Hill Drew University Madison,

More information

IMPROVING FEATURE LOCATION BY COMBINING DYNAMIC ANALYSIS AND STATIC INFERENCES

IMPROVING FEATURE LOCATION BY COMBINING DYNAMIC ANALYSIS AND STATIC INFERENCES IMPROVING FEATURE LOCATION BY COMBINING DYNAMIC ANALYSIS AND STATIC INFERENCES Meghan Revelle Advisor: David Coppit Department of Computer Science The College of William and Mary Williamsburg, Virginia

More information

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching Wolfgang Tannebaum, Parvaz Madabi and Andreas Rauber Institute of Software Technology and Interactive Systems, Vienna

More information

INFORMATION ACCESS VIA VOICE. dissertation is my own or was done in collaboration with my advisory committee. Yapin Zhong

INFORMATION ACCESS VIA VOICE. dissertation is my own or was done in collaboration with my advisory committee. Yapin Zhong INFORMATION ACCESS VIA VOICE Except where reference is made to the work of others, the work described in this dissertation is my own or was done in collaboration with my advisory committee. Yapin Zhong

More information

CloCom: Mining Existing Source Code for Automatic Comment Generation

CloCom: Mining Existing Source Code for Automatic Comment Generation CloCom: Mining Existing Source Code for Automatic Comment Generation Abstract Code comments are an integral part of software development. They improve program comprehension and software maintainability.

More information

Studying and Detecting Log-Related Issues

Studying and Detecting Log-Related Issues Noname manuscript No. (will be inserted by the editor) Studying and Detecting Log-Related Issues Mehran Hassani Weiyi Shang Emad Shihab Nikolaos Tsantalis Received: date / Accepted: date Abstract Logs

More information

Assisting Code Search with Automatic Query Reformulation for Bug Localization

Assisting Code Search with Automatic Query Reformulation for Bug Localization 2013 10th IEEE Working Conference on Mining Software Repositories Assisting Code Search with Automatic Query Reformulation for Bug Localization Bunyamin Sisman, Avinash C. Kak Purdue University West Lafayette,

More information

Automatic Bug Assignment Using Information Extraction Methods

Automatic Bug Assignment Using Information Extraction Methods Automatic Bug Assignment Using Information Extraction Methods Ramin Shokripour Zarinah M. Kasirun Sima Zamani John Anvik Faculty of Computer Science & Information Technology University of Malaya Kuala

More information

Rearranging the Order of Program Statements for Code Clone Detection

Rearranging the Order of Program Statements for Code Clone Detection Rearranging the Order of Program Statements for Code Clone Detection Yusuke Sabi, Yoshiki Higo, Shinji Kusumoto Graduate School of Information Science and Technology, Osaka University, Japan Email: {y-sabi,higo,kusumoto@ist.osaka-u.ac.jp

More information

ABSTRACT INTRODUCTION ISSN: OPEN ACCESS ARTICLE.

ABSTRACT INTRODUCTION ISSN: OPEN ACCESS ARTICLE. ISSN: 0976-3104 SPECIAL ISSUE: (EMERGING TECHNOLOGIES IN NETWORKING AND SECURITY (ETNS) Suseela and Devi IMPROVING THE ACCURACY OF IR TECHNIQUES USING TRACEABILITY MB. Suseela, PP. Devi Department Of Computer

More information

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier

LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier Wang Ding, Songnian Yu, Shanqing Yu, Wei Wei, and Qianfeng Wang School of Computer Engineering and Science, Shanghai University, 200072

More information

An Exploratory Study on Interface Similarities in Code Clones

An Exploratory Study on Interface Similarities in Code Clones 1 st WETSoDA, December 4, 2017 - Nanjing, China An Exploratory Study on Interface Similarities in Code Clones Md Rakib Hossain Misu, Abdus Satter, Kazi Sakib Institute of Information Technology University

More information

Confidence Measures: how much we can trust our speech recognizers

Confidence Measures: how much we can trust our speech recognizers Confidence Measures: how much we can trust our speech recognizers Prof. Hui Jiang Department of Computer Science York University, Toronto, Ontario, Canada Email: hj@cs.yorku.ca Outline Speech recognition

More information

AUTOMATICALLY GENERATING DESCRIPTIVE SUMMARY COMMENTS FOR C LANGUAGE CODE AND FUNCTIONS

AUTOMATICALLY GENERATING DESCRIPTIVE SUMMARY COMMENTS FOR C LANGUAGE CODE AND FUNCTIONS AUTOMATICALLY GENERATING DESCRIPTIVE SUMMARY COMMENTS FOR C LANGUAGE CODE AND FUNCTIONS Kusuma M S 1, K Srinivasa 2 1 M.Tech Student, 2 Assistant Professor Email: kusumams92@gmail.com 1, srinivas.karur@gmail.com

More information

Maintainability Index Variation Among PHP, Java, and Python Open Source Projects

Maintainability Index Variation Among PHP, Java, and Python Open Source Projects Maintainability Index Variation Among PHP, Java, and Python Open Source Projects Celia Chen 1, Lin Shi 2, Kamonphop Srisopha 1 1 Computer Science Department, USC 2 Laboratory for Internet Software Technologies,

More information

Using Stereotypes in the Automatic Generation of Natural Language Summaries for C++ Methods

Using Stereotypes in the Automatic Generation of Natural Language Summaries for C++ Methods Using Stereotypes in the Automatic Generation of Natural Language Summaries for C++ Methods 1 Department of Computer Science Kent State University Kent, Ohio 44242, USA {nabid, jmaletic}@kent.edu Nahla

More information

University of Santiago de Compostela at CLEF-IP09

University of Santiago de Compostela at CLEF-IP09 University of Santiago de Compostela at CLEF-IP9 José Carlos Toucedo, David E. Losada Grupo de Sistemas Inteligentes Dept. Electrónica y Computación Universidad de Santiago de Compostela, Spain {josecarlos.toucedo,david.losada}@usc.es

More information

Separating Speech From Noise Challenge

Separating Speech From Noise Challenge Separating Speech From Noise Challenge We have used the data from the PASCAL CHiME challenge with the goal of training a Support Vector Machine (SVM) to estimate a noise mask that labels time-frames/frequency-bins

More information

Empirical Study on Impact of Developer Collaboration on Source Code

Empirical Study on Impact of Developer Collaboration on Source Code Empirical Study on Impact of Developer Collaboration on Source Code Akshay Chopra University of Waterloo Waterloo, Ontario a22chopr@uwaterloo.ca Parul Verma University of Waterloo Waterloo, Ontario p7verma@uwaterloo.ca

More information

Filtering Bug Reports for Fix-Time Analysis

Filtering Bug Reports for Fix-Time Analysis Filtering Bug Reports for Fix-Time Analysis Ahmed Lamkanfi, Serge Demeyer LORE - Lab On Reengineering University of Antwerp, Belgium Abstract Several studies have experimented with data mining algorithms

More information

Combining Conceptual and Domain-Based Couplings to Detect Database and Code Dependencies

Combining Conceptual and Domain-Based Couplings to Detect Database and Code Dependencies Combining Conceptual and Domain-Based Couplings to Detect Database and Code Dependencies Malcom Gethers, Amir Aryani, Denys Poshyvanyk College of William and Mary, United States {mgethers,denys}@cs.wm.edu

More information

TopicViewer: Evaluating Remodularizations Using Semantic Clustering

TopicViewer: Evaluating Remodularizations Using Semantic Clustering TopicViewer: Evaluating Remodularizations Using Semantic Clustering Gustavo Jansen de S. Santos 1, Katyusco de F. Santos 2, Marco Tulio Valente 1, Dalton D. S. Guerrero 3, Nicolas Anquetil 4 1 Federal

More information

Active Code Search: Incorporating User Feedback to Improve Code Search Relevance

Active Code Search: Incorporating User Feedback to Improve Code Search Relevance Active Code Search: Incorporating User Feedback to Improve Code Search Relevance Shaowei Wang, David Lo, and Lingxiao Jiang School of Information Systems, Singapore Management University {shaoweiwang.2010,davidlo,lxjiang}@smu.edu.sg

More information

Recovering Traceability Links between Code and Textual Specification through Automated Domain Model Extraction

Recovering Traceability Links between Code and Textual Specification through Automated Domain Model Extraction Recovering Traceability Links between Code and Textual Specification through Automated Domain Model Extraction Jiří Vinárek 1, Petr Hnětynka 1, Viliam Šimko2, and Petr Kroha 1 1 Charles University in Prague,

More information

Domain Knowledge Driven Program Analysis

Domain Knowledge Driven Program Analysis Domain Knowledge Driven Program Analysis Daniel Ratiu http://www4.in.tum.de/~ratiu/knowledge_repository.html WSR Bad-Honnef, 4 May 2009 Pressing Issues Program understanding is expensive - over 60% of

More information

A Case Study on the Impact of Similarity Measure on Information Retrieval based Software Engineering Tasks

A Case Study on the Impact of Similarity Measure on Information Retrieval based Software Engineering Tasks Noname manuscript No. (will be inserted by the editor) A Case Study on the Impact of Similarity Measure on Information Retrieval based Software Engineering Tasks Md Masudur Rahman Saikat Chakraborty Gail

More information

Comparative Assessment of Testing and Model Checking Using Program Mutation

Comparative Assessment of Testing and Model Checking Using Program Mutation Comparative Assessment of Testing and Using Mutation Research Talk Jeremy S. Bradbury, James R. Cordy, Juergen Dingel School of Computing! Queen s University Kingston! Ontario! Canada {bradbury, cordy,

More information

The Impact of Task Granularity on Co-evolution Analyses

The Impact of Task Granularity on Co-evolution Analyses The Impact of Task Granularity on Co-evolution Analyses Keisuke Miura Kyushu University, Japan miura@posl.ait.kyushuu.ac.jp Ahmed E. Hassan Queen s University, Canada ahmed@cs.queensu.ca Shane McIntosh

More information

Digital Forensic Text String Searching: Improving Information Retrieval Effectiveness by Thematically Clustering Search Results

Digital Forensic Text String Searching: Improving Information Retrieval Effectiveness by Thematically Clustering Search Results Digital Forensic Text String Searching: Improving Information Retrieval Effectiveness by Thematically Clustering Search Results DFRWS 2007 Department of Information Systems & Technology Management The

More information

Web Information Retrieval using WordNet

Web Information Retrieval using WordNet Web Information Retrieval using WordNet Jyotsna Gharat Asst. Professor, Xavier Institute of Engineering, Mumbai, India Jayant Gadge Asst. Professor, Thadomal Shahani Engineering College Mumbai, India ABSTRACT

More information

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web

Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Thaer Samar 1, Alejandro Bellogín 2, and Arjen P. de Vries 1 1 Centrum Wiskunde & Informatica, {samar,arjen}@cwi.nl

More information

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval 1 Naïve Implementation Convert all documents in collection D to tf-idf weighted vectors, d j, for keyword vocabulary V. Convert

More information