Configuring Topic Models for Software Engineering Tasks in TraceLab
|
|
- Anne Bridges
- 5 years ago
- Views:
Transcription
1 Configuring Topic Models for Software Engineering Tasks in TraceLab Bogdan Dit Annibale Panichella Evan Moritz Rocco Oliveto Massimiliano Di Penta Denys Poshyvanyk Andrea De Lucia TEFSE 13 San Francisco, CA
2 Textual Information Information Retrieval Techniques Maintenance Tasks
3 Textual Information Information Retrieval Techniques Maintenance Tasks Require Parameter Calibration
4 Textual Information Information Retrieval Techniques Maintenance Tasks Require Parameter Calibration imported from natural language community ad-hoc configuration Assumption: source code has the same characteristics as natural language
5 Textual Information Information Retrieval Techniques Maintenance Tasks Require Parameter Calibration
6 Textual Information Information Retrieval Techniques Maintenance Tasks Require Parameter Calibration Sub-optimal results
7 Textual Information Information Retrieval Techniques Maintenance Tasks Require Parameter Calibration [Hindle et ICSE 12]: source code is more regular and predictable than natural language Sub-optimal results
8 Textual Information Information Retrieval Techniques Maintenance Tasks Require Parameter Calibration ICSE 2013 Sub-optimal ICSE 2013
9 Textual Information Information Retrieval Techniques Maintenance Tasks Require Parameter Calibration ICSE 2013 Sub-optimal results
10 Textual Information Information Retrieval Techniques Maintenance Tasks Require Parameter Calibration ICSE 2013 Sub-optimal results Near-optimal results
11 Textual Information Information Retrieval Techniques Maintenance Tasks Require Parameter Calibration ICSE 2013 Sub-optimal results Near-optimal results
12 Textual Information Information Retrieval Techniques Maintenance Tasks LDA Require Parameter Calibration Traceability ICSE 2013 Sub-optimal results Near-optimal results
13 What is LDA?
14 Latent Dirichlet Allocation (LDA) Topic model that generates the distribution of latent topics from textual documents
15 Latent Dirichlet Allocation (LDA) Topic model that generates the distribution of latent topics from textual documents LDA logging http connect save GUI export
16 Source Artifacts Target Artifacts Preprocessing Term-by-Document Matrix
17 Source Artifacts Target Artifacts Preprocessing Term-by-Document Matrix LDA
18 Source Artifacts Target Artifacts Preprocessing Input Parameters Term-by-Document Matrix LDA
19 Source Artifacts Target Artifacts Preprocessing Input Parameters Term-by-Document Matrix LDA Topic by Documents
20 Source Artifacts Target Artifacts Preprocessing Input Parameters Term-by-Document Matrix LDA Topic by Documents Probability that document is related to topic
21 Source Artifacts Target Artifacts Preprocessing Input Parameters Term-by-Document Matrix LDA Topic by Documents
22 Source Artifacts Target Artifacts Preprocessing Input Parameters Term-by-Document Matrix LDA Topic by Documents
23 Source Artifacts Target Artifacts Preprocessing Input Parameters Term-by-Document Matrix LDA Compute document to document similarity Topic by Documents Compute query to document similarity Cluster documents by topics
24 Source Artifacts Target Artifacts Preprocessing Input Parameters Term-by-Document Matrix LDA Topic by Documents
25 Source Artifacts Target Artifacts Preprocessing Input Parameters Term-by-Document Matrix LDA Words to Topic Topic by Documents
26 Source Artifacts Target Artifacts Preprocessing Input Parameters Term-by-Document Matrix LDA Words to Topic Topic by Documents
27 Source Artifacts Target Artifacts Preprocessing Input Parameters Term-by-Document Matrix LDA Words to Topic Topic by Documents Generate topic labels (top 5 words)
28 Source Artifacts Target Artifacts Preprocessing Input Parameters Term-by-Document Matrix LDA
29 Source Artifacts Target Artifacts Preprocessing Input Parameters Term-by-Document Matrix LDA Configuration # of iterations # of topics α β
30 Source Artifacts Target Artifacts Preprocessing Input Parameters Term-by-Document Matrix LDA Configuration # of iterations # of topics α β Number of Gibbs samplings of LDA model
31 Number of topics Low High
32 Number of topics Low High General topics
33 Number of topics Low High General topics Specific topics
34 α, influences the smoothness of documents to topics distribution Low α High α
35 α, influences the smoothness of documents to topics distribution Low α High α
36 α, influences the smoothness of documents to topics distribution Low α High α
37 α, influences the smoothness of documents to topics distribution Low α High α
38 β, influences the smoothness of topics to words distribution Low β High β
39 β, influences the smoothness of topics to words distribution Low β High β
40 LDA parameters significantly 1,000 different configurations of LDA parameters influence the results Evaluate the Average Precision on EasyClinic
41 LDA parameters significantly 1,000 different configurations of LDA parameters influence the results Evaluate the Average Precision on EasyClinic
42 LDA parameters significantly 1,000 different configurations of LDA parameters influence the results Few configurations produce good results Evaluate the Average Precision on EasyClinic High variability in results LDA-GA
43 LDA GA: automatically calibrate the input parameters of LDA using genetic algorithms
44 LDA GA: automatically calibrate the input parameters of LDA using genetic algorithms Dataset LDA-GA
45 LDA GA: automatically calibrate the input parameters of LDA using genetic algorithms LDA parameters Dataset LDA-GA # of iterations # of topics α β LDA
46 LDA GA: automatically calibrate the input parameters of LDA using genetic algorithms LDA parameters Dataset LDA-GA # of iterations # of topics α β LDA Oracle Traceability Results
47 LDA GA: automatically calibrate the input parameters of LDA using genetic algorithms LDA parameters Dataset LDA-GA # of iterations # of topics α β LDA Dataset dependent Oracle, Task independent Oracle Traceability Results
48 Evaluating the LDA parameter configuration
49 Topic by Documents
50 Topic by Documents Dominant Topics
51 Topic by Documents LDA Model Dominant Topics
52 Topic by Documents LDA Model Dominant Topics
53 Topic by Documents LDA Model Dominant Topics
54 Topic by Documents LDA Model
55 Topic by Documents LDA Model Cohesion (similarity): how related the documents in the same clusters are Separation (dissimilarity): how distinct a cluster is from other clusters
56 Topic by Documents LDA Model Silhouette coefficient
57
58 Cohesive Separated Documents are rearranged more closely in the n- dimensional space
59 We propose a genetic algorithm to find those solutions
60 Choose a set of Random LDA parameters
61 Choose a set of Random LDA parameters
62 Choose a set of Random LDA parameters LDA Determine cohesion and separation
63 Choose a set of Random LDA parameters LDA Determine cohesion and separation Cohesion and Separation (Silhouette)
64 Choose a set of Random LDA parameters LDA Determine cohesion and separation Select best LDA parameters
65 Choose a set of Random LDA parameters LDA Determine cohesion and separation Select best LDA parameters
66 Choose a set of Random LDA parameters LDA Determine cohesion and separation Select best LDA parameters Crossover, mutation
67 Choose a set of Random LDA parameters Crossover LDA Determine cohesion and separation Select best LDA parameters Crossover, mutation
68 Choose a set of Random LDA parameters Crossover LDA Determine cohesion and separation Select best LDA parameters Crossover, mutation
69 Choose a set of Random LDA parameters Crossover LDA Determine cohesion and separation Select best LDA parameters Mutation Crossover, mutation
70 Choose a set of Random LDA parameters Crossover LDA Determine cohesion and separation Select best LDA parameters Mutation Crossover, mutation
71 Choose a set of Random LDA parameters LDA Determine cohesion and separation Select best LDA parameters Crossover, mutation
72 Choose a set of Random LDA parameters LDA Determine cohesion and separation Select best LDA parameters Crossover, mutation
73 Choose a set of Random LDA parameters LDA Determine cohesion and separation Select best LDA parameters Crossover, mutation
74 Choose a set of Random LDA parameters LDA Determine cohesion and separation Select best LDA parameters Return best LDA parameters # of iterations # of topics α β Crossover, mutation
75 Precision/Recall EasyClinic
76 Precision/Recall EasyClinic LDA-GA Exhaustive (best of 1,000 configurations) ad-hoc configuration [ICPC 10]
77 LDA GA in TraceLab R Implementation of LDA-GA
78 LDA GA in TraceLab Genetic algorithm settings LDA-GA settings
79
80 Textual Information Information Retrieval Techniques Maintenance Tasks Require Parameter Calibration ICSE 2013 Sub-optimal results Near-optimal results
81 Textual Information Information Retrieval Techniques Maintenance Tasks Other IR techniques: (LSI, RTM) Require Parameter Calibration Other parameters: (preprocessing) ICSE 2013 Sub-optimal results Near-optimal results
82 Thank you! Questions? GA/
Configuring Topic Models for Software Engineering Tasks in TraceLab
Configuring Topic Models for Software Engineering Tasks in TraceLab Bogdan Dit 1, Annibale Panichella 2, Evan Moritz 1, Rocco Oliveto 3, Massimiliano Di Penta 4, Denys Poshyvanyk 1, Andrea De Lucia 2 1
More informationParameterizing and Assembling IR-based Solutions for SE Tasks using Genetic Algorithms
Parameterizing and Assembling IR-based Solutions for SE Tasks using Genetic Algorithms Annibale Panichella 1, Bogdan Dit 2, Rocco Oliveto 3, Massimiliano Di Penta 4, Denys Poshyvanyk 5, Andrea De Lucia
More informationSearch-Based Software Maintenance and Evolution
Search-Based Software Maintenance and Evolution Annibale Panichella Advisor: Prof. Andrea De Lucia Dott. Rocco Oliveto Search-Based Software Engineering «The application of meta-heuristic search-based
More informationTopicViewer: Evaluating Remodularizations Using Semantic Clustering
TopicViewer: Evaluating Remodularizations Using Semantic Clustering Gustavo Jansen de S. Santos 1, Katyusco de F. Santos 2, Marco Tulio Valente 1, Dalton D. S. Guerrero 3, Nicolas Anquetil 4 1 Federal
More informationEnhancing Software Traceability By Automatically Expanding Corpora With Relevant Documentation
Enhancing Software Traceability By Automatically Expanding Corpora With Relevant Documentation Tathagata Dasgupta, Mark Grechanik: U. of Illinois, Chicago Evan Moritz, Bogdan Dit, Denys Poshyvanyk: College
More informationJSEA: A Program Comprehension Tool Adopting LDA-based Topic Modeling
JSEA: A Program Comprehension Tool Adopting LDA-based Topic Modeling Tianxia Wang School of Software Engineering Tongji University China Yan Liu School of Software Engineering Tongji University China Abstract
More informationA Heuristic-based Approach to Identify Concepts in Execution Traces
A Heuristic-based Approach to Identify Concepts in Execution Traces Fatemeh Asadi * Massimiliano Di Penta ** Giuliano Antoniol * Yann-Gaël Guéhéneuc ** * Ecole Polytechnique de Montréal, Canada ** Dept.
More informationLiangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*
Tracking Trends: Incorporating Term Volume into Temporal Topic Models Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Dept. of Computer Science and Engineering, Lehigh University, Bethlehem, PA,
More informationBug or Not? Bug Report Classification using N-Gram IDF
Bug or Not? Bug Report Classification using N-Gram IDF Pannavat Terdchanakul 1, Hideaki Hata 1, Passakorn Phannachitta 2, and Kenichi Matsumoto 1 1 Graduate School of Information Science, Nara Institute
More informationA Topic Modeling Based Solution for Confirming Software Documentation Quality
A Topic Modeling Based Solution for Confirming Software Documentation Quality Nouh Alhindawi 1 Faculty of Sciences and Information Technology, JADARA UNIVERSITY Obaida M. Al-Hazaimeh 2 Department of Information
More informationUSING THE STRUCTURAL LOCATION OF TERMS TO IMPROVE THE RESULTS OF TEXT RETRIEVAL BASED APPROACHES TO FEATURE LOCATION BRIAN EDDY
USING THE STRUCTURAL LOCATION OF TERMS TO IMPROVE THE RESULTS OF TEXT RETRIEVAL BASED APPROACHES TO FEATURE LOCATION by BRIAN EDDY JEFF GRAY, COMMITTEE CHAIR NICHOLAS KRAFT JEFFREY CARVER SUSAN VRBSKY
More informationCan Better Identifier Splitting Techniques Help Feature Location?
Can Better Identifier Splitting Techniques Help Feature Location? Bogdan Dit, Latifa Guerrouj, Denys Poshyvanyk, Giuliano Antoniol SEMERU 19 th IEEE International Conference on Program Comprehension (ICPC
More informationMining Software Repositories with Topic Models
Mining Software Repositories with Topic Models Stephen W. Thomas sthomas@cs.queensu.ca Technical Report 2012-586 School of Computing Queen s University Kingston, Ontario, Canada Completed: December 2010
More informationUsing Code Ownership to Improve IR-Based Traceability Link Recovery
Using Code Ownership to Improve IR-Based Traceability Link Recovery Diana Diaz 1, Gabriele Bavota 2, Andrian Marcus 3, Rocco Oliveto 4, Silvia Takahashi 1, Andrea De Lucia 5 1 Universidad de Los Andes,
More informationTraceclipse: An Eclipse Plug-in for Traceability Link Recovery and Management
Traceclipse: An Eclipse Plug-in for Traceability Link Recovery and Management Samuel Klock, Malcom Gers, Bogdan Dit, Denys Poshyvanyk Department of Computer Science The College of William and Mary Williamsburg,
More informationAccepted Manuscript. The Impact of IR-based Classifier Configuration on the Performance and the Effort of Method-Level Bug Localization
Accepted Manuscript The Impact of IR-based Classifier Configuration on the Performance and the Effort of Method-Level Bug Localization Chakkrit Tantithamthavorn, Surafel Lemma Abebe, Ahmed E. Hassan, Akinori
More informationA BIPARTITE GRAPH MODEL FOR COMPARING CLUSTERING OF SOFTWARE SYSTEM FOR FEATURE LOCATION
International Journal of Computer Engineering & Technology (IJCET) Volume 9, Issue 4, July-Aug 2018, pp. 63 72, Article IJCET_09_04_007 Available online at http://www.iaeme.com/ijcet/issues.asp?jtype=ijcet&vtype=9&itype=4
More information@%'#*+#*+1((1&0(1*#"A*!'#",*.(&/6(/&0-*0"+*.%40"(#6*7"81&40(#1"*(1*./331&(*.18(B0&%*9%806(1&#",* H;2*>0"+#+0(%* C""1*C660+%4#61*DEFG* I0J&#%-%*K0$1(0*
!"#$%&'#()*+%,-#*.(/+#*+#*.0-%&"1* 2#30&(#4%"(1*+#*50(%40(#60*%+*7"81&40(#60* 21((1&0(1*+#*9#6%&60*#"*.6#%":%*50(%40(#6;%
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationResearch on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a
International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,
More informationQuery-Based Configuration of Text Retrieval Solutions for Software Engineering Tasks
Query-Based Configuration of Text Retrieval Solutions for Software Engineering Tasks Laura Moreno 1, Gabriele Bavota 2, Sonia Haiduc 3, Massimiliano Di Penta 4, Rocco Oliveto 5, Barbara Russo 2, Andrian
More informationA NOVEL APPROACH FOR CLONE GROUP MAPPING BY USING TOPIC MODELING
A NOVEL APPROACH FOR CLONE GROUP MAPPING BY USING TOPIC MODELING Ruixia Zhang, Liping Zhang, Huan Wang and Zhuo Chen Computer and information engineering college, Inner Mongolia normal university, Hohhot,
More informationIntegrated Impact Analysis for Managing Software Changes. Malcom Gethers, Bogdan Dit, Huzefa Kagdi, Denys Poshyvanyk
Integrated Impact Analysis for Managing Software Changes Malcom Gethers, Bogdan Dit, Huzefa Kagdi, Denys Poshyvanyk Change Impact Analysis Software change impact analysis aims at estimating the potentially
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationCORRELATING FEATURES AND CODE BY DYNAMIC
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS Ren Wu Shanghai Lixin University of Commerce, Shanghai 201620, China ABSTRACT One major problem in maintaining a software system is to understand
More informationarxiv: v1 [cs.se] 10 Mar 2017
XamForumDB: a dataset for studying Q&A about cross-platform mobile applications development. MATIAS MARTINEZ, SYLVAIN LECOMTE UNIVERSITY OF VALENCIENNES, FRANCE FIRST NAME.LAST NAME@UNIV-VALENCIENNES.FR
More informationStatic Test Case Prioritization Using Topic Models
Empirical Software Engineering manuscript No. (will be inserted by the editor) Static Test Case Prioritization Using Topic Models Stephen W. Thomas Hadi Hemmati Ahmed E. Hassan Dorothea Blostein Received:
More informationAn Efficient Approach for Requirement Traceability Integrated With Software Repository
An Efficient Approach for Requirement Traceability Integrated With Software Repository P.M.G.Jegathambal, N.Balaji P.G Student, Tagore Engineering College, Chennai, India 1 Asst. Professor, Tagore Engineering
More informationRecovering Traceability Links between Code and Textual Specification through Automated Domain Model Extraction
Recovering Traceability Links between Code and Textual Specification through Automated Domain Model Extraction Jiří Vinárek 1, Petr Hnětynka 1, Viliam Šimko2, and Petr Kroha 1 1 Charles University in Prague,
More informationImproving interpretability in approximative fuzzy models via multi-objective evolutionary algorithms.
Improving interpretability in approximative fuzzy models via multi-objective evolutionary algorithms. Gómez-Skarmeta, A.F. University of Murcia skarmeta@dif.um.es Jiménez, F. University of Murcia fernan@dif.um.es
More informationInternal vs. External Parameters in Fitness Functions
Internal vs. External Parameters in Fitness Functions Pedro A. Diaz-Gomez Computing & Technology Department Cameron University Lawton, Oklahoma 73505, USA pdiaz-go@cameron.edu Dean F. Hougen School of
More informationAdvanced Matching Technique for Trustrace To Improve The Accuracy Of Requirement
Advanced Matching Technique for Trustrace To Improve The Accuracy Of Requirement S.Muthamizharasi 1, J.Selvakumar 2, M.Rajaram 3 PG Scholar, Dept of CSE (PG)-ME (Software Engineering), Sri Ramakrishna
More informationSupervised classification of law area in the legal domain
AFSTUDEERPROJECT BSC KI Supervised classification of law area in the legal domain Author: Mees FRÖBERG (10559949) Supervisors: Evangelos KANOULAS Tjerk DE GREEF June 24, 2016 Abstract Search algorithms
More informationParallelism for LDA Yang Ruan, Changsi An
Parallelism for LDA Yang Ruan, Changsi An (yangruan@indiana.edu, anch@indiana.edu) 1. Overview As parallelism is very important for large scale of data, we want to use different technology to parallelize
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationBehavioral Data Mining. Lecture 18 Clustering
Behavioral Data Mining Lecture 18 Clustering Outline Why? Cluster quality K-means Spectral clustering Generative Models Rationale Given a set {X i } for i = 1,,n, a clustering is a partition of the X i
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationA Textual-based Technique for Smell Detection
A Textual-based Technique for Smell Detection Fabio Palomba1, Annibale Panichella2, Andrea De Lucia1, Rocco Oliveto3, Andy Zaidman2 1 University of Salerno, Italy 2 Delft University of Technology, The
More informationPromoting Ranking Diversity for Biomedical Information Retrieval based on LDA
Promoting Ranking Diversity for Biomedical Information Retrieval based on LDA Yan Chen, Xiaoshi Yin, Zhoujun Li, Xiaohua Hu and Jimmy Huang State Key Laboratory of Software Development Environment, Beihang
More informationNature Methods: doi: /nmeth Supplementary Figure 1
Supplementary Figure 1 Performance analysis under four different clustering scenarios. Performance analysis under four different clustering scenarios. i) Standard Conditions, ii) a sparse data set with
More informationCSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)"
CSCI 599: Applications of Natural Language Processing Information Retrieval Retrieval Models (Part 1)" All slides Addison Wesley, Donald Metzler, and Anton Leuski, 2008, 2012! Retrieval Models" Provide
More informationFeature LDA: a Supervised Topic Model for Automatic Detection of Web API Documentations from the Web
Feature LDA: a Supervised Topic Model for Automatic Detection of Web API Documentations from the Web Chenghua Lin, Yulan He, Carlos Pedrinaci, and John Domingue Knowledge Media Institute, The Open University
More informationDATA MINING Introductory and Advanced Topics Part I
DATA MINING Introductory and Advanced Topics Part I Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides for the text by Dr. M.H.Dunham, Data
More informationAn Efficient Approach for Requirement Traceability Integrated With Software Repository
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 15, Issue 4 (Nov. - Dec. 2013), PP 65-71 An Efficient Approach for Requirement Traceability Integrated With Software
More informationA Improving Software Modularization via Automated Analysis of Latent Topics and Dependencies
A Improving Software Modularization via Automated Analysis of Latent Topics and Dependencies GABRIELE BAVOTA, University of Salerno, Italy MALCOM GETHERS, University of Maryland, Baltimore County, USA
More informationStatic test case prioritization using topic models
DOI 10.1007/s10664-012-9219-7 Static test case prioritization using topic models Stephen W. Thomas Hadi Hemmati Ahmed E. Hassan Dorothea Blostein Springer Science+Business Media, LLC 2012 Editor: Gregg
More informationA Genetic Algorithm for Software Design Migration from Structured to Object Oriented Paradigm
A Genetic Algorithm for Software Design Migration from Structured to Object Oriented Paradigm Md. Selim selim.iitdu@gmail.com Saeed Siddik siddik.saeed@gmail.com Alim Ul Gias alimulgias@gmail.com M. Abdullah-Al-Wadud
More informationRetrieval by Content. Part 3: Text Retrieval Latent Semantic Indexing. Srihari: CSE 626 1
Retrieval by Content art 3: Text Retrieval Latent Semantic Indexing Srihari: CSE 626 1 Latent Semantic Indexing LSI isadvantage of exclusive use of representing a document as a T-dimensional vector of
More informationScalable Object Classification using Range Images
Scalable Object Classification using Range Images Eunyoung Kim and Gerard Medioni Institute for Robotics and Intelligent Systems University of Southern California 1 What is a Range Image? Depth measurement
More informationSpatial Latent Dirichlet Allocation
Spatial Latent Dirichlet Allocation Xiaogang Wang and Eric Grimson Computer Science and Computer Science and Artificial Intelligence Lab Massachusetts Tnstitute of Technology, Cambridge, MA, 02139, USA
More informationDimension Reduction CS534
Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of
More informationClustering using Topic Models
Clustering using Topic Models Compiled by Sujatha Das, Cornelia Caragea Credits for slides: Blei, Allan, Arms, Manning, Rai, Lund, Noble, Page. Clustering Partition unlabeled examples into disjoint subsets
More informationPowered Outer Probabilistic Clustering
Proceedings of the World Congress on Engineering and Computer Science 217 Vol I WCECS 217, October 2-27, 217, San Francisco, USA Powered Outer Probabilistic Clustering Peter Taraba Abstract Clustering
More informationSearching for Configurations in Clone Evaluation A Replication Study
Searching for Configurations in Clone Evaluation A Replication Study Chaiyong Ragkhitwetsagul 1, Matheus Paixao 1, Manal Adham 1 Saheed Busari 1, Jens Krinke 1 and John H. Drake 2 1 University College
More informationClassication of Corporate and Public Text
Classication of Corporate and Public Text Kevin Nguyen December 16, 2011 1 Introduction In this project we try to tackle the problem of classifying a body of text as a corporate message (text usually sent
More informationAn Investigation of Basic Retrieval Models for the Dynamic Domain Task
An Investigation of Basic Retrieval Models for the Dynamic Domain Task Razieh Rahimi and Grace Hui Yang Department of Computer Science, Georgetown University rr1042@georgetown.edu, huiyang@cs.georgetown.edu
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationMultimodal Medical Image Retrieval based on Latent Topic Modeling
Multimodal Medical Image Retrieval based on Latent Topic Modeling Mandikal Vikram 15it217.vikram@nitk.edu.in Suhas BS 15it110.suhas@nitk.edu.in Aditya Anantharaman 15it201.aditya.a@nitk.edu.in Sowmya Kamath
More informationA Measurement Design for the Comparison of Expert Usability Evaluation and Mobile App User Reviews
A Measurement Design for the Comparison of Expert Usability Evaluation and Mobile App User Reviews Necmiye Genc-Nayebi and Alain Abran Department of Software Engineering and Information Technology, Ecole
More informationBug Triaging: Profile Oriented Developer Recommendation
Bug Triaging: Profile Oriented Developer Recommendation Anjali Sandeep Kumar Singh Department of Computer Science and Engineering, Jaypee Institute of Information Technology Abstract Software bugs are
More informationWhere Should the Bugs Be Fixed?
Where Should the Bugs Be Fixed? More Accurate Information Retrieval-Based Bug Localization Based on Bug Reports Presented by: Chandani Shrestha For CS 6704 class About the Paper and the Authors Publication
More informationOntology based Model and Procedure Creation for Topic Analysis in Chinese Language
Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Dong Han and Kilian Stoffel Information Management Institute, University of Neuchâtel Pierre-à-Mazel 7, CH-2000 Neuchâtel,
More informationCS5220: Assignment Topic Modeling via Matrix Computation in Julia
CS5220: Assignment Topic Modeling via Matrix Computation in Julia April 7, 2014 1 Introduction Topic modeling is widely used to summarize major themes in large document collections. Topic modeling methods
More informationFeature Subset Selection using Clusters & Informed Search. Team 3
Feature Subset Selection using Clusters & Informed Search Team 3 THE PROBLEM [This text box to be deleted before presentation Here I will be discussing exactly what the prob Is (classification based on
More informationComposite Heuristic Algorithm for Clustering Text Data Sets
Composite Heuristic Algorithm for Clustering Text Data Sets Nikita Nikitinsky, Tamara Sokolova and Ekaterina Pshehotskaya InfoWatch Nikita.Nikitinsky@infowatch.com, Tamara.Sokolova@infowatch.com, Ekaterina.Pshehotskaya@infowatch.com
More informationGenetic Algorithms for Classification and Feature Extraction
Genetic Algorithms for Classification and Feature Extraction Min Pei, Erik D. Goodman, William F. Punch III and Ying Ding, (1995), Genetic Algorithms For Classification and Feature Extraction, Michigan
More informationFeature selection. LING 572 Fei Xia
Feature selection LING 572 Fei Xia 1 Creating attribute-value table x 1 x 2 f 1 f 2 f K y Choose features: Define feature templates Instantiate the feature templates Dimensionality reduction: feature selection
More informationA Multi-Objective Technique for Test Suite Reduction
A Multi-Objective Technique for Test Suite Reduction Alessandro Marchetto, Md. Mahfuzul Islam, Angelo Susi Fondazione Bruno Kessler Trento, Italy {marchetto,mahfuzul,susi}@fbk.eu Giuseppe Scanniello Università
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More informationMining Features from the Object-Oriented Source Code of a Collection of Software Variants Using Formal Concept Analysis and Latent Semantic Indexing
Mining Features from the Object-Oriented Source Code of a Collection of Software Variants Using Formal Concept Analysis and Latent Semantic Indexing R. AL-msie deen 1, A.-D. Seriai 1, M. Huchard 1, C.
More informationAccurate Developer Recommendation for Bug Resolution
Accurate Developer Recommendation for Bug Resolution Xin Xia, David Lo, Xinyu Wang, and Bo Zhou College of Computer Science and Technology, Zhejiang University, China School of Information Systems, Singapore
More informationSELECTION OF A MULTIVARIATE CALIBRATION METHOD
SELECTION OF A MULTIVARIATE CALIBRATION METHOD 0. Aim of this document Different types of multivariate calibration methods are available. The aim of this document is to help the user select the proper
More informationCitation for published version (APA): He, J. (2011). Exploring topic structure: Coherence, diversity and relatedness
UvA-DARE (Digital Academic Repository) Exploring topic structure: Coherence, diversity and relatedness He, J. Link to publication Citation for published version (APA): He, J. (211). Exploring topic structure:
More informationBruno Martins. 1 st Semester 2012/2013
Link Analysis Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 4
More informationEffective Latent Space Graph-based Re-ranking Model with Global Consistency
Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationScalable Multidimensional Hierarchical Bayesian Modeling on Spark
Scalable Multidimensional Hierarchical Bayesian Modeling on Spark Robert Ormandi, Hongxia Yang and Quan Lu Yahoo! Sunnyvale, CA 2015 Click-Through-Rate (CTR) Prediction Estimating the probability of click
More informationBook Recommendation based on Social Information
Book Recommendation based on Social Information Chahinez Benkoussas and Patrice Bellot LSIS Aix-Marseille University chahinez.benkoussas@lsis.org patrice.bellot@lsis.org Abstract : In this paper, we present
More informationUsing Information Retrieval to Support Software Evolution
Using Information Retrieval to Support Software Evolution Denys Poshyvanyk Ph.D. Candidate SEVERE Group @ Software is Everywhere Software is pervading every aspect of life Software is difficult to make
More informationPartitioning Algorithms for Improving Efficiency of Topic Modeling Parallelization
Partitioning Algorithms for Improving Efficiency of Topic Modeling Parallelization Hung Nghiep Tran University of Information Technology VNU-HCMC Vietnam Email: nghiepth@uit.edu.vn Atsuhiro Takasu National
More informationCombining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification
Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification Denys Poshyvanyk, Yann-Gaël Guéhéneuc, Andrian Marcus, Giuliano Antoniol, Václav Rajlich 14 th IEEE International
More informationPredictive Indexing for Fast Search
Predictive Indexing for Fast Search Sharad Goel, John Langford and Alex Strehl Yahoo! Research, New York Modern Massive Data Sets (MMDS) June 25, 2008 Goel, Langford & Strehl (Yahoo! Research) Predictive
More informationClustering. Bruno Martins. 1 st Semester 2012/2013
Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 Motivation Basic Concepts
More informationClassifying Bug Reports to Bugs and Other Requests Using Topic Modeling
Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling Natthakul Pingclasai Department of Computer Engineering Kasetsart University Bangkok, Thailand Email: b5310547207@ku.ac.th Hideaki
More informationCHAPTER 5 ENERGY MANAGEMENT USING FUZZY GENETIC APPROACH IN WSN
97 CHAPTER 5 ENERGY MANAGEMENT USING FUZZY GENETIC APPROACH IN WSN 5.1 INTRODUCTION Fuzzy systems have been applied to the area of routing in ad hoc networks, aiming to obtain more adaptive and flexible
More informationLDA for Big Data - Outline
LDA FOR BIG DATA 1 LDA for Big Data - Outline Quick review of LDA model clustering words-in-context Parallel LDA ~= IPM Fast sampling tricks for LDA Sparsified sampler Alias table Fenwick trees LDA for
More informationText Analytics (Text Mining)
CSE 6242 / CX 4242 Apr 1, 2014 Text Analytics (Text Mining) Concepts and Algorithms Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer,
More informationEffective Tweet Contextualization with Hashtags Performance Prediction and Multi-Document Summarization
Effective Tweet Contextualization with Hashtags Performance Prediction and Multi-Document Summarization Romain Deveaud 1 and Florian Boudin 2 1 LIA - University of Avignon romain.deveaud@univ-avignon.fr
More informationClassification Using Genetic Programming. Patrick Kellogg General Assembly Data Science Course (8/23/15-11/12/15)
Classification Using Genetic Programming Patrick Kellogg General Assembly Data Science Course (8/23/15-11/12/15) Iris Data Set Iris Data Set Iris Data Set Iris Data Set Iris Data Set Create a geometrical
More informationHarp-DAAL for High Performance Big Data Computing
Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big
More informationGrid-Based Genetic Algorithm Approach to Colour Image Segmentation
Grid-Based Genetic Algorithm Approach to Colour Image Segmentation Marco Gallotta Keri Woods Supervised by Audrey Mbogho Image Segmentation Identifying and extracting distinct, homogeneous regions from
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationStopping Duplicate Bug Reports before they start with Continuous Querying for Bug Reports
Stopping Duplicate Bug Reports before they start with Continuous Querying for Bug Reports Abram Hindle 1 1 Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada, hindle1@ualberta.ca
More informationIntegrating Apache Mesos with Science Gateways via Apache Airavata
Integrating Apache Mesos with Science Gateways via Apache Airavata Organization: Apache Software Foundation Abstract: Science Gateways federate resources from multiple organizations. Most gateways solve
More informationThe k-means Algorithm and Genetic Algorithm
The k-means Algorithm and Genetic Algorithm k-means algorithm Genetic algorithm Rough set approach Fuzzy set approaches Chapter 8 2 The K-Means Algorithm The K-Means algorithm is a simple yet effective
More informationA Case Study on the Impact of Similarity Measure on Information Retrieval based Software Engineering Tasks
Noname manuscript No. (will be inserted by the editor) A Case Study on the Impact of Similarity Measure on Information Retrieval based Software Engineering Tasks Md Masudur Rahman Saikat Chakraborty Gail
More informationUnsupervised learning
Unsupervised learning Enrique Muñoz Ballester Dipartimento di Informatica via Bramante 65, 26013 Crema (CR), Italy enrique.munoz@unimi.it Enrique Muñoz Ballester 2017 1 Download slides data and scripts:
More informationTime Complexity Analysis of the Genetic Algorithm Clustering Method
Time Complexity Analysis of the Genetic Algorithm Clustering Method Z. M. NOPIAH, M. I. KHAIRIR, S. ABDULLAH, M. N. BAHARIN, and A. ARIFIN Department of Mechanical and Materials Engineering Universiti
More informationDuplicate Bug Report Detection with a Combination of Information Retrieval and Topic Modeling
Duplicate Bug Report Detection with a Combination of Information Retrieval and Topic Modeling Anh Tuan Nguyen, David Lo Tung Thanh Nguyen, Singapore Management Tien N. Nguyen University, Singapore Iowa
More informationRobust PDF Table Locator
Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records
More information