CHAPTER 5 CLUSTERING USING MUST LINK AND CANNOT LINK ALGORITHM
|
|
- Ethelbert Tyrone Heath
- 6 years ago
- Views:
Transcription
1 82 CHAPTER 5 CLUSTERING USING MUST LINK AND CANNOT LINK ALGORITHM 5.1 INTRODUCTION In this phase, the prime attribute that is taken into consideration is the high dimensionality of the document space. The proposed system employs three different mechanisms. The first stage is the identification of related words in a document, using the MLCL algorithm. The relevant keywords are grouped to form clusters, using three main equations. Thereafter the clusters formed are optimized using Gaussian parameters. The Gaussian parameter identifies the word patterns and standard deviation of clusters. Then the words are grouped in accordance with the Gaussian outputs, and colligated into clusters with reference to the documents. 5.2 SYSTEM ARCHITECTURE Figure 5.1 shows the system architecture of the entire system. In this, the keywords extracted from the genetic algorithm process are clustered using Must Link and Cannot Link algorithm. The clusters are then optimized using Gaussian Parameters.
2 83 Figure 5.1 Overall System Architecture 5.3 MUST LINK AND CANNOT LINK ALGORITHM The extracted keywords are passed into the MLCL algorithm. The terms which do not correlate with the other terms are identified and eliminated using the MLCL algorithm. Each document is considered as an individual cluster of key terms. The equations used to calculate the similarity between the key terms of each document are based on the principles of cosine Equations (5.1) and (5.2) compute a value for the are compared against a threshold Equation (5.3), to check if any kind of relationship can be established between the words. The related keywords will then be grouped to form a cluster. The Must Link and Cannot Link algorithm
3 84 has three different phases for the formation of clusters. Equations (5.1) to (5.3) is given as follows: D W t d W t d min (, ) 2* (, )*log min( W ( t, d )) 1 (5.1) 1 D max log * W2 ( t, d) W2 ( t, d) max( W ( t, d)) 1 2 (5.2) Val ( Dmin) Val ( D max) (5.3) The decision of whether the documents have to be grouped is taken, with respect to its key terms. The key terms expressed in a numerical form will boost the rates of accuracy, but will not make the process of clustering grouping of documents into clusters is based on Equations (5.4) to (5.6). These equations are based on the weight of each term. The small variances can identify the apt words and place them in the accurate clusters. Clustering equations depend on the weight of each term. The three equations are as follows: W1 W 2 E1 log Avg( W1, W 2) (5.4) E2 Low( W1, W 2) Avg( W1, W 2) (5.5) E3 log( low( W1, W 2)) log( W1) log( W 2) (5.6)
4 85 W1 and W2 are the terms in two different documents, D1 and D2, respectively. Equation (5.4) gives a numerical relationship between the terms w1 and w2 of two different documents. Equation (5.5) adopts the lowest weight amongst two words w1 and w2, to form a relationship between the two different documents. Equation (5.6) is a combined representation of E1 and E2. It gives the platform over which the relationship of the two terms can be calibrated. Based on these Equations (5.7) and (5.8), the following inequalities are formed. E3 E2 E1 E2 E3 E 1 (5.7) Or E3 E1 E1 E2 E3 E 2 (5.8) Three different cases are considered based on these inequalities. CASE 1: Documents with Matching Keywords The two documents are grouped together The average weight of the two key terms is computed in the final cluster CASE 2: Clusters with matching sub keywords Clusters for matching sub keywords Matched clusters will be grouped together The average weight of the two key terms is computed in the final cluster CASE 3: No Matching terms - Do Nothing
5 ALGORITHM FOR OPTIMIZATION OF CLUSTERS Let C be the total number of Clusters Let N(j,k) = Let W(j) = Let S(j) = for i from 1 to C do for j from i+1 to C do for x from 1 to count(i) do for y from 1 to count(j) do if ( word[x,i] == word[ y,j]) if( S(x) == S(y) ) Mark S(y) to 1 End if End if End for End for End for End for Let flag=0 for i from 1 to C do for x from 1 to count(i) do Get ( word[x,i] ) If (S(x) == 0) break Else flag=1 End For If flag =1 then delete the cluster End For
6 87 The algorithm given above is used to optimize the clusters. It exhibits how the standard deviation and word pattern can be used to form clusters. Terms with similar standard deviation can be grouped into a single cluster, and thus, the output is optimized. Some documents can appear in more than one cluster. In such cases the clusters are optimized, so that a document appears in one cluster alone. 5.5 CLUSTER OUTPUT Scatter Plots of REUTERS The process of testing MLCL Clustering evolved around aspects of the document test space. One of the key objectives was to keep up with the performance even as the document size increases. For the testing of Reuters, a single category of documents was taken into consideration. As indicated in the test plan, the output had to give a much higher value for the Micro Measures, when compared with the existing methods of Fuzzy clustering. Since a single category of topic was pushed into the algorithms, the algorithm has to group as many documents as possible into a single cluster. This is because all the documents are of the same genre. The figures consist of rectangular and oval shaped dimensions, to indicate the distinguishable documents. The total area covered by the shapes will give an account of the state space employed by the phases of MLCL Clustering Documents When the number of documents was 20, as indicated in Figure 5.1 the state space covered by the novel methods of MLCL clustering was much less than that of fuzzy logic. The MLCL method formed only two different clusters, in an individual segment in which most of the documents were placed. This satisfied the prime need of precision. The Fuzzy Algorithm formed three different clusters, each of which had an even distribution of
7 88 documents. Nevertheless, this attribute does not satisfy the requirement of one topic clustering. The novel algorithm stood ahead of the existing one, when tested against 20 documents under a single topic as shown in Figure 5.2. Figure 5.1 MLCL 20 Documents Reuters Figure 5.2 Fuzzy 20 Documents Reuters 21578
8 Documents Figures 5.3 and 5.4 give an account of how clustering was done, when the number of documents increased to 50. Figure 5.3 MLCL 50 Documents Reuters Figure 5.4 Fuzzy 50 Documents Reuters 21578
9 90 The novel method as denoted by the project adhered to the prime necessitate. The size of the state space did increase, but the documents were all clustered together. Thereby, the creation of false clusters was prevented. The Fuzzy algorithm produced an extra cluster for 50 documents. This gave rise to a subsequent decrease in the over precision and accuracy of the method. The clusters covered a greater area, and did compromise on the actual requirement of single clustering Documents The test on 100 documents offered an extreme view of how clustering would have happened in both the algorithms and is shown in Figures 5.5 and 5.6. Based on the monotonous approach of clustering in accordance with the genre, the number of documents in the individual cluster tended to boost the MLCL methodology, while the existing technique contributed to more diverse clusters. Figure 5.5 MLCL 100 Documents Reuters 21578
10 91 Figure 5.6 Fuzzy 100 Documents Reuters Scatter Plots of the Brown Corpus The next phase of testing is done on the other aspect of clustering. k on documents with individual topics. It deals with documents of different topics, which will be combined together and tested for the effective formation of clusters. Here, each test comprises of five different document topics, and the proposed method works with the intention to form five distinguished clusters Documents Initially, when 20 documents with five different topics were combined, the algorithm had to give a few distinguished clusters. As aimed, the process of MLCL clustering produced a few different clusters within a given state space and is shown in Figures 5.7 and 5.8. The output, as compared with the Fuzzy logic was similar, but it caught up with the prime requisite of document space management. The amount of state space covered by Fuzzy logic was much more than that of MLCL logic. In the novel method,
11 92 documents formed an even distribution amongst the clusters, and none of the documents were abandoned, whereas in Fuzzy logic, there are a few clusters with just two documents, which brings down the weightage of the methodology. Figure 5.7 MLCL 20 Documents Brown Corpus Figure 5.8 Fuzzy 20 Documents Brown Corpus
12 Documents A marked increase in performance was seen in Figures 5.9 and 5.10 while testing with 50 documents. Figure 5.9 MLCL 50 Documents Brown Corpus Figure 5.10 Fuzzy 50 Documents Brown Corpus
13 94 The new method of the MLCL Algorithm has adhered to the formation of a reasonable number of clusters. The documents were well distributed and distinct. The method of Fuzzy logic did not work to satisfactory levels. The documents were sparse and grouped into an inappropriate number of clusters. The output gives a clear impression of the poor performance of Fuzzy Logic. This has enhanced the newer method of MLCL Clustering Documents The above Figures 5.11 and 5.12 gives an account of the performance in 100 documents of 5 different categories. The grouping in MLCL clustering was appealingly different. Though the graphical representation on the scatter plot demonstrated a specific color, the plotting is diversified into distinct spaces. This shows the possibility of two different clusters. The intermediate areas are framed with other clusters showing the presence of two different document topics. Apart from this, the output of fuzzy logic was also lower. The documents were sparse and distributed, and only two different clusters were formed of 100 documents. Figure 5.11 MLCL 100 Documents Brown Corpus
14 95 Figure 5.12 Fuzzy 100 Documents Brown Corpus 5.6 TESTING The performance of clusters is evaluated using Micro Measures like Micro Averaged Precision (MicroP), Micro Averaged Recall (MicroR), Micro Averaged F Measure (MicroF) and Micro Averaged Accuracy (MicroA). Micro Averaged Precision ( generated/sklearn.metrics.precision_recall_fscore_support.html, wikipedia.org/wiki/f1_score) is defined as the ratio of the number of true positives (correct results) divided by the number of all the returned results and is given in Equation (5.9). It is the ability of the algorithm not to label as positive a sample that is negative. MicroP p i 1 p i 1 TP i TP i FP i (5.9)
15 96 Micro Averaged Recall [20,21] is defined as the ratio of the number of true positives (correct results) divided by the number of results that should have been returned. It is the ability of the algorithm to find all the positive samples. The Equation (5.10) is given as follows: MicroR p i 1 p i 1 TP i TP i FN i (5.10) Micro averaged F-measure [20, 21] is interpreted as the weighted average of precision and recall and is given in Equation(5.11) as follows: MicroF 2* MicroP* MicroR MicroP MicroR (5.11) Micro Averaged Accuracy [22], is defined as the portion of all decisions that were correct. It is defined in Equation(5.12) as follows: MicroA p i 1 p i 1 TP i FN TP FP TN FN i i i i i (5.12) used for the study. Two different datasets Reuters-21578, and the Brown Corpus, were REUTERS Reuters has been widely used for testing clustering algorithms. Reuters is an experimental data collection that appeared on the Reuters newswire of the year The dataset was obtained from
16 97 Table 5.1 MicroR Values (Reuters-21578) No of MLCL Fuzzy Logic Documents Clustering Table 5.2 MicroP values (Reuters-21578) No of MLCL Fuzzy Logic Documents Clustering Table 5.3 MicroF values (Reuters-21578) No of Documents Fuzzy Logic MLCL Clustering
17 98 Table 5.4 MicroA values (Reuters-21578) No of Documents Fuzzy Logic MLCL Clustering Figure 5.13 MicroR Values for Reuters Dataset
18 99 Figure 5.14 MicroP Values for Reuters Dataset Figure 5.15 MicroF Values for Reuters Dataset
19 100 Figure 5.16 MicroA Values for Reuters Dataset The values of MicroR, MicroP, MicroF and MicroA in Tables 5.1 to 5.4 dictate how the results of MLCL clustering stood ahead of Fuzzy logic, by numerical figures. This concludes the testing of single documents in a large data space, with a common topic of identification. The Figures 5.13 to 5.16 shows the diagrammatic representation of the same Brown Corpus The next phase of testing is done on the other aspect of clustering. work on documents with individual topics. It deals with documents of different topics, which will be combined together and tested for the effective formation of clusters. Here, each test comprises of five different document topics, and the proposed method works with the intention to form five distinguished clusters.
20 101 The Brown Corpus has 500 sample English Documents. The document contains tagged words. It can be used to identify the tense of each word. The text sample is distributed over 15 different genres. Each sample starts with a random sentence boundary, and continues to the next boundary. Table 5.5 MicroR values (Brown Corpus) No of Documents Fuzzy Logic MLCL Clustering Table 5.6 MicroP values (Brown Corpus) No of Documents Fuzzy Logic MLCL Clustering
21 102 Table 5.7 MicroF values (Brown Corpus) No of Documents Fuzzy Logic MLCL Clustering Table 5.8 MicroA values (Brown Corpus) No of Documents Fuzzy Logic MLCL Clustering
22 103 Figure 5.17 MicroR Values for Brown Corpus Dataset Figure 5.18 MicroP Values for Brown Corpus Dataset
23 104 Figure 5.19 MicroF Values for Brown Corpus Dataset Figure 5.20 MicroA Values for Brown Corpus Dataset
24 105 The numerical values in Tables 5.5 to 5.8 show that the new MLCL algorithm outperforms the fuzzy logic for the brown corpus dataset also. The Figures 5.17 to 5.20 shows the diagrammatic representation of the same. 5.7 SUMMARY In this work, clustering is done in two phases. First, the dimensionality of the text document is decreased by selecting the important keywords. Then, the selected keywords are clustered using a MLCL algorithm. The novel method incorporates various computations to find the similarity between the words and the documents. The relationship between the words of a document is calculated using the MLCL algorithm. Then the similarity measures are used to identify the initial clusters, and the clustering process is continued till all the documents are clustered. Finally, the clusters are optimized using Gaussian parameters. The entire process is tested for its effectiveness with two different benchmark datasets. The new MLCL clustering algorithm is compared against Fuzzy self-constructing feature clustering, and it was found that the new novel method outperformed the existing algorithm in a consistent manner. The next chapter talks about sentence based clustering, which is an extended version of the current work. The sentence based clustering helps to improve the process of text summarization and clustering.
CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES
CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES 6.1 INTRODUCTION The exploration of applications of ANN for image classification has yielded satisfactory results. But, the scope for improving
More informationBasic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval
Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval 1 Naïve Implementation Convert all documents in collection D to tf-idf weighted vectors, d j, for keyword vocabulary V. Convert
More informationDeveloping Focused Crawlers for Genre Specific Search Engines
Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More informationEXTRACTION OF RELEVANT WEB PAGES USING DATA MINING
Chapter 3 EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING 3.1 INTRODUCTION Generally web pages are retrieved with the help of search engines which deploy crawlers for downloading purpose. Given a query,
More informationCHAPTER 5 OPTIMAL CLUSTER-BASED RETRIEVAL
85 CHAPTER 5 OPTIMAL CLUSTER-BASED RETRIEVAL 5.1 INTRODUCTION Document clustering can be applied to improve the retrieval process. Fast and high quality document clustering algorithms play an important
More informationDATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane
DATA MINING AND MACHINE LEARNING Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Data preprocessing Feature normalization Missing
More informationVisualization of Crowd-Powered Impression Evaluation Results
Visualization of Crowd-Powered Impression Evaluation Results Erika GOMI,YuriSAITO, Takayuki ITOH (*)Graduate School of Humanities and Sciences, Ochanomizu University Tokyo, Japan {erika53, yuri, itot }
More informationA novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems
A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems Anestis Gkanogiannis and Theodore Kalamboukis Department of Informatics Athens University of Economics
More informationChapter 4: Text Clustering
4.1 Introduction to Text Clustering Clustering is an unsupervised method of grouping texts / documents in such a way that in spite of having little knowledge about the content of the documents, we can
More informationEvaluation Metrics. (Classifiers) CS229 Section Anand Avati
Evaluation Metrics (Classifiers) CS Section Anand Avati Topics Why? Binary classifiers Metrics Rank view Thresholding Confusion Matrix Point metrics: Accuracy, Precision, Recall / Sensitivity, Specificity,
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationAn Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data
An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University
More informationCHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM
96 CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM Clustering is the process of combining a set of relevant information in the same group. In this process KM algorithm plays
More informationFrequency Distributions
Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,
More informationFabric Image Retrieval Using Combined Feature Set and SVM
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IMPACT FACTOR: 5.258 IJCSMC,
More information1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra
Pattern Recall Analysis of the Hopfield Neural Network with a Genetic Algorithm Susmita Mohapatra Department of Computer Science, Utkal University, India Abstract: This paper is focused on the implementation
More informationLearning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009
Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer
More informationA New Online Clustering Approach for Data in Arbitrary Shaped Clusters
A New Online Clustering Approach for Data in Arbitrary Shaped Clusters Richard Hyde, Plamen Angelov Data Science Group, School of Computing and Communications Lancaster University Lancaster, LA1 4WA, UK
More informationNoise-based Feature Perturbation as a Selection Method for Microarray Data
Noise-based Feature Perturbation as a Selection Method for Microarray Data Li Chen 1, Dmitry B. Goldgof 1, Lawrence O. Hall 1, and Steven A. Eschrich 2 1 Department of Computer Science and Engineering
More informationCS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks
CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks Archana Sulebele, Usha Prabhu, William Yang (Group 29) Keywords: Link Prediction, Review Networks, Adamic/Adar,
More informationA. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value.
AP Statistics - Problem Drill 05: Measures of Variation No. 1 of 10 1. The range is calculated as. (A) The minimum data value minus the maximum data value. (B) The maximum data value minus the minimum
More informationResearch on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a
International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,
More informationA GENETIC ALGORITHM FOR CLUSTERING ON VERY LARGE DATA SETS
A GENETIC ALGORITHM FOR CLUSTERING ON VERY LARGE DATA SETS Jim Gasvoda and Qin Ding Department of Computer Science, Pennsylvania State University at Harrisburg, Middletown, PA 17057, USA {jmg289, qding}@psu.edu
More informationDT-Binarize: A Hybrid Binarization Method using Decision Tree for Protein Crystallization Images
DT-Binarize: A Hybrid Binarization Method using Decision Tree for Protein Crystallization Images İmren Dinç 1, Semih Dinç 1, Madhav Sigdel 1, Madhu S. Sigdel 1, Marc L. Pusey 2, Ramazan S. Aygün 1 1 DataMedia
More informationCHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and
CHAPTER-13 Mining Class Comparisons: Discrimination between DifferentClasses: 13.1 Introduction 13.2 Class Comparison Methods and Implementation 13.3 Presentation of Class Comparison Descriptions 13.4
More informationA Bagging Method using Decision Trees in the Role of Base Classifiers
A Bagging Method using Decision Trees in the Role of Base Classifiers Kristína Machová 1, František Barčák 2, Peter Bednár 3 1 Department of Cybernetics and Artificial Intelligence, Technical University,
More informationCHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES
70 CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 3.1 INTRODUCTION In medical science, effective tools are essential to categorize and systematically
More informationBetter Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web
Better Contextual Suggestions in ClueWeb12 Using Domain Knowledge Inferred from The Open Web Thaer Samar 1, Alejandro Bellogín 2, and Arjen P. de Vries 1 1 Centrum Wiskunde & Informatica, {samar,arjen}@cwi.nl
More informationCHAPTER 6 DETECTION OF MASS USING NOVEL SEGMENTATION, GLCM AND NEURAL NETWORKS
130 CHAPTER 6 DETECTION OF MASS USING NOVEL SEGMENTATION, GLCM AND NEURAL NETWORKS A mass is defined as a space-occupying lesion seen in more than one projection and it is described by its shapes and margin
More informationImage Segmentation Techniques for Object-Based Coding
Image Techniques for Object-Based Coding Junaid Ahmed, Joseph Bosworth, and Scott T. Acton The Oklahoma Imaging Laboratory School of Electrical and Computer Engineering Oklahoma State University {ajunaid,bosworj,sacton}@okstate.edu
More informationCS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering
W10.B.0.0 CS435 Introduction to Big Data W10.B.1 FAQs Term project 5:00PM March 29, 2018 PA2 Recitation: Friday PART 1. LARGE SCALE DATA AALYTICS 4. RECOMMEDATIO SYSTEMS 5. EVALUATIO AD VALIDATIO TECHIQUES
More informationA Language Independent Author Verifier Using Fuzzy C-Means Clustering
A Language Independent Author Verifier Using Fuzzy C-Means Clustering Notebook for PAN at CLEF 2014 Pashutan Modaresi 1,2 and Philipp Gross 1 1 pressrelations GmbH, Düsseldorf, Germany {pashutan.modaresi,
More information9.1. K-means Clustering
424 9. MIXTURE MODELS AND EM Section 9.2 Section 9.3 Section 9.4 view of mixture distributions in which the discrete latent variables can be interpreted as defining assignments of data points to specific
More informationCHAPTER 3 ASSOCIATON RULE BASED CLUSTERING
41 CHAPTER 3 ASSOCIATON RULE BASED CLUSTERING 3.1 INTRODUCTION This chapter describes the clustering process based on association rule mining. As discussed in the introduction, clustering algorithms have
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationQUANTUM BASED PSO TECHNIQUE FOR IMAGE SEGMENTATION
International Journal of Computer Engineering and Applications, Volume VIII, Issue I, Part I, October 14 QUANTUM BASED PSO TECHNIQUE FOR IMAGE SEGMENTATION Shradha Chawla 1, Vivek Panwar 2 1 Department
More informationClustering Technique with Potter stemmer and Hypergraph Algorithms for Multi-featured Query Processing
Vol.2, Issue.3, May-June 2012 pp-960-965 ISSN: 2249-6645 Clustering Technique with Potter stemmer and Hypergraph Algorithms for Multi-featured Query Processing Abstract In navigational system, it is important
More informationA Feature Selection Method to Handle Imbalanced Data in Text Classification
A Feature Selection Method to Handle Imbalanced Data in Text Classification Fengxiang Chang 1*, Jun Guo 1, Weiran Xu 1, Kejun Yao 2 1 School of Information and Communication Engineering Beijing University
More informationCSE 190, Spring 2015: Midterm
CSE 190, Spring 2015: Midterm Name: Student ID: Instructions Hand in your solution at or before 7:45pm. Answers should be written directly in the spaces provided. Do not open or start the test before instructed
More informationNDoT: Nearest Neighbor Distance Based Outlier Detection Technique
NDoT: Nearest Neighbor Distance Based Outlier Detection Technique Neminath Hubballi 1, Bidyut Kr. Patra 2, and Sukumar Nandi 1 1 Department of Computer Science & Engineering, Indian Institute of Technology
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationEmpirical Analysis of Single and Multi Document Summarization using Clustering Algorithms
Engineering, Technology & Applied Science Research Vol. 8, No. 1, 2018, 2562-2567 2562 Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms Mrunal S. Bewoor Department
More informationApplication of Support Vector Machine In Bioinformatics
Application of Support Vector Machine In Bioinformatics V. K. Jayaraman Scientific and Engineering Computing Group CDAC, Pune jayaramanv@cdac.in Arun Gupta Computational Biology Group AbhyudayaTech, Indore
More informationList of Exercises: Data Mining 1 December 12th, 2015
List of Exercises: Data Mining 1 December 12th, 2015 1. We trained a model on a two-class balanced dataset using five-fold cross validation. One person calculated the performance of the classifier by measuring
More informationCS294-1 Assignment 2 Report
CS294-1 Assignment 2 Report Keling Chen and Huasha Zhao February 24, 2012 1 Introduction The goal of this homework is to predict a users numeric rating for a book from the text of the user s review. The
More informationFeature Subset Selection using Clusters & Informed Search. Team 3
Feature Subset Selection using Clusters & Informed Search Team 3 THE PROBLEM [This text box to be deleted before presentation Here I will be discussing exactly what the prob Is (classification based on
More informationDetecting Polytomous Items That Have Drifted: Using Global Versus Step Difficulty 1,2. Xi Wang and Ronald K. Hambleton
Detecting Polytomous Items That Have Drifted: Using Global Versus Step Difficulty 1,2 Xi Wang and Ronald K. Hambleton University of Massachusetts Amherst Introduction When test forms are administered to
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised
More informationINF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering
INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,
More informationSemi supervised clustering for Text Clustering
Semi supervised clustering for Text Clustering N.Saranya 1 Assistant Professor, Department of Computer Science and Engineering, Sri Eshwar College of Engineering, Coimbatore 1 ABSTRACT: Based on clustering
More informationClassification Part 4
Classification Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Model Evaluation Metrics for Performance Evaluation How to evaluate
More informationXRDUG Seminar III Edward Laitila 3/1/2009
XRDUG Seminar III Edward Laitila 3/1/2009 XRDUG Seminar III Computer Algorithms Used for XRD Data Smoothing, Background Correction, and Generating Peak Files: Some Features of Interest in X-ray Diffraction
More informationLeveraging Set Relations in Exact Set Similarity Join
Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationA Framework for Clustering Massive Text and Categorical Data Streams
A Framework for Clustering Massive Text and Categorical Data Streams Charu C. Aggarwal IBM T. J. Watson Research Center charu@us.ibm.com Philip S. Yu IBM T. J.Watson Research Center psyu@us.ibm.com Abstract
More informationAn Efficient Character Segmentation Based on VNP Algorithm
Research Journal of Applied Sciences, Engineering and Technology 4(24): 5438-5442, 2012 ISSN: 2040-7467 Maxwell Scientific organization, 2012 Submitted: March 18, 2012 Accepted: April 14, 2012 Published:
More informationPITSCO Math Individualized Prescriptive Lessons (IPLs)
Orientation Integers 10-10 Orientation I 20-10 Speaking Math Define common math vocabulary. Explore the four basic operations and their solutions. Form equations and expressions. 20-20 Place Value Define
More informationVideo Aesthetic Quality Assessment by Temporal Integration of Photo- and Motion-Based Features. Wei-Ta Chu
1 Video Aesthetic Quality Assessment by Temporal Integration of Photo- and Motion-Based Features Wei-Ta Chu H.-H. Yeh, C.-Y. Yang, M.-S. Lee, and C.-S. Chen, Video Aesthetic Quality Assessment by Temporal
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationTopic:- DU_J18_MA_STATS_Topic01
DU MA MSc Statistics Topic:- DU_J18_MA_STATS_Topic01 1) In analysis of variance problem involving 3 treatments with 10 observations each, SSE= 399.6. Then the MSE is equal to: [Question ID = 2313] 1. 14.8
More informationUnivariate Margin Tree
Univariate Margin Tree Olcay Taner Yıldız Department of Computer Engineering, Işık University, TR-34980, Şile, Istanbul, Turkey, olcaytaner@isikun.edu.tr Abstract. In many pattern recognition applications,
More informationTraditional clustering fails if:
Traditional clustering fails if: -- in the input space the clusters are not linearly separable; -- the distance measure is not adequate; -- the assumptions limit the shape or the number of the clusters.
More informationGetting More from Segmentation Evaluation
Getting More from Segmentation Evaluation Martin Scaiano University of Ottawa Ottawa, ON, K1N 6N5, Canada mscai056@uottawa.ca Diana Inkpen University of Ottawa Ottawa, ON, K1N 6N5, Canada diana@eecs.uottawa.com
More informationCS294-1 Final Project. Algorithms Comparison
CS294-1 Final Project Algorithms Comparison Deep Learning Neural Network AdaBoost Random Forest Prepared By: Shuang Bi (24094630) Wenchang Zhang (24094623) 2013-05-15 1 INTRODUCTION In this project, we
More informationCMPSCI 646, Information Retrieval (Fall 2003)
CMPSCI 646, Information Retrieval (Fall 2003) Midterm exam solutions Problem CO (compression) 1. The problem of text classification can be described as follows. Given a set of classes, C = {C i }, where
More informationTHE preceding chapters were all devoted to the analysis of images and signals which
Chapter 5 Segmentation of Color, Texture, and Orientation Images THE preceding chapters were all devoted to the analysis of images and signals which take values in IR. It is often necessary, however, to
More informationOnline Pattern Recognition in Multivariate Data Streams using Unsupervised Learning
Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Devina Desai ddevina1@csee.umbc.edu Tim Oates oates@csee.umbc.edu Vishal Shanbhag vshan1@csee.umbc.edu Machine Learning
More informationPrincipal Component Image Interpretation A Logical and Statistical Approach
Principal Component Image Interpretation A Logical and Statistical Approach Md Shahid Latif M.Tech Student, Department of Remote Sensing, Birla Institute of Technology, Mesra Ranchi, Jharkhand-835215 Abstract
More informationYelp Restaurant Photo Classification
Yelp Restaurant Photo Classification Rajarshi Roy Stanford University rroy@stanford.edu Abstract The Yelp Restaurant Photo Classification challenge is a Kaggle challenge that focuses on the problem predicting
More informationA Robust Wipe Detection Algorithm
A Robust Wipe Detection Algorithm C. W. Ngo, T. C. Pong & R. T. Chin Department of Computer Science The Hong Kong University of Science & Technology Clear Water Bay, Kowloon, Hong Kong Email: fcwngo, tcpong,
More information8 th Grade Pre Algebra Pacing Guide 1 st Nine Weeks
8 th Grade Pre Algebra Pacing Guide 1 st Nine Weeks MS Objective CCSS Standard I Can Statements Included in MS Framework + Included in Phase 1 infusion Included in Phase 2 infusion 1a. Define, classify,
More informationClustering will not be satisfactory if:
Clustering will not be satisfactory if: -- in the input space the clusters are not linearly separable; -- the distance measure is not adequate; -- the assumptions limit the shape or the number of the clusters.
More informationTag-based Social Interest Discovery
Tag-based Social Interest Discovery Xin Li / Lei Guo / Yihong (Eric) Zhao Yahoo!Inc 2008 Presented by: Tuan Anh Le (aletuan@vub.ac.be) 1 Outline Introduction Data set collection & Pre-processing Architecture
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationContent-based Dimensionality Reduction for Recommender Systems
Content-based Dimensionality Reduction for Recommender Systems Panagiotis Symeonidis Aristotle University, Department of Informatics, Thessaloniki 54124, Greece symeon@csd.auth.gr Abstract. Recommender
More informationExtracting Rankings for Spatial Keyword Queries from GPS Data
Extracting Rankings for Spatial Keyword Queries from GPS Data Ilkcan Keles Christian S. Jensen Simonas Saltenis Aalborg University Outline Introduction Motivation Problem Definition Proposed Method Overview
More informationDI TRANSFORM. The regressive analyses. identify relationships
July 2, 2015 DI TRANSFORM MVstats TM Algorithm Overview Summary The DI Transform Multivariate Statistics (MVstats TM ) package includes five algorithm options that operate on most types of geologic, geophysical,
More informationEvaluating Segmentation
Evaluating Segmentation David Martin dmartin@cs.bc.edu Computer Science Department Boston College CVPR 2004 Graph-Based Image Segmentation Tutorial 1 How do you know when a segmentation algorithm is good?
More informationKeyword Extraction by KNN considering Similarity among Features
64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering May 25, 2011 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig Homework
More informationImproving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu CS 229 Fall
Improving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu (fcdh@stanford.edu), CS 229 Fall 2014-15 1. Introduction and Motivation High- resolution Positron Emission Tomography
More informationCHAPTER 4 FUZZY LOGIC, K-MEANS, FUZZY C-MEANS AND BAYESIAN METHODS
CHAPTER 4 FUZZY LOGIC, K-MEANS, FUZZY C-MEANS AND BAYESIAN METHODS 4.1. INTRODUCTION This chapter includes implementation and testing of the student s academic performance evaluation to achieve the objective(s)
More informationExample 1: Give the coordinates of the points on the graph.
Ordered Pairs Often, to get an idea of the behavior of an equation, we will make a picture that represents the solutions to the equation. A graph gives us that picture. The rectangular coordinate plane,
More informationCombinatorial PCA and SVM Methods for Feature Selection in Learning Classifications (Applications to Text Categorization)
Combinatorial PCA and SVM Methods for Feature Selection in Learning Classifications (Applications to Text Categorization) Andrei V. Anghelescu Ilya B. Muchnik Dept. of Computer Science DIMACS Email: angheles@cs.rutgers.edu
More informationPrivacy Preserving Probabilistic Record Linkage
Privacy Preserving Probabilistic Record Linkage Duncan Smith (Duncan.G.Smith@Manchester.ac.uk) Natalie Shlomo (Natalie.Shlomo@Manchester.ac.uk) Social Statistics, School of Social Sciences University of
More informationRefinement of Web Search using Word Sense Disambiguation and Intent Mining
International Journal of Information and Computation Technology. ISSN 974-2239 Volume 4, Number 3 (24), pp. 22-23 International Research Publications House http://www. irphouse.com /ijict.htm Refinement
More informationDigital Image Processing. Prof. P.K. Biswas. Department of Electronics & Electrical Communication Engineering
Digital Image Processing Prof. P.K. Biswas Department of Electronics & Electrical Communication Engineering Indian Institute of Technology, Kharagpur Image Segmentation - III Lecture - 31 Hello, welcome
More informationCIS UDEL Working Notes on ImageCLEF 2015: Compound figure detection task
CIS UDEL Working Notes on ImageCLEF 2015: Compound figure detection task Xiaolong Wang, Xiangying Jiang, Abhishek Kolagunda, Hagit Shatkay and Chandra Kambhamettu Department of Computer and Information
More informationInclusion of Aleatory and Epistemic Uncertainty in Design Optimization
10 th World Congress on Structural and Multidisciplinary Optimization May 19-24, 2013, Orlando, Florida, USA Inclusion of Aleatory and Epistemic Uncertainty in Design Optimization Sirisha Rangavajhala
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 7: Document Clustering December 4th, 2014 Wolf-Tilo Balke and José Pinto Institut für Informationssysteme Technische Universität Braunschweig The Cluster
More informationCSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16)
CSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16) Michael Hahsler Southern Methodist University These slides are largely based on the slides by Hinrich Schütze Institute for
More informationA Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2
Chapter 5 A Genetic Algorithm for Graph Matching using Graph Node Characteristics 1 2 Graph Matching has attracted the exploration of applying new computing paradigms because of the large number of applications
More informationCategorical Data in a Designed Experiment Part 2: Sizing with a Binary Response
Categorical Data in a Designed Experiment Part 2: Sizing with a Binary Response Authored by: Francisco Ortiz, PhD Version 2: 19 July 2018 Revised 18 October 2018 The goal of the STAT COE is to assist in
More informationAn Object Oriented Runtime Complexity Metric based on Iterative Decision Points
An Object Oriented Runtime Complexity Metric based on Iterative Amr F. Desouky 1, Letha H. Etzkorn 2 1 Computer Science Department, University of Alabama in Huntsville, Huntsville, AL, USA 2 Computer Science
More informationFACE DETECTION AND RECOGNITION OF DRAWN CHARACTERS HERMAN CHAU
FACE DETECTION AND RECOGNITION OF DRAWN CHARACTERS HERMAN CHAU 1. Introduction Face detection of human beings has garnered a lot of interest and research in recent years. There are quite a few relatively
More informationAn Overview of various methodologies used in Data set Preparation for Data mining Analysis
An Overview of various methodologies used in Data set Preparation for Data mining Analysis Arun P Kuttappan 1, P Saranya 2 1 M. E Student, Dept. of Computer Science and Engineering, Gnanamani College of
More informationAssembly dynamics of microtubules at molecular resolution
Supplementary Information with: Assembly dynamics of microtubules at molecular resolution Jacob W.J. Kerssemakers 1,2, E. Laura Munteanu 1, Liedewij Laan 1, Tim L. Noetzel 2, Marcel E. Janson 1,3, and
More informationDensity Based Clustering using Modified PSO based Neighbor Selection
Density Based Clustering using Modified PSO based Neighbor Selection K. Nafees Ahmed Research Scholar, Dept of Computer Science Jamal Mohamed College (Autonomous), Tiruchirappalli, India nafeesjmc@gmail.com
More informationRecommendation System for Location-based Social Network CS224W Project Report
Recommendation System for Location-based Social Network CS224W Project Report Group 42, Yiying Cheng, Yangru Fang, Yongqing Yuan 1 Introduction With the rapid development of mobile devices and wireless
More information