Fuzzy Clustering of Time-variant and invariant Features: Application to Sepsis Outcome Prediction
|
|
- Cori Barker
- 5 years ago
- Views:
Transcription
1 Fuzzy Clustering of Time-variant and invariant Features: Application to Sepsis Outcome Prediction Marta C. Ferreira* * Technical University of Lisbon, Instituto Superior Técnico, Dept. of Mechanical Engineering, CIS/IDMEC LAETA, Av. Rovisco Pais, Lisbon, Portugal ARTICLE INFO ABSTRACT This dissertation proposes a novel clustering method based on fuzzy c-means, which is capable of Keywords: handling information from time variant and invariant features. The new method, Mixed Clustering, Data Mining shows the advantages of successfully aggregating both data components to identify systems in a wide Machine Learning Clustering Time Series Analysis Mixed Data Septic Shock 1. Introduction 1.1. Knowledge Data Discovery The present developments in data warehouse enable storing of increasingly bigger sets of data, leading to a growth in the amount of information available regarding any given system as well as the analytical possibilities they provide. The Knowledge Data Discovery (KDD) process focuses on methodologies for extracting useful knowledge from the available information, data bases, (Fayyad, Piatetsky-Shapiro, & Smyth, 1996). Firstly the data relevant (target data) for the system under identification from the available data base, after which the target data is pre-processed, cleaning the range of application domains, such as Medical, Management or Energy Systems. The flexible formulation of the proposed methodology can adapt to data sets with multivariate time series and different similarity measures based on distance. In fact, in addition to the euclidean distance, the distance based on the popular Dynamic Time Warping method is used for time series similarity search, being capable of overcoming the temporal misalignment between them, commonly found on these applications. The contribution of the Mixed Clustering approach is demonstrated for forecasting and classification problems, the first being achieved through its application to a meteorological system for temperature and humidity forecasting based on geographical location. The method s performance as a binary classifier is demonstrated with a Medical application, where the goal is to predict the outcome of a patient diagnosed with septic shock through the analysis of physiological variables measured during a sampling period and patient s demography, which is constant during his stay in an Intensive Care Unit. The machine learning process is tested under unsupervised and supervised alternatives. The application of the method showed that when the temporal information of the patient is poorer, the demographic information can improve the classifier s performance. information, handling missing values and adapting it to the requirements of the analysis. Figure 1-1 KDD Process The data is then Transformed, consolidated into structures appropriate for the data mining method then applied, in this case the Mixed Clustering, which identifies patterns in the data. 1
2 The results obtained from the mined patterns is then interpreted in the original systems field, finally obtaining the useful knowledge desired. The focus of this dissertation is proposal of a new, efficient, data mining method based on clustering, for databases combining time variant and invariant features, valid for forecasting and classification problems, and applicable to a diverse range of application domains, from medical problems, climacteric analysis, power management to economic studies, designated as Mixed Clustering Time Series Data Mining This innovative data mining method searches for patterns and similarities in both data components, time variant and invariant, combining the extracted information to better characterize the data objects. The process of mining time series, particularly, the clustering of time series attracts the interest of researchers. The complexity of this type of data requires careful examination of the proposed algorithms, (Rani & Sikka, 2012). While the time invariant features are easily compared by a common and simple distance function, the Euclidean Distance, the time variant features, represented by time series, require a more complex analysis, (Rani & Sikka, 2012). Thus, a more modern measure is implemented for similarity search of time series, the Dynamic Time Warping. Figure 1-2 Euclidean and DTW matching of Time Series This similarity measure is capable to overcome temporally misaligned time series, identifying similar tendencies and patters, even if unfazed in the time of occurrence. This measure has been successfully applied in areas such as handwriting and online signature matching, time series database search, computer vision, surveillance and signal processing, (Gaudin & Nicoloyannis, 2006) Outline This work is structured as follows: in section 2, the mixed clustering concept is described and the methodology presented. In section 3, the use of the method s outputs to solve a forecasting problem is presented and applied to a Meteorological System, followed by a demonstration and discussion of the results. The method s contribution to a classification problem is demonstrated in section 4, and applied to a Medical System, followed by a demonstration and discussion of the results achieved. Finally, in section 5 the results of the different applications are revised and compared to previous works on the subject, concluding with a set of suggestions to further develop the study described as future work. 2. Clustering 2.1. Concept Clustering is a data mining technique that aims to group similar data objects, based on patterns identified, while distinguishing objects with distinct behaviours, divide the data into clusters, so that intra-group differences are smaller than those inter-groups. This concept is useful in a wide range of applications from image analysis, wireless sensor network's based applications or population segmentation to bioinformatics, (Liao, 2005). Often, the information that describes a system is not all represented in the same type of data, there are categorical, numerical and text features, constant and time-varying features. In such cases, a clustering 2
3 method capable of conciliating distinct data types becomes necessary. In (Izakian, Witold, & Jamal, 2013), a clustering method to handle spatiotemporal systems is proposed. These systems are characterized not only by temporal features but also by the spatial location at which they were measured. Geography, climatology and epidemiology systems are examples of applications relying on spatiotemporal data for their identification. The methodology proposed in (Izakian, Witold, & Jamal, 2013) expands the Fuzzy C-Means (FCM) Clustering technique, (Bezdek, Ehrlich, & Full, 1984) to handle spatiotemporal data by adding a pondering element λ, that factors the importance to be given to the temporal component. This element majorly beneficiates the algorithm s flexibility, allowing it to search for the best combination between temporal and spatial contributions The aim of this dissertation is to expand this notion of spatiotemporal data to any dataset containing different types of data, constant and time-varying, that may require specific treatment, by generalizing the spatiotemporal clustering methodology to data bases with mixed clustering and multivariate time series. We will show that there are benefits in successfully converging both data components to model systems in a wide range of application domains, such as Medical Care, Finances, Management and Energetic Systems Mixed Clustering Methodology When working with a database with time variant and invariant features, the input data is considered as a concatenation of both data components: x i = [x i s x i t ], i = 1,.., n ( 2.1 ) The invariant component, represented by numeric values, is structured as follows x s i = [x s i,1,, x s i,r ] ( 2.2 ) Where r is the number of invariant features. The time variant data component, represented by multivariate time series, is structured as a tridimensional matrix: t x i,j,k = ( 2.3 ) In this format, each value is defined by 3 coordinates: i = 1,, n, indicating the sample number, j = 1,, q, the sampling point and k = 1,, f, the feature The clustering method defines a set of prototypes, or centers for each of the c clusters, comprised of a variant and an invariant component: by: The invariant component s prototypes are determined v l s = n u m s i=1 l,i xi n u m i=1 l,i ( 2.4 ) The time-variant prototypes require an expansion to deal with the dimensionality increase of the data. A 3 dimensional structure was defined, with dimensions [c q f]: t v l,k = n u m t i=1 l,i xi,k n u m i=1 l,i ( 2.5 ) Where the fuzziness parameter, m, makes the process more fuzzy or crisp. The membership degree The value u l,i is an element of the partition matrix, U, that defines the degree at which each sample belongs to each cluster. Being a fuzzy clustering method, the membership of a sample k to a cluster is a value in the c n interval u l,i [0,1], l=1 u i,k = 1and0 < u l,i < n i=1. The similarity between a sample and a cluster is then measured by the sample s augmented distance to the cluster s center, given by: d λ 2 (v l, x i ) = v l s x i s 2 + λ t Where δ(v l,k t f k=1 t t δ(v l,k, x i,k ) ( 2.6 ), x i,k ) is the distance between the k th feature of prototype i and sample j, calculated by the 3
4 distance function used and λ is a parameter that defines the influence given to the time variant features. The optimal value of this parameter is determined by sequential runs of the clustering process, for different values, choosing the one that generates the best performance. By adding the distances of all features for each sample, the matrix of distances maintains its dimension [c n], resulting in a meaningful partition matrix defined, as for a univariate time-series system, by: u l,i = 1 ( d λ (v l,x i ) 2/(m 1) c o=1 d λ (vo,x i ) ) ( 2.7 ) Since the objective function J only has direct dependency on the distances and membership degrees, it can be defined as for a univariate time-series system: JJ = c n u m l,i d 2 l=1 i=1 λ (v l, x i ) ( 2.8 ) The Clustering process continues until convergence of the distance function or the maximum number of iterations is achieved. 3. Forecasting Problem Meteorological System 3.1. Modelling The Alberta Agriculture and Rural Development organization provides current and historical weather data from approximately 340 meteorological stations located across the Californian province, mapped on Figure 3-1. The meteorological variables available include temperature, humidity, precipitation and solar radiation, and are of great interest for users such as Epidemiologists seeking to better understand, for instance, the relationships between measures of environmental health and those of animal health. This platform, available at (ARD) is also valuable for environmental or agriculture analysis. Figure 3-1 Map of the province of Alberta, Canada. Area were the meteorological stations are located The Alberta province covers areas with different geographical and meteorological profiles that characterize these locations, including mountains, valleys, lakes and arid areas. For these experiments, the average daily temperatures and the daily average humidity registries where considered, taken from 1/1/2009 to 12/31/2009, forming the time variant input features. The time invariant features used consisted of the latitude and longitude coordinates of the location of the station they were measured at. All stations in which all features were available and had no missing values were considered, resulting in 168 samples. The time series were represented by the Discrete Fourier Transform (DFT). DFT Fuzziness parameter: m = 2 Number of samples: n = 249 Number of time invariant features: r = 2 Number of time variant features: f = 2 Time variant feature s length: q = Experimental Setup The application of the Mixed clustering methodology proposed to the Meteorological System was performed under two distinct criterions. The first, Reconstruction 4
5 Criterion (RC), evaluates the cluster validity, while the Prediction Criterion (PC) evaluates the method s forecasting ability. Reconstruction Criterion The RC assesses the quality of the clusters constructed by attempting to recreate the original data. Defining x as the reconstructed data, its variant and invariant components are respectively defined as x is = c l=1 c l=1 x it = c l=1 c l=1 u m t l,i vl,k u m l,i u m s l,i vl u m l,i ( 3.1 ) k [1, f] ( 3.2 ) The Average Reconstruction Error (ARE) is calculated as: ARE(λ) = 1 n (1 r ( (x i,j n i=1 n r j=1 f s x i,j s ) 2 ) + 1 f q σ j 2 ( (x i,j i=1 k=1 q j=1 t x i,j t ) 2 )) σ j Results and Discussion Reconstruction Criterion The RC was applied to each of time variant feature, humidity or temperature, individually and to the combination of both in a multivariate approach, each using a number of clusters between 2 and 5, using the Euclidean Distance and the DTW for similarity search. It was observed that the multivariate alternative was not capable to improve the quality of the data clusters created, according to this criteria, and that the best results were obtained for the temperature features, with 5 clusters and using the Euclidean Distance. Figure 3-2 shows a plot of the analysed stations according to their geographical location, coloured according to the cluster they have the highest membership degree to, under the best RC conditions. Four stations in different regions are highlighted. ( 3.3 ) Where σ j 2 is the variance of the j th feature. Prediction Criterion The aim of the PC is to predict the temporal component of the data by using the available spatial component of the data, minimizing the resulting error by adjusting the temporal influence parameter λ. A partition matrix is estimated from the invariant data and prototypes: as: 1 u l,i = ( v l s 2 x s i (m 1) c o=1 vs o xi s ) ( 3.4 ) The average Prediction Error (APE) is then calculated t x i,j t ) 2 APE(λ) = 1 ( n f q (x i,j 2 n f q i=1 k=1 j=1 ) ( 3.5 ) σ j The stopping criteria for the clustering algorithm in this experiment were the following: Minimal variation of the objective function: J < ε = 10 5 Maximum number of iterations: maxit = 100 Figure 3-2 Geographical Distribution under best RC conditions, c=5 It is clear that the method was capable of recognizing and distinguishing areas with the most different climacteric profiles. Prediction Criterion The PC was also applied under the same experimental conditions as the RC, multivariate and univariate time series, Euclidean distance and DTW were used as similarity measures for a number of clusters between 2 and 10. 5
6 The best result was also obtained using the multivariate approach, with the Euclidean distance and 8 clusters. These conditions were used to forecast the temperature and humidity. The total samples were separated into training and testing sets: s t x train = [x train x train ] ( 3.1 ) And s t x test = [x test x test ] ( 3.2 ) The procedure followed is described in Figure 3-3. Figure 3-4 Humidity Predicting under best PC conditions Figure 3-3 Workflow representing process for temporal forecasting of test set In this experiment, around 70% of the samples were used as train set, ntrain = 117, while the rest was used as test set. The forecasting results of humidity and temperature of one exemplary test sample, under the best conditions, are shown in Figure 3-4 and Figure 3-5, respectively. Figure 3-5 Temperature Predicting under best PC conditions In the forecasting problem, the DTW did not show improvement on the Euclidean distance, as similarity measures. The multivariate approach achieved the best forecasts of temperature and humidity during 2009, at the selected stations. 4. Classification Problem Medical System An analogy was made from the spatiotemporal concept, where the geographical location becomes, in medical applications, a patient s demography: age, weight, height, sex, among other possibilities. In this equivalence, the temporal component is regarded as all time-varying features that characterize the system, such as heart beats, blood pressure, body temperature and such, measured through a period of time and represented as time-series. 6
7 4.1. Modelling Septic shock is a medical emergency that can occur as a reaction of the immune system to, for example, an operation. It is estimated to affect about 12% of patients in an Intensive Care Unit (ICU) and has a high death rate, which is referred to depend on the patient s age and overall health. The database used, MEDAN, comprises several physiological features of patients diagnosed with abdominal septic shock, uniformly sampled during the whole period while the patient was at the ICU, (Paetz, 2003). This database was pre-processed by (Marques, Moutinho, Vieira, & Sousa, 2011), who analysed the most determinant features for outcome prediction, creating a sub dataset of patients with measurements of 12 of the available features. This data suffered further processing, from which resulted a data set with 100 samples each comprised of: 2 time invariant features: patient s age and weight, represented by a numeric value; 12 time variant features representing physiological variables by time series with a sampling time of 24 hours, over the last 10 days of the patient s stay in an Intensive Care Unit; 1 outcome represented by a binary where 0 represents the patient s survival and 1 the patient s death Experimental Setup The concept of classification based on clustering assumes that similar objects will share outcomes, and uses this knowledge to predict an object s classification. The classification approach proposed in this work is based on this concept and defines an object as belonging to a cluster if its membership degree is higher than a certain threshold. It then assumes that objects grouped together must share the same outcome. Thus, this concept is only valid for binary classifiers using two clusters, c=2. To evaluate the method s ability to predict an object s outcome, a 5 fold Cross Validation was performed. At each fold, the train set is clustered to determine the optimal λ and the resulting clustering output v. The membership degree of each test set sample are then determined, depending on their distance to each cluster prototype, and the predicted outcome determined according to the highest membership degree. The experiments described in this section share the following experimental conditions: Clustering Conditions: o Minimal variation of the objective function: J < ε = 10 8 o Maximum number of iterations: maxit = 500 o Fuzziness parameter: m = 2 Classification Conditions: o 5 Fold CrossValidation o Class Distribution: 44%/56% The Mixed Clustering methodology was applied under two learning approaches: unsupervised and supervised. The first partitions the data without knowledge of its outcome, while the second used labelled samples for training, following the steps: i Unsupervised Clustering of Train set to determine λ ; ii Supervised Clustering of Train set using λ to obtain prototypes v ; iii Unsupervised Classification of Test set using v. The criteria implemented to evaluate the quality of the outcome prediction is frequently used with health care problems, (Lavrač, 1999): Accuracy: measures the number of correct classifications out of samples classified; Sensitivity: accounts for the number of correct positive classifications, out of all positive samples; Specificity: accounts for the number of correct negative classifications, out of all negative samples; 4.3. Results and Discussion The experiments performed with the Mixed Clustering include the use of data representations in time (raw data) and frequency domain (DFT), of the 7
8 Euclidean Distance and the DTW as similarity measures. In addition to the mixed clustering, an alternative clustering was tested, using only the time variant features, to assess the actual benefit of combining both information components, designated as Temporal Clustering. A Forward Feature Selection method was used to assess the quality of each time variant feature, under all combinations of conditions described. It was observed that the superiority of a similarity measure or time series representation method depended on the feature. The benefit of the mixed clustering over the temporal clustering was also not global for every feature. It was verified that when the time variant features, by themselves, were rich enough, the addition of the patient s demography mislead the algorithm, leading to weaker results. However, when the temporal feature was weaker, it benefited from the mixed clustering approach. The best overall Unsupervised Mixed Clustering result was obtained using the Euclidean Distance with the DFT using one time variant feature, no. 6, representative of the Central Venous Pressure. Figure 4-1 shows the differences between the temporal and mixed alternatives, under unsupervised learning, for the best feature and an example of a weaker temporal feature that benefited from the mixed clustering approach, feature 8: Ph. Figure 4-1 Unsupervised Mixed and Temporal Clustering Accuracy for features 6 and 8 It is observable that while the addition of the patient s demography did not increase the performance of feature 6, the weaker feature 8 needed the increase of information that came with it. In Figure 4-2, the equivalent results are shown, for the Supervised learning alternative. Figure 4-2 Supervised Mixed and Temporal Clustering Accuracy for features 6 and 8 The best result under Supervised clustering was also achieved for feature 6, using the DTW and DFT. It is also shown that, for these features, the supervised clustering alternative managed to improve the results of the unsupervised alternative. This effect was not verified for all features however, overall the supervised learning increase the performance of the features that were also the best under the unsupervised alternative, suggesting that the features most related to the outcome beneficiate from its inclusion in the learning process. 5. Conclusions and Future Work A new expanded clustering algorithm was formulated to mine databases represented by both time variant and invariant features, combining the information extracted to further characterize a given system. The results of the data mining and pattern recognition process were applied to machine learning purposes, where distinct methodologies were proposed to solve Forecasting and Classification problems, the first with a Meteorological System, while the last with a Medical application, demonstrating its wide applicability. 8
9 Different measures were implemented for similarity search between time series, the commonly used Euclidean Distance and the increasingly popular Dynamic Time Warping. The benefit of the joint clustering of different types of data was also demonstrated, by comparing it to the clustering of individual data types. Table 5-1 shows the best result obtained from previous work on the same database. It should be noticed that these results are not directly comparable since the studies performed different processing on the input data and the methods used are different. The authors of (Cismondi, et al., 2012) used multi-criteria Feature Selection with Fuzzy Models (FM) and Neural Networks (NN) to predict the patient s outcome. While the FM constructed produced the best ACC, the Mixed Clustering produced comparable results using 4 times less features, 2 of each were numerical values, significantly easier to measure and process. Table 5-1 Best Mixed Clustering and best previous work result Reference Method No. ACC Sens. Spec. features (%) (%) (%) Max Sens NN Max. Spec (Cismondi, Parallel et al., 2012) Max Sens FM Max. Spec Parallel Unsupervised Mixed 3* Clustering Supervised 3* * The mixed Clustering used two constant features, patient s age and weight, combined with one time variant feature. In addition, the Mixed Clustering method has the highest sensitivity, or true positive rate, crucial since the positive class represents a deceased patient. As future work, it would be interesting to expand the clustering possibilities to any number of partitions and to databases with any number of classes. Since the DTW method is able to compare time series of different length, the expansion of the method to form prototypes of variable length would expand the applicability of the mixed clustering method to databases with time series of different length. Also, a reformulation of the method should include the possibility to use different similarity measures for each feature, as well as the influence given to each through the implementation of different temporal influence parameters λ i, where i = 1,2,, f. Even though one of the great advantages of the data mining and soft computing techniques analysis is their ability to read any problem specific to a given field as a generalized system, the final step in the KDD approach would be the interpretation of the results, bringing the problem back to its field and enabling practical conclusions. Thus, the medical system application demonstrated would benefit from further analysis over the best features that resulted from the feature selection algorithms, possibly bringing awareness of the importance of a feature to the medical community. In this context, a feature sensibility study could also be performed on the time variant and invariant features, pre-assessing the quality of the knowledge they contain. The causes of septic are not yet fully comprehended, however some risk factors have been studied (Fink, Abraham, Vincent, & Kochanek, 2005), and could be insert in the Mixed Clustering method as time invariant features. Finally, the validation of the mixed clustering methodology requires its application to problems from different domains and fields, such as Financial, Power Consumption or Surveillance Applications. The use of benchmark databases can demonstrate its value against 9
10 different techniques. However, due to the specific characteristics of the mixed clustering s inputs, there is a shortage of available databases, (Keogh & Kasetty, 2003). References ARD. (n.d.). Current and Historical Alberta Weather Station Data Viewer. Retrieved May 2014, from Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences, 10, Cismondi, F., Horn, A. L., Fialho, A. S., Vieira, S. M., Reti, S. R., Sousa, J. M., et al. (2012). Multistage Modeling Using Fuzzy Multi-criteria Feature Selection to Improve Survival Prediction of ICU Septic Shock Patients. Expert Systems with Applications, 39, Devjver, P. A., & Kittler, J. (1982). Pattern Recognition: A Statistical Approach. Prentice- Hall. Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. Al Magazine, 17, Fink, M., Abraham, E., Vincent, J., & Kochanek, P. M. (2005). Septic Shock. In Textbook of Critical Care (5th ed.). Saunders Elsevier. Gaudin, R., & Nicoloyannis, N. (2006). An Adaptable Time Warping Distance for Time Series Learning. 5th International Conference on Machine Learning and Applications (ICMLA 06). Orlando, USA. Han, J., & Kamber, M. (2006). Data Mining: Concepts and Techniques (2 ed.). Morgan Kaufmann Publishers. Izakian, H., Witold, P., & Jamal, I. (2013, October). Clustering Spatiotemporal Data: An Augmented Fuzzy C-Means. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 21. Keogh, E., & Kasetty, S. (2003, October). On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. Data Mining and Knowledge Discovery, 7, pp Lavrač, N. (1999). Artificial Intelligence in Medicine: Machine Learning for Data Mining in Medicine (Vol. 1620). Liao, T. W. (2005, November). Clustering of time series data - a survey. Pattern Recognition, Marques, F. J., Moutinho, A., Vieira, S. M., & Sousa, J. M. (2011). Preprocessing of Clinical Databases to improve classification accuracy of patient diagnosis. World Congress, (pp ). Paetz, J. (2003). Knowledge-based approach to septic shock patient data using a neural network with trapezoidal activation functions. Artificial Intelligence in Medicine, 28, Rani, S., & Sikka, G. (2012). Recent Techniques of Clustering of Time Series Data: A Survey. International Journal of Computer Applications, 52(15). 10
Feature Selection in Knowledge Discovery
Feature Selection in Knowledge Discovery Susana Vieira Technical University of Lisbon, Instituto Superior Técnico Department of Mechanical Engineering, Center of Intelligent Systems, IDMEC-LAETA Av. Rovisco
More informationEnhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques
24 Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Enhancing Forecasting Performance of Naïve-Bayes Classifiers with Discretization Techniques Ruxandra PETRE
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 21 Table of contents 1 Introduction 2 Data mining
More informationPART III APPLICATIONS
S. Vieira PART III APPLICATIONS Fuzz IEEE 2013, Hyderabad India 1 Applications Finance Value at Risk estimation based on a PFS model for density forecast of a continuous response variable conditional on
More informationData Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394
Data Mining Introduction Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 20 Table of contents 1 Introduction 2 Data mining
More informationData Mining: An experimental approach with WEKA on UCI Dataset
Data Mining: An experimental approach with WEKA on UCI Dataset Ajay Kumar Dept. of computer science Shivaji College University of Delhi, India Indranath Chatterjee Dept. of computer science Faculty of
More informationKnowledge Discovery. Javier Béjar URL - Spring 2019 CS - MIA
Knowledge Discovery Javier Béjar URL - Spring 2019 CS - MIA Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics
More informationDatabase and Knowledge-Base Systems: Data Mining. Martin Ester
Database and Knowledge-Base Systems: Data Mining Martin Ester Simon Fraser University School of Computing Science Graduate Course Spring 2006 CMPT 843, SFU, Martin Ester, 1-06 1 Introduction [Fayyad, Piatetsky-Shapiro
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationHybrid Models Using Unsupervised Clustering for Prediction of Customer Churn
Hybrid Models Using Unsupervised Clustering for Prediction of Customer Churn Indranil Bose and Xi Chen Abstract In this paper, we use two-stage hybrid models consisting of unsupervised clustering techniques
More informationAn Improved Apriori Algorithm for Association Rules
Research article An Improved Apriori Algorithm for Association Rules Hassan M. Najadat 1, Mohammed Al-Maolegi 2, Bassam Arkok 3 Computer Science, Jordan University of Science and Technology, Irbid, Jordan
More informationCHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES
70 CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 3.1 INTRODUCTION In medical science, effective tools are essential to categorize and systematically
More information8/3/2017. Contour Assessment for Quality Assurance and Data Mining. Objective. Outline. Tom Purdie, PhD, MCCPM
Contour Assessment for Quality Assurance and Data Mining Tom Purdie, PhD, MCCPM Objective Understand the state-of-the-art in contour assessment for quality assurance including data mining-based techniques
More informationClustering of Data with Mixed Attributes based on Unified Similarity Metric
Clustering of Data with Mixed Attributes based on Unified Similarity Metric M.Soundaryadevi 1, Dr.L.S.Jayashree 2 Dept of CSE, RVS College of Engineering and Technology, Coimbatore, Tamilnadu, India 1
More informationISSN: (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 3, Issue 9, September 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationSpatial Outlier Detection
Spatial Outlier Detection Chang-Tien Lu Department of Computer Science Northern Virginia Center Virginia Tech Joint work with Dechang Chen, Yufeng Kou, Jiang Zhao 1 Spatial Outlier A spatial data point
More informationData Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation
Data Mining Part 2. Data Understanding and Preparation 2.4 Spring 2010 Instructor: Dr. Masoud Yaghini Outline Introduction Normalization Attribute Construction Aggregation Attribute Subset Selection Discretization
More informationFUZZY C-MEANS ALGORITHM BASED ON PRETREATMENT OF SIMILARITY RELATIONTP
Dynamics of Continuous, Discrete and Impulsive Systems Series B: Applications & Algorithms 14 (2007) 103-111 Copyright c 2007 Watam Press FUZZY C-MEANS ALGORITHM BASED ON PRETREATMENT OF SIMILARITY RELATIONTP
More informationBasic Data Mining Technique
Basic Data Mining Technique What is classification? What is prediction? Supervised and Unsupervised Learning Decision trees Association rule K-nearest neighbor classifier Case-based reasoning Genetic algorithm
More informationEE 589 INTRODUCTION TO ARTIFICIAL NETWORK REPORT OF THE TERM PROJECT REAL TIME ODOR RECOGNATION SYSTEM FATMA ÖZYURT SANCAR
EE 589 INTRODUCTION TO ARTIFICIAL NETWORK REPORT OF THE TERM PROJECT REAL TIME ODOR RECOGNATION SYSTEM FATMA ÖZYURT SANCAR 1.Introductıon. 2.Multi Layer Perception.. 3.Fuzzy C-Means Clustering.. 4.Real
More informationCustomer Clustering using RFM analysis
Customer Clustering using RFM analysis VASILIS AGGELIS WINBANK PIRAEUS BANK Athens GREECE AggelisV@winbank.gr DIMITRIS CHRISTODOULAKIS Computer Engineering and Informatics Department University of Patras
More informationSK International Journal of Multidisciplinary Research Hub Research Article / Survey Paper / Case Study Published By: SK Publisher
ISSN: 2394 3122 (Online) Volume 2, Issue 1, January 2015 Research Article / Survey Paper / Case Study Published By: SK Publisher P. Elamathi 1 M.Phil. Full Time Research Scholar Vivekanandha College of
More informationClustering & Classification (chapter 15)
Clustering & Classification (chapter 5) Kai Goebel Bill Cheetham RPI/GE Global Research goebel@cs.rpi.edu cheetham@cs.rpi.edu Outline k-means Fuzzy c-means Mountain Clustering knn Fuzzy knn Hierarchical
More informationClustering Analysis based on Data Mining Applications Xuedong Fan
Applied Mechanics and Materials Online: 203-02-3 ISSN: 662-7482, Vols. 303-306, pp 026-029 doi:0.4028/www.scientific.net/amm.303-306.026 203 Trans Tech Publications, Switzerland Clustering Analysis based
More informationData Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44
Data Mining Piotr Paszek piotr.paszek@us.edu.pl Introduction (Piotr Paszek) Data Mining DM KDD 1 / 44 Plan of the lecture 1 Data Mining (DM) 2 Knowledge Discovery in Databases (KDD) 3 CRISP-DM 4 DM software
More informationCollaborative Rough Clustering
Collaborative Rough Clustering Sushmita Mitra, Haider Banka, and Witold Pedrycz Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India {sushmita, hbanka r}@isical.ac.in Dept. of Electrical
More informationHybrid Fuzzy C-Means Clustering Technique for Gene Expression Data
Hybrid Fuzzy C-Means Clustering Technique for Gene Expression Data 1 P. Valarmathie, 2 Dr MV Srinath, 3 Dr T. Ravichandran, 4 K. Dinakaran 1 Dept. of Computer Science and Engineering, Dr. MGR University,
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationDynamic Clustering of Data with Modified K-Means Algorithm
2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq
More informationClimate Precipitation Prediction by Neural Network
Journal of Mathematics and System Science 5 (205) 207-23 doi: 0.7265/259-529/205.05.005 D DAVID PUBLISHING Juliana Aparecida Anochi, Haroldo Fraga de Campos Velho 2. Applied Computing Graduate Program,
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More informationCHAPTER 4 FUZZY LOGIC, K-MEANS, FUZZY C-MEANS AND BAYESIAN METHODS
CHAPTER 4 FUZZY LOGIC, K-MEANS, FUZZY C-MEANS AND BAYESIAN METHODS 4.1. INTRODUCTION This chapter includes implementation and testing of the student s academic performance evaluation to achieve the objective(s)
More informationIntroduction to Data Mining
Introduction to JULY 2011 Afsaneh Yazdani What motivated? Wide availability of huge amounts of data and the imminent need for turning such data into useful information and knowledge What motivated? Data
More informationSemi-Supervised Clustering with Partial Background Information
Semi-Supervised Clustering with Partial Background Information Jing Gao Pang-Ning Tan Haibin Cheng Abstract Incorporating background knowledge into unsupervised clustering algorithms has been the subject
More informationKnowledge Discovery. URL - Spring 2018 CS - MIA 1/22
Knowledge Discovery Javier Béjar cbea URL - Spring 2018 CS - MIA 1/22 Knowledge Discovery (KDD) Knowledge Discovery in Databases (KDD) Practical application of the methodologies from machine learning/statistics
More informationChapter 28. Outline. Definitions of Data Mining. Data Mining Concepts
Chapter 28 Data Mining Concepts Outline Data Mining Data Warehousing Knowledge Discovery in Databases (KDD) Goals of Data Mining and Knowledge Discovery Association Rules Additional Data Mining Algorithms
More informationData Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy
Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy Lutfi Fanani 1 and Nurizal Dwi Priandani 2 1 Department of Computer Science, Brawijaya University, Malang, Indonesia. 2 Department
More informationApplication of Clustering as a Data Mining Tool in Bp systolic diastolic
Application of Clustering as a Data Mining Tool in Bp systolic diastolic Assist. Proffer Dr. Zeki S. Tywofik Department of Computer, Dijlah University College (DUC),Baghdad, Iraq. Assist. Lecture. Ali
More informationKeyword Extraction by KNN considering Similarity among Features
64 Int'l Conf. on Advances in Big Data Analytics ABDA'15 Keyword Extraction by KNN considering Similarity among Features Taeho Jo Department of Computer and Information Engineering, Inha University, Incheon,
More informationWKU-MIS-B10 Data Management: Warehousing, Analyzing, Mining, and Visualization. Management Information Systems
Management Information Systems Management Information Systems B10. Data Management: Warehousing, Analyzing, Mining, and Visualization Code: 166137-01+02 Course: Management Information Systems Period: Spring
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationData Mining and Analytics. Introduction
Data Mining and Analytics Introduction Data Mining Data mining refers to extracting or mining knowledge from large amounts of data It is also termed as Knowledge Discovery from Data (KDD) Mostly, data
More informationKeywords Fuzzy, Set Theory, KDD, Data Base, Transformed Database.
Volume 6, Issue 5, May 016 ISSN: 77 18X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Fuzzy Logic in Online
More informationAPPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE
APPLICATION OF MULTIPLE RANDOM CENTROID (MRC) BASED K-MEANS CLUSTERING ALGORITHM IN INSURANCE A REVIEW ARTICLE Sundari NallamReddy, Samarandra Behera, Sanjeev Karadagi, Dr. Anantha Desik ABSTRACT: Tata
More informationCluster analysis of 3D seismic data for oil and gas exploration
Data Mining VII: Data, Text and Web Mining and their Business Applications 63 Cluster analysis of 3D seismic data for oil and gas exploration D. R. S. Moraes, R. P. Espíndola, A. G. Evsukoff & N. F. F.
More informationComparative Study of Clustering Algorithms using R
Comparative Study of Clustering Algorithms using R Debayan Das 1 and D. Peter Augustine 2 1 ( M.Sc Computer Science Student, Christ University, Bangalore, India) 2 (Associate Professor, Department of Computer
More information9. Conclusions. 9.1 Definition KDD
9. Conclusions Contents of this Chapter 9.1 Course review 9.2 State-of-the-art in KDD 9.3 KDD challenges SFU, CMPT 740, 03-3, Martin Ester 419 9.1 Definition KDD [Fayyad, Piatetsky-Shapiro & Smyth 96]
More informationTime Series Clustering Ensemble Algorithm Based on Locality Preserving Projection
Based on Locality Preserving Projection 2 Information & Technology College, Hebei University of Economics & Business, 05006 Shijiazhuang, China E-mail: 92475577@qq.com Xiaoqing Weng Information & Technology
More informationData Mining Course Overview
Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical
More informationCHAPTER 4 AN IMPROVED INITIALIZATION METHOD FOR FUZZY C-MEANS CLUSTERING USING DENSITY BASED APPROACH
37 CHAPTER 4 AN IMPROVED INITIALIZATION METHOD FOR FUZZY C-MEANS CLUSTERING USING DENSITY BASED APPROACH 4.1 INTRODUCTION Genes can belong to any genetic network and are also coordinated by many regulatory
More informationINTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá
INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús
More informationTime Series Classification in Dissimilarity Spaces
Proceedings 1st International Workshop on Advanced Analytics and Learning on Temporal Data AALTD 2015 Time Series Classification in Dissimilarity Spaces Brijnesh J. Jain and Stephan Spiegel Berlin Institute
More informationCHAPTER 3 TUMOR DETECTION BASED ON NEURO-FUZZY TECHNIQUE
32 CHAPTER 3 TUMOR DETECTION BASED ON NEURO-FUZZY TECHNIQUE 3.1 INTRODUCTION In this chapter we present the real time implementation of an artificial neural network based on fuzzy segmentation process
More informationKeywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.
Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey of Clustering
More informationEnhancing K-means Clustering Algorithm with Improved Initial Center
Enhancing K-means Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of
More informationChapter 1, Introduction
CSI 4352, Introduction to Data Mining Chapter 1, Introduction Young-Rae Cho Associate Professor Department of Computer Science Baylor University What is Data Mining? Definition Knowledge Discovery from
More informationUsing Cluster Analysis in the Synthesis of Electrical Equipment Diagnostic Models
Using Cluster Analysis in the Synthesis of Electrical Equipment Diagnostic Models Ksenia Gnutova, Denis Eltyshev Electrotechnical Department, Perm National Research Polytechnic University, Komsomolsky
More informationData Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005
Data Mining with Oracle 10g using Clustering and Classification Algorithms Nhamo Mdzingwa September 25, 2005 Abstract Deciding on which algorithm to use, in terms of which is the most effective and accurate
More informationRedefining and Enhancing K-means Algorithm
Redefining and Enhancing K-means Algorithm Nimrat Kaur Sidhu 1, Rajneet kaur 2 Research Scholar, Department of Computer Science Engineering, SGGSWU, Fatehgarh Sahib, Punjab, India 1 Assistant Professor,
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More informationComputational Intelligence Meets the NetFlix Prize
Computational Intelligence Meets the NetFlix Prize Ryan J. Meuth, Paul Robinette, Donald C. Wunsch II Abstract The NetFlix Prize is a research contest that will award $1 Million to the first group to improve
More informationInternational Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.3, May Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani
LINK MINING PROCESS Dr.Zakea Il-Agure and Mr.Hicham Noureddine Itani Higher Colleges of Technology, United Arab Emirates ABSTRACT Many data mining and knowledge discovery methodologies and process models
More informationNEURAL NETWORKS ... FEATURE SELECTION USING ANT COLONY OPTIMIZATION: APPLICATIONS IN HEALTH CARE. Motivation. Outline.
Motivation FEATURE SELECTION USING ANT COLONY OPTIMIZATION: APPLICATIONS IN HEALTH CARE João M. C. Sousa jmsousa@ist.utl.pt S. M. Vieira, S. N. Finkelstein 2,3, A. S. Fialho,2, F. Cismondi,2, S. R. Reti
More informationDATA MINING AND WAREHOUSING
DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making
More informationFuzzy Ant Clustering by Centroid Positioning
Fuzzy Ant Clustering by Centroid Positioning Parag M. Kanade and Lawrence O. Hall Computer Science & Engineering Dept University of South Florida, Tampa FL 33620 @csee.usf.edu Abstract We
More information6. Dicretization methods 6.1 The purpose of discretization
6. Dicretization methods 6.1 The purpose of discretization Often data are given in the form of continuous values. If their number is huge, model building for such data can be difficult. Moreover, many
More informationImproving Tree-Based Classification Rules Using a Particle Swarm Optimization
Improving Tree-Based Classification Rules Using a Particle Swarm Optimization Chi-Hyuck Jun *, Yun-Ju Cho, and Hyeseon Lee Department of Industrial and Management Engineering Pohang University of Science
More information3-D MRI Brain Scan Classification Using A Point Series Based Representation
3-D MRI Brain Scan Classification Using A Point Series Based Representation Akadej Udomchaiporn 1, Frans Coenen 1, Marta García-Fiñana 2, and Vanessa Sluming 3 1 Department of Computer Science, University
More informationBiclustering Bioinformatics Data Sets. A Possibilistic Approach
Possibilistic algorithm Bioinformatics Data Sets: A Possibilistic Approach Dept Computer and Information Sciences, University of Genova ITALY EMFCSC Erice 20/4/2007 Bioinformatics Data Sets Outline Introduction
More informationOutlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data
Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data Ms. Gayatri Attarde 1, Prof. Aarti Deshpande 2 M. E Student, Department of Computer Engineering, GHRCCEM, University
More informationOnline Pattern Recognition in Multivariate Data Streams using Unsupervised Learning
Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Devina Desai ddevina1@csee.umbc.edu Tim Oates oates@csee.umbc.edu Vishal Shanbhag vshan1@csee.umbc.edu Machine Learning
More informationSeminars of Software and Services for the Information Society
DIPARTIMENTO DI INGEGNERIA INFORMATICA AUTOMATICA E GESTIONALE ANTONIO RUBERTI Master of Science in Engineering in Computer Science (MSE-CS) Seminars in Software and Services for the Information Society
More informationUniversity of Florida CISE department Gator Engineering. Data Preprocessing. Dr. Sanjay Ranka
Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should
More informationImage Mining: frameworks and techniques
Image Mining: frameworks and techniques Madhumathi.k 1, Dr.Antony Selvadoss Thanamani 2 M.Phil, Department of computer science, NGM College, Pollachi, Coimbatore, India 1 HOD Department of Computer Science,
More informationDynamic Data in terms of Data Mining Streams
International Journal of Computer Science and Software Engineering Volume 1, Number 1 (2015), pp. 25-31 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining
More informationDensity Based Clustering using Modified PSO based Neighbor Selection
Density Based Clustering using Modified PSO based Neighbor Selection K. Nafees Ahmed Research Scholar, Dept of Computer Science Jamal Mohamed College (Autonomous), Tiruchirappalli, India nafeesjmc@gmail.com
More informationIn the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,
1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to
More informationIntelligent Risk Identification and Analysis in IT Network Systems
Intelligent Risk Identification and Analysis in IT Network Systems Masoud Mohammadian University of Canberra, Faculty of Information Sciences and Engineering, Canberra, ACT 2616, Australia masoud.mohammadian@canberra.edu.au
More informationAvailable online Journal of Scientific and Engineering Research, 2019, 6(1): Research Article
Available online www.jsaer.com, 2019, 6(1):193-197 Research Article ISSN: 2394-2630 CODEN(USA): JSERBR An Enhanced Application of Fuzzy C-Mean Algorithm in Image Segmentation Process BAAH Barida 1, ITUMA
More informationA Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression
Journal of Data Analysis and Information Processing, 2016, 4, 55-63 Published Online May 2016 in SciRes. http://www.scirp.org/journal/jdaip http://dx.doi.org/10.4236/jdaip.2016.42005 A Comparative Study
More informationData Preprocessing. Data Preprocessing
Data Preprocessing Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville ranka@cise.ufl.edu Data Preprocessing What preprocessing step can or should
More informationReview on Methods of Selecting Number of Hidden Nodes in Artificial Neural Network
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,
More informationThe Comparison of CBA Algorithm and CBS Algorithm for Meteorological Data Classification Mohammad Iqbal, Imam Mukhlash, Hanim Maria Astuti
Information Systems International Conference (ISICO), 2 4 December 2013 The Comparison of CBA Algorithm and CBS Algorithm for Meteorological Data Classification Mohammad Iqbal, Imam Mukhlash, Hanim Maria
More informationARTICLE; BIOINFORMATICS Clustering performance comparison using K-means and expectation maximization algorithms
Biotechnology & Biotechnological Equipment, 2014 Vol. 28, No. S1, S44 S48, http://dx.doi.org/10.1080/13102818.2014.949045 ARTICLE; BIOINFORMATICS Clustering performance comparison using K-means and expectation
More informationA Novel Approach for Minimum Spanning Tree Based Clustering Algorithm
IJCSES International Journal of Computer Sciences and Engineering Systems, Vol. 5, No. 2, April 2011 CSES International 2011 ISSN 0973-4406 A Novel Approach for Minimum Spanning Tree Based Clustering Algorithm
More informationThe k-means Algorithm and Genetic Algorithm
The k-means Algorithm and Genetic Algorithm k-means algorithm Genetic algorithm Rough set approach Fuzzy set approaches Chapter 8 2 The K-Means Algorithm The K-Means algorithm is a simple yet effective
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More informationRemotely Sensed Image Processing Service Automatic Composition
Remotely Sensed Image Processing Service Automatic Composition Xiaoxia Yang Supervised by Qing Zhu State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University
More informationCS490D: Introduction to Data Mining Prof. Chris Clifton
CS490D: Introduction to Data Mining Prof. Chris Clifton April 5, 2004 Mining of Time Series Data Time-series database Mining Time-Series and Sequence Data Consists of sequences of values or events changing
More informationData Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei
Data Mining Chapter 1: Introduction Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei 1 Any Question? Just Ask 3 Chapter 1. Introduction Why Data Mining? What Is Data Mining? A Multi-Dimensional
More informationSimulation of Zhang Suen Algorithm using Feed- Forward Neural Networks
Simulation of Zhang Suen Algorithm using Feed- Forward Neural Networks Ritika Luthra Research Scholar Chandigarh University Gulshan Goyal Associate Professor Chandigarh University ABSTRACT Image Skeletonization
More informationData Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier
Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio
More informationNORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM
NORMALIZATION INDEXING BASED ENHANCED GROUPING K-MEAN ALGORITHM Saroj 1, Ms. Kavita2 1 Student of Masters of Technology, 2 Assistant Professor Department of Computer Science and Engineering JCDM college
More informationIteration Reduction K Means Clustering Algorithm
Iteration Reduction K Means Clustering Algorithm Kedar Sawant 1 and Snehal Bhogan 2 1 Department of Computer Engineering, Agnel Institute of Technology and Design, Assagao, Goa 403507, India 2 Department
More informationMachine Learning with MATLAB --classification
Machine Learning with MATLAB --classification Stanley Liang, PhD York University Classification the definition In machine learning and statistics, classification is the problem of identifying to which
More informationInternational Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X
Analysis about Classification Techniques on Categorical Data in Data Mining Assistant Professor P. Meena Department of Computer Science Adhiyaman Arts and Science College for Women Uthangarai, Krishnagiri,
More informationResearch on Applications of Data Mining in Electronic Commerce. Xiuping YANG 1, a
International Conference on Education Technology, Management and Humanities Science (ETMHS 2015) Research on Applications of Data Mining in Electronic Commerce Xiuping YANG 1, a 1 Computer Science Department,
More information