5.2.1 Principal Component Analysis Kernel Principal Component Analysis Fuzzy Roughset Feature Selection

Size: px
Start display at page:

Download "5.2.1 Principal Component Analysis Kernel Principal Component Analysis Fuzzy Roughset Feature Selection"

Transcription

1 ENHANCED FUZZY ROUGHSET BASED FEATURE SELECTION 5 TECHNIQUE USING DIFFERENTIAL EVOLUTION 5.1 Data Reduction Dimensionality Reduction 5.2 Feature Transformation Principal Component Analysis Kernel Principal Component Analysis 5.3 Feature Selection Fuzzy Roughset Feature Selection Differential Evolution Maximization 5.4 Proposed FRFS with Differential Evolution 5.5 Experimental Results Feature Transformation Feature Selection 5.6 Chapter Summary A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 77

2 5.1 DATA REDUCTION Data reduction is an important technique that plays a major role in the context of data mining. It helps to amalgamate or aggregate the required information in high dimensional datasets into useful and manageable information chunks. Data reduction techniques are used to reduce the dimension of large datasets into a smaller one where the integrity between the original and reduced dataset is preserved. In classification process, a classifier model that learns from the reduced dataset can produce better results than the classifier model that learns from the original dataset. The main advantage of using this technique in classification process is that, it makes the learning process faster and accurate. Listed are the four types of data reduction strategies and they are graphically represented in Figure 5.1: Data Compression Numerosity Reduction Dimensionality Reduction and Concept Hierarchy Generation Data Compression (Campos, 2000) helps to reduce the size of an original file by assigning a small amount of bits to minimally used data values in the file. There are two different kinds of algorithms in data compression. They are lossy and lossless compression. These algorithms are often used in image processing, signal processing and time series data analysis. In numerosity reduction (Han and Kamber, 2006), the original dataset is replaced by an alternate and smaller data representation. These techniques are based on parametric or nonparametric models that can estimate the actual dataset. It also stores the parameters of the dataset instead of an original dataset for estimation. A typical numerosity reduction technique that is used for mining the patterns is sampling. A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 78

3 Dimensionality reduction is a data reduction technique which helps to detect and remove the redundant, irrelevant and weakly relevant attributes in the dataset. If the dimension of the dataset increases then the data becomes sparser in its space. So, the main objective of dimensionality reduction technique is to find an optimal subset of attributes that improves the accuracy of the classification algorithm and removes the sparsity of the dataset. Some of the encoding mechanisms to reduce the dimension of the dataset are PCA, LDA, SVD, KPCA etc. Concept hierarchy generation (Han and Kamber, 2006) reduces the original dataset and replaces low level concepts using higher level concepts. Though few details are lost by this generalization, it gives meaningful patterns and is easier to interpret. Data Reduction Data Compression Numerosity Reduction Dimensionality Reduction Concept Hierarchy Generation Feature Transformation Feature Selection Feature Extraction Linear Non Linear Filter Wrapper Embedded Factor Analysis ICA SVD PCA Sammon s Mapping ISOMAP Diffusion Maps KPCA Univariate RelieF Linear Regressio Multivariate Correlation Rough set Fuzzy Rough set FSSE Forward Selection Backward Elimination Decision Trees Figure 5.1 Taxonomy of Data Reduction techniques A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 79

4 Data reduction techniques can be applied to various real time applications with data mining techniques to predict the results accurately. The datasets taken for experimentation may contain redundant, irrelevant or weakly relevant features which spoil the classification principle of support vector machines. It results in high computational complexity of the classification algorithm by increasing the unwanted calculations. This curse of dimensionality is a main impediment in data mining and machine learning algorithms. So, there is a prerequisite for dimensionality reduction techniques in this research work to enhance the results, better visualization and minimize the time as well as memory. Different types of dimensionality reduction techniques used in this research work are discussed in the following section Dimensionality Reduction Dimensionality reduction technique is used to find a feasible subset of features that are adequate to describe the actual dataset. It iteratively identifies and removes the irrelevant information and produces the feasible subset of features. In previous literature, different types of sophisticated dimensionality reduction techniques are developed to overcome the three emerging problems in classification process i.e. classifier complexity, model accuracy and comprehensiveness of induced concepts. Dimensionality reduction technique (Jensen, 2005) falls into three categories. They are given below Feature Transformation Feature Selection Feature Extraction Feature Transformation is a dimensionality reduction technique that projects the original high dimensional dataset into lower dimensional space using some algebraic functions and finds a feasible solution in continuous space. Feature Selection algorithms are used to find an optimal subset of feature vectors according A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 80

5 to the given objective function in a discrete space. It improves the learning accuracy in the classification process by removing the irrelevant features. Feature Extraction (Ripley, 1996) is a powerful dimensionality reduction technique that is generally used to estimate and construct the linear combination of continuous features in the dataset which have good discriminatory power between the class labels. Feature selection and feature transformation are the two dimensionality reduction techniques that are required in this research work to reduce the complexity of high dimensional datasets and improve the performance of SVM classifier. 5.2 FEATURE TRANSFORMATION In dimensionality reduction, feature transformation maps the high dimensional datasets to a lower dimensional space such that the locality and geometric structures are preserved. Basically, feature transformation techniques are divided into two categories. They are linear transformation and nonlinear transformation. Linear dimensionality reduction technique determines the structure of a given dataset and its internal relationships using euclidean distance. Linear dimensionality reduction techniques like Principal Component Analysis (PCA), singular value decomposition and factor analysis are based on second-order statistics and they use covariance matrix for transformations. Nonlinear dimensionality reduction technique recovers the useful and meaningful sub manifolds from high dimensional datasets. It also helps to understand and visualize the recovered sub manifolds of complex real time datasets Techniques like Kernel Principal Component Analysis (KPCA), diffusion maps and sammon s mapping are some of the nonlinear dimensionality reduction techniques. They are comparatively simple and easy to code, as they involve classical matrix for calculations. A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 81

6 Nowadays most of the real time problems are nonlinear with high dimensional datasets and it cannot be solved by the existing linear dimensionality techniques. So, nonlinear dimensionality reduction techniques are introduced. At the same time, linear dimensionality reduction techniques are not too restricted because still many applications require them to solve high dimensional problems. From the literature study, it is identified that PCA and KPCA are the well-known feature transformation methods that can be used in this research framework to improve the classifier performance. In this section, PCA and KPCA are discussed Principal Component Analysis (Lei and Govindaraju, 2005) PCA is a linear dimensionality reduction technique based on unsupervised learning. It transforms the original high dimensional datasets into a new and low dimensional space. In a new dimensional space, it determines the variance between the feature values and minimizes the reconstruction error. PCA transforms the number of possible correlated variables into a fewer number of uncorrelated variables called as principal components. It is a statistical technique that determines the key variables in a dataset to explain the difference between the observed feature values. This technique simplifies the data analysis and visualization of high dimensional dataset without loss of information. After centering the data for each feature vector, the principal components are ascertained by the Eigen value decomposition of a correlation / covariance / singular value decomposition data matrix. Here, PCA based on covariance matrix is chosen to reduce the dimensionality reduction because the variance of feature value is very high compared to correlation matrix. For different types of feature values, PCA based on type correlation is preferred. Similarly, the singular vector decomposition technique can be used for numerical feature values to improve the accuracy. Eigen vectors are often used to assess how much a principal component symbolizes the data. If the Eigen values of a principal component are higher, then A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 82

7 the data is more representative. The goal of principal component analysis is to calculate the meaningful basis to express a noisy dataset as a consistent one. PCA based on covariance is explained as follows: PCA based on Covariance Let χ ={ x 1,x 2,x 3,,x N } be the training dataset, where each x i represents a training vector, N is the size of training set and d is the dimensionality of the input vector. Using linear PCA, the maximum dimension of the projected subspace is minimum (N, d). Let X= [x1 x2 xn] denote the matrix containing the training vectors. PCA finds the eigenvectors of the covariance matrix, solving the following equation XX T e=λe (5.1) where e is an eigenvector and λ an eigen value. Using the Karhunen Loeve method (Kirby and Sirovich, 1987), pre-multiply both sides by X T, and then the equation (5.1) can be written as Kα=λα (5.2) where K = X T X and α = X T e. K is referred to as the inner product matrix of the training samples since K ij = (x i.x j ). This is a standard eigen value problem, which can be solved for α and λ. From (5.2), e = Xα (after normalization ). is calculated. The projections on the first q eigen vectors (corresponding to largest q eigen values) constitute the feature vector. For a test vector x, the principal component y corresponding to eigenvector e is given by y=e T x= (Xα) T x=α T X T x=σα i (x i.x) (5.3) where x i.x denotes the inner product of vectors x i and x. Most of the modern methods for nonlinear dimensionality reduction find their theoretical and algorithmic roots in PCA which is known for its robustness. PCA finds the mathematically optimal method, so it is sensitive to outliers in the data, that produce large errors and tries to avoid it. Hence it is common practice to remove outliers before performing PCA. A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 83

8 5.2.2 Kernel Principal Component Analysis (Schölkopf et al. 1999) KPCA is a nonlinear dimensionality reduction technique that is used for nonlinear feature transformation, denoising, statistical estimation, visualization, classification and prediction based real time problems. Nonlinearity in dimensionality reduction is introduced using the kernel trick, which is the core concept of SVMs. In the kernel trick, data samples are nonlinearly mapped into a higher dimensional reproducing kernel Hilbert space (RKHS) called as feature space F by Φ. d : F, x ( x) (5.4) The dimension of the feature space F is very large, and it can be infinite. In order to avoid complex calculation in the feature space F, kernel functions are used. By defining a kernel function in the feature space F, optimization problems are transformed to dual optimization problems in R n. Hence, the computational complexity largely depends on the number of data samples used in this method. In the case of SVM, there are many optimization techniques such as chunking and sequential minimum optimization which produce sparse solutions. Thus, the limited number of data samples are stored for further calculations that are called as support vectors. If a new input data x is evaluated, then it is needed to evaluate the kernel function of x and support vectors. KPCA using Kernel Gram Matrix Let x 1,..., x n be d-dimensional data samples. Then, correlation operator of transformed samples Φ(x 1 ),..., Φ (x n ) is estimated by R N 1 : ( x i ) ( x i ) (5.5) n i 1 where.. denotes the operator that satisfies a b c c b a for all c. Then it transforms to r-dimensional eigen space i.e. U = [u 1.u r ] T and the projection is KPCA = U T U, where } { u r i 1 i is a set of eigen vectors of R Φ. Since the correlation A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 84

9 operator R Φ : F F is huge, a trick is used to obtain eigenvectors u i. Let an operator be S: R F be n n 1)... (xn )] (xi ) ei i 1 S [ (x (5.6) n where n { e i} is the standard basis in R n. The adjoint operator of A is denoted by i 1 A *. Since R Φ = 1/n SS *, the eigenvalue decomposition satisfies KPCA S ( r i 1 1 v i i v T i ) S *, (5.7) where T denotes the transpose of a vector or a matrix, and (K x ) ij = k(x i, x j ) is called the kernel Gram matrix. For an input vector x, its projection norm is called the kernel gram matrix and it is given by the equation KPCA (x) 2 n i 1 1 i 2 * v S (x)), i (5.8) where S * Φ(x) = [k(x, x 1 ),..., k(x, x n )] T Є R n is called the empirical kernel map of x. Consequently, KPCA requires to calculate the eigen value decomposition of K x Є R nxn in the learning phase and in test phase, n times evaluations of the kernel function are required. KPCA as a natural extension of PCA KPCA gives solution to an optimization problem by the following equation mi X : F F n n (xi) - X (xi) i 1 subject to rank( X ) r, N( X ) 2 R( S), (5.9) where R(S) denotes the range or the image of the matrix X, and N (X) denotes the null space or the kernel of the matrix X. From the above description, it is known that the null space and range are ignored in an input space R d since the data samples span the R d space if the number of data samples are adequate. But, in an infinite or in high dimensional feature A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 85

10 space F, the number of data samples is much smaller than the given number of dimensions. So, using an appropriate kernel function for nonlinear dimensionality reduction, the classification performance is improved when compared to the linear transformation and multivariate analysis. 5.3 FEATURE SELECTION Feature selection aims to find an optimal feature subset from a given problem domain where the accuracy obtained from the original dataset is retained. Unlike other dimensionality reduction techniques, it preserves the originality of a dataset after reduction and improves the performance of a classifier. The efficacy of a selected feature is established by its redundancy and relevancy. If the selected feature of a dataset is predictive to its decision variable then it is said to be relevant. Similarly, if the selected features are highly correlated with other features then it is called as redundant features. From the above strategies, it is identified that the selection of a good feature subset entails the feature vectors that are correlated with the decision variables and uncorrelated with other features. In dimensionality reduction, feature selection techniques are broadly classified into three types (Jensen, 2005). They are Filter approach Wrapper approach Embedded approach Feature selection algorithms that perform the selection process separately without any learning algorithms involvement are called as a filter approach. In this approach, irrelevant features are filtered before using an induction algorithm. This technique can be applied to most real world problems where it is not interrelated with particular induction process. Feature selection algorithms that are bound together with the learning algorithms to select the subset of features are called as wrapper approach. In this method, the selection process is based on the A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 86

11 approximated accuracy from an induction process. Embedded approach is similar to wrapper approach, where the selection process is built into the classifier model. Here, the searching process takes place in the combined space of hypotheses and feature subsets. Though the wrapper and embedded based feature selection algorithms perform better and produce improved results, they are cost expensive to execute and split the dataset into large number of feature vectors. This disadvantage is due to the usage of improper learning algorithms in the evaluation of feature subsets. Also, when dealing with high dimensional datasets wrapper and embedded method encounter with a main problem i.e. an infeasible selection of subset for a given dataset. So, it is better to use filter based feature selection approaches to improve the accuracy and reduce the complexity of the proposed classification framework Fuzzy Rough Set based Feature Selection (Chen et al. 2012) From the previous work, rough set theory has been proved as a successful filter based feature selection technique that performs better in data reduction and it can be applied to many real time problems. The three main aspects of the rough set theory are as follows Hidden facts in dataset are analyzed No additional information about the data is required Minimal knowledge is represented In real time applications, there are many cases where the feature values are crisp and real valued. Therefore, most traditional feature selection algorithms fail to perform well. To overcome this issue, an actual dataset is discretized before constructing a new dataset using crisp values. Here, the degree of membership of the feature values to the discretized values are not examined and it leads to an inadequacy. So, it is clear that there is a prerequisite for feature selection techniques that can reduce the real valued and crisp attributed datasets. A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 87

12 Fuzzy theory and concept of fuzzification are the feature selection techniques that have emerged to provide an effective solution for real valued feature values. This technique allows the feature values that belong to more than one class label with different degrees of membership and models the vagueness in the dataset. Again, it is exploited with fuzzy concepts i.e. it enables an uncertainty in reasoning the dataset. To overcome the vagueness and indiscernibility in feature values, Fuzzy and rough set theory is encapsulated to remove an uncertainty in datasets. Fuzzy rough set theory is an extended version of the crisp rough set theory.it takes the degree of membership values within the range of [0, 1].It gives higher flexibility when compared to crisp rough sets where it deals only with zero or full set membership. Fuzzy rough set is described by two fuzzy sets. They are lower and upper approximation. FRQUICKREDUCT(C, D) C, the set of all conditional features D, the set of decision features R {}; γ best=0; γ prev=0 do T R γ prev= γ best for all x Є (C-R) if γ RU{x} (D)> γ T (D) T RU{x} γ best= γ T (D) R T until γ best== γ prev return R Figure 5.2 Fuzzy Rough QUICKREDUCT algorithm A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 88

13 Fuzzy Rough Feature Selection (FRFS) can be effectively used to reduce the discrete and real valued noisy attributes without any user information. In addition, this technique applies to both classification as well as regression problems that take the input value as continuous or nominal values. Information that is required to partition the fuzzy sets for feature vectors are automatically obtained from the datasets. It can also be replaced by other searching mechanisms like swarm intelligence, ant colony optimization and others. In FRFS, FRQuickreduct is the basic algorithm that has been developed to find a minimal subset of feature vectors and it is represented in Figure 5.2. It uses the fuzzy rough dependency function γ to select and add the feature values to reduct candidate. If adding any feature value to the reduct candidate fails to increase the degree of dependency, then the FRQuickreduct algorithm stops with the particular iteration. The FRQuickreduct algorithm calculates a reduct candidate with all possible subsets of feature values but it lacks in comprehensiveness.it starts the iteration with an empty set and adds a feature value one by one after checking the constraint that fuzzy rough set dependency should be increased or else it should produce a maximum value for the actual dataset. Thus, the dependency of each feature value is ascertained using FRQuickreduct algorithm and the feasible candidate is chosen. However, it is not assured that this algorithm will find a minimal subset of feature values. Sometimes the dependency function that discriminates between the reduct candidates leads to non minimal subset of features. So, it is not feasible to predict the combination of feature values that produce an optimal reduct based on the dependency function. Though the obtained result is close to the minimal subset, still it must be reduced greatly to achieve good results. As discussed earlier, this FRQuickreduct algorithm can be modified by introducing a new searching mechanism that optimizes the result which is discussed in the next section. A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 89

14 5.3.2 Differential Evolution (DE) A potential elucidation for the FRQuickreduct algorithm is to modify the algorithm with an optimization technique that selects the most relevant and optimal feature subset that constructs an accurate and robust model for the classification process. From the state of the art, differential evolution (Storn and Price, 1997) has outperformed particle swarm optimization and evolutionary algorithms. DE algorithm is observed as a best technique for real world multiobjective optimization problems over continuous domain. This population-based stochastic technique is simple, robust and converges quickly in a particular optimum value. Additionally, DE has a minimum number of parameters and same parameter settings can be used for different domains. Similar to the other optimization techniques, DE has four main steps and they are initialization, mutation, recombination and selection. First, randomly generate the population vectors from the parent population. At each point of population matrix, a mutant vector is produced by selecting first two random vectors in population. Next, perform the weight difference and add the result to the third random vector. Then the mutant vector is crossed using the original vector to occupy the position in the original matrix. Result derived from this operation is known as trial vector. Thus the equivalent position in the population can contain either the trial vector or an original vector of the target that depends on the fitness function to achieve a higher accuracy. In the differential evolution algorithm, mutation is considered as an important step where it has a simple coding mechanism and is easy to implement. 5.4 PROPOSED FRFS with Differential Evolution To improve the performance and overcome the disadvantages of Fuzzy Rough Set Feature Selection technique, the FRQuickreduct algorithm is modified with the differential evolution based optimization technique i.e. FRFSDE. The pseudo code for FRFSDE is given in the Figure 5.3. This technique enables a fast convergence towards an optimal feature subset of the original dataset. A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 90

15 FRFSDE QUICKREDUCT (C, D) C, the set of all conditional features; D, the set of decision features. Randomly select the position and velocity of the particles Initialize the population Evaluate the objective and fitness value Find the optimal feature subset as global Repeat Create a new feature subset Apply greedy selection strategy Evaluate the fitness and probability values If feature subset dominates the feature set, then the feature subset replaces the set If the feature set dominates the subset, then the feature subset is discarded Otherwise, the feature subset is added in the population Determine the best feature subset Memorize the best optimum feature subset Until the stopping criteria is satisfied Figure 5.3 Fuzzy Roughset Feature Selection with Differential Evolution In the FRFSDE algorithm, the conditional features and decision feature are taken as input data. The candidate is generated randomly for each feature vector in the parent population with lower and upper approximation bounds. Candidates are randomly generated by the particles in the population. Simultaneously, an empty feature subset is created and greedy selection strategy is used to find the fuzzy rough dependency function. If the parent dominates the candidate, then the candidate is discarded. Similarly if the candidate dominates the parent, then it is replaced by the candidate. Otherwise, the candidate is added to the population. The objective function for candidate is ascertained and evaluated using the fitness function. It is repeated until an optimal feature subset of the original dataset is derived. A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 91

16 z 5.5 EXPERIMENTAL RESULTS Support vector machines do not support the automated internal relevance vector detection and hence it is necessary to perform dimensionality reduction techniques that reduce the feature set of an original dataset. In feature transformation, PCA and KPCA are used to transform the original features into new feature values. In feature selection, FRFS and proposed FRFSDE are used to select an optimal subset of features. While implementing the benchmark data sets, the performance of feature transformation techniques has slightly decreased due to uncertainty in feature labels and high computational complexity. To overcome the shortcomings of the feature transformation techniques and improve the performance of the adopted classifier, feature selection techniques have been reviewed and implemented. Table 5.1 Brief Summary of Benchmark Datasets taken for Feature Transformation Data Sets Size Attribute Iris Liver Heart Wine Abalone Pentagon Brief summary of the datasets taken for experimenting feature transformation techniques are given in Table 5.1. Feature selection techniques are experimented with all the benchmark and synthetic datasets that are detailed in chapter 2. Performance of the feature selection and feature transformation techniques are compared using different metrics. A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 92

17 5.5.1 Feature Transformation PCA for Synthetic and UCI Data Sets In this section, Pentagon, Iris, Liver and Heart are the four datasets that are used to validate the methods. To evaluate the above method three visualizing tools like Scatter Plots, Bipareto analysis and Plotting in 3D (Hussain et al. 2011) are used and they are depicted below. A scatter plot (Pozdnoukhov et al. 2009) is often employed to identify potential associations between two variables. The Pareto Chart (Deb and Saxena, 2005; Wright and Manic 2010; Hussain et al. 2011) is basically a descending bar graph that shows the frequencies of occurrences or relative sizes of either: The various categories of all problems encountered, in order to determine which of the existing problems occur most frequently, or The various causes of a particular problem, in order to determine which of the causes of a particular problem arise most frequently. Figures 5.4 (a, b, c), 5.5 (a, b, c), 5.6 (a, b, c) and 5.7(a, b, c) represent the principal component analysis for pentagon, Iris, Liver and Heart datasets with Scatter Plot, Pareto Chart and 3D Plot. The PCA technique shows the feasibility by reducing the dimension of the dataset. Figure 5.4 (a) represents the scatter plot with principal component 1 in X axis and principal component 2 in Y axis and shows the divergence of data while using PCA for pentagon dataset. Figure 5.4 (b) illustrates Pareto chart for the synthetic dataset with variance and principal component as their dimensions. Figure 5.4 (c) symbolizes the 3D view with principal components. Figures 5.5, 5.6 and 5.7 represent the performance of PCA on Iris, Liver and heart datasets. A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 93

18 Figure 5.4 (a) Scatter plot for Pentagon dataset Figure 5.4 (b) Pareto Chart for Pentagon dataset Figure 5.4 (c) 3D Plot for Pentagon dataset Figure 5.5 (a) Scatter plot for Iris dataset A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 94

19 Figure 5.5 (b) Pareto Chart for Iris dataset Figure 5.5 (c) 3D Plot for Iris dataset Figure 5.6 (a) Scatter plot for Liver dataset Figure 5.6 (b) Pareto Chart for Liver dataset A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 95

20 Figure 5.6 (c) 3D Plot for Liver dataset Figure 5.7(a) Scatter plot for Heart dataset Figure 5.7(b) Pareto Chart for Heart dataset Figure 5.7 (c) 3D Plot for Heart dataset KPCA for Synthetic and UCI Data Sets In this section, four datasets i.e. Pentagon, Iris, Wine and Abalone are used to validate the results. Figures 5.8, 5.9, 5.10 and 5.11 represent KPCA for four datasets with data distribution and contour lines of different projection norms with parameter values (Huang et al. 2009). Here, the Gaussian kernel is used with r as A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 96

21 rank operators for the input dimension of the datasets and the kernel parameter c is 1 2 set to, where σ is variance for all elements. Since the number of samples in left 2 side is less than that on the right side, contour lines are biased to right side. This indicates that KPCA outperforms PCA. From the experimental results, it is proposed that KPCA can be used for high dimensional datasets before SVM classification, since KPCA is an extension of the PCA where it uses the kernel function to give a better transformation for kernel based classifiers. PCA is the root for all dimensionality reduction techniques, it can also be used for lower dimension data before classification. Even though KPCA performs better, the removed feature labels cannot be identified in feature transformation techniques because PCA or KPCA transforms the original dataset into a derived feature value with high computational cost. This is the major drawback of feature transformation and it can be overcome by feature selection techniques. Figure 5.8 Kernel PCA for Pentagon dataset A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 97

22 Figure 5.9 Kernel PCA for Iris dataset Figure 5.10 Kernel PCA for Wine dataset A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 98

23 Figure 5.11 Kernel PCA for Abalone dataset Feature Selection In this section, two feature selection techniques i.e. FRFS and FRFSDE are implemented for binary and multiclass datasets. Generally, goodness of a feature selection technique is defined by the information theoretic measures. To make these techniques more reliable and to compare their performance, four important information theoretic measures are considered. They are as follows: i. Fuzzy Entropy The fuzzy entropy of an attribute subset R is defined by the following equation F E( R) H( F) (5.10) Y F U / R Y U / R where R is an attribute subset, F is collection of fuzzy subsets, H(F) is the fuzzy entropy for a fuzzy subset, D is the set of classes and U is the nonempty set of finite objects. A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 99

24 [ ii. Mutual Information The mutual information between two random variables X and Y is given by the equation I( X; Y) H( X ) H( X Y) (5.11) where H(X) is an entropy and H(X Y) is a conditional entropy. iii. Information Gain An information gain IG(S,A) is denoted by the equation Sv IG( S, A) Entropy( S) Entropy( Sv) (5.12) S v values ( A) where A is an attribute set and S is feature subsets. iv. Conditional Entropy The conditional entropy H(Y X) is given by the equation H( Y X ) p( x) H( Y X x) x X (5.13) where X, Y are the two random variables and p(x) is probability. Entropy is an information theoretic measure that checks an uncertainty of a random variable. Thus, the lowest entropy of feature values can be well suited for any classification process because these feature values are most informative. As such, the feature values with minimum fuzzy entropy and conditional entropy are considered as an informative feature subsets. Here, the average of entropy and mutual information of the selected feature values are ascertained for comparison. A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 100

25 Table 5.2 Performance of FRFS and FRFSDE techniques for Binary and Multiclass datasets Datasets FRFS FRFSDE FME FMMI MIG CENT FME FMMI MIG CENT Iris Liver Heart Diabetes Breast Cancer Hepatitis Ripley Glass E-Coli Wine Balance Scale Lenses Pentagon Similarly, for information gain and mutual information, the value of the feature subsets should be maximum. Mutual information measures how much information these subsets can contribute to make a correct classification. In selecting feature subsets, if the mutual information reaches its maximum level then it is represented as a perfect indicator for class labels. Information gain selects the features that have highest discrimination and it helps in finding an optimal subset of features. Performance of FRFS and FRFSDE are compared using the above discussed information theoretic measures and it is depicted in the Table 5.2. A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 101

26 Table 5.3 Brief summary and size reduction of Binary and Multiclass datasets using FRFS and FRFSDE Datasets No. of. Instance s No. of. Feature s Order FRFS No. of. Feat. FRFSDE Order No. of. Feat. Iris {3,4,1} 3 {3,4} 2 Liver {5,4,2,1,3} 5 {5,4,2} 3 Heart {5,1,4,13} 4 {1,4,5} 3 Diabetes {2,3,8,13} 5 {3,13,8} 3 Breast Cancer {1,2,5,8} 4 {2,5,8} 2 Hepatitis {17,14,12,16} 4 {12,14,16} 3 Ripley {2,1} 2 {2,1} 2 Glass {4,3,6,7,8} 5 {4,3,6} 3 E-Coli {6,1,3,2} 4 {1,2,6} 3 Wine {1,2,7,13} 4 {7,13} 2 Balance Scale {1,2,3,4} 4 {1,2} 2 Lenses 24 4 {4,3,1} 3 {4,3} 2 Pentagon 99 2 {1,2} 2 {1,2} 2 From the Table 5.2, it is inferred that the proposed FRFSDE outperforms FRFS by increasing the information gain and mutual information, and decreasing the fuzzy and conditional entropy. Thus the obtained feature subsets are the optimal and feasible subsets that can be used further for the classification process. Table 5.3 represents the datasets, number of instances, number of features and the subset of features that are selected using FRFS and FRFSDE. From the Table 5.3, it is deduced that an optimal subset of feature sets are derived using the proposed FRFSDE technique when compared to FRFS technique. Thus, it is suggested to use the proposed FRFSDE technique to improve the classifier performance and reduce the dimensionality in the research framework. A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 102

27 5.6 CHAPTER SUMMARY This chapter presents in detail the dimensionality reduction techniques and the need for it in different perspectives. In classification process, there may be feature values that are common to more than one class and they can contribute less information to the results. Thus, removing the features that has minimum information can improve the consistency and performance of a classifier. It also avoids curse of dimensionality and overfitting. The feature values with higher information increases the performance of a classification process and at the same time it reduces the size of a classifier model which leads to an efficient memory usage. Reducing the feature values may seem to be instinctively wrong but from the experiments, it is shown that results are either retained or improved. Here, the modified Fuzzy Rough Set Differential Evolution (FRFSDE) based feature selection outperforms the existing fuzzy rough set. It proves to be a feasible technique and can be used in the research framework for improving the performance. A Framework for Admissible Kernel Function in Support Vector Machines using Lévy Distribution 103

Dimension Reduction CS534

Dimension Reduction CS534 Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures

More information

Feature Selection Using Modified-MCA Based Scoring Metric for Classification

Feature Selection Using Modified-MCA Based Scoring Metric for Classification 2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification

More information

FEATURE SELECTION TECHNIQUES

FEATURE SELECTION TECHNIQUES CHAPTER-2 FEATURE SELECTION TECHNIQUES 2.1. INTRODUCTION Dimensionality reduction through the choice of an appropriate feature subset selection, results in multiple uses including performance upgrading,

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

More information

Unsupervised learning in Vision

Unsupervised learning in Vision Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual

More information

Chap.12 Kernel methods [Book, Chap.7]

Chap.12 Kernel methods [Book, Chap.7] Chap.12 Kernel methods [Book, Chap.7] Neural network methods became popular in the mid to late 1980s, but by the mid to late 1990s, kernel methods have also become popular in machine learning. The first

More information

CHAPTER 3 PRINCIPAL COMPONENT ANALYSIS AND FISHER LINEAR DISCRIMINANT ANALYSIS

CHAPTER 3 PRINCIPAL COMPONENT ANALYSIS AND FISHER LINEAR DISCRIMINANT ANALYSIS 38 CHAPTER 3 PRINCIPAL COMPONENT ANALYSIS AND FISHER LINEAR DISCRIMINANT ANALYSIS 3.1 PRINCIPAL COMPONENT ANALYSIS (PCA) 3.1.1 Introduction In the previous chapter, a brief literature review on conventional

More information

3 Feature Selection & Feature Extraction

3 Feature Selection & Feature Extraction 3 Feature Selection & Feature Extraction Overview: 3.1 Introduction 3.2 Feature Extraction 3.3 Feature Selection 3.3.1 Max-Dependency, Max-Relevance, Min-Redundancy 3.3.2 Relevance Filter 3.3.3 Redundancy

More information

CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM

CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM 96 CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM Clustering is the process of combining a set of relevant information in the same group. In this process KM algorithm plays

More information

Discriminate Analysis

Discriminate Analysis Discriminate Analysis Outline Introduction Linear Discriminant Analysis Examples 1 Introduction What is Discriminant Analysis? Statistical technique to classify objects into mutually exclusive and exhaustive

More information

Clustering and Visualisation of Data

Clustering and Visualisation of Data Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

More information

Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga.

Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. Americo Pereira, Jan Otto Review of feature selection techniques in bioinformatics by Yvan Saeys, Iñaki Inza and Pedro Larrañaga. ABSTRACT In this paper we want to explain what feature selection is and

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

Feature Selection in Knowledge Discovery

Feature Selection in Knowledge Discovery Feature Selection in Knowledge Discovery Susana Vieira Technical University of Lisbon, Instituto Superior Técnico Department of Mechanical Engineering, Center of Intelligent Systems, IDMEC-LAETA Av. Rovisco

More information

Dimensionality Reduction, including by Feature Selection.

Dimensionality Reduction, including by Feature Selection. Dimensionality Reduction, including by Feature Selection www.cs.wisc.edu/~dpage/cs760 Goals for the lecture you should understand the following concepts filtering-based feature selection information gain

More information

Data preprocessing Functional Programming and Intelligent Algorithms

Data preprocessing Functional Programming and Intelligent Algorithms Data preprocessing Functional Programming and Intelligent Algorithms Que Tran Høgskolen i Ålesund 20th March 2017 1 Why data preprocessing? Real-world data tend to be dirty incomplete: lacking attribute

More information

Contents. Preface to the Second Edition

Contents. Preface to the Second Edition Preface to the Second Edition v 1 Introduction 1 1.1 What Is Data Mining?....................... 4 1.2 Motivating Challenges....................... 5 1.3 The Origins of Data Mining....................

More information

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\ Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured

More information

Feature selection. Term 2011/2012 LSI - FIB. Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/ / 22

Feature selection. Term 2011/2012 LSI - FIB. Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/ / 22 Feature selection Javier Béjar cbea LSI - FIB Term 2011/2012 Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/2012 1 / 22 Outline 1 Dimensionality reduction 2 Projections 3 Attribute selection

More information

Data mining with sparse grids

Data mining with sparse grids Data mining with sparse grids Jochen Garcke and Michael Griebel Institut für Angewandte Mathematik Universität Bonn Data mining with sparse grids p.1/40 Overview What is Data mining? Regularization networks

More information

SGN (4 cr) Chapter 10

SGN (4 cr) Chapter 10 SGN-41006 (4 cr) Chapter 10 Feature Selection and Extraction Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology February 18, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006

More information

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach, Kumar Data Preprocessing Aggregation Sampling Dimensionality Reduction Feature subset selection Feature creation

More information

PCA and KPCA algorithms for Face Recognition A Survey

PCA and KPCA algorithms for Face Recognition A Survey PCA and KPCA algorithms for Face Recognition A Survey Surabhi M. Dhokai 1, Vaishali B.Vala 2,Vatsal H. Shah 3 1 Department of Information Technology, BVM Engineering College, surabhidhokai@gmail.com 2

More information

Divide and Conquer Kernel Ridge Regression

Divide and Conquer Kernel Ridge Regression Divide and Conquer Kernel Ridge Regression Yuchen Zhang John Duchi Martin Wainwright University of California, Berkeley COLT 2013 Yuchen Zhang (UC Berkeley) Divide and Conquer KRR COLT 2013 1 / 15 Problem

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

Uncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique

Uncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique Research Paper Uncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique C. Sudarsana Reddy 1 S. Aquter Babu 2 Dr. V. Vasu 3 Department

More information

Unsupervised Learning

Unsupervised Learning Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES

CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 70 CHAPTER 3 A FAST K-MODES CLUSTERING ALGORITHM TO WAREHOUSE VERY LARGE HETEROGENEOUS MEDICAL DATABASES 3.1 INTRODUCTION In medical science, effective tools are essential to categorize and systematically

More information

Machine Learning Techniques for Data Mining

Machine Learning Techniques for Data Mining Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already

More information

1) Give decision trees to represent the following Boolean functions:

1) Give decision trees to represent the following Boolean functions: 1) Give decision trees to represent the following Boolean functions: 1) A B 2) A [B C] 3) A XOR B 4) [A B] [C Dl Answer: 1) A B 2) A [B C] 1 3) A XOR B = (A B) ( A B) 4) [A B] [C D] 2 2) Consider the following

More information

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging 1 CS 9 Final Project Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging Feiyu Chen Department of Electrical Engineering ABSTRACT Subject motion is a significant

More information

Unsupervised Feature Selection Using Multi-Objective Genetic Algorithms for Handwritten Word Recognition

Unsupervised Feature Selection Using Multi-Objective Genetic Algorithms for Handwritten Word Recognition Unsupervised Feature Selection Using Multi-Objective Genetic Algorithms for Handwritten Word Recognition M. Morita,2, R. Sabourin 3, F. Bortolozzi 3 and C. Y. Suen 2 École de Technologie Supérieure, Montreal,

More information

Alternative Statistical Methods for Bone Atlas Modelling

Alternative Statistical Methods for Bone Atlas Modelling Alternative Statistical Methods for Bone Atlas Modelling Sharmishtaa Seshamani, Gouthami Chintalapani, Russell Taylor Department of Computer Science, Johns Hopkins University, Baltimore, MD Traditional

More information

Feature Selection. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 262

Feature Selection. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 262 Feature Selection Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester 2016 239 / 262 What is Feature Selection? Department Biosysteme Karsten Borgwardt Data Mining Course Basel

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

A New Improved Hybridized K-MEANS Clustering Algorithm with Improved PCA Optimized with PSO for High Dimensional Data Set

A New Improved Hybridized K-MEANS Clustering Algorithm with Improved PCA Optimized with PSO for High Dimensional Data Set International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-2, Issue-2, May 2012 A New Improved Hybridized K-MEANS Clustering Algorithm with Improved PCA Optimized with PSO

More information

Linear Methods for Regression and Shrinkage Methods

Linear Methods for Regression and Shrinkage Methods Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors

More information

A Survey on Pre-processing and Post-processing Techniques in Data Mining

A Survey on Pre-processing and Post-processing Techniques in Data Mining , pp. 99-128 http://dx.doi.org/10.14257/ijdta.2014.7.4.09 A Survey on Pre-processing and Post-processing Techniques in Data Mining Divya Tomar and Sonali Agarwal Indian Institute of Information Technology,

More information

Classification with Diffuse or Incomplete Information

Classification with Diffuse or Incomplete Information Classification with Diffuse or Incomplete Information AMAURY CABALLERO, KANG YEN Florida International University Abstract. In many different fields like finance, business, pattern recognition, communication

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Machine Learning Feature Creation and Selection

Machine Learning Feature Creation and Selection Machine Learning Feature Creation and Selection Jeff Howbert Introduction to Machine Learning Winter 2012 1 Feature creation Well-conceived new features can sometimes capture the important information

More information

The Pre-Image Problem in Kernel Methods

The Pre-Image Problem in Kernel Methods The Pre-Image Problem in Kernel Methods James Kwok Ivor Tsang Department of Computer Science Hong Kong University of Science and Technology Hong Kong The Pre-Image Problem in Kernel Methods ICML-2003 1

More information

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS

AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS H.S Behera Department of Computer Science and Engineering, Veer Surendra Sai University

More information

Data Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\

Data Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\ Data Preprocessing S - MAI AMLT - 2016/2017 (S - MAI) Data Preprocessing AMLT - 2016/2017 1 / 71 Outline 1 Introduction Data Representation 2 Data Preprocessing Outliers Missing Values Normalization Discretization

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,

More information

MUSI-6201 Computational Music Analysis

MUSI-6201 Computational Music Analysis MUSI-6201 Computational Music Analysis Part 4.3: Feature Post-Processing alexander lerch November 4, 2015 instantaneous features overview text book Chapter 3: Instantaneous Features (pp. 63 69) sources:

More information

10-701/15-781, Fall 2006, Final

10-701/15-781, Fall 2006, Final -7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly

More information

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation Optimization Methods: Introduction and Basic concepts 1 Module 1 Lecture Notes 2 Optimization Problem and Model Formulation Introduction In the previous lecture we studied the evolution of optimization

More information

Min-Uncertainty & Max-Certainty Criteria of Neighborhood Rough- Mutual Feature Selection

Min-Uncertainty & Max-Certainty Criteria of Neighborhood Rough- Mutual Feature Selection Information Technology Min-Uncertainty & Max-Certainty Criteria of Neighborhood Rough- Mutual Feature Selection Sombut FOITHONG 1,*, Phaitoon SRINIL 1, Ouen PINNGERN 2 and Boonwat ATTACHOO 3 1 Faculty

More information

CS 521 Data Mining Techniques Instructor: Abdullah Mueen

CS 521 Data Mining Techniques Instructor: Abdullah Mueen CS 521 Data Mining Techniques Instructor: Abdullah Mueen LECTURE 2: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major Tasks

More information

Sparsity Based Regularization

Sparsity Based Regularization 9.520: Statistical Learning Theory and Applications March 8th, 200 Sparsity Based Regularization Lecturer: Lorenzo Rosasco Scribe: Ioannis Gkioulekas Introduction In previous lectures, we saw how regularization

More information

Linear Discriminant Analysis for 3D Face Recognition System

Linear Discriminant Analysis for 3D Face Recognition System Linear Discriminant Analysis for 3D Face Recognition System 3.1 Introduction Face recognition and verification have been at the top of the research agenda of the computer vision community in recent times.

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling

More information

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Data Mining - Data Dr. Jean-Michel RICHER 2018 jean-michel.richer@univ-angers.fr Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Outline 1. Introduction 2. Data preprocessing 3. CPA with R 4. Exercise

More information

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or

More information

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM

CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM 20 CHAPTER 2 CONVENTIONAL AND NON-CONVENTIONAL TECHNIQUES TO SOLVE ORPD PROBLEM 2.1 CLASSIFICATION OF CONVENTIONAL TECHNIQUES Classical optimization methods can be classified into two distinct groups:

More information

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Jian Guo, Debadyuti Roy, Jing Wang University of Michigan, Department of Statistics Introduction In this report we propose robust

More information

Kernel Principal Component Analysis: Applications and Implementation

Kernel Principal Component Analysis: Applications and Implementation Kernel Principal Component Analysis: Applications and Daniel Olsson Royal Institute of Technology Stockholm, Sweden Examiner: Prof. Ulf Jönsson Supervisor: Prof. Pando Georgiev Master s Thesis Presentation

More information

Cover Page. The handle holds various files of this Leiden University dissertation.

Cover Page. The handle   holds various files of this Leiden University dissertation. Cover Page The handle http://hdl.handle.net/1887/22055 holds various files of this Leiden University dissertation. Author: Koch, Patrick Title: Efficient tuning in supervised machine learning Issue Date:

More information

Data mining with sparse grids using simplicial basis functions

Data mining with sparse grids using simplicial basis functions Data mining with sparse grids using simplicial basis functions Jochen Garcke and Michael Griebel Institut für Angewandte Mathematik Universität Bonn Part of the work was supported within the project 03GRM6BN

More information

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Decision Tree CE-717 : Machine Learning Sharif University of Technology Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

CS 195-5: Machine Learning Problem Set 5

CS 195-5: Machine Learning Problem Set 5 CS 195-5: Machine Learning Problem Set 5 Douglas Lanman dlanman@brown.edu 26 November 26 1 Clustering and Vector Quantization Problem 1 Part 1: In this problem we will apply Vector Quantization (VQ) to

More information

Features: representation, normalization, selection. Chapter e-9

Features: representation, normalization, selection. Chapter e-9 Features: representation, normalization, selection Chapter e-9 1 Features Distinguish between instances (e.g. an image that you need to classify), and the features you create for an instance. Features

More information

BENCHMARKING ATTRIBUTE SELECTION TECHNIQUES FOR MICROARRAY DATA

BENCHMARKING ATTRIBUTE SELECTION TECHNIQUES FOR MICROARRAY DATA BENCHMARKING ATTRIBUTE SELECTION TECHNIQUES FOR MICROARRAY DATA S. DeepaLakshmi 1 and T. Velmurugan 2 1 Bharathiar University, Coimbatore, India 2 Department of Computer Science, D. G. Vaishnav College,

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

Supervised Learning: K-Nearest Neighbors and Decision Trees

Supervised Learning: K-Nearest Neighbors and Decision Trees Supervised Learning: K-Nearest Neighbors and Decision Trees Piyush Rai CS5350/6350: Machine Learning August 25, 2011 (CS5350/6350) K-NN and DT August 25, 2011 1 / 20 Supervised Learning Given training

More information

CSE 258 Lecture 5. Web Mining and Recommender Systems. Dimensionality Reduction

CSE 258 Lecture 5. Web Mining and Recommender Systems. Dimensionality Reduction CSE 258 Lecture 5 Web Mining and Recommender Systems Dimensionality Reduction This week How can we build low dimensional representations of high dimensional data? e.g. how might we (compactly!) represent

More information

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Do Something..

More information

May 1, CODY, Error Backpropagation, Bischop 5.3, and Support Vector Machines (SVM) Bishop Ch 7. May 3, Class HW SVM, PCA, and K-means, Bishop Ch

May 1, CODY, Error Backpropagation, Bischop 5.3, and Support Vector Machines (SVM) Bishop Ch 7. May 3, Class HW SVM, PCA, and K-means, Bishop Ch May 1, CODY, Error Backpropagation, Bischop 5.3, and Support Vector Machines (SVM) Bishop Ch 7. May 3, Class HW SVM, PCA, and K-means, Bishop Ch 12.1, 9.1 May 8, CODY Machine Learning for finding oil,

More information

Nonlinear projections. Motivation. High-dimensional. data are. Perceptron) ) or RBFN. Multi-Layer. Example: : MLP (Multi(

Nonlinear projections. Motivation. High-dimensional. data are. Perceptron) ) or RBFN. Multi-Layer. Example: : MLP (Multi( Nonlinear projections Université catholique de Louvain (Belgium) Machine Learning Group http://www.dice.ucl ucl.ac.be/.ac.be/mlg/ 1 Motivation High-dimensional data are difficult to represent difficult

More information

A Hybrid Feature Selection Algorithm Based on Information Gain and Sequential Forward Floating Search

A Hybrid Feature Selection Algorithm Based on Information Gain and Sequential Forward Floating Search A Hybrid Feature Selection Algorithm Based on Information Gain and Sequential Forward Floating Search Jianli Ding, Liyang Fu School of Computer Science and Technology Civil Aviation University of China

More information

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing

More information

Forward Feature Selection Using Residual Mutual Information

Forward Feature Selection Using Residual Mutual Information Forward Feature Selection Using Residual Mutual Information Erik Schaffernicht, Christoph Möller, Klaus Debes and Horst-Michael Gross Ilmenau University of Technology - Neuroinformatics and Cognitive Robotics

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1

SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1 SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1 Overview The goals of analyzing cross-sectional data Standard methods used

More information

FEATURE GENERATION USING GENETIC PROGRAMMING BASED ON FISHER CRITERION

FEATURE GENERATION USING GENETIC PROGRAMMING BASED ON FISHER CRITERION FEATURE GENERATION USING GENETIC PROGRAMMING BASED ON FISHER CRITERION Hong Guo, Qing Zhang and Asoke K. Nandi Signal Processing and Communications Group, Department of Electrical Engineering and Electronics,

More information

Advanced Machine Learning Practical 1: Manifold Learning (PCA and Kernel PCA)

Advanced Machine Learning Practical 1: Manifold Learning (PCA and Kernel PCA) Advanced Machine Learning Practical : Manifold Learning (PCA and Kernel PCA) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch, nadia.figueroafernandez@epfl.ch

More information

Modelling and Visualization of High Dimensional Data. Sample Examination Paper

Modelling and Visualization of High Dimensional Data. Sample Examination Paper Duration not specified UNIVERSITY OF MANCHESTER SCHOOL OF COMPUTER SCIENCE Modelling and Visualization of High Dimensional Data Sample Examination Paper Examination date not specified Time: Examination

More information

Deep Generative Models Variational Autoencoders

Deep Generative Models Variational Autoencoders Deep Generative Models Variational Autoencoders Sudeshna Sarkar 5 April 2017 Generative Nets Generative models that represent probability distributions over multiple variables in some way. Directed Generative

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Convexization in Markov Chain Monte Carlo

Convexization in Markov Chain Monte Carlo in Markov Chain Monte Carlo 1 IBM T. J. Watson Yorktown Heights, NY 2 Department of Aerospace Engineering Technion, Israel August 23, 2011 Problem Statement MCMC processes in general are governed by non

More information

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017 CPSC 340: Machine Learning and Data Mining Kernel Trick Fall 2017 Admin Assignment 3: Due Friday. Midterm: Can view your exam during instructor office hours or after class this week. Digression: the other

More information

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis

Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction

More information

Feature Selection Using Principal Feature Analysis

Feature Selection Using Principal Feature Analysis Feature Selection Using Principal Feature Analysis Ira Cohen Qi Tian Xiang Sean Zhou Thomas S. Huang Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign Urbana,

More information

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 15 Table of contents 1 Introduction 2 Data preprocessing

More information

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394

Data Mining. Data preprocessing. Hamid Beigy. Sharif University of Technology. Fall 1394 Data Mining Data preprocessing Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1394 1 / 15 Table of contents 1 Introduction 2 Data preprocessing

More information

INF 4300 Classification III Anne Solberg The agenda today:

INF 4300 Classification III Anne Solberg The agenda today: INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15

More information

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES

CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES 6.1 INTRODUCTION The exploration of applications of ANN for image classification has yielded satisfactory results. But, the scope for improving

More information

CPSC 340: Machine Learning and Data Mining. Multi-Dimensional Scaling Fall 2017

CPSC 340: Machine Learning and Data Mining. Multi-Dimensional Scaling Fall 2017 CPSC 340: Machine Learning and Data Mining Multi-Dimensional Scaling Fall 2017 Assignment 4: Admin 1 late day for tonight, 2 late days for Wednesday. Assignment 5: Due Monday of next week. Final: Details

More information

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction CSE 255 Lecture 5 Data Mining and Predictive Analytics Dimensionality Reduction Course outline Week 4: I ll cover homework 1, and get started on Recommender Systems Week 5: I ll cover homework 2 (at the

More information

Sparse and large-scale learning with heterogeneous data

Sparse and large-scale learning with heterogeneous data Sparse and large-scale learning with heterogeneous data February 15, 2007 Gert Lanckriet (gert@ece.ucsd.edu) IEEE-SDCIS In this talk Statistical machine learning Techniques: roots in classical statistics

More information

A Stochastic Optimization Approach for Unsupervised Kernel Regression

A Stochastic Optimization Approach for Unsupervised Kernel Regression A Stochastic Optimization Approach for Unsupervised Kernel Regression Oliver Kramer Institute of Structural Mechanics Bauhaus-University Weimar oliver.kramer@uni-weimar.de Fabian Gieseke Institute of Structural

More information