Learning Manifolds in Forensic Data

Size: px
Start display at page:

Download "Learning Manifolds in Forensic Data"

Transcription

1 Learning Manifolds in Forensic Data Frédéric Ratle 1, Anne-Laure Terrettaz-Zufferey 2, Mikhail Kanevski 1, Pierre Esseiva 2, and Olivier Ribaux 2 1 Institut de Géomatique et d Analyse du Risque, Faculté desgéosciences et de l Environnement, Université de Lausanne, Amphipôle, CH-1015, Switzerland frederic.ratle@unil.ch 2 Institut de Police Scientifique et de Criminologie, Ecole des Sciences Criminelles, Université de Lausanne, Batochime, CH-1015, Switzerland Abstract. Chemical data related to illicit cocaine seizures is analyzed using linear and nonlinear dimensionality reduction methods. The goal is to find relevant features that could guide the data analysis process in chemical drug profiling, a recent field in the crime mapping community. The data has been collected using gas chromatography analysis. Several methods are tested: PCA, kernel PCA, isomap, spatio-temporal isomap and locally linear embedding. ST-isomap is used to detect a potential time-dependent nonlinear manifold, the data being sequential. Results show that the presence of a simple nonlinear manifold in the data is very likely and that this manifold cannot be detected by a linear PCA. The presence of temporal regularities is also observed with ST-isomap. Kernel PCA and isomap perform better than the other methods, and kernel PCA is more robust than isomap when introducing random perturbations in the dataset. 1 Introduction Chemical profiling of illicit drugs has become an important field in crime mapping in recent years. While traditional crime mapping research has focused on criminal events, i.e., the analysis of spatial and temporal events with traditional statistical methods, the analysis of the chemical composition of drug samples can reveal important information related to the evolution and the dynamics of illicit drugs market. As described in [2], many types of substances can be found in a cocaine sample seized from a street dealer. Among those, there are of course the main constituants of the drug itself, but also chemical residues of the fabrication process and cutting agents used to dilute the final product. Each of these can possibly provide information about a certain stage of drug processing, from the growth conditions of the original plant to the street distribution. This study will focus on cocaine main constituants, which are enumerated in section 3. This work was supported by the Swiss National Science Foundation (grant no ). S. Kollias et al. (Eds.): ICANN 2006, Part II, LNCS 4132, pp , c Springer-Verlag Berlin Heidelberg 2006

2 Learning Manifolds in Forensic Data Related Work A preliminary study was made by the same authors in [1], where heroin data was used. PCA, clustering and classification algorithms (MLP, PNN, RBF networks and k-nearest neighbors) were successfully applied. However, heroin data has less variables (6 main constituants), which makes it more likely to be reduced to few features. A thorough review of the field of chemical drug profiling can be found in Guéniat and Esseiva [2]. In this book, authors have tested several statistical methods for heroin and cocaine profiling. Among other methods, they have mainly used similarity measures between samples to determine the main data classes. A methodology based on the square cosine function as an intercorrelation measurement is explained in further details in Esseiva et al. [3]. Also, principal component analysis (PCA) and soft independent modelling of class analogies (SIMCA) have been applied for dimensionality reduction and supervised classification. A radial basis function network has been trained on the processed data and showed encouraging results. The classes used for classification were based solely on indices of chemical similarities found between data points. This methodology was further developed by the same authors in [4]. Another type of data was studied by Madden and Ryder [5]: Raman spectroscopy obtained from solid mixtures containing cocaine. The goal was to predict, based on the Raman spectrum, the cocaine concentration in a solid using k-nearest neighbors, neural networks and partial least squares. They have also used a genetic algorithm to perform feature selection. However, their study has been constrained by a very limited number of experimental samples, even though results were good. Also, the experimental method of sample analysis is fundamentally different from the one used in this study (gas chromatography). Similarly, Raman spectroscopy data was studied in [6] using support vector machines with RBF and polynomial kernels, KNN, the C4.5 decision tree and a naive Bayes classifier. The goal of the classification algorithm was to discriminate samples containing acetaminophen (used as a cutting agent) from those that do not. The RBF-kernel SVM outperformed all the other algorithms on a dataset of 217 samples using 22-fold cross-validation. 3 The Data The data has 13 initial features, i.e., the 13 main chemical components of cocaine, measured by peaks area on the spectrum obtained for each sample: 1. Cocaine 2. Tropacocaine 3. Benzoic acid 4. Norcocaine 5. Ecgonine

3 896 F. Ratle et al. 6. Ecgonine methyl ester 7. N-formylcocaine 8. Trans-cinnamic acid 9. Anhydroecgonine 10. Anhydroecgonine methyl ester 11. Benzoylecgonine 12. Cis-cinnamoylecgonine methyl ester 13. Trans-cinnamoylecgonine methyl ester. Time is also implicitely considered in ST-isomap. Five dimensionality reduction algorithms are used: a standard principal component analysis, kernel PCA [7], locally linear embedding (LLE) [8], isomap [9] and spatio-temporal isomap [10]. The latter has been used in order to detect any relationship in the temporal evolution of the drug s chemical composition, given that the analyses have been sequentially ordered with respect to the date of seizure for that experiment. Every sample has been normalized by dividing each variable by the total area of the peaks of the chromatogram for that sample, every peak being associated with one chemical substance. This normalization is common practice in the field of chemometrics and aims at accounting for the variation in the purity of samples, i.e., the concentration of pure cocaine in the sample samples were considered. It is worth noting that a dataset of this size is rather unusual due to the restricted availability of this type of data. 4 Methodology and Results Due to the size of the dataset (9500 samples, 13 variables), the methods involving the computation of a Gram matrix or distance matrix were repeated several times with random subsets of the data of 50% of its initial size. All the experiments were done in Matlab. The kernel PCA implementation was taken from the pattern classification toolbox by Stork and Yom-Tov [11], which implements algorithms described in Duda et al. [12]. LLE, isomap and ST-isomap implementations were provided by the respective authors of the algorithms. 4.1 Principal Component Analysis Following normalization and centering of the data, a simple PCA was performed. The eigenvalues seem to increase linearly in absolute value, and a subset of at least six variables is necessary in order to explain 80% of the data variability. Fig. 1 shows the residual variance vs the number of components in the subset. Given that the data can be reduced at most to 6 or 7 components, the results obtained with PCA are not convincing and suggest the use of methods for detecting nonlinear structures, i.e., no simple linear strucure seem to lie in the high-dimensional space. As an indication, the two first principal components are illustrated in Fig. 2.

4 Learning Manifolds in Forensic Data 897 Fig. 1. Residual variance vs number of components Fig. 2. The two main principal components 4.2 Kernel PCA Kernel PCA was introduced by Schölkopf et al. [7] and aims at performing a PCA in feature space, where the nonlinear manifold is linear, using the kernel trick. KPCA is thus a simple yet very powerful technique to learn nonlinear structures. Using the Gram matrix K, defined by a positive semidefinite kernel (usually linear, polynomial or Gaussian), rather than the empirical covariance matrix and knowing that, as for PCA, the new variables can be expressed as the product of eigenvectors of the covariance matrix and the data, the nonlinear projection can be expressed as:

5 898 F. Ratle et al. ( Vk Φ (x) ) = N α k i K (x i, x) (1) where N is the number of data points, α k i is the eigenvector of the Gram matrix corresponding to the eigenvector V k of the covariance matrix in feature space, which does not need to be computed. The radial basis function kernel provided the best results (among linear, polynomial and Gaussian), using a Gaussian width of 0.1. Fig. 3 shows the two-dimensional manifold obtained with KPCA. Unlike PCA, a coherent structure is recognizable here, and it seems that two nonlinear features reasonably account for the variation in the whole dataset. i=1 Fig. 3. Two-dimensional embedding with kernel PCA 4.3 Locally Linear Embedding LLE [8] aims at constructing a low-dimensional manifold by building local linear models in the data. Each point is embedded in the lower-dimensional coordinate system by a linear combination of its neighbors: ˆX i = W i X i (2) i N k (X i) where N k (X i ) is the neighborhood of the point X i,ofsizek. The quality of the resulting projection is measured by the squared difference between the original point and its projection. The main parameter to tune is the number k of neighbors used for the projection. Values from 3 to 50 have been tested, and the setting k = 40 has provided the best resulting manifold, even though this neighborhood value is unusually large. Fig. 4 show the three-dimensional embedding obtained. As for KPCA, a structure can be recognized. However, it is not as distinct, and suggests that LLE cannot easily represent the underlying manifold compared to KPCA.

6 Learning Manifolds in Forensic Data Isomap Fig. 4. Three-dimensional embedding with LLE Isomap [9] is a global method for reduction of dimensionality. It uses the classical linear method of multi-dimensional scaling (MDS) [13], but with geodesic distances rather than Euclidean distances. The geodesic distance between two points is the shortest path along the manifold. Indeed, the Euclidean distance does not appropriately estimate the distance between two points lying on a nonlinear manifold. However, it is usually locally accurate, i.e., between neighboring points. Isomap can therefore be summarized as: 1. Determination of every point s nearest neighbors (using Euclidean distances); 2. Construction of a graph connecting every point to its nearest neighbours; Fig. 5. Residual variance vs Isomap dimensionality

7 900 F. Ratle et al. 3. Calculation of the shortest path on the graph between every pair of points; 4. Application of multi-dimensional scaling on the resulting distances (geodesic distances). The application of this algorithm on the chemical variables has also provided good results compared to PCA. As for LLE, the number of neighbors k has been studied and set to 5. As an indication, Fig. 5 shows the residual variance with subsets of 1 to 10 components. As it can be seen, the residual variance with only one component is much lower than for PCA. In Fig. 6, the two-dimensional embedding is illustrated. From this figure, it appears that the underlying structure is better caught than with LLE, which may suggest that isomap is more efficient on this dataset. Fig. 6. Two-dimensional embedding with ISOMAP 4.5 Spatio-temporal Isomap It is well-known in the crime research community that time series analysis often leads to patterns that reflect police activity rather than underlying criminal behavior. This is especially true in drug profiling research, where the police seizures can vary in time independently of criminal activity. On the other hand, for data such as burglaries, time series analysis could prove more efficient, since the vast majority of events are actually reported. Methods assuming sequential data rather than time-referenced data are perhaps more promising in the field of drug profiling in order to capture true underlying patterns rather than sampling patterns. Spatio-temporal isomap [10] is an extension of isomap for the analysis of sequential data and has been presented by Jenkins and Matarić. Here, the data is of course feature-temporal rather than spatio-temporal. The number of neighbors and the obtained embedding are the same as with isomap. However, the featuretemporal distance matrix is shown in Fig. 7. From this figure, it can be seen that

8 Learning Manifolds in Forensic Data 901 regularities are present in the dataset. Given that the samples cover a period of several years, this data could be used in a predictive point of view and could help understand the organization of distribution networks. This remains the purpose of future study. Fig. 7. Feature-temporal distance matrix 4.6 Robustness Assessment Following these results, the robustness of the two most well-suited methods (KPCA and isomap) was tested using a method similar to that used in [14]. Indeed, few quantitative criteria exist to assess the quality of dimensionality reduction methods, since the reconstruction of patterns in input space is not straightforward and thus limits our ability to measure the accuracy of a given algorithm. The algorithm that has been used follows this outline: 1. Randomly divide the dataset D in three partitions: F, P 1 and P Construct embeddings using F P 1 and F P Compute the mean squared difference (MSD) between both embeddings obtained for F. 4. Repeat the previous steps for a fixed number of iterations. The embeddings were constructed 15 times for kernel PCA and isomap, and the results are summarized in Table 1. Table 1. Normalized mean squared difference for KPCA and isomap Algorithm MSD std (MSD) Kernel PCA Isomap

9 902 F. Ratle et al. It can be observed that here, kernel PCA is considerably more stable than isomap. Isomap, being based on a graph of nearest neighbors, may be more sensitive to random variations in the dataset and could therefore lead to different results with different sets of observations of a given phenomenon. 5 Conclusion Five methods of dimensionality reduction were applied to the problem of chemical profiling of cocaine. The application of PCA showed that linear methods for feature extraction had serious limits in this field of application. Kernel PCA, isomap, locally linear embedding and ST-isomap have demonstrated the presence of simple nonlinear structures that were not detected by conventional PCA. Kernel PCA and isomap have given the best results in terms of an interpretable set of features. However, kernel PCA has shown more robust than isomap. Of course, research by experts in drug profiling will yet have to confirm the relevancy of the obtained results and provide a practical interpretation. Further research will aim at selecting appropriate methods for determination of classes on those low-dimensional structures. This clustering task will enable researchers in the field of crime sciences to determine if distinct production or distribution networks can be put into light by analyzing the data clusters obtained from the chemical composition of the drug seizures. Also, regarding sequential data, other methods could be tested, particularly hidden Markov models. References 1. F. Ratle, A.L. Terrettaz, M. Kanevski, P. Esseiva, O. Ribaux, Pattern analysis in illicit heroin seizures: a novel application of machine learning algorithms, Proc. of the 14 th European Symposium on Artificial Neural Networks, d-side publi., O. Guéniat, P. Esseiva, Le Profilage de l Héroïne et de la Cocaïne, Presses polytechniques et universitaires romandes, Lausanne, P. Esseiva, L. Dujourdy, F. Anglada, F. Taroni, P. Margot, A methodology for illicit drug intelligence perspective using large databases, Forensic Science International, 132: , P. Esseiva, F. Anglada, L. Dujourdy, F. Taroni, P. Margot, E. Du Pasquier, M. Dawson, C. Roux, P. Doble, Chemical profiling and classification of illicit heroin by principal component analysis, calculation of inter sample correlation and artificial neural networks, Talanta, 67: , M.G. Madden, A.G. Ryder, Machine Learning Methods for Quantitative Analysis of Raman Spectroscopy Data, In Proceedings of the International Society for Optical Engineering (SPIE 2002), 4876: , M.L. O Connell, T. Howley, A.G. Ryder, M.G. Madden, Classification of a target analyte in solid mixtures using principal component analysis, support vector machines, and Raman spectroscopy, In Proceedings of the International Society for Optical Engineering (SPIE 2005), 4876: , B. Schölkopf, A. Smola, K.R. Müller, Nonlinear component analysis as a kernel eigenvalue problem, Neural Computation 10: , 1998.

10 Learning Manifolds in Forensic Data S.T. Roweis, L.K. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science 290: , J.B. Tenenbaum, V. de Silva, J.C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science 290: , O.C. Jenkins, M.J. Matarić, A spatio-temporal extension to isomap nonlinear dimension reduction, Proc. of the 21 st International Conference on Machine Learning, D.G. Stork, E. Yom-Tov, Computer Manual in MATLAB to accompany Pattern Classification, Wiley, Hoboken (NJ), R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, 2nd Edition, Wiley, New York, J.B. Kruskal and M. Wish, Multidimensional Scaling, SAGE Publications, Y. Bengio, J.F. Paiement, P. Vincent, O. Delalleau, N. Le Roux and M. Ouimet. Out-of-samples extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering. Advances in Neural Information Processing Systems 16, 2004.

Non-linear dimension reduction

Non-linear dimension reduction Sta306b May 23, 2011 Dimension Reduction: 1 Non-linear dimension reduction ISOMAP: Tenenbaum, de Silva & Langford (2000) Local linear embedding: Roweis & Saul (2000) Local MDS: Chen (2006) all three methods

More information

Robust Pose Estimation using the SwissRanger SR-3000 Camera

Robust Pose Estimation using the SwissRanger SR-3000 Camera Robust Pose Estimation using the SwissRanger SR- Camera Sigurjón Árni Guðmundsson, Rasmus Larsen and Bjarne K. Ersbøll Technical University of Denmark, Informatics and Mathematical Modelling. Building,

More information

SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM. Olga Kouropteva, Oleg Okun and Matti Pietikäinen

SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM. Olga Kouropteva, Oleg Okun and Matti Pietikäinen SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM Olga Kouropteva, Oleg Okun and Matti Pietikäinen Machine Vision Group, Infotech Oulu and Department of Electrical and

More information

Technical Report. Title: Manifold learning and Random Projections for multi-view object recognition

Technical Report. Title: Manifold learning and Random Projections for multi-view object recognition Technical Report Title: Manifold learning and Random Projections for multi-view object recognition Authors: Grigorios Tsagkatakis 1 and Andreas Savakis 2 1 Center for Imaging Science, Rochester Institute

More information

Image Similarities for Learning Video Manifolds. Selen Atasoy MICCAI 2011 Tutorial

Image Similarities for Learning Video Manifolds. Selen Atasoy MICCAI 2011 Tutorial Image Similarities for Learning Video Manifolds Selen Atasoy MICCAI 2011 Tutorial Image Spaces Image Manifolds Tenenbaum2000 Roweis2000 Tenenbaum2000 [Tenenbaum2000: J. B. Tenenbaum, V. Silva, J. C. Langford:

More information

Isometric Mapping Hashing

Isometric Mapping Hashing Isometric Mapping Hashing Yanzhen Liu, Xiao Bai, Haichuan Yang, Zhou Jun, and Zhihong Zhang Springer-Verlag, Computer Science Editorial, Tiergartenstr. 7, 692 Heidelberg, Germany {alfred.hofmann,ursula.barth,ingrid.haas,frank.holzwarth,

More information

Investigation into the use of PCA with Machine Learning for the Identification of Narcotics based on Raman Spectroscopy

Investigation into the use of PCA with Machine Learning for the Identification of Narcotics based on Raman Spectroscopy Investigation into the use of PCA with Machine Learning for the Identification of Narcotics based on Raman Spectroscopy Tom Howley, Michael G. Madden, Marie-Louise O Connell and Alan G. Ryder National

More information

Dimension Reduction of Image Manifolds

Dimension Reduction of Image Manifolds Dimension Reduction of Image Manifolds Arian Maleki Department of Electrical Engineering Stanford University Stanford, CA, 9435, USA E-mail: arianm@stanford.edu I. INTRODUCTION Dimension reduction of datasets

More information

Neighbor Line-based Locally linear Embedding

Neighbor Line-based Locally linear Embedding Neighbor Line-based Locally linear Embedding De-Chuan Zhan and Zhi-Hua Zhou National Laboratory for Novel Software Technology Nanjing University, Nanjing 210093, China {zhandc, zhouzh}@lamda.nju.edu.cn

More information

Localization from Pairwise Distance Relationships using Kernel PCA

Localization from Pairwise Distance Relationships using Kernel PCA Center for Robotics and Embedded Systems Technical Report Localization from Pairwise Distance Relationships using Kernel PCA Odest Chadwicke Jenkins cjenkins@usc.edu 1 Introduction In this paper, we present

More information

Remote Sensing Data Classification Using Combined Spectral and Spatial Local Linear Embedding (CSSLE)

Remote Sensing Data Classification Using Combined Spectral and Spatial Local Linear Embedding (CSSLE) 2016 International Conference on Artificial Intelligence and Computer Science (AICS 2016) ISBN: 978-1-60595-411-0 Remote Sensing Data Classification Using Combined Spectral and Spatial Local Linear Embedding

More information

Supervised Variable Clustering for Classification of NIR Spectra

Supervised Variable Clustering for Classification of NIR Spectra Supervised Variable Clustering for Classification of NIR Spectra Catherine Krier *, Damien François 2, Fabrice Rossi 3, Michel Verleysen, Université catholique de Louvain, Machine Learning Group, place

More information

Learning a Manifold as an Atlas Supplementary Material

Learning a Manifold as an Atlas Supplementary Material Learning a Manifold as an Atlas Supplementary Material Nikolaos Pitelis Chris Russell School of EECS, Queen Mary, University of London [nikolaos.pitelis,chrisr,lourdes]@eecs.qmul.ac.uk Lourdes Agapito

More information

Non-Local Manifold Tangent Learning

Non-Local Manifold Tangent Learning Non-Local Manifold Tangent Learning Yoshua Bengio and Martin Monperrus Dept. IRO, Université de Montréal P.O. Box 1, Downtown Branch, Montreal, H3C 3J7, Qc, Canada {bengioy,monperrm}@iro.umontreal.ca Abstract

More information

Manifold Learning for Video-to-Video Face Recognition

Manifold Learning for Video-to-Video Face Recognition Manifold Learning for Video-to-Video Face Recognition Abstract. We look in this work at the problem of video-based face recognition in which both training and test sets are video sequences, and propose

More information

Dimension Reduction CS534

Dimension Reduction CS534 Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of

More information

Nonlinear projections. Motivation. High-dimensional. data are. Perceptron) ) or RBFN. Multi-Layer. Example: : MLP (Multi(

Nonlinear projections. Motivation. High-dimensional. data are. Perceptron) ) or RBFN. Multi-Layer. Example: : MLP (Multi( Nonlinear projections Université catholique de Louvain (Belgium) Machine Learning Group http://www.dice.ucl ucl.ac.be/.ac.be/mlg/ 1 Motivation High-dimensional data are difficult to represent difficult

More information

Large-Scale Face Manifold Learning

Large-Scale Face Manifold Learning Large-Scale Face Manifold Learning Sanjiv Kumar Google Research New York, NY * Joint work with A. Talwalkar, H. Rowley and M. Mohri 1 Face Manifold Learning 50 x 50 pixel faces R 2500 50 x 50 pixel random

More information

Sensitivity to parameter and data variations in dimensionality reduction techniques

Sensitivity to parameter and data variations in dimensionality reduction techniques Sensitivity to parameter and data variations in dimensionality reduction techniques Francisco J. García-Fernández 1,2,MichelVerleysen 2, John A. Lee 3 and Ignacio Díaz 1 1- Univ. of Oviedo - Department

More information

Lecture Topic Projects

Lecture Topic Projects Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, basic tasks, data types 3 Introduction to D3, basic vis techniques for non-spatial data Project #1 out 4 Data

More information

Locality Preserving Projections (LPP) Abstract

Locality Preserving Projections (LPP) Abstract Locality Preserving Projections (LPP) Xiaofei He Partha Niyogi Computer Science Department Computer Science Department The University of Chicago The University of Chicago Chicago, IL 60615 Chicago, IL

More information

Generalized Principal Component Analysis CVPR 2007

Generalized Principal Component Analysis CVPR 2007 Generalized Principal Component Analysis Tutorial @ CVPR 2007 Yi Ma ECE Department University of Illinois Urbana Champaign René Vidal Center for Imaging Science Institute for Computational Medicine Johns

More information

Manifold Clustering. Abstract. 1. Introduction

Manifold Clustering. Abstract. 1. Introduction Manifold Clustering Richard Souvenir and Robert Pless Washington University in St. Louis Department of Computer Science and Engineering Campus Box 1045, One Brookings Drive, St. Louis, MO 63130 {rms2,

More information

Head Frontal-View Identification Using Extended LLE

Head Frontal-View Identification Using Extended LLE Head Frontal-View Identification Using Extended LLE Chao Wang Center for Spoken Language Understanding, Oregon Health and Science University Abstract Automatic head frontal-view identification is challenging

More information

School of Computer and Communication, Lanzhou University of Technology, Gansu, Lanzhou,730050,P.R. China

School of Computer and Communication, Lanzhou University of Technology, Gansu, Lanzhou,730050,P.R. China Send Orders for Reprints to reprints@benthamscienceae The Open Automation and Control Systems Journal, 2015, 7, 253-258 253 Open Access An Adaptive Neighborhood Choosing of the Local Sensitive Discriminant

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Locality Preserving Projections (LPP) Abstract

Locality Preserving Projections (LPP) Abstract Locality Preserving Projections (LPP) Xiaofei He Partha Niyogi Computer Science Department Computer Science Department The University of Chicago The University of Chicago Chicago, IL 60615 Chicago, IL

More information

Differential Structure in non-linear Image Embedding Functions

Differential Structure in non-linear Image Embedding Functions Differential Structure in non-linear Image Embedding Functions Robert Pless Department of Computer Science, Washington University in St. Louis pless@cse.wustl.edu Abstract Many natural image sets are samples

More information

A Stochastic Optimization Approach for Unsupervised Kernel Regression

A Stochastic Optimization Approach for Unsupervised Kernel Regression A Stochastic Optimization Approach for Unsupervised Kernel Regression Oliver Kramer Institute of Structural Mechanics Bauhaus-University Weimar oliver.kramer@uni-weimar.de Fabian Gieseke Institute of Structural

More information

DEVELOPMENT OF NEURAL NETWORK TRAINING METHODOLOGY FOR MODELING NONLINEAR SYSTEMS WITH APPLICATION TO THE PREDICTION OF THE REFRACTIVE INDEX

DEVELOPMENT OF NEURAL NETWORK TRAINING METHODOLOGY FOR MODELING NONLINEAR SYSTEMS WITH APPLICATION TO THE PREDICTION OF THE REFRACTIVE INDEX DEVELOPMENT OF NEURAL NETWORK TRAINING METHODOLOGY FOR MODELING NONLINEAR SYSTEMS WITH APPLICATION TO THE PREDICTION OF THE REFRACTIVE INDEX THESIS CHONDRODIMA EVANGELIA Supervisor: Dr. Alex Alexandridis,

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Do Something..

More information

ECG782: Multidimensional Digital Signal Processing

ECG782: Multidimensional Digital Signal Processing ECG782: Multidimensional Digital Signal Processing Object Recognition http://www.ee.unlv.edu/~b1morris/ecg782/ 2 Outline Knowledge Representation Statistical Pattern Recognition Neural Networks Boosting

More information

A Taxonomy of Semi-Supervised Learning Algorithms

A Taxonomy of Semi-Supervised Learning Algorithms A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph

More information

Hyperspectral image segmentation using spatial-spectral graphs

Hyperspectral image segmentation using spatial-spectral graphs Hyperspectral image segmentation using spatial-spectral graphs David B. Gillis* and Jeffrey H. Bowles Naval Research Laboratory, Remote Sensing Division, Washington, DC 20375 ABSTRACT Spectral graph theory

More information

How to project circular manifolds using geodesic distances?

How to project circular manifolds using geodesic distances? How to project circular manifolds using geodesic distances? John Aldo Lee, Michel Verleysen Université catholique de Louvain Place du Levant, 3, B-1348 Louvain-la-Neuve, Belgium {lee,verleysen}@dice.ucl.ac.be

More information

Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation

Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation Lori Cillo, Attebury Honors Program Dr. Rajan Alex, Mentor West Texas A&M University Canyon, Texas 1 ABSTRACT. This work is

More information

Extended Isomap for Pattern Classification

Extended Isomap for Pattern Classification From: AAAI- Proceedings. Copyright, AAAI (www.aaai.org). All rights reserved. Extended for Pattern Classification Ming-Hsuan Yang Honda Fundamental Research Labs Mountain View, CA 944 myang@hra.com Abstract

More information

Local Linear Approximation for Kernel Methods: The Railway Kernel

Local Linear Approximation for Kernel Methods: The Railway Kernel Local Linear Approximation for Kernel Methods: The Railway Kernel Alberto Muñoz 1,JavierGonzález 1, and Isaac Martín de Diego 1 University Carlos III de Madrid, c/ Madrid 16, 890 Getafe, Spain {alberto.munoz,

More information

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Robust Kernel Methods in Clustering and Dimensionality Reduction Problems Jian Guo, Debadyuti Roy, Jing Wang University of Michigan, Department of Statistics Introduction In this report we propose robust

More information

Selecting Models from Videos for Appearance-Based Face Recognition

Selecting Models from Videos for Appearance-Based Face Recognition Selecting Models from Videos for Appearance-Based Face Recognition Abdenour Hadid and Matti Pietikäinen Machine Vision Group Infotech Oulu and Department of Electrical and Information Engineering P.O.

More information

ABSTRACT. Keywords: visual training, unsupervised learning, lumber inspection, projection 1. INTRODUCTION

ABSTRACT. Keywords: visual training, unsupervised learning, lumber inspection, projection 1. INTRODUCTION Comparison of Dimensionality Reduction Methods for Wood Surface Inspection Matti Niskanen and Olli Silvén Machine Vision Group, Infotech Oulu, University of Oulu, Finland ABSTRACT Dimensionality reduction

More information

Non-Local Estimation of Manifold Structure

Non-Local Estimation of Manifold Structure Non-Local Estimation of Manifold Structure Yoshua Bengio, Martin Monperrus and Hugo Larochelle Département d Informatique et Recherche Opérationnelle Centre de Recherches Mathématiques Université de Montréal

More information

Data-Dependent Kernels for High-Dimensional Data Classification

Data-Dependent Kernels for High-Dimensional Data Classification Proceedings of International Joint Conference on Neural Networks, Montreal, Canada, July 31 - August 4, 2005 Data-Dependent Kernels for High-Dimensional Data Classification Jingdong Wang James T. Kwok

More information

Discovering Shared Structure in Manifold Learning

Discovering Shared Structure in Manifold Learning Discovering Shared Structure in Manifold Learning Yoshua Bengio and Martin Monperrus Dept. IRO, Université de Montréal P.O. Box 1, Downtown Branch, Montreal, H3C 3J7, Qc, Canada {bengioy,monperrm}@iro.umontreal.ca

More information

A supervised strategy for deep kernel machine

A supervised strategy for deep kernel machine A supervised strategy for deep kernel machine Florian Yger, Maxime Berar, Gilles Gasso and Alain Rakotomamonjy LITIS EA 4108 - Université de Rouen/ INSA de Rouen, 76800 Saint Etienne du Rouvray - France

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018 SUPERVISED LEARNING METHODS Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018 2 CHOICE OF ML You cannot know which algorithm will work

More information

Support Vector Machines

Support Vector Machines Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

Recursive Similarity-Based Algorithm for Deep Learning

Recursive Similarity-Based Algorithm for Deep Learning Recursive Similarity-Based Algorithm for Deep Learning Tomasz Maszczyk 1 and Włodzisław Duch 1,2 1 Department of Informatics, Nicolaus Copernicus University Grudzia dzka 5, 87-100 Toruń, Poland 2 School

More information

Support Vector Machines

Support Vector Machines Support Vector Machines RBF-networks Support Vector Machines Good Decision Boundary Optimization Problem Soft margin Hyperplane Non-linear Decision Boundary Kernel-Trick Approximation Accurancy Overtraining

More information

Cluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6

Cluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6 Cluster Analysis and Visualization Workshop on Statistics and Machine Learning 2004/2/6 Outlines Introduction Stages in Clustering Clustering Analysis and Visualization One/two-dimensional Data Histogram,

More information

A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation. Kwanyong Lee 1 and Hyeyoung Park 2

A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation. Kwanyong Lee 1 and Hyeyoung Park 2 A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation Kwanyong Lee 1 and Hyeyoung Park 2 1. Department of Computer Science, Korea National Open

More information

Kernel PCA in nonlinear visualization of a healthy and a faulty planetary gearbox data

Kernel PCA in nonlinear visualization of a healthy and a faulty planetary gearbox data Kernel PCA in nonlinear visualization of a healthy and a faulty planetary gearbox data Anna M. Bartkowiak 1, Radoslaw Zimroz 2 1 Wroclaw University, Institute of Computer Science, 50-383, Wroclaw, Poland,

More information

Classification Performance related to Intrinsic Dimensionality in Mammographic Image Analysis

Classification Performance related to Intrinsic Dimensionality in Mammographic Image Analysis Classification Performance related to Intrinsic Dimensionality in Mammographic Image Analysis Harry Strange a and Reyer Zwiggelaar a a Department of Computer Science, Aberystwyth University, SY23 3DB,

More information

Advanced Machine Learning Practical 1: Manifold Learning (PCA and Kernel PCA)

Advanced Machine Learning Practical 1: Manifold Learning (PCA and Kernel PCA) Advanced Machine Learning Practical : Manifold Learning (PCA and Kernel PCA) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch, nadia.figueroafernandez@epfl.ch

More information

Facial Expression Classification with Random Filters Feature Extraction

Facial Expression Classification with Random Filters Feature Extraction Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course

More information

Self-Organizing Maps for Analysis of Expandable Polystyrene Batch Process

Self-Organizing Maps for Analysis of Expandable Polystyrene Batch Process International Journal of Computers, Communications & Control Vol. II (2007), No. 2, pp. 143-148 Self-Organizing Maps for Analysis of Expandable Polystyrene Batch Process Mikko Heikkinen, Ville Nurminen,

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 26.04.206 Discriminative Approaches (5 weeks) Linear

More information

RDRToolbox A package for nonlinear dimension reduction with Isomap and LLE.

RDRToolbox A package for nonlinear dimension reduction with Isomap and LLE. RDRToolbox A package for nonlinear dimension reduction with Isomap and LLE. Christoph Bartenhagen October 30, 2017 Contents 1 Introduction 1 1.1 Loading the package......................................

More information

Kernel Methods and Visualization for Interval Data Mining

Kernel Methods and Visualization for Interval Data Mining Kernel Methods and Visualization for Interval Data Mining Thanh-Nghi Do 1 and François Poulet 2 1 College of Information Technology, Can Tho University, 1 Ly Tu Trong Street, Can Tho, VietNam (e-mail:

More information

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis

A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis A Weighted Majority Voting based on Normalized Mutual Information for Cluster Analysis Meshal Shutaywi and Nezamoddin N. Kachouie Department of Mathematical Sciences, Florida Institute of Technology Abstract

More information

SELECTION OF A MULTIVARIATE CALIBRATION METHOD

SELECTION OF A MULTIVARIATE CALIBRATION METHOD SELECTION OF A MULTIVARIATE CALIBRATION METHOD 0. Aim of this document Different types of multivariate calibration methods are available. The aim of this document is to help the user select the proper

More information

Data fusion and multi-cue data matching using diffusion maps

Data fusion and multi-cue data matching using diffusion maps Data fusion and multi-cue data matching using diffusion maps Stéphane Lafon Collaborators: Raphy Coifman, Andreas Glaser, Yosi Keller, Steven Zucker (Yale University) Part of this work was supported by

More information

Advanced Machine Learning Practical 2: Manifold Learning + Clustering (Spectral Clustering and Kernel K-Means)

Advanced Machine Learning Practical 2: Manifold Learning + Clustering (Spectral Clustering and Kernel K-Means) Advanced Machine Learning Practical : Manifold Learning + Clustering (Spectral Clustering and Kernel K-Means) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails:

More information

Robot Manifolds for Direct and Inverse Kinematics Solutions

Robot Manifolds for Direct and Inverse Kinematics Solutions Robot Manifolds for Direct and Inverse Kinematics Solutions Bruno Damas Manuel Lopes Abstract We present a novel algorithm to estimate robot kinematic manifolds incrementally. We relate manifold learning

More information

Curvilinear Distance Analysis versus Isomap

Curvilinear Distance Analysis versus Isomap Curvilinear Distance Analysis versus Isomap John Aldo Lee, Amaury Lendasse, Michel Verleysen Université catholique de Louvain Place du Levant, 3, B-1348 Louvain-la-Neuve, Belgium {lee,verleysen}@dice.ucl.ac.be,

More information

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER

MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER MULTIVARIATE TEXTURE DISCRIMINATION USING A PRINCIPAL GEODESIC CLASSIFIER A.Shabbir 1, 2 and G.Verdoolaege 1, 3 1 Department of Applied Physics, Ghent University, B-9000 Ghent, Belgium 2 Max Planck Institute

More information

Unsupervised Learning

Unsupervised Learning Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised

More information

Nearest Cluster Classifier

Nearest Cluster Classifier Nearest Cluster Classifier Hamid Parvin, Moslem Mohamadi, Sajad Parvin, Zahra Rezaei, and Behrouz Minaei Nourabad Mamasani Branch, Islamic Azad University, Nourabad Mamasani, Iran hamidparvin@mamasaniiau.ac.ir,

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover

More information

Facing Non-Convex Optimization to Scale Machine Learning to AI

Facing Non-Convex Optimization to Scale Machine Learning to AI Facing Non-Convex Optimization to Scale Machine Learning to AI October 10th 2006 Thanks to: Yann Le Cun, Geoffrey Hinton, Pascal Lamblin, Olivier Delalleau, Nicolas Le Roux, Hugo Larochelle Machine Learning

More information

Linear and Non-linear Dimentionality Reduction Applied to Gene Expression Data of Cancer Tissue Samples

Linear and Non-linear Dimentionality Reduction Applied to Gene Expression Data of Cancer Tissue Samples Linear and Non-linear Dimentionality Reduction Applied to Gene Expression Data of Cancer Tissue Samples Franck Olivier Ndjakou Njeunje Applied Mathematics, Statistics, and Scientific Computation University

More information

Generalized Autoencoder: A Neural Network Framework for Dimensionality Reduction

Generalized Autoencoder: A Neural Network Framework for Dimensionality Reduction Generalized Autoencoder: A Neural Network Framework for Dimensionality Reduction Wei Wang 1, Yan Huang 1, Yizhou Wang 2, Liang Wang 1 1 Center for Research on Intelligent Perception and Computing, CRIPAC

More information

CIE L*a*b* color model

CIE L*a*b* color model CIE L*a*b* color model To further strengthen the correlation between the color model and human perception, we apply the following non-linear transformation: with where (X n,y n,z n ) are the tristimulus

More information

Support Vector Regression for Software Reliability Growth Modeling and Prediction

Support Vector Regression for Software Reliability Growth Modeling and Prediction Support Vector Regression for Software Reliability Growth Modeling and Prediction 925 Fei Xing 1 and Ping Guo 2 1 Department of Computer Science Beijing Normal University, Beijing 100875, China xsoar@163.com

More information

Feature selection and dimensionality reduction of neural data to model adaptation in the retina

Feature selection and dimensionality reduction of neural data to model adaptation in the retina Feature selection and dimensionality reduction of neural data to model adaptation in the retina Neda Nategh Electrical Engineering Department, Stanford University Stanford, CA 9435 nnategh@stanford.edu

More information

Non-Local Estimation of Manifold Structure

Non-Local Estimation of Manifold Structure Neural Computation Manuscript #3171 Non-Local Estimation of Manifold Structure Yoshua Bengio, Martin Monperrus and Hugo Larochelle Département d Informatique et Recherche Opérationnelle Centre de Recherches

More information

Geodesic Based Ink Separation for Spectral Printing

Geodesic Based Ink Separation for Spectral Printing Geodesic Based Ink Separation for Spectral Printing Behnam Bastani*, Brian Funt**, *Hewlett-Packard Company, San Diego, CA, USA **Simon Fraser University, Vancouver, BC, Canada Abstract An ink separation

More information

Modelling Data Segmentation for Image Retrieval Systems

Modelling Data Segmentation for Image Retrieval Systems Modelling Data Segmentation for Image Retrieval Systems Leticia Flores-Pulido 1,2, Oleg Starostenko 1, Gustavo Rodríguez-Gómez 3 and Vicente Alarcón-Aquino 1 1 Universidad de las Américas Puebla, Puebla,

More information

Gender Classification of Human Faces

Gender Classification of Human Faces Presented at: BMCV*02, Tübingen, FRG, 22-24 November 2002. Published in: Biologically Motivated Computer Vision, eds. H.H. Bülthoff, S.-W. Lee, T.A. Poggio and C. Wallraven, LNCS 2525, pages 491-1, 2002,

More information

Locally Linear Landmarks for large-scale manifold learning

Locally Linear Landmarks for large-scale manifold learning Locally Linear Landmarks for large-scale manifold learning Max Vladymyrov and Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced http://eecs.ucmerced.edu

More information

Feature selection. Term 2011/2012 LSI - FIB. Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/ / 22

Feature selection. Term 2011/2012 LSI - FIB. Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/ / 22 Feature selection Javier Béjar cbea LSI - FIB Term 2011/2012 Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/2012 1 / 22 Outline 1 Dimensionality reduction 2 Projections 3 Attribute selection

More information

A Discriminative Non-Linear Manifold Learning Technique for Face Recognition

A Discriminative Non-Linear Manifold Learning Technique for Face Recognition A Discriminative Non-Linear Manifold Learning Technique for Face Recognition Bogdan Raducanu 1 and Fadi Dornaika 2,3 1 Computer Vision Center, 08193 Bellaterra, Barcelona, Spain bogdan@cvc.uab.es 2 IKERBASQUE,

More information

Chemometrics. Description of Pirouette Algorithms. Technical Note. Abstract

Chemometrics. Description of Pirouette Algorithms. Technical Note. Abstract 19-1214 Chemometrics Technical Note Description of Pirouette Algorithms Abstract This discussion introduces the three analysis realms available in Pirouette and briefly describes each of the algorithms

More information

A Developmental Framework for Visual Learning in Robotics

A Developmental Framework for Visual Learning in Robotics A Developmental Framework for Visual Learning in Robotics Amol Ambardekar 1, Alireza Tavakoli 2, Mircea Nicolescu 1, and Monica Nicolescu 1 1 Department of Computer Science and Engineering, University

More information

CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr.

CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr. CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr. Michael Nechyba 1. Abstract The objective of this project is to apply well known

More information

A Supervised Non-linear Dimensionality Reduction Approach for Manifold Learning

A Supervised Non-linear Dimensionality Reduction Approach for Manifold Learning A Supervised Non-linear Dimensionality Reduction Approach for Manifold Learning B. Raducanu 1 and F. Dornaika 2,3 1 Computer Vision Center, Barcelona, SPAIN 2 Department of Computer Science and Artificial

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo CSE 6242 A / CX 4242 DVA March 6, 2014 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Analyze! Limited memory size! Data may not be fitted to the memory of your machine! Slow computation!

More information

Globally and Locally Consistent Unsupervised Projection

Globally and Locally Consistent Unsupervised Projection Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence Globally and Locally Consistent Unsupervised Projection Hua Wang, Feiping Nie, Heng Huang Department of Electrical Engineering

More information

Title: Optimized multilayer perceptrons for molecular classification and diagnosis using genomic data

Title: Optimized multilayer perceptrons for molecular classification and diagnosis using genomic data Supplementary material for Manuscript BIOINF-2005-1602 Title: Optimized multilayer perceptrons for molecular classification and diagnosis using genomic data Appendix A. Testing K-Nearest Neighbor and Support

More information

CPSC 340: Machine Learning and Data Mining. Multi-Dimensional Scaling Fall 2017

CPSC 340: Machine Learning and Data Mining. Multi-Dimensional Scaling Fall 2017 CPSC 340: Machine Learning and Data Mining Multi-Dimensional Scaling Fall 2017 Assignment 4: Admin 1 late day for tonight, 2 late days for Wednesday. Assignment 5: Due Monday of next week. Final: Details

More information

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

A Vector Agent-Based Unsupervised Image Classification for High Spatial Resolution Satellite Imagery

A Vector Agent-Based Unsupervised Image Classification for High Spatial Resolution Satellite Imagery A Vector Agent-Based Unsupervised Image Classification for High Spatial Resolution Satellite Imagery K. Borna 1, A. B. Moore 2, P. Sirguey 3 School of Surveying University of Otago PO Box 56, Dunedin,

More information

Non-linear CCA and PCA by Alignment of Local Models

Non-linear CCA and PCA by Alignment of Local Models Non-linear CCA and PCA by Alignment of Local Models Jakob J. Verbeek, Sam T. Roweis, and Nikos Vlassis Informatics Institute, University of Amsterdam Department of Computer Science,University of Toronto

More information

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\ Data Preprocessing Javier Béjar BY: $\ URL - Spring 2018 C CS - MAI 1/78 Introduction Data representation Unstructured datasets: Examples described by a flat set of attributes: attribute-value matrix Structured

More information