Similar documents
Applied Neuroscience. Columbia Science Honors Program Fall Machine Learning and Neural Networks

Robust Face Recognition via Sparse Representation Authors: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma

Principal Component Image Interpretation A Logical and Statistical Approach

USING PRINCIPAL COMPONENTS ANALYSIS FOR AGGREGATING JUDGMENTS IN THE ANALYTIC HIERARCHY PROCESS

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

Biology Project 1

Linear Methods for Regression and Shrinkage Methods

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

General Instructions. Questions

JMP Book Descriptions

Recognition: Face Recognition. Linda Shapiro EE/CSE 576

( ) =cov X Y = W PRINCIPAL COMPONENT ANALYSIS. Eigenvectors of the covariance matrix are the principal components

An array is a collection of data that holds fixed number of values of same type. It is also known as a set. An array is a data type.

Dimension Reduction CS534

CSE 481C Imitation Learning in Humanoid Robots Motion capture, inverse kinematics, and dimensionality reduction

Chapter 1. Using the Cluster Analysis. Background Information

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Multivariate Capability Analysis

Clustering analysis of gene expression data

Clustering and Visualisation of Data

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS

Classroom Tips and Techniques: Stepwise Solutions in Maple - Part 2 - Linear Algebra

Dimension reduction : PCA and Clustering

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

Digital Image Processing Chapter 11: Image Description and Representation

Lecture 07 Dimensionality Reduction with PCA

Lecture 3: Camera Calibration, DLT, SVD

The EMCLUS Procedure. The EMCLUS Procedure

Chapter 13 Multivariate Techniques. Chapter Table of Contents

6 Randomized rounding of semidefinite programs

Indirect Pairwise Comparison Method

Unsupervised Learning

Image Processing. Image Features

Appendix B BASIC MATRIX OPERATIONS IN PROC IML B.1 ASSIGNING SCALARS

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

Multivariate analyses in ecology. Cluster (part 2) Ordination (part 1 & 2)

A. Incorrect! This would be the negative of the range. B. Correct! The range is the maximum data value minus the minimum data value.

1 2 (3 + x 3) x 2 = 1 3 (3 + x 1 2x 3 ) 1. 3 ( 1 x 2) (3 + x(0) 3 ) = 1 2 (3 + 0) = 3. 2 (3 + x(0) 1 2x (0) ( ) = 1 ( 1 x(0) 2 ) = 1 3 ) = 1 3

Network Traffic Measurements and Analysis

Workload Characterization Techniques

Rotation Perturbation Technique for Privacy Preserving in Data Stream Mining

Feature selection. Term 2011/2012 LSI - FIB. Javier Béjar cbea (LSI - FIB) Feature selection Term 2011/ / 22

Feature Selection Using Principal Feature Analysis

Recognition, SVD, and PCA

DESIGNING ALGORITHMS FOR SEARCHING FOR OPTIMAL/TWIN POINTS OF SALE IN EXPANSION STRATEGIES FOR GEOMARKETING TOOLS

SVD ANALYSIS OF A MACHINED SURFACE IMAGE FOR THE TOOL WEAR ESTIMATION

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010

Image Compression with Singular Value Decomposition & Correlation: a Graphical Analysis

Inf2B assignment 2. Natural images classification. Hiroshi Shimodaira and Pol Moreno. Submission due: 4pm, Wednesday 30 March 2016.

PERFORMANCE OF THE DISTRIBUTED KLT AND ITS APPROXIMATE IMPLEMENTATION

Multivariate Normal Random Numbers

Version 2.4 of Idiogrid

Basics of Multivariate Modelling and Data Analysis

CSE 258 Lecture 5. Web Mining and Recommender Systems. Dimensionality Reduction

2 Second Derivatives. As we have seen, a function f (x, y) of two variables has four different partial derivatives: f xx. f yx. f x y.

Clustering in Networks

SAS/STAT 13.2 User s Guide. The VARCLUS Procedure

Face Recognition for Mobile Devices

Work 2. Case-based reasoning exercise

Chapter 15 Introduction to Linear Programming

Multivariate Methods

Introductory Concepts for Voxel-Based Statistical Analysis

ECONOMIC DESIGN OF STATISTICAL PROCESS CONTROL USING PRINCIPAL COMPONENTS ANALYSIS AND THE SIMPLICIAL DEPTH RANK CONTROL CHART

Design and Performance Improvements for Fault Detection in Tightly-Coupled Multi-Robot Team Tasks

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

SAS/STAT 13.1 User s Guide. The NESTED Procedure

Spectral Classification

SGN (4 cr) Chapter 10

Data Analysis Guidelines

CSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning

MATLAB COURSE FALL 2004 SESSION 1 GETTING STARTED. Christian Daude 1

MATLAB Blocks September 2006

NAG Library Function Document nag_mv_prin_comp (g03aac)

Lecture 25 Nonlinear Programming. November 9, 2009

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Principal Components Analysis with Spatial Data

NAG Toolbox for MATLAB. g02ef.1

Data preprocessing Functional Programming and Intelligent Algorithms

The Novel Approach for 3D Face Recognition Using Simple Preprocessing Method

If you are not familiar with IMP you should download and read WhatIsImp and CoordGenManual before proceeding.

Facial Expression Detection Using Implemented (PCA) Algorithm

Image Coding with Active Appearance Models

ViSta THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Developing Statistical Objects. Forrest W. Young

Week 7 Picturing Network. Vahe and Bethany

Python Certification Training

CIE L*a*b* color model

Stats fest Multivariate analysis. Multivariate analyses. Aims. Multivariate analyses. Objects. Variables

Face detection and recognition. Many slides adapted from K. Grauman and D. Lowe

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100.

Introduction to Scientific Computing with Matlab

CHAPTER 3 PRINCIPAL COMPONENT ANALYSIS AND FISHER LINEAR DISCRIMINANT ANALYSIS

IDL Primer - Week 1 John Rausch

Fundamentals of Python: First Programs. Chapter 4: Strings (Indexing, Slicing, and Methods)

Lecture 8 Object Descriptors

The Curse of Dimensionality

Unsupervised Learning

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Transcription:

PCOMP http://127.0.0.1:55825/help/topic/com.rsi.idl.doc.core/pcomp... IDL API Reference Guides > IDL Reference Guide > Part I: IDL Command Reference > Routines: P PCOMP Syntax Return Value Arguments Keywords Examples Version History See Also The PCOMP function computes the principal components of an m-column, n-row array, where m is the number of variables and n is the number of observations or samples. The principal components of a multivariate data set may be used to restate the data in terms of derived variables or may be used to reduce the dimensionality of the data by reducing the number of variables (columns). This routine is written in the IDL language. Its source code can be found in the file pcomp.pro in the lib subdirectory of the IDL distribution. Syntax Result = PCOMP( A [, COEFFICIENTS=variable] [, /COVARIANCE] [, /DOUBLE] [, EIGENVALUES=variable] [, NVARIABLES=value] [, /STANDARDIZE] [, VARIANCES=variable] ) Return Value The result is an nvariables-column (nvariables! m), n-row array of derived variables. Arguments A An m-column, n-row, single- or double-precision floating-point array. Keywords COEFFICIENTS Use this keyword to specify a named variable that will contain the principal components used to compute the derived variables. The principal components are the coefficients of the derived variables and are returned in an m-column, m-row array. The rows of this array correspond to the coefficients of the derived variables. The coefficients are scaled so that the sums of their squares are equal to the eigenvalue from which they are computed. COVARIANCE Set this keyword to compute the principal components using the covariances of the original data. The default is to use the correlations of the original data to compute the principal components. DOUBLE Set this keyword to use double-precision for computations and to return a double-precision result. Set DOUBLE=0 to use single-precision for computations and to return a single-precision result. The default is /DOUBLE if Array is double precision, otherwise the default is DOUBLE=0. 1 of 4 3/14/11 2:33 PM

PCOMP http://127.0.0.1:55825/help/topic/com.rsi.idl.doc.core/pcomp... EIGENVALUES Use this keyword to specify a named variable that will contain a one-column, m-row array of eigenvalues that correspond to the principal components. The eigenvalues are listed in descending order. NVARIABLES Use this keyword to specify the number of derived variables. A value of zero, negative values, and values in excess of the input array's column dimension result in a complete set (m-columns and n-rows) of derived variables. STANDARDIZE Set this keyword to convert the variables (the columns) of the input array to standardized variables (variables with a mean of zero and variance of one). VARIANCES Use this keyword to specify a named variable that will contain a one-column, m-row array of variances. The variances correspond to the percentage of the total variance for each derived variable. Examples PRO ex_pcomp ;Define an array with 4 variables and 20 observations. array = [[19.5, 43.1, 29.1, 11.9], $ [24.7, 49.8, 28.2, 22.8], $ [30.7, 51.9, 37.0, 18.7], $ [29.8, 54.3, 31.1, 20.1], $ [19.1, 42.2, 30.9, 12.9], $ [25.6, 53.9, 23.7, 21.7], $ [31.4, 58.5, 27.6, 27.1], $ [27.9, 52.1, 30.6, 25.4], $ [22.1, 49.9, 23.2, 21.3], $ [25.5, 53.5, 24.8, 19.3], $ [31.1, 56.6, 30.0, 25.4], $ [30.4, 56.7, 28.3, 27.2], $ [18.7, 46.5, 23.0, 11.7], $ [19.7, 44.2, 28.6, 17.8], $ [14.6, 42.7, 21.3, 12.8], $ [29.5, 54.4, 30.1, 23.9], $ [27.7, 55.3, 25.7, 22.6], $ [30.2, 58.6, 24.6, 25.4], $ [22.7, 48.2, 27.1, 14.8], $ [25.2, 51.0, 27.5, 21.1]] ;Remove the mean from each variable. m = 4 ; number of variables n = 20 ; number of observations means = TOTAL(array, 2)/n array = array - REBIN(means, m, n) ;Compute derived variables based upon the principal components. result = PCOMP(array, COEFFICIENTS = coefficients, $ EIGENVALUES=eigenvalues, VARIANCES=variances, /COVARIANCE) 2 of 4 3/14/11 2:33 PM

PCOMP http://127.0.0.1:55825/help/topic/com.rsi.idl.doc.core/pcomp... END, 'Result: ', result, FORMAT = '(4(F8.2))', 'Coefficients: ' FOR mode=0,3 DO, $ mode+1, coefficients[*,mode], $ FORMAT='("Mode#",I1,4(F10.4))' eigenvectors = coefficients/rebin(eigenvalues, m, m), 'Eigenvectors: ' FOR mode=0,3 DO, $ mode+1, eigenvectors[*,mode],$ FORMAT='("Mode#",I1,4(F10.4))' array_reconstruct = result ## eigenvectors, 'Reconstruction error: ', $ TOTAL((array_reconstruct - array)^2), 'Energy conservation: ', TOTAL(array^2), $ TOTAL(eigenvalues)*(n-1), ' Mode Eigenvalue PercentVariance' FOR mode=0,3 DO, $ mode+1, eigenvalues[mode], variances[mode]*100 When the above program is compiled and executed, the following output is produced: Result: -107.38 13.40-1.41-0.03 3.20 0.70 5.95-0.02 32.50 38.66-3.87 0.01 40.89 13.79-4.98-0.01-107.24 19.36 1.77 0.02 18.43-17.15-1.47-0.00 99.89-6.23 0.13 0.02 45.38 8.11 6.53-0.01-21.31-18.31 3.75-0.01 5.54-11.17-4.52 0.02 83.14 4.97 0.09 0.01 87.11-3.16 2.81 0.00-101.32-11.78-6.12 0.01-73.07 6.24 6.61 0.02-137.02-19.10 1.33 0.01 57.11 6.96 0.84-0.01 42.13-10.07-2.14 0.01 83.30-16.69-2.72-0.01-54.13 2.56-4.21-0.03 2.84-1.06 1.62-0.01 Coefficients: Mode#1 4.8799 5.0568 1.0282 4.7936 Mode#2 1.0147-0.9545 3.4885-0.7743 Mode#3-0.6183-0.9554 0.2690 1.5796 Mode#4-0.0900 0.0752 0.0472 0.0022 Eigenvectors: Mode#1 0.0665 0.0689 0.0140 0.0653 Mode#2 0.0690-0.0649 0.2372-0.0526 3 of 4 3/14/11 2:33 PM

PCOMP http://127.0.0.1:55825/help/topic/com.rsi.idl.doc.core/pcomp... Mode#3-0.1601-0.2473 0.0697 0.4089 Mode#4-5.6290 4.7013 2.9540 0.1372 Reconstruction error: 1.44876e-010 Energy conservation: 1748.17 1748.17 Mode Eigenvalue PercentVariance 1 73.4205 79.7970 2 14.7099 15.9875 3 3.86271 4.19818 4 0.0159915 0.0173803 The first two derived variables account for 96% of the total variance of the original data. Version History 5.0 Introduced See Also CORRELATE, EIGENQL 4 of 4 3/14/11 2:33 PM

Multivariate Analysis http://127.0.0.1:55825/help/topic/com.rsi.idl.doc.core/multivari... 1 99 79 63 87 249 3 67 41 36 51 114 3 67 41 36 51 114 0 94 191 160 173 124 2 42 108 37 51 41 3 67 41 36 51 114 0 94 191 160 173 124 1 99 79 63 87 249 3 67 41 36 51 114 Samples 0 and 7 contain identical data and are assigned to cluster #1. Samples 1, 2, 5, and 8 contain identical data and are assigned to cluster #3. Samples 3 and 6 contain identical data and are assigned to cluster #0. Sample 4 is unique and is assigned to cluster #2. If this example is run several times, each time computing new cluster weights, it is possible that the cluster number assigned to each grouping of samples may change. Principal Components Analysis Principal components analysis is a mathematical technique which describes a multivariate set of data using derived variables. The derived variables are formulated using specific linear combinations of the original variables. The derived variables are uncorrelated and are computed in decreasing order of importance; the first variable accounts for as much as possible of the variation in the original data, the second variable accounts for the second largest portion of the variation in the original data, and so on. Principal components analysis attempts to construct a small set of derived variables which summarize the original data, thereby reducing the dimensionality of the original data. The principal components of a multivariate set of data are computed from the eigenvalues and eigenvectors of either the sample correlation or sample covariance matrix. If the variables of the multivariate data are measured in widely differing units (large variations in magnitude), it is usually best to use the sample correlation matrix in computing the principal components; this is the default method used in IDL's PCOMP function. Another alternative is to standardize the variables of the multivariate data prior to computing principal components. Standardizing the variables essentially makes them all equally important by creating new variables that each have a mean of zero and a variance of one. Proceeding in this way allows the principal components to be computed from the sample covariance matrix. IDL's PCOMP function includes COVARIANCE and STANDARDIZE keywords to provide this functionality. For example, suppose that we wish to restate the following data using its principal components. There are three variables, each consisting of five samples. Table 7-1: Data for Principal Component Analysis Var 1 Var 2 Var 3 Sample 1 2.0 1.0 3.0 Sample 2 4.0 2.0 3.0 Sample 3 4.0 1.0 0.0 2 of 5 3/14/11 2:32 PM

Multivariate Analysis http://127.0.0.1:55825/help/topic/com.rsi.idl.doc.core/multivari... Sample 4 2.0 3.0 3.0 Sample 5 5.0 1.0 9.0 We compute the principal components (the coefficients of the derived variables) to 2 decimal accuracy and store them by row in the following array. The derived variables {z 1, z 2, z 3 } are then computed as follows: 3 of 5 3/14/11 2:32 PM

Multivariate Analysis http://127.0.0.1:55825/help/topic/com.rsi.idl.doc.core/multivari... In this example, analysis shows that the derived variable z 1 accounts for 57.3% of the total variance of the original data, the derived variable z 2 accounts for 28.2% of the total variance of the original data, and the derived variable z 3 accounts for 14.5% of the total variance of the original data. Example of Derived Variables from Principal Components The following example constructs an appropriate set of derived variables, based upon the principal components of the original data, which may be used to reduce the dimensionality of the data. The data consist of four variables, each containing of twenty samples. ; Define an array with 4 variables and 20 samples: data = [[19.5, 43.1, 29.1, 11.9], $ [24.7, 49.8, 28.2, 22.8], $ [30.7, 51.9, 37.0, 18.7], $ [29.8, 54.3, 31.1, 20.1], $ [19.1, 42.2, 30.9, 12.9], $ [25.6, 53.9, 23.7, 21.7], $ [31.4, 58.5, 27.6, 27.1], $ [27.9, 52.1, 30.6, 25.4], $ [22.1, 49.9, 23.2, 21.3], $ [25.5, 53.5, 24.8, 19.3], $ [31.1, 56.6, 30.0, 25.4], $ [30.4, 56.7, 28.3, 27.2], $ [18.7, 46.5, 23.0, 11.7], $ [19.7, 44.2, 28.6, 17.8], $ [14.6, 42.7, 21.3, 12.8], $ [29.5, 54.4, 30.1, 23.9], $ [27.7, 55.3, 25.7, 22.6], $ [30.2, 58.6, 24.6, 25.4], $ [22.7, 48.2, 27.1, 14.8], $ [25.2, 51.0, 27.5, 21.1]] The variables that will contain the values returned by the COEFFICIENTS, EIGENVALUES, and VARIANCES keywords to the PCOMP routine must be initialized as nonzero values prior to calling PCOMP. coef = 1 & eval = 1 & var = 1 ; Compute the derived variables based upon ; the principal components. result = PCOMP(data, COEFFICIENTS = coef, $ EIGENVALUES = eval, VARIANCES = var) ; Display the array of derived variables:, result, FORMAT = '(4(f5.1, 2x))' 4 of 5 3/14/11 2:32 PM

Multivariate Analysis http://127.0.0.1:55825/help/topic/com.rsi.idl.doc.core/multivari... IDL prints: 81.4 15.5-5.5 0.5 102.7 11.1-4.1 0.6 109.9 20.3-6.2 0.5 110.5 13.8-6.3 0.6 81.8 17.1-4.9 0.6 104.8 6.2-5.4 0.6 121.3 8.1-5.2 0.6 111.3 12.6-4.0 0.6 97.0 6.4-4.4 0.6 102.5 7.8-6.1 0.6 118.5 11.2-5.3 0.6 118.9 9.1-4.7 0.6 81.5 8.8-6.3 0.6 88.0 13.4-3.9 0.6 74.3 7.5-4.8 0.6 113.4 12.0-5.1 0.6 109.7 7.7-5.6 0.6 117.5 5.5-5.7 0.6 91.4 12.0-6.1 0.6 102.5 10.6-4.9 0.6 Display the percentage of total variance for each derived variable:, var IDL prints: 0.712422 0.250319 0.0370950 0.000164269 Display the percentage of variance for the first two derived variables; the first two columns of the resulting array above., TOTAL(var[0:1]) IDL prints: 0.962741 This indicates that the first two derived variables (the first two columns of the resulting array) account for 96.3% of the total variance of the original data, and thus could be used to summarize the original data. Routines for Multivariate Analysis See Multivariate Analysis (in the functional category "Mathematics" (IDL Reference Guide)) for a brief description of IDL routines for multivariate analysis. Detailed information is available in the IDL Reference Guide. 5 of 5 3/14/11 2:32 PM