Chapter II Multiple Correspondance Analysis (MCA)
|
|
- Paul Brent McGee
- 6 years ago
- Views:
Transcription
1 Chapter II Multiple Correspondance Analysis (MCA) Master MMAS - University of Bordeaux Marie Chavent Chapitre 2 MCA 1/52
2 Introduction How to get information from a categorical data table of individuals variables? Example : categorical data table where 27 dogs are described on 6 variables #load("chiensrdata") load("dogsrdata") print(data[1:8,]) ## Size Weight Velocity Intelligence Affectivity Aggressivness ## Beauceron S++ W+ V++ I+ Af+ Ag+ ## BassetHound S- W- V- I- Af- Ag+ ## GermanShepherd S++ W+ V++ I++ Af+ Ag+ ## Boxer S+ W+ V+ I+ Af+ Ag+ ## Bulldog S- W- V- I+ Af+ Ag- ## BullMastiff S++ W++ V- I++ Af- Ag+ ## Poodle S- W- V+ I++ Af+ Ag- ## Chihuahua S- W- V- I- Af+ Ag- - Which individuals are similar? - Which variables are linked? Chapitre 2 MCA 2/52
3 By looking the matrix of the distances between the individuals? d <- dist(data) asmatrix(d)[1:5,1:5] ## Beauceron BassetHound GermanShepherd Boxer Bulldog ## Beauceron 0 NA NA NA NA ## BassetHound NA 0 NA NA NA ## GermanShepherd NA NA 0 NA NA ## Boxer NA NA NA 0 NA ## Bulldog NA NA NA NA 0 How do we measure the distance between two rows of categorical data? Chapitre 2 MCA 3/52
4 By looking the matrix of the χ 2 of independance between the pairs of variables? p <- ncol(data) ; chi2 <- matrix(na,p,p) ; pval <- matrix(na,p,p) rownames(pval) <- colnames(pval) <- rownames(chi2) <- colnames(chi2) <- colnames(data) for (j in 1:p) for (k in 1:p) { tab <- table(data[,j],data[,k]) chi2[j,k] <- chisqtest(tab)$statistic pval[j,k] <- chisqtest(tab)$pvalue } print(chi2,digit=2) # value of the chi2 statistic ## Size Weight Velocity Intelligence Affectivity Aggressivness ## Size ## Weight ## Velocity ## Intelligence ## Affectivity ## Aggressivness print(round(pval,digit=3),digit=2) # p-value of the test of independance ## Size Weight Velocity Intelligence Affectivity Aggressivness ## Size ## Weight ## Velocity ## Intelligence ## Affectivity ## Aggressivness Chapitre 2 MCA 4/52
5 By applying a multivariate statistical method? Multiple Correspondance Analysis (MCA) gives graphical representation of the distances between the individuals, the links between the categorival variables and the levels library(factominer) res <- MCA(data,graph=FALSE) plot(res,choix="ind",invisible = "var", title="",cex=15) plot(res,choix="ind",invisible = "ind", title="",cex=15) plot(res,choix="var",invisible = "ind", title="",cex=15) Dim 2 (2308%) BassetHound Mastiff Chihuahua Pekingese BullMastiff SaintBernard Teckel GermanMastiff Bulldog Newfoundland CockerSpaniel FoxTerrier FoxHound Levrier Poodle GrdBleuGascon DobermanPointerSetter GermanShepherd Beauceron CollieFrenchSpaniel Boxer Dalmatien BittanySpaniel Labrador Dim 2 (2308%) V W++ S I W Af Ag+ S++ Ag Af+ V++ I+ I++ W+ V+ S Dim 2 (2308%) Weight Velocity Size Intelligence Affectivity Aggressivness Dim 1 (2890%) Dim 1 (2890%) Dim 1 (2890%) Chapitre 2 MCA 5/52
6 MCA is also a method of dimension reduction : it gives a small number of new synthetic numerical variables summarizing the initial variables Categorical data : 8 initial categorical data Numerical data : 3 synthetic numerical variables ## Size Weight Velocity Intelligence Affectivity Aggressivness ## Beauceron S++ W+ V++ I+ Af+ ## Ag+ Dim 1 Dim 2 Dim 3 ## BassetHound S- W- V- I- Af- ## Beauceron Ag ## GermanShepherd S++ W+ V++ I++ Af+ ## BassetHound Ag ## Boxer S+ W+ V+ I+ Af+ ## GermanShepherd Ag ## Bulldog S- W- V- I+ Af+ ## Boxer Ag ## BullMastiff S++ W++ V- I++ Af- ## BulldogAg ## Poodle S- W- V+ I++ Af+ ## BullMastiff Ag ## Chihuahua S- W- V- I- Af+ ## PoodleAg ## Chihuahua MCA is then also a method to transform categorical data into numerical data Chapitre 2 MCA 6/52
7 Plan 1 Basic notions 2 The MCA algorithm 3 Different implementations of MCA 4 Interpretation of the results Chapitre 2 MCA 7/52
8 1 Basic notions Let us consider a data table where n individuals are described on p categorical variables Let : 1 j p 1 i x ij n - X = (x ij) n p denote the original data matrix whith x ij M j and M j the set of the levels of the jth variable - m j = card(m j) denote the number of levels of the jth variable - m = m m p denote the total number of levels Chapitre 2 MCA 8/52
9 Example : categorical data with n = 27 individuals, p = 6 variables and m = 16 levels print(data[1:8,]) ## Size Weight Velocity Intelligence Affectivity Aggressivness ## Beauceron S++ W+ V++ I+ Af+ Ag+ ## BassetHound S- W- V- I- Af- Ag+ ## GermanShepherd S++ W+ V++ I++ Af+ Ag+ ## Boxer S+ W+ V+ I+ Af+ Ag+ ## Bulldog S- W- V- I+ Af+ Ag- ## BullMastiff S++ W++ V- I++ Af- Ag+ ## Poodle S- W- V+ I++ Af+ Ag- ## Chihuahua S- W- V- I- Af+ Ag- Levels of the variables : T-,T+,T++ (taille), P-,P+,P++ (poids), etc Two approaches for recoding the categorical data into numerical data : - build the disjonctive table where each levels is coded is coded as a binary variable, - build the Burt table (anglo-saxon approach) which gathers the contingency tables of all the pairs of variables Chapitre 2 MCA 9/52
10 The disjonctive table describes the n individuals on the m levels : 1 s m 1 K = i k is n total Each column s is the indicator vector of the level s with : { kis = 1 if individual i has level s k is = 0 otherwise Let n s denote the number of individuals having level s n s Chapitre 2 MCA 10/52
11 Disjonctive table of the m = 16 levels library(factominer) K <- tabdisjonctif(data) print(k[1:4,]) ## S- S+ S++ W- W+ W++ V- V+ V++ I- I+ I++ Af- Af+ Ag- Ag+ ## Beauceron ## BassetHound ## GermanShepherd ## Boxer Frequencies n s of the levels : ns <- apply(k,2,sum) print(ns) ## S- S+ S++ W- W+ W++ V- V+ V++ I- I+ I++ Af- Af+ Ag- Ag+ ## Relative frequencies ns n ns <- apply(k,2,sum) n <- nrow(k) print(ns/n) of the levels : ## S- S+ S++ W- W+ W++ V- V+ V++ I- I+ I++ Af- Af+ Ag- Ag+ ## Chapitre 2 MCA 11/52
12 Centered disjonctive table - The n rows of the matrix K (the disjonctive table) define a cloud of n points in R m - Each individual i is weighted by w i and usually w i = 1 n Matrix K of the original recoded data 1 s m 1 i k is n mean n s n Matrix Z of the centered data 1 s m 1 i z is = k is n s n n mean 0 var n s n (1 n s n ) Verify that var(z s ) = ns ns (1 ) where n n zs R n denotes s-th column of Z Chapitre 2 MCA 12/52
13 Distance between two individuals - A weight m s is associated with each level s in order to give more importance to rare levels : m s = n n s - The metric M = diag( n n s, s = 1, m) of the diagonal matrix of the weights of the columns gives : d 2 M(z i, z i ) = = m s=1 m s=1 n n s (z is z i s) 2 n n s (k is k i s) 2 Two individuals are different if they have different levels, with more weight in the distance for rare levels (n s small) Chapitre 2 MCA 13/52
14 Example : ## S- S+ S++ W- W+ W++ V- V+ V++ I- I+ I++ Af- Af+ Ag- Ag+ ## Beauceron ## BassetHound ## GermanShepherd ## Boxer ## Bulldog Relative frequency of the levels : ## S- S+ S++ W- W+ W++ V- V+ V++ I- I+ I++ Af- Af+ Ag- Ag+ ## Squared distance between the two first dogs : dm(z 2 1, z 2) = (0 1) (0 0) (1 1)2 048 Chapitre 2 MCA 14/52
15 Inertia of the disjonctive table We have seen in the slides about the basic notions for PCA that : - centering the data doesn t change the distances between the individuals and then the inertia, - the inertia of a data table is the (weighted) sum of the variances of its columns In the particular case of a disjonctive table K this gives : - I(K) = I(Z) where Z is the centered disjonctive table and : I(Z) = m m svar(z s ), s=1 where m s is the weight of the column (the level) s Chapitre 2 MCA 15/52
16 - This gives when the rows are weighted by 1 and the columns are n weighted by m s = n n s : I(Z) = m (1 ns n ) s=1 In practice : - The contribution of a level s to the inertia of Z is all the more important as the level is rare - Too rare levels are then avoided (by pre-processing for instance) Chapitre 2 MCA 16/52
17 - This gives also : In practice : I(Z) = p (m j 1) j=1 - The contribution of a variable j to the inertia of Z is all the more imporant as its number of levels m j is high - Variables with too different number of levels are then avoided (by pre-processing for instance) Chapitre 2 MCA 17/52
18 - This gives also : I(Z) = m p Example of the dogs : #number of variables ncol(data) ## [1] 6 #number of levels ncol(k) ## [1] 16 I(Z) = 16 6 = 10 Chapitre 2 MCA 18/52
19 The correlation ratio The link between a numerical variable y and a categorical variable x is often measure by : η 2 (y x) = var(ȳ x) var(y) n i=1 n s m s=1 = n (ȳ s ȳ) 2 1 (yi ȳ)2 n where m is the number of levels of x and ȳ s is the mean value of y performed with the individuals having the level s - This criterion is often named correlation ratio - It takes its values in [0, 1] - It measures the proportion of the variance of the numerical variable y explained by the categorical variable x In which situation is this criterion equal to 0, equal to 1? Chapitre 2 MCA 19/52
20 Example : The Iris data ## SepalLength SepalWidth PetalLength PetalWidth Species ## setosa ## setosa ## setosa ## versicolor ## versicolor ## virginica Correlation ratios between the variable Species and the 4 numerical variables : eta2 <- function(x, gpe) { moyennes <- tapply(x, gpe, mean) effectifs <- tapply(x, gpe, length) varinter <- (sum(effectifs * (moyennes - mean(x))^2)) vartot <- (var(x) * (length(x) - 1)) res <- varinter/vartot return(res) } apply(iris[,-5],2,function(x){eta2(x,iris$species)}) ## SepalLength SepalWidth PetalLength PetalWidth ## Chapitre 2 MCA 20/52
21 The variable Species explains : % of the variance of "Petal Length" - 40 % of the variance of "Sepal Length" Petal Length Sepal Width setosa versicolor virginica setosa versicolor virginica Chapitre 2 MCA 21/52
22 Give an interpretation of the graphical outputs below : res <- PCA(iris,qualisup = 5,graph=FALSE) plot(res,choix="ind",habillage=5, title="",label="none",invisible="quali") plot(res,choix="var",title="",cex=15) Dim 2 (2285%) setosa versicolor virginica Dim 2 (2285%) SepalWidth Dim 1 (7296%) SepalLength PetalWidth PetalLength Dim 1 (7296%) How is this interpretation coherent with the results of the correlation ratios? Chapitre 2 MCA 22/52
23 Plan 1 Basic notions 2 The MCA algorithm 3 Different implementations of MCA 4 Interpretation of the results Chapitre 2 MCA 23/52
24 2 The MCA algorithm Several algorithms exist to perform Multiple Correspondance Analysis (MCA) and MCA can be defined as : - Correspondance Analysis (CA) applied to the Burt table (anglo-saxon approach) or to the disjonctive table (french approach), - Principal Component Analysis (PCA) applied to the centered disjonctive table (the approach describe in this Chapter) Because the CA method in not studied in this lecture, the MCA algorithm described hereafter is based on the general framework of PCA with metric introduced in the section 4 of the Chapter I Chapitre 2 MCA 24/52
25 The MCA algorithm The data table to be analyzed by MCA comprises n individuals described by p categorical variables and it is represented by the n p categorical matrix X Let m denote the total number of levels of the p categorical variables Step 1 : the pre-processing step 1 Build the real matrix Z of dimension n m as follows : Each level is coded as a binary variable and the n m disjonctive table K is constructed Z is the centered version of K 2 Build the diagonal matrix N of the weights of the rows of Z The n rows are often weighted by 1 n, such that N = 1 n In 3 Build the diagonal matrix M of the weights of the columns of Z : The m columns (corresponding to the levels of the categorical variables) are weighted by n n s, where n s, s = 1,, m denotes the number of individuals that belong to the sth level Chapitre 2 MCA 25/52
26 The metric M = diag( n n 1,, n n m ) (1) indicates that the distance between two rows of Z is weighted euclidean distance in the spirit of the χ 2 distance used in CA This distance gives more importance to rare levels The total inertia of Z with this distance and the weights 1 is equal to m p n Chapitre 2 MCA 26/52
27 Step 2 : the factor coordinates processing step 1 The Generalized Singular Value Decomposition (GSVD) of Z with metrics N and M gives the decomposition : where Z = UΛV t (2) - Λ = diag( λ 1,, λ r ) is the r r diagonal matrix of the singular values of ZMZ t N and Z t NZM, and r denotes the rank of Z which can best be here r = min(n 1, m p) ; - U is the n r matrix of the first r eigenvectors of ZMZ t N such that U t NU = I r, with I r the identity matrix of size r ; - V is the p r matrix of the first r eigenvectors of Z t NZM such that V t MV = I r Chapitre 2 MCA 27/52
28 2 The matrix F of dimension n r of the factor coordinates of the individuals is defined by : F = ZMV, (3) and we deduce from (2) that : F = UΛ (4) 1 α r 1 i f iα n mean 0 var λ α The columns f α of F are the principal components and : var(f α ) = λ α The columns u α = the standardized principal components λ fα of U are α Chapitre 2 MCA 28/52
29 The matrix F res <- MCA(data,graph=FALSE) F <- res$ind$coord F[,1:2] ## Dim 1 Dim 2 ## Beauceron ## BassetHound ## GermanShepherd ## Boxer ## Bulldog ## BullMastiff ## Poodle ## Chihuahua ## CockerSpaniel ## Collie ## Dalmatien ## Doberman ## GermanMastiff ## BittanySpaniel ## FrenchSpaniel ## FoxHound ## FoxTerrier ## GrdBleuGascon ## Labrador ## Levrier ## Mastiff ## Pekingese ## Pointer ## SaintBernard ## Setter ## Teckel ## Newfoundland Individuals plotted according to the two first PCs plot(res,choix="ind",invisible="var", cex=15,title="") Dim 2 (2308%) res$eig[1:2,1] ## [1] BassetHound Mastiff Chihuahua Pekingese BullMastiff SaintBernard Teckel GermanMastiff Newfoundland Bulldog FoxHound CockerSpaniel GrdBleuGascon Levrier Doberman PointerSetter GermanShepherd Beauceron CollieFrenchSpaniel Dim 1 (2890%) FoxTerrier Poodle Boxer Labrador Dalmatien BittanySpaniel Chapitre 2 MCA 29/52
30 3 The matrix A of dimension m r of the factor coordinates of the levels is defined by : A = MZ t NU = MA, (5) and we deduce from (2) that : A = MVΛ (6) 1 α q 1 s asα m Each coordinate a sα (element of A ) is the mean value of the (standardized) factor coordinates of the individuals that belong to level s : a sα = 1 n s i:k is =1 f iα λα This relation is called the barycentric property This property is fondamental for the interpretation of the graphical outputs in MCA Chapitre 2 MCA 30/52
31 The matrix A A <- res$var$coord A[,1:2] ## Dim 1 Dim 2 ## S ## S ## S ## W ## W ## W ## V ## V ## V ## I ## I ## I ## Af ## Af ## Ag ## Ag Plot of the levels according to their factor coordinates on dim1-2 plot(res,choix="ind",invisible="ind", cex=15,title="") Dim 2 (2308%) W++ Af S++ V++ Ag+ I I++ W+ V V+ W Ag I+ Af+ S+ S Dim 1 (2890%) The coordinates of the level W++ are the mean of the standardized coordinates of the dogs that belong to W++ rownames(data)[which(data$weight=="w++")] ## [1] "BullMastiff" "GermanMastiff" "Mastiff" "SaintBernard" "Newfoundland" Chapitre 2 MCA 31/52
32 Is it possible to plot both individuals and levels on the same map? It is possible to plot the levels at the barycenter of the individuals by using the barycentric property asα = 1 f iα n s λα i:k is =1 In that case two dimensions are chosen and : - the individuals are plotted according to their standardized principal components fα, λ α - the levels are plotted according to their factor coordinates vectors a α Chapitre 2 MCA 32/52
33 Example of the dogs data : Levels at the barycenter of the individuals second standardized PC BassetHound Mastiff Chihuahua Pekingese W++ V BullMastiff SaintBernard S GermanMastiff I W Bulldog Teckel Newfoundland Af Ag+ FoxTerrier FoxHound CockerSpaniel S++ Poodle Levrier GrdBleuGascon V++ I+ Ag Af+ Doberman I++ Pointer Setter GermanShepherd Beauceron W+ FrenchSpaniel Collie V+ S+ Boxer BittanySpaniel Dalmatien Labrador first standardized PC For instance the level W++ is plotted at the barycenter of the dogs that belong to W++ rownames(data)[which(data$weight=="w++")] ## [1] "BullMastiff" "GermanMastiff" "Mastiff" "SaintBernard" "Newfoundland" Chapitre 2 MCA 33/52
34 However this simultaneous representation of the levels at the barycenter of the individuals is not the standard output of softwares implementing MCA where the so-called quasi-barycentric property is usually used The quasi-barycentric property is simply the barycentric property written as follows : ( ) asα = 1 1 f iα λα n s i:k is =1 This reads : each coordinate asα is the mean value of the factor coordinates of the individuals that belong to level s, up to the multiplier coefficient 1 λ α Chapitre 2 MCA 34/52
35 It is then possible to plot the levels at the quasi-barycenter of the individuals : - the individuals are plotted according to their principal components f α, - the levels are plotted according to their factor coordinates vectors a α The representation of the levels at the quasi-barycenter of the individuals : - is the simultaneous representation usually implemented in the softwares, - must be interpreted as follows : the cloud of the levels is the dilatation (by in each dimension) of the cloud of the gravity centers of the 1 λ α individuals Chapitre 2 MCA 35/52
36 Example of the dogs data : Levels at the quasi barycenter of the individuals second PC BassetHound W++ V Mastiff S I Chihuahua Pekingese W GermanMastiff BullMastiff SaintBernard Newfoundland Bulldog Teckel Af Ag+ CockerSpaniel FoxTerrier FoxHound S++ Levrier Poodle GrdBleuGascon V++ I+ Ag Doberman Af+ Pointer Setter GermanShepherd I++ Beauceron FrenchSpaniel Collie W+ Boxer V+ BittanySpaniel Dalmatien Labrador S For instance the level W++ is plotted at the barycenter of the dogs that belong to W++ dilated by 1 = 1 λ = 2076 on the first dimension first PC res$eig[1:2,1] ## [1] apply(f[which(data$weight=="w++"),1:2],2,mean)/sqrt(res$eig[1:2,1]) ## Dim 1 Dim 2 ## Chapitre 2 MCA 36/52
37 Step 3 : the squared loadings processing step The contribution c jα of the variable x j (j-th column of X) to the variance of the principal component f α is defined by : n s c jα = n a 2 sα (7) s M j The matrix C = (c jα) of dimention p r is called the squared loadings matrix to draw an analogy with squared loadings in PCA 1 α r 1 j c jα p Each element c jα is equal to the correlation ratio between x j and f α : c jα = η 2 (f α x j ) Chapitre 2 MCA 37/52
38 C <- res$var$eta2 C[,1:2] The matrix C ## Dim 1 Dim 2 ## Size ## Weight ## Velocity ## Intelligence ## Affectivity ## Aggressivness Variables plotted according to their squared loadings plot(res,choix="var", cex=15,title="") Dim 2 (2308%) Intelligence Aggressivness Velocity Weight Affectivity Size Dim 1 (2890%) Chapitre 2 MCA 38/52
39 Plan 1 Basic notions 2 The MCA algorithm 3 Different implementations of MCA 4 Interpretation of the results Chapitre 2 MCA 39/52
40 3 Different implementations of MCA 1 Implement MCA as a CA of the Burt table : the anglo-saxon approach - CA is called simple Correspondance Analysis - In french CA is AFC = Analyse Factorielle des Correspondances - CA analyses a simple contingency table obtained by crossing two categorical variables - CA is a two steps procedure with first a PCA of the matrix of the row-profiles of the contingency table and then a PCA of the the matrix of the column-profiles These PCA use specific weights on the rows and columns and then specific metrics - Applying CA to the Burt table is then applying a single PCA (with specific metrics) to the matrix of the row-profiles of the Burt table Indeed the column-profiles are identical to the row-profiles in the Burt table Drawback : This algorithm gives the results (factor coordinates) for the levels but not for the individuals Implemented in the procedure CORRESP of the SAS sofware Chapitre 2 MCA 40/52
41 The Burt table is a symmetric table of size m m which gathers the contingency tables of all the pairs of variables 1 s m 1 B = K t K = s b ss m where : - b ss = n kisk i=1 is is the number of individual having both levels s and s - b ss = n s is the number of individuals having s Chapitre 2 MCA 41/52
42 Example : Burt table of the m = 16 levels K <- tabdisjonctif(data) B <- t(k)%*%k print(b) ## S- S+ S++ W- W+ W++ V- V+ V++ I- I+ I++ Af- Af+ Ag- Ag+ ## S ## S ## S ## W ## W ## W ## V ## V ## V ## I ## I ## I ## Af ## Af ## Ag ## Ag Chapitre 2 MCA 42/52
43 2 Implement MCA as a CA of the disjonctive table : the standard approach - The disjonctive table is used as a contingency table - Applying CA to the disjonctive table is then a two steps procedure with first a PCA of the matrix of the row-profiles (of the individuall) and then a PCA of the the matrix of the column-profiles (of the levels) Advantage : This algorithm gives directly the results (factor coordinates) for the levels and for the individuals Implemented in the function MCA of the R package FactoMineR Chapitre 2 MCA 43/52
44 3 Perform a PCA of the disjonctive table : the single PCA approach - This PCA uses specific weights of the columns (the levels) and then a specific distance between two rows (individuals) - Compared to the standard approach : - the factor coordinates of the levels are the same - the factor coordinates of the individuals are multiplied by p - the total inertia is multiplied by p and is equal to m p Advantage : It is not necessary to know the CA method to understand this algorithm Implemented in the function PCAmix of the R package PCAmixdata Chapitre 2 MCA 44/52
45 Plan 1 Basic notions 2 The MCA algorithm 3 Different implementations of MCA 4 Interpretation of the results Chapitre 2 MCA 45/52
46 4 Interpretation of the results Quality of the dimension reduction The quality of the q first principal components is measured by the proportion of the inertia that they explain Inertia of the data : I(Z) = I(F) = λ λ r = m p Proportion of inertia explaine by the α-th principal component λ α λ λ r In MCA, the percentage of inertia explained by the axes are "small" by construction Some authors have proposed corrections of the eigenvalues in MCA (Greenacre, 1993) Chapitre 2 MCA 46/52
47 Original data (p = 6 et m=16) ## Size Weight Velocity Intelligence ## Beauceron S++ W+ V++ I+ ## BassetHound S- W- V- I- ## GermanShepherd S++ W+ V++ I++ ## Boxer S+ W+ V+ I+ ## Bulldog S- W- V- I+ Reduction to the 3 first PCs ## Dim 1 Dim 2 Dim 3 ## Beauceron ## BassetHound ## GermanShepherd ## Boxer ## Bulldog What is the quality of this reduction? ## Eigenvalue Proportion Cumulative ## dim ## dim ## dim ## dim ## dim ## dim ## dim ## dim ## dim ## dim r = 10 non nul eigenvalues because r = min(n 1, m p) = 10, - The sum of the eigenvalues is m p = 10 (total inertia), % of the inertia is exaplined by the 3 first PCs Chapitre 2 MCA 47/52
48 Contribution of the individuals and of the levels - The relative contribution of an individual i to the variance of an axe α is : 1 fiα 2 n λ α The individuals far from the center of the factor map are those who contribute the most They can be a source of instability and can be removed or used as illustrative - The relative contribution of a level s to the variance of an axe α is : n s asα 2 n λ α The levels far from the center of the factor map are not necessary those who contribute the most Chapitre 2 MCA 48/52
49 The 5 individuals which contribute the most The 5 levels which contribute the most Dim 2 (2308%) Mastiff Pekingese Chihuahua Dalmatien Labrador Dim 2 (2308 %) W+ V S+ S W Dim 1 (2890%) Dim 1 (289 %) Chapitre 2 MCA 49/52
50 Contribution of the variables The absolute contribution of a categorical variable j to the variance of an axe α is the sum of the contributions of its levels : n s n a2 sα = η 2 (f α x j ) s M j The correlation ratios are signless measure of links used to plot the categorical variables on a map Dim 2 (2308 %) Weight Velocity Size Intelligence Affectivity Aggressivness Dim 2 (2308%) S S+ S++ BassetHound Mastiff SaintBernard BullMastiff Newfoundland GermanMastiff CockerSpaniel FoxHound GrdBleuGascon Levrier Doberman Setter Beauceron Pointer GermanShepherd CollieFrenchSpaniel Chihuahua Pekingese Bulldog Teckel FoxTerrier Poodle Boxer Dalmatien Labrador BittanySpaniel Dim 1 (289 %) Dim 1 (2890%) Chapitre 2 MCA 50/52
51 Quality of the projection of the individuals and of the level The quality of the projection of the individuals or of the levels is measured as in PCA by the so-called squared cosine - If two individuals are well projected their distance on the factor map is not far from their true distance knowing that in MCA the distance between two individuals is small if they have the same levels - If two levels are well projected, their distance on the factor map can be interpreted using the barycentric property : - two levels of two different variables are close if they are owned by the same individuals - two levels of a same variable are close if the two associated groups of individuals are close - Take care of the dispertion of the individuals associated with each levels before interpretating of the proximity between two levels Chapitre 2 MCA 51/52
52 The 10 individuals best projected Levels having a cos 2 > 05 Dim 2 (2308%) Mastiff GermanMastiff BassetHound Chihuahua Pekingese Bulldog Teckel Dalmatien Labrador BittanySpaniel Dim 2 (2308 %) Af S++ W+ V Af+ S+ S W Dim 1 (2890%) Dim 1 (289 %) Chapitre 2 MCA 52/52
Chapter II Multiple Correspondance Analysis (MCA)
Chapter II Multiple Correspodace Aalysis (MCA) Master MMAS - Uiversity of Bordeaux Marie Chavet Chapitre 2 MCA 1/52 Itroductio How to get iformatio from a categorical data table of idividuals variables?
More informationMULTIVARIATE ANALYSIS USING R
MULTIVARIATE ANALYSIS USING R B N Mandal I.A.S.R.I., Library Avenue, New Delhi 110 012 bnmandal @iasri.res.in 1. Introduction This article gives an exposition of how to use the R statistical software for
More informationData Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47
Data Mining - Data Dr. Jean-Michel RICHER 2018 jean-michel.richer@univ-angers.fr Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Outline 1. Introduction 2. Data preprocessing 3. CPA with R 4. Exercise
More informationk Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing)
k Nearest Neighbors k Nearest Neighbors To classify an observation: Look at the labels of some number, say k, of neighboring observations. The observation is then classified based on its nearest neighbors
More informationFeature Selection Using Modified-MCA Based Scoring Metric for Classification
2011 International Conference on Information Communication and Management IPCSIT vol.16 (2011) (2011) IACSIT Press, Singapore Feature Selection Using Modified-MCA Based Scoring Metric for Classification
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Look for accompanying R code on the course web site. Topics Exploratory Data Analysis
More informationWork 2. Case-based reasoning exercise
Work 2. Case-based reasoning exercise Marc Albert Garcia Gonzalo, Miquel Perelló Nieto November 19, 2012 1 Introduction In this exercise we have implemented a case-based reasoning system, specifically
More informationAnalysis and Latent Semantic Indexing
18 Principal Component Analysis and Latent Semantic Indexing Understand the basics of principal component analysis and latent semantic index- Lab Objective: ing. Principal Component Analysis Understanding
More informationClustOfVar: an R package for the clustering of variables
Outline ClustOfVar: an R package for the clustering of variables Marie Chavent & Vanessa Kuentz & Benoît Liquet & Jérôme Saracco IMB, University of Bordeaux, France INRIA Bordeaux Sud-Ouest, CQFD Team
More informationIntroduction to R and Statistical Data Analysis
Microarray Center Introduction to R and Statistical Data Analysis PART II Petr Nazarov petr.nazarov@crp-sante.lu 22-11-2010 OUTLINE PART II Descriptive statistics in R (8) sum, mean, median, sd, var, cor,
More informationLinear discriminant analysis and logistic
Practical 6: classifiers Linear discriminant analysis and logistic This practical looks at two different methods of fitting linear classifiers. The linear discriminant analysis is implemented in the MASS
More informationIntro to R for Epidemiologists
Lab 9 (3/19/15) Intro to R for Epidemiologists Part 1. MPG vs. Weight in mtcars dataset The mtcars dataset in the datasets package contains fuel consumption and 10 aspects of automobile design and performance
More informationVariable selection to construct indicators of quality of life for data structured in groups
Variable selection to construct indicators of quality of life for data structured in groups Marie Chavent a,b, Vanessa Kuentz-Simonet c, Amaury Labenne c, Jérôme Saracco a,b a Univ. Bordeaux, IMB, UMR
More informationBL5229: Data Analysis with Matlab Lab: Learning: Clustering
BL5229: Data Analysis with Matlab Lab: Learning: Clustering The following hands-on exercises were designed to teach you step by step how to perform and understand various clustering algorithm. We will
More informationData analysis case study using R for readily available data set using any one machine learning Algorithm
Assignment-4 Data analysis case study using R for readily available data set using any one machine learning Algorithm Broadly, there are 3 types of Machine Learning Algorithms.. 1. Supervised Learning
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and
More informationPackage InPosition. R topics documented: February 19, 2015
Package InPosition February 19, 2015 Type Package Title Inference Tests for ExPosition Version 0.12.7 Date 2013-12-09 Author Derek Beaton, Joseph Dunlop, Herve Abdi Maintainer Derek Beaton
More informationPredict Outcomes and Reveal Relationships in Categorical Data
PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,
More informationCreating publication-ready Word tables in R
Creating publication-ready Word tables in R Sara Weston and Debbie Yee 12/09/2016 Has this happened to you? You re working on a draft of a manuscript with your adviser, and one of her edits is something
More informationMachine Learning: Algorithms and Applications Mockup Examination
Machine Learning: Algorithms and Applications Mockup Examination 14 May 2012 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students Write First Name, Last Name, Student Number and Signature
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Features and Patterns The Curse of Size and
More informationK-means Clustering & PCA
K-means Clustering & PCA Andreas C. Kapourani (Credit: Hiroshi Shimodaira) 02 February 2018 1 Introduction In this lab session we will focus on K-means clustering and Principal Component Analysis (PCA).
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Features and Patterns The Curse of Size and
More informationEmpirical comparison of a monothetic divisive clustering method with the Ward and the k-means clustering methods
Empirical comparison of a monothetic divisive clustering method with the Ward and the k-means clustering methods Marie Chavent, Yves Lechevallier To cite this version: Marie Chavent, Yves Lechevallier.
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationStats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms
Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms Padhraic Smyth Department of Computer Science Bren School of Information and Computer Sciences University of California,
More informationPartitioning Cluster Analysis with Possibilistic C-Means Zeynel Cebeci
Partitioning Cluster Analysis with Possibilistic C-Means Zeynel Cebeci 2017-11-10 Contents 1 PREPARING FOR THE ANALYSIS 1 1.1 Install and load the package ppclust................................ 1 1.2
More informationGeneral Instructions. Questions
CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These
More informationStatistical Methods in AI
Statistical Methods in AI Distance Based and Linear Classifiers Shrenik Lad, 200901097 INTRODUCTION : The aim of the project was to understand different types of classification algorithms by implementing
More informationIBM SPSS Categories. Predict outcomes and reveal relationships in categorical data. Highlights. With IBM SPSS Categories you can:
IBM Software IBM SPSS Statistics 19 IBM SPSS Categories Predict outcomes and reveal relationships in categorical data Highlights With IBM SPSS Categories you can: Visualize and explore complex categorical
More informationSTA 570 Spring Lecture 5 Tuesday, Feb 1
STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row
More informationHandling Missing Values with Regularized Iterative Multiple Correspondence Analysis
Handling Missing Values with Regularized Iterative Multiple Correspondence Analysis Julie Josse, Marie Chavent, Benoit Liquet, François Husson To cite this version: Julie Josse, Marie Chavent, Benoit Liquet,
More informationLecture 3: Camera Calibration, DLT, SVD
Computer Vision Lecture 3 23--28 Lecture 3: Camera Calibration, DL, SVD he Inner Parameters In this section we will introduce the inner parameters of the cameras Recall from the camera equations λx = P
More informationarulescba: Classification for Factor and Transactional Data Sets Using Association Rules
arulescba: Classification for Factor and Transactional Data Sets Using Association Rules Ian Johnson Southern Methodist University Abstract This paper presents an R package, arulescba, which uses association
More informationIn this tutorial, we show how to implement this approach and how to interpret the results with Tanagra.
Subject Implementing the Principal Component Analysis (PCA) with TANAGRA. The PCA belongs to the factor analysis approaches. It is used to discover the underlying structure of a set of variables. It reduces
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationGraphing Bivariate Relationships
Graphing Bivariate Relationships Overview To fully explore the relationship between two variables both summary statistics and visualizations are important. For this assignment you will describe the relationship
More informationPart I. Graphical exploratory data analysis. Graphical summaries of data. Graphical summaries of data
Week 3 Based in part on slides from textbook, slides of Susan Holmes Part I Graphical exploratory data analysis October 10, 2012 1 / 1 2 / 1 Graphical summaries of data Graphical summaries of data Exploratory
More informationCluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6
Cluster Analysis and Visualization Workshop on Statistics and Machine Learning 2004/2/6 Outlines Introduction Stages in Clustering Clustering Analysis and Visualization One/two-dimensional Data Histogram,
More informationWorkload Characterization Techniques
Workload Characterization Techniques Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/
More informationInstance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.
Instance-Based Representations exemplars + distance measure Challenges. algorithm: IB1 classify based on majority class of k nearest neighbors learned structure is not explicitly represented choosing k
More informationCSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.
CSE 547: Machine Learning for Big Data Spring 2019 Problem Set 2 Please read the homework submission policies. 1 Principal Component Analysis and Reconstruction (25 points) Let s do PCA and reconstruct
More informationModified-MCA Based Feature Selection Model for Preprocessing Step of Classification
Modified- Based Feature Selection Model for Preprocessing Step of Classification Myo Khaing and Nang Saing Moon Kham, Member IACSIT Abstract Feature subset selection is a technique for reducing the attribute
More informationMultiresponse Sparse Regression with Application to Multidimensional Scaling
Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,
More informationWeek 7 Picturing Network. Vahe and Bethany
Week 7 Picturing Network Vahe and Bethany Freeman (2005) - Graphic Techniques for Exploring Social Network Data The two main goals of analyzing social network data are identification of cohesive groups
More informationmmpf: Monte-Carlo Methods for Prediction Functions by Zachary M. Jones
CONTRIBUTED RESEARCH ARTICLE 1 mmpf: Monte-Carlo Methods for Prediction Functions by Zachary M. Jones Abstract Machine learning methods can often learn high-dimensional functions which generalize well
More informationDimension reduction : PCA and Clustering
Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental
More informationSTAT 1291: Data Science
STAT 1291: Data Science Lecture 18 - Statistical modeling II: Machine learning Sungkyu Jung Where are we? data visualization data wrangling professional ethics statistical foundation Statistical modeling:
More informationDATA VISUALIZATION WITH GGPLOT2. Coordinates
DATA VISUALIZATION WITH GGPLOT2 Coordinates Coordinates Layer Controls plot dimensions coord_ coord_cartesian() Zooming in scale_x_continuous(limits =...) xlim() coord_cartesian(xlim =...) Original Plot
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki Wagner Meira Jr. Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA Department
More informationClojure & Incanter. Introduction to Datasets & Charts. Data Sorcery with. David Edgar Liebke
Data Sorcery with Clojure & Incanter Introduction to Datasets & Charts National Capital Area Clojure Meetup 18 February 2010 David Edgar Liebke liebke@incanter.org Outline Overview What is Incanter? Getting
More informationMATH5745 Multivariate Methods Lecture 13
MATH5745 Multivariate Methods Lecture 13 April 24, 2018 MATH5745 Multivariate Methods Lecture 13 April 24, 2018 1 / 33 Cluster analysis. Example: Fisher iris data Fisher (1936) 1 iris data consists of
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More informationAN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS
AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS H.S Behera Department of Computer Science and Engineering, Veer Surendra Sai University
More informationUnsupervised Learning
Unsupervised Learning Fabio G. Cozman - fgcozman@usp.br November 16, 2018 What can we do? We just have a dataset with features (no labels, no response). We want to understand the data... no easy to define
More informationFitting Classification and Regression Trees Using Statgraphics and R. Presented by Dr. Neil W. Polhemus
Fitting Classification and Regression Trees Using Statgraphics and R Presented by Dr. Neil W. Polhemus Classification and Regression Trees Machine learning methods used to construct predictive models from
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include
More informationLaTeX packages for R and Advanced knitr
LaTeX packages for R and Advanced knitr Iowa State University April 9, 2014 More ways to combine R and LaTeX Additional knitr options for formatting R output: \Sexpr{}, results='asis' xtable - formats
More informationFinding Clusters 1 / 60
Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering Clustering by Partitioning, e.g. k-means Density Based Clustering, e.g. DBScan Grid Based Clustering 1 / 60
More informationAn Introduction to Cluster Analysis. Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs
An Introduction to Cluster Analysis Zhaoxia Yu Department of Statistics Vice Chair of Undergraduate Affairs zhaoxia@ics.uci.edu 1 What can you say about the figure? signal C 0.0 0.5 1.0 1500 subjects Two
More informationChapter 60 The STEPDISC Procedure. Chapter Table of Contents
Chapter 60 Chapter Table of Contents OVERVIEW...3155 GETTING STARTED...3156 SYNTAX...3163 PROC STEPDISC Statement...3163 BYStatement...3166 CLASSStatement...3167 FREQStatement...3167 VARStatement...3167
More informationModalities Additive coding Disjunctive coding z 2/z a b 0 c d Table 1: Coding of modalities Table 2: table Conti
Topological Map for Binary Data Mustapha. LEBBAH a,fouad. BADRAN b, Sylvie. THIRIA a;b a- CEDERIC, Conservatoire National des Arts et Métiers, 292 rue Saint Martin, 75003 Paris, France b- Laboratoire LODYC,
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.
More informationIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)
More informationThe Curse of Dimensionality
The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more
More informationIntroduction to Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Introduction to Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 14 Python Exercise on knn and PCA Hello everyone,
More informationData Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Data Exploration Chapter Introduction to Data Mining by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 What is data exploration?
More informationPackage PCADSC. April 19, 2017
Type Package Package PCADSC April 19, 2017 Title Tools for Principal Component Analysis-Based Data Structure Comparisons Version 0.8.0 A suite of non-parametric, visual tools for assessing differences
More informationQuick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018
Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018 Contents Introduction... 1 Start DIONE... 2 Load Data... 3 Missing Values... 5 Explore Data... 6 One Variable... 6 Two Variables... 7 All
More information6 Subscripting. 6.1 Basics of Subscripting. 6.2 Numeric Subscripts. 6.3 Character Subscripts
6 Subscripting 6.1 Basics of Subscripting For objects that contain more than one element (vectors, matrices, arrays, data frames, and lists), subscripting is used to access some or all of those elements.
More informationExploratory Multivariate Analysis by Example Using R
Computer Science and Data Analysis Series Exploratory Multivariate Analysis by Example Using R Franc;ois Husson Sebastien Le Jerome Pages 0 ~~~,~~!~~"' Boca Raton London New York Contents P reface xi 1
More informationKTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn
KTH ROYAL INSTITUTE OF TECHNOLOGY Lecture 14 Machine Learning. K-means, knn Contents K-means clustering K-Nearest Neighbour Power Systems Analysis An automated learning approach Understanding states in
More informationnetzen - a software tool for the analysis and visualization of network data about
Architect and main contributor: Dr. Carlos D. Correa Other contributors: Tarik Crnovrsanin and Yu-Hsuan Chan PI: Dr. Kwan-Liu Ma Visualization and Interface Design Innovation (ViDi) research group Computer
More informationApplication of Fuzzy Logic Akira Imada Brest State Technical University
A slide show of our Lecture Note Application of Fuzzy Logic Akira Imada Brest State Technical University Last modified on 29 October 2016 (Contemporary Intelligent Information Techniques) 2 I. Fuzzy Basic
More information2 Second Derivatives. As we have seen, a function f (x, y) of two variables has four different partial derivatives: f xx. f yx. f x y.
2 Second Derivatives As we have seen, a function f (x, y) of two variables has four different partial derivatives: (x, y), (x, y), f yx (x, y), (x, y) It is convenient to gather all four of these into
More informationIterated Consensus Clustering: A Technique We Can All Agree On
Iterated Consensus Clustering: A Technique We Can All Agree On Mindy Hong, Robert Pearce, Kevin Valakuzhy, Carl Meyer, Shaina Race Abstract Cluster Analysis is a field of Data Mining used to extract underlying
More informationvector space retrieval many slides courtesy James Amherst
vector space retrieval many slides courtesy James Allan@umass Amherst 1 what is a retrieval model? Model is an idealization or abstraction of an actual process Mathematical models are used to study the
More informationPCOMP http://127.0.0.1:55825/help/topic/com.rsi.idl.doc.core/pcomp... IDL API Reference Guides > IDL Reference Guide > Part I: IDL Command Reference > Routines: P PCOMP Syntax Return Value Arguments Keywords
More informationLars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
Syllabus Fri. 27.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 3.11. (2) A.1 Linear Regression Fri. 10.11. (3) A.2 Linear Classification Fri. 17.11. (4) A.3 Regularization
More informationMultivariate analyses in ecology. Cluster (part 2) Ordination (part 1 & 2)
Multivariate analyses in ecology Cluster (part 2) Ordination (part 1 & 2) 1 Exercise 9B - solut 2 Exercise 9B - solut 3 Exercise 9B - solut 4 Exercise 9B - solut 5 Multivariate analyses in ecology Cluster
More informationThe STEPDISC Procedure
SAS/STAT 9.2 User s Guide The STEPDISC Procedure (Book Excerpt) This document is an individual chapter from SAS/STAT 9.2 User s Guide. The correct bibliographic citation for the complete manual is as follows:
More informationRecommender System. What is it? How to build it? Challenges. R package: recommenderlab
Recommender System What is it? How to build it? Challenges R package: recommenderlab 1 What is a recommender system Wiki definition: A recommender system or a recommendation system (sometimes replacing
More informationClassification with Diffuse or Incomplete Information
Classification with Diffuse or Incomplete Information AMAURY CABALLERO, KANG YEN Florida International University Abstract. In many different fields like finance, business, pattern recognition, communication
More informationMachine Learning with MATLAB --classification
Machine Learning with MATLAB --classification Stanley Liang, PhD York University Classification the definition In machine learning and statistics, classification is the problem of identifying to which
More informationDiscriminate Analysis
Discriminate Analysis Outline Introduction Linear Discriminant Analysis Examples 1 Introduction What is Discriminant Analysis? Statistical technique to classify objects into mutually exclusive and exhaustive
More informationClustering analysis of gene expression data
Clustering analysis of gene expression data Chapter 11 in Jonathan Pevsner, Bioinformatics and Functional Genomics, 3 rd edition (Chapter 9 in 2 nd edition) Human T cell expression data The matrix contains
More informationDimensionality Reduction, including by Feature Selection.
Dimensionality Reduction, including by Feature Selection www.cs.wisc.edu/~dpage/cs760 Goals for the lecture you should understand the following concepts filtering-based feature selection information gain
More informationApplying the Possibilistic C-Means Algorithm in Kernel-Induced Spaces
1 Applying the Possibilistic C-Means Algorithm in Kernel-Induced Spaces Maurizio Filippone, Francesco Masulli, and Stefano Rovetta M. Filippone is with the Department of Computer Science of the University
More informationPackage catdap. R topics documented: March 20, 2018
Version 1.3.4 Title Categorical Data Analysis Program Package Author The Institute of Statistical Mathematics Package catdap March 20, 2018 Maintainer Masami Saga Depends R (>=
More informationMachine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme
Machine Learning A. Supervised Learning A.7. Decision Trees Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 /
More informationIn stochastic gradient descent implementations, the fixed learning rate η is often replaced by an adaptive learning rate that decreases over time,
Chapter 2 Although stochastic gradient descent can be considered as an approximation of gradient descent, it typically reaches convergence much faster because of the more frequent weight updates. Since
More informationLecture Topic Projects
Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, basic tasks, data types 3 Introduction to D3, basic vis techniques for non-spatial data Project #1 out 4 Data
More informationThe ca Package. October 29, 2007
Version 0.21 Date 2007-07-25 The ca Package October 29, 2007 Title Simple, Multiple and Joint Correspondence Analysis Author Michael Greenacre , Oleg Nenadic
More informationData Exploration and Preparation Data Mining and Text Mining (UIC Politecnico di Milano)
Data Exploration and Preparation Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining, : Concepts and Techniques", The Morgan Kaufmann
More informationThe Use of Biplot Analysis and Euclidean Distance with Procrustes Measure for Outliers Detection
Volume-8, Issue-1 February 2018 International Journal of Engineering and Management Research Page Number: 194-200 The Use of Biplot Analysis and Euclidean Distance with Procrustes Measure for Outliers
More informationHands on Datamining & Machine Learning with Weka
Step1: Click the Experimenter button to launch the Weka Experimenter. The Weka Experimenter allows you to design your own experiments of running algorithms on datasets, run the experiments and analyze
More informationPackage TExPosition. R topics documented: January 31, 2019
Package TExPosition January 31, 2019 Type Package Title Two-Table ExPosition Version 2.6.10.1 Date 2013-12-09 Author Derek Beaton, Jenny Rieck, Cherise R. Chin Fatt, Herve Abdi Maintainer Derek Beaton
More informationVersion 2.4 of Idiogrid
Version 2.4 of Idiogrid Structural and Visual Modifications 1. Tab delimited grids in Grid Data window. The most immediately obvious change to this newest version of Idiogrid will be the tab sheets that
More informationCSE 252B: Computer Vision II
CSE 252B: Computer Vision II Lecturer: Serge Belongie Scribe: Haowei Liu LECTURE 16 Structure from Motion from Tracked Points 16.1. Introduction In the last lecture we learned how to track point features
More informationUniversity of Florida CISE department Gator Engineering. Visualization
Visualization Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida What is visualization? Visualization is the process of converting data (information) in to
More information