Package FPDclustering

Similar documents
Package longclust. March 18, 2018

Package sensory. R topics documented: February 23, Type Package

Clustering Lecture 5: Mixture Model

Package RaPKod. February 5, 2018

Package restlos. June 18, 2013

Package clustmixtype

Package gibbs.met. February 19, 2015

Package ssmn. R topics documented: August 9, Type Package. Title Skew Scale Mixtures of Normal Distributions. Version 1.1.

Package FisherEM. February 19, 2015

Package clustvarsel. April 9, 2018

Package Numero. November 24, 2018

Package glassomix. May 30, 2013

Package BayesCR. September 11, 2017

Package hsiccca. February 20, 2015

Clustering and Visualisation of Data

Package ICSShiny. April 1, 2018

Package clusternomics

Package CONOR. August 29, 2013

Package mrm. December 27, 2016

Package cwm. R topics documented: February 19, 2015

Introduction to Mobile Robotics

Package slp. August 29, 2016

Package corclass. R topics documented: January 20, 2016

Unsupervised Learning : Clustering

Package samplesizelogisticcasecontrol

Package rmi. R topics documented: August 2, Title Mutual Information Estimators Version Author Isaac Michaud [cre, aut]

Package ordinalclust

Network Traffic Measurements and Analysis

Package RCA. R topics documented: February 29, 2016

Package robustreg. R topics documented: April 27, Version Date Title Robust Regression Functions

Package mixphm. July 23, 2015

Package CausalGAM. October 19, 2017

Package estprod. May 2, 2018

Package mmtsne. July 28, 2017

Package ECctmc. May 1, 2018

Package SSLASSO. August 28, 2018

Package rpst. June 6, 2017

Package RSKC. R topics documented: August 28, Type Package Title Robust Sparse K-Means Version Date Author Yumi Kondo

Package FMsmsnReg. March 30, 2016

Package bayesdccgarch

Package dissutils. August 29, 2016

Package spark. July 21, 2017

Package CoClust. December 17, 2017

Package RPEnsemble. October 7, 2017

Package assortnet. January 18, 2016

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Package GFD. January 4, 2018

Package CompGLM. April 29, 2018

Package ConvergenceClubs

Package CINID. February 19, 2015

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Package edci. May 16, 2018

Package flexcwm. May 20, 2018

Package TVsMiss. April 5, 2018

Package IntNMF. R topics documented: July 19, 2018

Package libstabler. June 1, 2017

Package TipDatingBeast

Package MAINT.Data. August 29, 2013

Package EFDR. April 4, 2015

Package pwrrasch. R topics documented: September 28, Type Package

Package subplex. April 5, 2018

Package sbf. R topics documented: February 20, Type Package Title Smooth Backfitting Version Date Author A. Arcagni, L.

Package bisect. April 16, 2018

Package snn. August 23, 2015

Package influence.sem

Package chi2x3way. R topics documented: January 23, Type Package

Package qboxplot. November 12, qboxplot... 1 qboxplot.stats Index 5. Quantile-Based Boxplots

Package madsim. December 7, 2016

Package hkclustering

Package clustering.sc.dp

Package StVAR. February 11, 2017

Package densityclust

Package isopam. February 20, 2015

Package orthodr. R topics documented: March 26, Type Package

Package subtype. February 20, 2015

Note Set 4: Finite Mixture Models and the EM Algorithm

Package LVGP. November 14, 2018

Package MfUSampler. June 13, 2017

Package PrivateLR. March 20, 2018

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Working with Unlabeled Data Clustering Analysis. Hsiao-Lung Chan Dept Electrical Engineering Chang Gung University, Taiwan

Multivariate analyses in ecology. Cluster (part 2) Ordination (part 1 & 2)

Package fractalrock. February 19, 2015

Package acebayes. R topics documented: November 21, Type Package

Package gpr. February 20, 2015

Package gplm. August 29, 2016

Package datasets.load

Package GGMselect. February 1, 2018

Package optimus. March 24, 2017

Package MeanShift. R topics documented: August 29, 2016

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Package mtsdi. January 23, 2018

Package gridgraphics

Package pnmtrem. February 20, Index 9

Package complmrob. October 18, 2015

Package RAMP. May 25, 2017

Package munfold. R topics documented: February 8, Type Package. Title Metric Unfolding. Version Date Author Martin Elff

Package bdots. March 12, 2018

Package ecp. December 15, 2017

Package gppm. July 5, 2018

Transcription:

Type Package Title PD-Clustering and Factor PD-Clustering Version 1.2 Date 2017-08-23 Package FPDclustering Author Cristina Tortora and Paul D. McNicholas August 23, 2017 Maintainer Cristina Tortora <grikris1@gmail.com> Probabilistic distance clustering (PD-clustering) is an iterative, distribution free, probabilistic clustering method. PD-clustering assigns units to a cluster according to their probability of membership, under the constraint that the product of the probability and the distance of each point to any cluster centre is a constant. PD-clustering is a flexible method that can be used with non-spherical clusters, outliers, or noisy data. Facto PDclustering (FPDC) is a recently proposed factor clustering method that involves a linear transformation of variables and a cluster optimizing the PD-clustering criterion. It works on high dimensional datasets. Depends ThreeWay License GPL (>= 2) NeedsCompilation no Repository CRAN Date/Publication 2017-08-23 18:08:58 UTC R topics documented: asymmetric20........................................ 2 asymmetric3......................................... 2 FPDC............................................ 3 outliers........................................... 5 PDclust........................................... 5 Silh............................................. 6 TuckerFactors........................................ 8 Index 10 1

2 asymmetric3 asymmetric20 Asymmetric data set shape=20 Each cluster has been generated according to a multivariate asymmetric Gaussian distribution, with shape=20, covariance matrix equal to the identity matrix and randomly generated centres. data(asymmetric20) Format A data frame with 800 observations on the following 101 variables. The first variable is the membership. Source Generated with R using the package sn (The skew-normal and skew-t distributions), function rsn data(asymmetric20) plot(asymmetric20[,2:3]) asymmetric3 Asymmetric data set shape=3 Each cluster has been generated according to a multivariate asymmetric Gaussian distribution, with shape=3, covariance matrix equal to the identity matrix and randomly generated centres. data(asymmetric3) Format A data frame with 800 observations on 101 variables. The first variable is the membership labels. Source Generated with R using the package sn (The skew-normal and skew-t distributions), function rsn

FPDC 3 data(asymmetric3) plot(asymmetric3[,2:3]) FPDC Factor probabilistic distance clustering An implementation of FPDC, a probabilistic factor clustering algorithm that involves a linear transformation of variables and a cluster optimizing the PD-clustering criterion FPDC(data = NULL, k = 2, nf = 2, nu = 2) Arguments data k nf nu A matrix or data frame such that rows correspond to observations and columns correspond to variables. A numerical parameter giving the number of clusters A numerical parameter giving the number of factors for variables A numerical parameter giving the number of factors for units Value A list with components label centers probability JDF iter explained A vector of integers indicating the cluster membership for each unit A matrix of cluster centers A matrix of probability of each point belonging to each cluster The value of the Joint distance function The number of iterations The explained variability Author(s) Cristina Tortora and Paul D. McNicholas

4 FPDC References Tortora, C., M. Gettler Summa, M. Marino, and F. Palumbo. Factor probabilistic distance clustering (fpdc): a new clustering method for high dimensional data sets. Advanced in Data Analysis and Classification, to appear, 2016. Tortora C., Gettler Summa M., and Palumbo F.. Factor pd-clustering. In Lausen et al., editor, Algorithms from and for Nature and Life, Studies in Classification, Data Analysis, and Knowledge Organization DOI 10.1007/978-3-319-00035-011, 115-123, 2013. Tortora C., Non-hierarchical clustering methods on factorial subspaces, 2012. See Also PDclust # Asymmetric data set clustering example (with shape=3). data('asymmetric3') x<-asymmetric3[,-1] fpdas3=fpdc(x,4,3,3) table(asymmetric3[,1],fpdas3$label) Silh(fpdas3$probability) # Asymmetric data set clustering example (with shape=20). data('asymmetric20') x<-asymmetric20[,-1] fpdas20=fpdc(x,4,3,3) table(asymmetric20[,1],fpdas20$label) Silh(fpdas20$probability) # Clustering example with outliers. data('outliers') x<-outliers[,-1] fpdout=fpdc(x,4,5,4) table(outliers[,1],fpdout$label) Silh(fpdout$probability)

outliers 5 outliers Data set with outliers Format Source Each cluster has been generated according to a multivariate Gaussian distribution, with centers c randomly generated. For each cluster, 20% of uniform distributed outliers have been generated at a distance included in max(x-c) and max(x-c)+5 form the center. data(outliers) A data frame with 960 observations on the following 101 variables. The first variable corresponds to the membership generated with R data(outliers) plot(outliers[,2:3]) PDclust Probabilistic Distance Clustering Probabilistic distance clustering (PD-clustering) is an iterative, distribution free, probabilistic clustering method. PD clustering assigns units to a cluster according to their probability of membership, under the constraint that the product of the probability and the distance of each point to any cluster centre is a constant. PDclust(data = NULL, k = 2) Arguments data k A matrix or data frame such that rows correspond to observations and columns correspond to variables. A numerical parameter giving the number of clusters

6 Silh Value A list with components label centers probability JDF iter A vector of integers indicating the cluster membership for each unit A matrix of cluster centers A matrix of probability of each point belonging to each cluster The value of the Joint distance function The number of iterations Author(s) Cristina Tortora and Paul D. McNicholas References Ben-Israel A, Iyigun. Probabilistic D-Clustering. Journal of Classification, 25(1), 5 26, 2008. #Normally generated clusters c1 = c(+2,+2,2,2) c2 = c(-2,-2,-2,-2) c3 = c(-3,3,-3,3) n=200 x1 = cbind(rnorm(n, c1[1]), rnorm(n, c1[2]), rnorm(n, c1[3]), rnorm(n, c1[4]) ) x2 = cbind(rnorm(n, c2[1]), rnorm(n, c2[2]),rnorm(n, c2[3]), rnorm(n, c2[4]) ) x3 = cbind(rnorm(n, c3[1]), rnorm(n, c3[2]),rnorm(n, c3[3]), rnorm(n, c3[4]) ) x = rbind(x1,x2,x3) pdn=pdclust(x,3) plot(x[,1:2],col=pdn$label) plot(x[,3:4],col=pdn$label) Silh Probabilistic silhouette plot Graphical tool to see how well each point belongs to the cluster. Silh(p) Arguments p A matrix of probabilities such that rows correspond to observations and columns correspond to clusters.

Silh 7 Details The probabilistic silhouettes are an adaptation of the ones proposed by Menardi(2011) according to the following formula:, Value dbs i = (log(p imk /p im1 ))/max i log(p imk /p im1 ) where m k is such that x i belongs to cluster k and m 1 is such that p im1 is maximum for m different fromm k. Probabilistic silhouette plot Author(s) Cristina Tortora References Menardi G. Density-based Silhouette diagnostics for clustering methods.statistics and Computing, 21, 295-308, 2011. # Asymmetric data set silhouette example (with shape=3). data('asymmetric3') x<-asymmetric3[,-1] fpdas3=fpdc(x,4,3,3) Silh(fpdas3$probability) # Asymmetric data set shiluette example (with shape=20). data('asymmetric20') x<-asymmetric20[,-1] fpdas20=fpdc(x,4,3,3) Silh(fpdas20$probability) # Shiluette example with outliers. data('outliers') x<-outliers[,-1] fpdout=fpdc(x,4,4,3) Silh(fpdout$probability)

8 TuckerFactors TuckerFactors Choice of the number of Tucker 3 factors An empirical way of choosing the number of factors. The algorithm returns a graph and a table representing the explained variability varying the number of factors. TuckerFactors(data = NULL, nc = 2) Arguments data nc A matrix or data frame such that rows correspond to observations and columns correspond to variables. A numerical parameter giving the number of clusters Value A table containing the explained variability varying the number of factors for units (column) and for variables (row) and a plot Author(s) Cristina Tortora References Kiers H, Kinderen A. A fast method for choosing the numbers of components in Tucker3 analysis.british Journal of Mathematical and Statistical Psychology, 56(1), 119-125, 2003. Kroonenberg P. Applied Multiway Data Analysis. Ebooks Corporation, Hoboken, New Jersey, 2008. Tortora C., Gettler Summa M., and Palumbo F.. Factor pd-clustering. In Lausen et al., editor, Algorithms from and for Nature and Life, Studies in Classification, Data Analysis, and Knowledge Organization DOI 10.1007/978-3-319-00035-011, 115-123, 2013. See Also T3 # Asymmetric data set example (with shape=3). data('asymmetric3') xp=tuckerfactors(asymmetric3[,-1], nc = 4)

TuckerFactors 9 # Asymmetric data set example (with shape=20). data('asymmetric20') xp=tuckerfactors(asymmetric20[,-1], nc = 4)

Index asymmetric20, 2 asymmetric3, 2 FPDC, 3 outliers, 5 PDclust, 4, 5 Silh, 6 T3, 8 TuckerFactors, 8 10