Package Funclustering

Similar documents
Package FisherEM. February 19, 2015

Package mixphm. July 23, 2015

Package glassomix. May 30, 2013

Package mrm. December 27, 2016

Package Numero. November 24, 2018

Package flexcwm. May 20, 2018

Package logspline. February 3, 2016

Package cwm. R topics documented: February 19, 2015

Package mixsqp. November 14, 2018

Package Rambo. February 19, 2015

Package ordinalclust

Package FMsmsnReg. March 30, 2016

Package sspline. R topics documented: February 20, 2015

Package PedCNV. February 19, 2015

Package TVsMiss. April 5, 2018

Package Rankcluster. September 27, 2013

Package SAM. March 12, 2019

Package longclust. March 18, 2018

Package clustmixtype

MSA220 - Statistical Learning for Big Data

Clustering Lecture 5: Mixture Model

Package pendvine. R topics documented: July 9, Type Package

Clustering and Visualisation of Data

Package penrvine. R topics documented: May 21, Type Package

Package intcensroc. May 2, 2018

Package sensory. R topics documented: February 23, Type Package

Linear Methods for Regression and Shrinkage Methods

Package MeanShift. R topics documented: August 29, 2016

Expectation Maximization (EM) and Gaussian Mixture Models

Package ipft. January 4, 2018

Package rankdist. April 8, 2018

Package cgh. R topics documented: February 19, 2015

Package ICSOutlier. February 3, 2018

Package DPBBM. September 29, 2016

Package MixSim. April 29, 2017

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Package robustgam. February 20, 2015

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Package SSLASSO. August 28, 2018

Package ClustGeo. R topics documented: July 14, Type Package

Note Set 4: Finite Mixture Models and the EM Algorithm

Package MPCI. October 25, 2015

Package PUlasso. April 7, 2018

Package StVAR. February 11, 2017

SGN (4 cr) Chapter 11

Package Rankcluster. July 29, 2016

Package OptimaRegion

Package CINID. February 19, 2015

What is machine learning?

Package clustvarsel. April 9, 2018

Package orthodr. R topics documented: March 26, Type Package

Package treethresh. R topics documented: June 30, Version Date

Package robustreg. R topics documented: April 27, Version Date Title Robust Regression Functions

Package ICsurv. February 19, 2015

Package geofd. R topics documented: October 5, Version 1.0

Package MIICD. May 27, 2017

Package tclust. May 24, 2018

Package rknn. June 9, 2015

Package rvinecopulib

Package SeleMix. R topics documented: November 22, 2016

Package freeknotsplines

Package edrgraphicaltools

Using the DATAMINE Program

Introduction to Machine Learning CMU-10701

Package mlegp. April 15, 2018

Package robets. March 6, Type Package

Package VarReg. April 24, 2018

Package msgps. February 20, 2015

Package CMPControl. February 19, 2015

Package ECctmc. May 1, 2018

Package pcapa. September 16, 2016

Package hbm. February 20, 2015

Package GLDreg. February 28, 2017

Package UPMASK. April 3, 2017

Package kamila. August 29, 2016

Package capushe. R topics documented: April 19, Type Package

Package prettygraphs

Package pampe. R topics documented: November 7, 2015

Package robustgam. January 7, 2013

Package milr. June 8, 2017

Package autovarcore. June 4, 2018

Clustering: Classic Methods and Modern Views

Package rmumps. September 19, 2017

Package intccr. September 12, 2017

Package pencopula. R topics documented: August 31, Type Package

Package MSwM. R topics documented: February 19, Type Package Title Fitting Markov Switching Models Version 1.

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Package mtsdi. January 23, 2018

Package fso. February 19, 2015

Package radiomics. March 30, 2018

Package anfis. February 19, 2015

The supclust Package

Robust Kernel Methods in Clustering and Dimensionality Reduction Problems

Package svmpath. R topics documented: August 30, Title The SVM Path Algorithm Date Version Author Trevor Hastie

Package robustfa. February 15, 2013

Package SoftClustering

Package PrivateLR. March 20, 2018

Package svapls. February 20, 2015

Package esabcv. May 29, 2015

Transcription:

Type Package Title A package for functional data clustering. Version 1.0.1 Date 2013-12-20 Package Funclustering February 19, 2015 Author Mohamed Soueidatt <mohamed.soueidatt@inria.fr>, <msoueidatt@gmail.com>*. Copyright Mohamed Soueidatt Maintainer Vincent KUBICKI <vincent.kubicki@inria.fr> This packages proposes a model-based clustering algorithm for multivariate functional data. The parametric mixture model, based on the assumption of normality of the principal components resulting from a multivariate functional PCA, is estimated by an EM-like algorithm. The main advantage of the proposed algorithm is its ability to take into account the dependence among curves. License GPL (>= 2) Depends fda (>= 2.2.6), R (>= 2.15.1), methods Imports Rcpp (>= 0.10.3) LinkingTo Rcpp, RcppEigen SystemRequirements GNU make Collate 'cpp_data.r' 'funclust.r' 'input.r' 'output.r' 'plot.r' 'mfpca.r' 'harmcut.r' 'mfpcaplot.r' NeedsCompilation yes Repository CRAN Date/Publication 2014-01-15 11:52:11 R topics documented: Funclustering-package................................... 2 cppmultidata........................................ 3 cppunidata......................................... 4 funclust........................................... 4 1

2 Funclustering-package harmscut.......................................... 6 Input-class.......................................... 7 mfpca............................................ 7 mfpcaplot.......................................... 8 Output-class......................................... 9 plotfd............................................ 10 plotoc............................................ 11 Index 12 Funclustering-package A package for functional data clustering. Details This packages proposes a model-based clustering algorithm for multivariate functional data. The parametric mixture model, based on the assumption of normality of the principal components resulting from a multivariate functional PCA, is estimated by an EM-like algorithm. The main advantage of the proposed algorithm is its ability to take into account the dependence among curves. This package include a function named funclust which allow to perform functional data clustering. This package allows also to perform a functional PCA (function mfpca) for univariate or multivariate functional data, with possibility to define different weights for each observation. The output of funclust and mfpca are a list, so one can use summary() to display the results. We can also plot the original curves with function plotoc. The function plotfd plots the curves after interpolation or smoothing. See the help of theses functions for more details. References J.Jacques and C.Preda (2013), Funclust: a curves clustering method using functional random variable density approximation, Neurocomputing, 112, 164-171. J.Jacques and C.Preda (2013), Model-based clustering of multivariate functional data, Computational Statistics and Data Analysis, in press DOI 10.1016/j.csda.2012.12.004. Examples # Multivariate # --------- CanadianWeather (data from the fda package) -------- CWtime<- 1:365 CWrange<-c(1,365) CWbasis <- create.fourier.basis(cwrange, nbasis=65) harmaccellfd <- vec2lfd(c(0,(2*pi/365)^2,0), rangeval=cwrange) # -- Build the curves -- temperature=canadianweather$dailyav[,,"temperature.c"] CWfd1 <- smooth.basispar( CWtime, CanadianWeather$dailyAv[,,"Temperature.C"],CWbasis,

cppmultidata 3 Lfdobj=harmaccelLfd, lambda=1e-2)$fd precipitation=canadianweather$dailyav[,,"precipitation.mm"] CWfd2 <- smooth.basispar( CWtime, CanadianWeather$dailyAv[,,"Precipitation.mm"],CWbasis, Lfdobj=harmaccelLfd, lambda=1e-2)$fd # -- the multivariate functional data object -- CWfd=list(CWfd1,CWfd2) # -- clustering in two class -- res=funclust(cwfd,k=2) summary(res) cppmultidata The C++ code, of this package take to run from the data two main information. the coefficient in the basis expansion of the functional data, and the inner product between theses basis. cppmultidata made this task in the multivariate case. The C++ code, of this package take to run from the data two main information. the coefficient in the basis expansion of the functional data, and the inner product between theses basis. cppmultidata made this task in the multivariate case. cppmultidata(mfd) mfd a list containing all dimension of the data Value fddata a list that containing the concatenation of the transpose of the coefficients matrix and a block diagonal matrix where each block is the inner product between the basis functions.

4 funclust cppunidata The C++ code, of this package take to run from the data two main information. the coefficient in the basis expansion of the functional data, and the inner product between theses basis. cppunidata made this task in the univariate case. The C++ code, of this package take to run from the data two main information. the coefficient in the basis expansion of the functional data, and the inner product between theses basis. cppunidata made this task in the univariate case. cppunidata(fd) fd the functional data Value fddata a list that containing the transpose of the coefficients matrix and the inner product between these basis functions. funclust funclust, clustering multivariate functional data This function run clustering algorithm for multivariate functional data. funclust(fd, K, thd = 0.05, increasedimension = FALSE, hard = FALSE, fixeddimension = integer(0), nbinit = 5, nbiterinit = 20, nbiteration = 200, epsilon = 1e-05) fd K thd in the univariate case fd is an object from a class fd of fda package. Otherwise, in the multivariate case, fd is a list of fd objects (fd=list(fd1,fd2,..)). the number of clusters. the threshold in the Cattell scree test used to select the numbers of principal components retained for the approximation of the probability density of the functional data.

funclust 5 increasedimension A logical parameter. If FALSE, the numbers of principal components are selected at each step of the algorithm according to the Cattell scree test. If TRUE, only an increase of the numbers of principal components are allowed. hard Details Value A logical parameter. If TRUE, the algorithm is randomly initialized with "hard" cluster membership (each curves is randomly assigned to one of the clusters). if FALSE, "soft" cluster membership are used (the cluster membership probabilities are randomly choosen in the K-simplex). fixeddimension A vector of size K which contains the numbers of principal components in the case of fixed numbers of principal components. epsilon nbinit nbiterinit nbiteration The stoping criterion for the EM-like algorithm: the algorithm is stopped if the difference between two successive loglikelihood is less than epsilon. The number of small-em used to determine the intialization of the main EM-like algorithm. The maximum number of iterations for each small-em. The maximum number of iteration in the main EM-like algorithm. There is multiple kind of running the function funclust. The first one is to run the function with fixed dimensions among all iteration of the algorithm. (parameter fixeddimenstion). fixeddimension must be an integer vector of size K (number of cluster). If the user gives a not integer value in fixeddimension, then this value will be convert automatically to an integer one for examplethe, the algorithm will run with a dimension 2 instead of 2.2 given by user. The second one is to run it with a dimensions variying according to the results of the scree test (parameter increasedimension). If increasedimension = true, then the dimensions will be constraint to only increase between to consecutuves iterations of the algorithm. else the values of the dimensions will be the results of the scree test. a list containing: tik: conditional probabilities of cluster membership for each observation cls: the estimated partition, proportions: the mixing proportions, loglikelihood: the final value of the approximated likelihood, logliktotal: the values of the approximated likelihood along the EM-like algorithm (not necessarily increasing), aic, bic, icl: model selection criteria, dimensions: number of principal components, used in the functional data probability density approximation, selected for each clusters, dimtotal: values of the number of principal components along the EM-like algorithm, V: principal components variances per cluster Examples data(growth) data=cbind(matrix(growth$hgtm,31,39),matrix(growth$hgtf,31,54)); t=growth$age; splines <- create.bspline.basis(rangeval=c(1, max(t)), nbasis = 20,norder=4); fd <- Data2fd(data, argvals=t, basisobj=splines); # with varying dimensions (according to the results of the scree test) res=funclust(fd,k=2)

6 harmscut summary(res) # with fixed dimensions res=funclust(fd,k=2,fixeddimension=c(2,2)) # Multivariate (deactivated by default to comply with CRAN policies) # --------- CanadianWeather (data from the package fda) -------- # CWtime<- 1:365 # CWrange<-c(1,365) # CWbasis <- create.fourier.basis(cwrange, nbasis=65) # harmaccellfd <- vec2lfd(c(0,(2*pi/365)^2,0), rangeval=cwrange) # -- Build the curves --- # temp=canadianweather$dailyav[,,"temperature.c"] # CWfd1 <- smooth.basispar( # CWtime, CanadianWeather$dailyAv[,,"Temperature.C"],CWbasis, # Lfdobj=harmaccelLfd, lambda=1e-2)$fd # precip=canadianweather$dailyav[,,"precipitation.mm"] # CWfd2 <- smooth.basispar( # CWtime, CanadianWeather$dailyAv[,,"Precipitation.mm"],CWbasis, # Lfdobj=harmaccelLfd, lambda=1e-2)$fd # CWfd=list(CWfd1,CWfd2) # res=funclust(cwfd,k=2) harmscut Separates the matrices of the coefficients of harmonics This function is used in mfpca to cut the harmonic coefficient matrix. In fact mfpca call a c++ mfpca which return (among others) the matrix of the coefficients of the harmonics in the basis expantion. But in the multivariate case the coefficient matrix for all dimension are store in the same matrix, so we use this function to store each dimension in the specific matrix. harmscut(harms, nbasis) harms nbasis the coefficients matrix, see the description for more details vector containing the number of basis for each dimension Value a list of length dimension of the data which contain the harmonic coefficients matrix for each dimension

Input-class 7 Input-class Constructor of Input class Details This class contains the input paramaters need to run the algorithm. coefs In the univariate case this will be the transpose of the coefficients matrix. In the multivarite case this matrix will be the concatenation of the coefs matrix for each dimension of the multivariate functinal object. basisprod In the univariate case this is the matrix of the inner product between the basis functions.in the multivariate case the m_basisprod member will be a block diagonal matrix and each block will be the matrix of the inner product between the basis functions of each dimension of the data. K the number of clusters. thd the threshold in the scree test to select the dimensions of the curves. increasedimension A logicla paramater, if true the dimensions will be constraint to increase after each iteration. A false mean that the dimensions will take their values according to the scree test. hard A logical parameter, if true we initialize randomly the model with "hard" weights A logical parameter, if true we initialize randomly the model with hard weights (weights of each curves taking 0 or 1 according to the class membership of the curves). if false we initialize randomly with "soft" weights (weights of each curves taking a probabilities according to the class membership of the curves). fixeddimension A vector of size "K" which contains the dimensions in the case of running algorithm with fixed dimensions. epsilon The stoping criterion, we stop run the algorithm if the difference between two successive loglikelihood is less than epsilon. nbinit The number of initialization to be achieve, befor running the long algorithm. nbiterinit The maximum number of iterations in each initialization. nbiteration The maximum number of iteration in the long algorithm. mfpca Multivariate functional pca This function will run a weighted functional pca in the two cases of uni, and multivariate cases. If the observations (the curves) are given with weights, set up the parameter tik.

8 mfpcaplot mfpca(fd, nharm, tik = numeric(0)) fd nharm tik in the univariate case fd is an object from a class fd. Otherwise in the multivariate case fd is a list of fd object (fd=list(fd1,fd2,..)). number of harmonics or principal component to be retain. the weights of the functional pca which corresponds to the weights of the curves. If don t given, then we will run a classic functional pca (without weighting the curves). Value When univarite functional data, the function are returning an object of calss "pca.fd", When multivariate a list of "pca.fd" object by dimension. The "pca.fd" class contains the folowing parameter: harmonics: functional data object storing the eigen function values: the eigenvalues varprop: the normalized eigenvalues (eigenvalues divide by their sum) scores: the scores matrix meanfd: the mean of the functional data object Examples data(growth) data=cbind(matrix(growth$hgtm,31,39),matrix(growth$hgtf,31,54)); t=growth$age; splines <- create.bspline.basis(rangeval=c(1, max(t)), nbasis = 20,norder=4); fd <- Data2fd(data, argvals=t, basisobj=splines); pca=mfpca(fd,nharm=2) summary(pca) mfpcaplot Plot multivariate functional pca This function plots the functional pca. mfpcaplot(pca, grid = c()) pca grid is the result of mfpca. In the univariate case mfpcaplot use the package fda and will be similar to it s function "plot.pca.fd". In multivariate functional pca, we will make a graphic window for each dimension. specify how to divide the graphics window. grid=c(n,m) divided the widow in to n lines and m columns. If user don t specify grid then he must enter <Enter> to pass to the next graphic.

Output-class 9 Examples # Multivariate # --------- CanadianWeather (data from the package fda) -------- CWtime<- 1:365 CWrange<-c(1,365) CWbasis <- create.fourier.basis(cwrange, nbasis=65) harmaccellfd <- vec2lfd(c(0,(2*pi/365)^2,0), rangeval=cwrange) # -- Build the curves --- temp=canadianweather$dailyav[,,"temperature.c"] CWfd1 <- smooth.basispar( CWtime, CanadianWeather$dailyAv[,,"Temperature.C"],CWbasis, Lfdobj=harmaccelLfd, lambda=1e-2)$fd precip=canadianweather$dailyav[,,"precipitation.mm"] CWfd2 <- smooth.basispar( CWtime, CanadianWeather$dailyAv[,,"Precipitation.mm"],CWbasis, Lfdobj=harmaccelLfd, lambda=1e-2)$fd CWfd=list(CWfd1,CWfd2) pca=mfpca(cwfd,nharm=4) mfpcaplot(pca,grid=c(2,2)) Output-class Constructor of Output class Details This class contains the parameters in the output after running classification. tik a matrix of size (number of Curves) x (K), each column contains the weights of the curves in the corresponding class. cls a vector of size number of curves, containing the index of the class for each curve. proportion a matrix of size 1xnbClust (number of clusters), containing the estimated mixture proportions. loglikelihood the estimated log-likelihood. aic the value of AIC criterion. bic the value of BIC criterion. icl the value of ICL criterion. dimensions a vector of size nbclust of the dimensions of the specifique dimensions of the functional data in each class. dimtotal a matrix of size nbclust x nbruniteration, where nbruniteration is the number of iterations before the algorithm converge. Each column of the dimtotal matrix contain the dimensions on the coresponding iteration.

10 plotfd V principal components variances per cluster empty logical parameter, and empty=true if we have an empty class plotfd plot a functional data object This function plots a functional data object (after smoothing or interpolation). If you want to color the curves according to a cluster s membership, please specify the parmetere col. Note: this function works only for univariate functional data. plotfd(fd, col = c(1:nrow(fd$coefs)), xlab = "time", ylab = "value", main = "Functional data curves") fd col xlab ylab main a functional data object the color vector. label of the horizontal axis label of the vertical axis the title of the graphic Examples data(growth) data=cbind(matrix(growth$hgtm,31,39),matrix(growth$hgtf,31,54)); t=growth$age; splines <- create.bspline.basis(rangeval=c(1, max(t)), nbasis = 20,norder=4); fd <- Data2fd(data, argvals=t, basisobj=splines); cls=c(rep(1,39),rep(2,54)) #there is 39 boys and 54 girls in this data plotfd(fd,col=cls)

plotoc 11 plotoc plot Original Curves This function plots the observed curves, before any action of smoothing or interpolation. plotoc(time = 1:nrow(curves), curves, xlab = "time", ylab = "value", main = "Original curves") time curves xlab ylab main a vector containing the observation time for the curves. If absent the time param will be set at the vector 1:nrow(curves) the observations matrix. Each column of this matrix corresponds to one observed curve, and contains the value of the curve at discrete time points. label of the horizontal axis label of the vertical axis the title of the graphic Examples data(growth) curves=matrix(data=cbind(growth$hgtm,growth$hgtf),ncol=93) time=growth$age plotoc(time,curves)

Index cppmultidata, 3 cppunidata, 4 funclust, 4 Funclustering (Funclustering-package), 2 Funclustering-package, 2 harmscut, 6 Input-class, 7 mfpca, 7 mfpcaplot, 8 Output-class, 9 plotfd, 10 plotoc, 11 12