Package PCADSC. April 19, 2017

Similar documents
Package datasets.load

Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package ECctmc. May 1, 2018

Package spark. July 21, 2017

Package fastdummies. January 8, 2018

Package pwrab. R topics documented: June 6, Type Package Title Power Analysis for AB Testing Version 0.1.0

Package canvasxpress

Package bisect. April 16, 2018

Package bigreadr. R topics documented: August 13, Version Date Title Read Large Text Files

Package balance. October 12, 2018

Package omu. August 2, 2018

Package sure. September 19, 2017

Package frequencyconnectedness

Package ggimage. R topics documented: December 5, Title Use Image in 'ggplot2' Version 0.1.0

Package jdx. R topics documented: January 9, Type Package Title 'Java' Data Exchange for 'R' and 'rjava'

Package estprod. May 2, 2018

Package ggimage. R topics documented: November 1, Title Use Image in 'ggplot2' Version 0.0.7

Package editdata. October 7, 2017

Package validara. October 19, 2017

Package modmarg. R topics documented:

Package sigqc. June 13, 2018

Package raker. October 10, 2017

Package sfc. August 29, 2016

Package jstree. October 24, 2017

Package svalues. July 15, 2018

Package gtrendsr. October 19, 2017

Package strat. November 23, 2016

Package repec. August 31, 2018

Package kirby21.base

Package mgc. April 13, 2018

Package splithalf. March 17, 2018

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package gppm. July 5, 2018

Package messaging. May 27, 2018

Package textrank. December 18, 2017

Package gtrendsr. August 4, 2018

Package reval. May 26, 2015

Package projector. February 27, 2018

Package tidylpa. March 28, 2018

Package IATScore. January 10, 2018

Package ecoseries. R topics documented: September 27, 2017

Package vesselr. R topics documented: April 3, Type Package

Package tidyimpute. March 5, 2018

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Package dyncorr. R topics documented: December 10, Title Dynamic Correlation Package Version Date

Package zebu. R topics documented: October 24, 2017

Package vinereg. August 10, 2018

Package pwrrasch. R topics documented: September 28, Type Package

Package lcc. November 16, 2018

Package epitab. July 4, 2018

Package qualmap. R topics documented: September 12, Type Package

Package ClusterBootstrap

Package pcr. November 20, 2017

Package dalmatian. January 29, 2018

Package TVsMiss. April 5, 2018

Package nodbi. August 1, 2018

Package preprosim. July 26, 2016

Package kdtools. April 26, 2018

Package ArCo. November 5, 2017

Package fastqcr. April 11, 2017

Package autocogs. September 22, Title Automatic Cognostic Summaries Version 0.1.1

Package vip. June 15, 2018

Package censusr. R topics documented: June 14, Type Package Title Collect Data from the Census API Version 0.0.

Package fbroc. August 29, 2016

Package fitbitscraper

Package crossword.r. January 19, 2018

Package Cubist. December 2, 2017

Package mdftracks. February 6, 2017

Package docxtools. July 6, 2018

Package CorporaCoCo. R topics documented: November 23, 2017

Package slickr. March 6, 2018

Package rcv. August 11, 2017

Package barcoder. October 26, 2018

Package areaplot. October 18, 2017

Package readxl. April 18, 2017

Package pairsd3. R topics documented: August 29, Title D3 Scatterplot Matrices Version 0.1.0

Package kerasformula

Package skynet. December 12, 2018

Package clipr. June 23, 2018

Package crossrun. October 8, 2018

Package QCAtools. January 3, 2017

Package sankey. R topics documented: October 22, 2017

Package OLScurve. August 29, 2016

Package bikedata. April 27, 2018

Package blandr. July 29, 2017

Package infer. July 11, Type Package Title Tidy Statistical Inference Version 0.3.0

Package scmap. March 29, 2019

Package regexselect. R topics documented: September 22, Version Date Title Regular Expressions in 'shiny' Select Lists

Package fst. December 18, 2017

Package SEMrushR. November 3, 2018

Package rbraries. April 18, 2018

Package arulescba. April 24, 2018

Package quickreg. R topics documented:

Package available. November 17, 2017

Package arc. April 18, 2018

Package GetITRData. October 22, 2017

Package rereg. May 30, 2018

Package icesadvice. December 7, 2018

Package catenary. May 4, 2018

Package reghelper. April 8, 2017

Transcription:

Type Package Package PCADSC April 19, 2017 Title Tools for Principal Component Analysis-Based Data Structure Comparisons Version 0.8.0 A suite of non-parametric, visual tools for assessing differences in data structures for two datasets that contain different observations of the same variables. These tools are all based on Principal Component Analysis (PCA) and thus effectively address differences in the structures of the covariance matrices of the two datasets. The PCASDC tools consist of easy-to-use, intuitive plots that each focus on different aspects of the PCA decompositions. The cumulative eigenvalue (CE) plot describes differences in the variance components (eigenvalues) of the deconstructed covariance matrices. The angle plot presents the information loss when moving from the PCA decomposition of one dataset to the PCA decomposition of the other. The chroma plot describes the loading patterns of the two datasets, thereby presenting the relative weighting and importance of the variables from the original dataset. Depends R (>= 3.2.2) License GPL-2 Encoding UTF-8 LazyData true Imports reshape2, methods, pander, ggplot2, Matrix RoxygenNote 5.0.1 URL https://github.com/annepetersen1/pcadsc BugReports https://github.com/annepetersen1/pcadsc/issues NeedsCompilation no Author Anne H. Petersen [aut, cre], Bo Markussen [aut] Maintainer Anne H. Petersen <ahpe@sund.ku.dk> Repository CRAN Date/Publication 2017-04-19 10:07:43 UTC 1

2 angleplot R topics documented: angleplot.......................................... 2 CEPlot............................................ 3 chromaplot......................................... 4 doangle........................................... 6 doce............................................ 7 dochroma.......................................... 8 PCADSC.......................................... 9 Index 12 angleplot Angle plot Produce an angle plot from a full or partial PCADSC object, as obtained from a call to PCADSC. In either case, this PCADSC object must have a non-null anleinfo slot (see examples). The angle plot compares the eigenvalue- and loading patterns from PCA performed on two datasets that consist of different observations of the same variables. angleplot(x) x A PCADSC or angleinfo object, as produced by PCADSC or doangle, respectively. PCADSC, doangle #make a full PCADSC object, splitting the data by "group" irispcadsc <- PCADSC(iris, "group")

CEPlot 3 #make a partial PCADSC object from iris and fill out angleinfo in the next call irispcadsc2 <- PCADSC(iris, "group", doangle = FALSE) irispcadsc2 <- doangle(irispcadsc2) #make an angle plot angleplot(irispcadsc) angleplot(irispcadsc2) #Only do angle information for a faster run-time irispcadsc_fast <- PCADSC(iris, "group", doce = FALSE, dochroma = FALSE) angleplot(irispcadsc_fast) CEPlot Cumulative eigenvalue plot Produce a cumulative eigenvalue (CE) plot from a full or partial PCADSC object, as obtained from a call to PCADSC. In either case, this PCADSC object must have a non-null CEInfo slot (see examples). The CE plot compares the eigenvalues obtained from PCA performed separately and jointly on two datasets that consist of different observations of the same variables. CEPlot(x, ndraw = NULL) x ndraw x A PCADSC or angleinfo object, as produced by PCADSC or doangle, respectively. A positive integer. The number of simulated cumulative eigenvalue curves that should be added to the plot. Details In the x-coordinates, cumulative differences in eigenvalues are shown, while the y-coordinates are the cumulative sum of the joint eigenvalues. The plot is annotated with Kolmogorov-Smirnov and Cramer-von Mises tests evaluated by permutation tests, testing the null hypothesis of no difference in eigenvalues. The plot also features a number of cumulative simulated cumulative eigenvalue curves as dashed lines. Moreover, a shaded area presents pointwise 95 % confidence bands for the cumulative difference, also obtained using the permutation test. PCADSC, doce

4 chromaplot #make a PCADSC object, splitting the data by "group" irispcadsc <- PCADSC(iris, "group") #make a partial PCADSC object from iris and fill out CEInfo in the next call irispcadsc2 <- PCADSC(iris, "group", doce = FALSE) irispcadsc2 <- doce(irispcadsc2) #make a CE plot CEPlot(irisPCADSC) CEPlot(irisPCADSC2) #Only do CE information and use less resamplings for a faster runtime irispcadsc_fast <- PCADSC(iris, "group", doangle = FALSE, dochroma = FALSE, B = 1000) CEPlot(irisPCADSC_fast) chromaplot Chroma plot Produce a chroma plot from a full or partial PCADSC object, as obtained from a call to PCADSC. In either case, this PCADSC object must have a non-null chromainfo slot (see examples). The chroma plot compares the loading patterns from PCA conducted on two datasets consisting of different observations of the same variables. chromaplot(x, varlabels = NULL, cvco = 1, splitlabels = NULL, varannotation = "cum", usecomps = NULL) x Either a PCADSC object or a chromainfo object, as produced by PCADSC and dochroma.

chromaplot 5 varlabels cvco splitlabels varannotation usecomps A vector of character string labels for the variables used in pcadscobj. If non- NULL, these labels appear in the plot instead of the variable names. A numeric in the interval [0, 1] where the default, 1, corresponds to no cut-off value. If a value smaller than 1, only the first n components are plotted, where n is the the lowest possible number, such that the cumulative variance contribution of the first n components exceeds cvco for both datasets. Note that setting covco will overrule the argument usecomps. Labels for the two categories of the splitting variable used to create the PCADSC object, x, given as a named list (see examples). These labels will appear in the headers of the two PCADSC plots. If NULL (the default), the original levels of the splitting variable are used. If "cum" (the default), cummulated explained variance percentages are annotated to the right of the bars for each component. If "raw", the non-cummulated percentages of explained variance are added instead. If NULL, no annotation is added. Note that "cum" is only allowed if usecomps is non-null. A vector of integers with the indexes of the principal component that should be included in the plot. Details The plot consists of one display for each of the two datasets. The two displays both consist of a number of vertical bars. Each vertical bar represents a principal component and the width of each colored section (chroma) within the bar corresponds to the normalized PCA loading vector of that component. The bars can be annotated with the (cumulative) variance contributions of the components (see varannotation). PCADSC, dochroma #make a PCADSC object, splitting the data by "group" irispcadsc <- PCADSC(iris, "group") #make a partial PCADSC object from iris and fill out chromainfo in the next call irispcadsc2 <- PCADSC(iris, "group", dochroma = FALSE) irispcadsc2 <- dochroma(irispcadsc2)

6 doangle #make a chroma plot chromaplot(irispcadsc) chromaplot(irispcadsc) #Change the labels of the splitting variable chromaplot(irispcadsc, splitlabels = list("non-setosa" = "Not Setosa", "setosa" = "Setosa")) #Only plot components 1 and 4 and remove annotated variances chromaplot(irispcadsc, usecomps = c(1,4), varannotation = "no") #Only plot the first components responsible for explaining 80 percent variance chromaplot(irispcadsc, cvco = 0.8) #Change variable labels chromaplot(irispcadsc, varlabels = c("sepal length", "Sepal width", "Petal length", "Petal width")) #Only do chroma information in order to get a faster runtime: irispcadsc_fast <- PCADSC(iris, "group", doce = FALSE, doangle = FALSE) chromaplot(irispcadsc_fast) doangle Compute angle information Computes the information that is needed in order to make an angleplot from a PCADSC or pcares object. Typically, this function is called on a partial PCADSC object in order to add angleinfo (see examples). doangle(x) x Either a PCADSC or a pcares object. angleplot, PCADSC

doce 7 #make a partial PCADSC object, splitting the data by "group" irispcadsc <- PCADSC(iris, "group", doangle = FALSE) #No angleinfo available irispcadsc$angleinfo #Add and show angleinfo irispcadsc <- doangle(irispcadsc) irispcadsc$angleinfo #Make a partial PCADSC object and only add angle information for a #faster runtime irispcadsc_fast <- PCADSC(iris, "group", doangle = FALSE, dochroma = FALSE, doce = FALSE) irispcadsc_fast <- doangle(irispcadsc_fast) irispcadsc_fast$angleinfo doce Compute cumulative eigenvalue information Computes the information that is needed in order to make a CEPlot from a PCADSC or pcares object. Typically, this function is called on a partial PCADSC object in order to add CEInfo (see examples). doce(x,...) x Either a PCADSC or a pcares object.... If doce is called on a pcares object, the full dataset must also be supplied (as data), as well as the number of resampling steps (B).

8 dochroma CEPlot, PCADSC #make a partial PCADSC object, splitting the data by "group" irispcadsc <- PCADSC(iris, "group", doce = FALSE) #No CEInfo available irispcadsc$ceinfo #Add and show CEInfo irispcadsc <- doce(irispcadsc) irispcadsc$ceinfo #Make a partial PCADSC object and only add CE information with no #bootstrapping (and thus no test) irispcadsc_fast <- PCADSC(iris, "group", doangle = FALSE, dochroma = FALSE, doce = FALSE) irispcadsc_fast <- doce(irispcadsc_fast, B = 100) irispcadsc_fast$ceinfo dochroma Compute chroma information Computes the information that is needed in order to make a chromaplot from a PCADSC or pcares object. Typically, this function is called on a partial PCADSC object in order to add chromainfo (see examples). dochroma(x)

PCADSC 9 x Either a PCADSC or a pcares object. chromaplot, PCADSC #make a partial PCADSC object, splitting the data by "group" irispcadsc <- PCADSC(iris, "group", dochroma = FALSE) #No chromainfo available irispcadsc$chromainfo #Add and show chromainfo irispcadsc <- dochroma(irispcadsc) irispcadsc$chromainfo #Make a partial PCADSC object and only add chroma information for a #faster runtime irispcadsc_fast <- PCADSC(iris, "group", doangle = FALSE, dochroma = FALSE, doce = FALSE) irispcadsc_fast <- dochroma(irispcadsc_fast) irispcadsc_fast$chromainfo PCADSC Compute the elements used for PCADSC Principal Component Analysis-based Data Structure Comparison tools that prepare a dataset for various diagnostic plots for comparing data structures. More specifically, PCADSC performs PCA on two subsets of a dataset in order to compare the structures of these datasets, e.g. to assess whether they can be analyzed pooled or not. The results of the PCAs are then manipulated in various ways and stored for easy plotting using the three PCADSC plotting tools, the CEPlot, the angleplot and the chromaplot.

10 PCADSC PCADSC(data, splitby, vars = NULL, doce = TRUE, doangle = TRUE, dochroma = TRUE, B = 10000) data splitby vars doce doangle dochroma B A dataset, either a data.frame or a matrix with variables in columns and observations in rows. Note that tibbles and data.tables are accepted as input, but they are instantly converted to data.frames. Future releases might include specific implementation for these data representations. The name of a grouping variable with two levels defining the two groups within the dataset whose data structures we wish to compare. The variable names in data to include in the PCADSC. If NULL (the default), all variables except for splitby are used. Logical. Should the cumulative eigenvalue plot information be computed? Logical. Should the angle plot information be computed? Logical. Should the chroma plot information be computed? A positive integer. The number of resampling steps performed in the cumulative eigenvalue step, if relevant. Details Value PCADSC presents a suite of non-parametric, visual tools for comparing the strucutures of two subsets of a dataset. These tools are all based on PCA (principal component analysis) and thus they can be interpreted as comparisons of the covariance matrices of the two (sub)datasets. PCADSC performs PCA using singular value decomposition for increased numerical precision. Before performing PCA on the full dataset and the two subsets, all variables within each such dataset are standardized. An object of class PCADSC, which is a named list with the following entries: pcares The results of the PCAs performed on the first subset, the second subset and the full subset and also information about the data splitting. CEInfo The information needed for making a cumulative eigenvalue plot (see CEPlot). angleinfo The information needed for making an angle plot (see angleplot). chromainfo The information needed for making a chroma plot (see chromaplot). data The original (full) dataset. splitby The name of the variable that splits the dataset in two. vars The names of the variables in the dataset that should be used for PCA. B The number of resamplings performed for the CEInfo. doce, doangle, dochroma, CEPlot, angleplot, chromaplot

PCADSC 11 #Make a full PCADSC object, splitting the data by "group" irispcadsc <- PCADSC(iris, "group") #The three plotting functions can now be called on irispcadsc: CEPlot(irisPCADSC) angleplot(irispcadsc) chromaplot(irispcadsc) #Make a partial PCADSC object with no angle plot information and add #angle plot information afterwards: irispcadsc2 <- PCADSC(iris, "group", doangle = FALSE) irispcadsc2 <- doangle(irispcadsc) #Make a partial PCADSC obejct with no plotting (angle/ce/chroma) #information: irispcadsc_minimal <- PCADSC(iris, "group", doangle = FALSE, doce = FALSE, dochroma = FALSE)

Index angleplot, 2, 6, 9, 10 CEPlot, 3, 7 10 chromaplot, 4, 8 10 data.frame, 10 data.table, 10 doangle, 2, 3, 6, 10 doce, 3, 7, 10 dochroma, 4, 5, 8, 10 PCADSC, 2 6, 8, 9, 9 tibble, 10 12