Package clusternomics

Similar documents
Package milr. June 8, 2017

Package ECctmc. May 1, 2018

Package tidylpa. March 28, 2018

Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package bisect. April 16, 2018

Package sgmcmc. September 26, Type Package

Package mfa. R topics documented: July 11, 2018

Package robotstxt. November 12, 2017

Package spark. July 21, 2017

Package mgc. April 13, 2018

Package svalues. July 15, 2018

Package PUlasso. April 7, 2018

Package semisup. March 10, Version Title Semi-Supervised Mixture Model

Package fastdummies. January 8, 2018

Package jdx. R topics documented: January 9, Type Package Title 'Java' Data Exchange for 'R' and 'rjava'

Package BayesCR. September 11, 2017

Package widyr. August 14, 2017

Package ramcmc. May 29, 2018

Package TVsMiss. April 5, 2018

Package lumberjack. R topics documented: July 20, 2018

Package mixsqp. November 14, 2018

Package vinereg. August 10, 2018

Package validara. October 19, 2017

Package sigqc. June 13, 2018

Package canvasxpress

Package bayeslongitudinal

Package rgho. R topics documented: January 18, 2017

Package projector. February 27, 2018

Package docxtools. July 6, 2018

Package EBglmnet. January 30, 2016

Package splithalf. March 17, 2018

Package cwm. R topics documented: February 19, 2015

Package NFP. November 21, 2016

Package SCORPIUS. June 29, 2018

Package jpmesh. December 4, 2017

Package facerec. May 14, 2018

Package clustvarsel. April 9, 2018

Package WordR. September 7, 2017

Package fitbitscraper

Package diagis. January 25, 2018

Package mason. July 5, 2018

Package modmarg. R topics documented:

Package gppm. July 5, 2018

Package IncucyteDRC. August 29, 2016

Package radiomics. March 30, 2018

Package climber. R topics documented:

Package GFD. January 4, 2018

Package postal. July 27, 2018

Package eply. April 6, 2018

Package rwars. January 14, 2017

Package kdtools. April 26, 2018

Package balance. October 12, 2018

Package CRF. February 1, 2017

Package glassomix. May 30, 2013

Package sparsenetgls

Package mapsapi. March 12, 2018

Package rsppfp. November 20, 2018

Package quickreg. R topics documented:

Package nltt. October 12, 2016

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package messaging. May 27, 2018

Package rollply. R topics documented: August 29, Title Moving-Window Add-on for 'plyr' Version 0.5.0

Package ggimage. R topics documented: November 1, Title Use Image in 'ggplot2' Version 0.0.7

Package gibbs.met. February 19, 2015

Package Bergm. R topics documented: September 25, Type Package

Package TANDEM. R topics documented: June 15, Type Package

Package reconstructr

Package epitab. July 4, 2018

Package optimus. March 24, 2017

Package CausalImpact

Package nmslibr. April 14, 2018

Package statsdk. September 30, 2017

Package BDMMAcorrect

Package filematrix. R topics documented: February 27, Type Package

Package nonlinearicp

Package pdfsearch. July 10, 2018

Package samplingdatacrt

Package GauPro. September 11, 2017

Package BAMBI. R topics documented: August 28, 2017

Package preprocomb. June 26, 2016

Package bayescl. April 14, 2017

Package ggimage. R topics documented: December 5, Title Use Image in 'ggplot2' Version 0.1.0

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Package mfbvar. December 28, Type Package Title Mixed-Frequency Bayesian VAR Models Version Date

Package virustotal. May 1, 2017

Package superheat. February 4, 2017

Package metaheur. June 30, 2016

Package stapler. November 27, 2017

Package crossword.r. January 19, 2018

Package sensory. R topics documented: February 23, Type Package

Package goodpractice

Package strat. November 23, 2016

Package MTLR. March 9, 2019

Package ECOSolveR. February 18, 2018

Package knitrprogressbar

Package TrafficBDE. March 1, 2018

Package aop. December 5, 2016

Package nngeo. September 29, 2018

Package geneslope. October 26, 2016

Package benchmarkme. R topics documented: February 26, Type Package Title Crowd Sourced System Benchmarks Version 0.2.

Transcription:

Type Package Package clusternomics March 14, 2017 Title Integrative Clustering for Heterogeneous Biomedical Datasets Version 0.1.1 Author Evelina Gabasova Maintainer Evelina Gabasova <egabasova@gmail.com> Integrative context-dependent clustering for heterogeneous biomedical datasets. Identifies local clustering structures in related datasets, and a global clusters that exist across the datasets. License MIT + file LICENSE LazyData TRUE Imports magrittr, plyr, MASS Suggests knitr, rmarkdown, testthat, mclust, gplots, ggplot2 URL https://github.com/evelinag/clusternomics BugReports https://github.com/evelinag/clusternomics/issues RoxygenNote 5.0.1 VignetteBuilder knitr NeedsCompilation no Repository CRAN Date/Publication 2017-03-14 15:13:04 R topics documented: clustersizes......................................... 2 coclusteringmatrix..................................... 3 contextcluster........................................ 4 empiricalbayesprior.................................... 5 generateprior........................................ 7 generatetestdata_1d.................................... 8 generatetestdata_2d.................................... 9 numberofclusters...................................... 10 Index 12 1

2 clustersizes clustersizes Estimate sizes of clusters from global cluster assignments. Estimate sizes of clusters from global cluster assignments. clustersizes(assignments) assignments Matrix of cluster assignments, where each row corresponds to cluster assignments sampled in one MCMC iteration Sizes of individual clusters in each MCMC iteration. # Generate simple test dataset testdata <- generatetestdata_2d(groupcounts, means) # Fit the model # 1. specify number of clusters clustercounts <- list(global=10, context=c(3,3)) # 2. Run inference # Number of iterations is just for demonstration purposes, use # a larger number of iterations in practice! results <- contextcluster(datasets, clustercounts, maxiter = 10, burnin = 5, lag = 1, datadistributions = 'diagnormal', verbose = TRUE) # Extract only the sampled global assignments samples <- results$samples clusters <- plyr::laply(1:length(samples), function(i) samples[[i]]$global) clustersizes(clusters)

coclusteringmatrix 3 coclusteringmatrix Compute the posterior co-clustering matrix from global cluster assignments. Compute the posterior co-clustering matrix from global cluster assignments. coclusteringmatrix(assignments) assignments Matrix of cluster assignments, where each row corresponds to cluster assignments sampled in one MCMC iteration Posterior co-clustering matrix, where element [i,j] represents the posterior probability that data points i and j belong to the same cluster. # Generate simple test dataset testdata <- generatetestdata_2d(groupcounts, means) # Fit the model # 1. specify number of clusters clustercounts <- list(global=10, context=c(3,3)) # 2. Run inference # Number of iterations is just for demonstration purposes, use # a larger number of iterations in practice! results <- contextcluster(datasets, clustercounts, maxiter = 10, burnin = 5, lag = 1, datadistributions = 'diagnormal', verbose = TRUE) # Extract only the sampled global assignments samples <- results$samples clusters <- plyr::laply(1:length(samples), function(i) samples[[i]]$global) coclusteringmatrix(clusters)

4 contextcluster contextcluster Clusternomics: Context-dependent clustering This function fits the context-dependent clustering model to the data using Gibbs sampling. It allows the user to specify a different number of clusters on the global level, as well as on the local level. contextcluster(datasets, clustercounts, datadistributions = "diagnormal", prior = NULL, maxiter = 1000, burnin = NULL, lag = 3, verbose = FALSE) datasets List of data matrices where each matrix represents a context-specific dataset. Each data matrix has the size N times M, where N is the number of data points and M is the dimensionality of the data. The full list of matrices has length C. The number of data points N must be the same for all data matrices. clustercounts Number of cluster on the global level and in each context. List with the following structure: clustercounts = list(global=global, context=context) where global is the number of global clusters, and context is the list of numbers of clusters in the individual contexts (datasets) of length C where context[c] is the number of clusters in dataset c. datadistributions Distribution of data in each dataset. Can be either a list of length C where datadistributions[c] is the distribution of dataset c, or a single string when all datasets have the same distribution. Currently implemented distribution is the 'diagnormal' option for multivariate Normal distribution with diagonal covariance matrix. prior maxiter burnin lag verbose Prior distribution. If NULL then the prior is estimated using the datasets. The 'diagnormal' distribution uses the Normal-Gamma distribution as a prior for each dimension. Number of iterations of the Gibbs sampling algorithm. Number of burn-in iterations that will be discarded. If not specified, the algorithm discards the first half of the maxiter samples. Used for thinning the samples. Print progress, by default FALSE. Returns list containing the sequence of MCMC states and the log likelihoods of the individual states.

empiricalbayesprior 5 samples logliks DIC List of assignments sampled from the posterior, each state samples[[i]] is a data frame with local and global assignments for each data point. Log likelihoods during MCMC iterations. Deviance information criterion to help select the number of clusters. Lower values of DIC correspond to better-fitting models. # Example with simulated data (see vignette for details) # Number of elements in each cluster # Centers of clusters testdata <- generatetestdata_2d(groupcounts, means) # Fit the model # 1. specify number of clusters clustercounts <- list(global=10, context=c(3,3)) # 2. Run inference # Number of iterations is just for demonstration purposes, use # a larger number of iterations in practice! results <- contextcluster(datasets, clustercounts, maxiter = 10, burnin = 5, lag = 1, datadistributions = 'diagnormal', verbose = TRUE) # Extract results from the samples # Final state: state <- results$samples[[length(results$samples)]] # 1) assignment to global clusters globalassgn <- state$global # 2) context-specific assignmnets- assignment in specific dataset (context) contextassgn <- state[,"context 1"] # Assess the fit of the model with DIC results$dic empiricalbayesprior Fit an empirical Bayes prior to the data Fit an empirical Bayes prior to the data empiricalbayesprior(datasets, distributions = "diagnormal", globalconcentration = 0.1, localconcentration = 0.1, type = "fitrate")

6 empiricalbayesprior datasets List of data matrices where each matrix represents a context-specific dataset. Each data matrix has the size N times M, where N is the number of data points and M is the dimensionality of the data. The full list of matrices has length C. The number of data points N must be the same for all data matrices. distributions Distribution of data in each dataset. Can be either a list of length C where datadistributions[c] is the distribution of dataset c, or a single string when all datasets have the same distribution. Currently implemented distribution is the 'diagnormal' option for multivariate Normal distribution with diagonal covariance matrix. globalconcentration Prior concentration parameter for the global clusters. Small values of this parameter give larger prior probability to smaller number of clusters. localconcentration Prior concentration parameter for the local context-specific clusters. Small values of this parameter give larger prior probability to smaller number of clusters. type Type of prior that is fitted to the data. The algorithm can fit either rate of the prior covariance matrix, or fit the full covariance matrix to the data. Returns the prior object that can be used as an input for the contextcluster function. # Example with simulated data (see vignette for details) ncontexts <- 2 # Number of elements in each cluster # Centers of clusters testdata <- generatetestdata_2d(groupcounts, means) # Generate the prior fulldatadistributions <- rep('diagnormal', ncontexts) prior <- empiricalbayesprior(datasets, fulldatadistributions, 0.01, 0.1, 'fitrate') # Fit the model # 1. specify number of clusters clustercounts <- list(global=10, context=c(3,3)) # 2. Run inference # Number of iterations is just for demonstration purposes, use # a larger number of iterations in practice! results <- contextcluster(datasets, clustercounts, maxiter = 10, burnin = 5, lag = 1, datadistributions = 'diagnormal', prior = prior, verbose = TRUE)

generateprior 7 generateprior Generate a basic prior distribution for the datasets. Creates a basic prior distribution for the clustering model, assuming a unit prior covariance matrix for clusters in each dataset. generateprior(datasets, distributions = "diagnormal", globalconcentration = 0.1, localconcentration = 0.1) datasets List of data matrices where each matrix represents a context-specific dataset. Each data matrix has the size N times M, where N is the number of data points and M is the dimensionality of the data. The full list of matrices has length C. The number of data points N must be the same for all data matrices. distributions Distribution of data in each dataset. Can be either a list of length C where datadistributions[c] is the distribution of dataset c, or a single string when all datasets have the same distribution. Currently implemented distribution is the 'diagnormal' option for multivariate Normal distribution with diagonal covariance matrix. globalconcentration Prior concentration parameter for the global clusters. Small values of this parameter give larger prior probability to smaller number of clusters. localconcentration Prior concentration parameter for the local context-specific clusters. Small values of this parameter give larger prior probability to smaller number of clusters. Returns the prior object that can be used as an input for the contextcluster function. # Example with simulated data (see vignette for details) ncontexts <- 2 # Number of elements in each cluster # Centers of clusters testdata <- generatetestdata_2d(groupcounts, means) # Generate the prior fulldatadistributions <- rep('diagnormal', ncontexts)

8 generatetestdata_1d prior <- generateprior(datasets, fulldatadistributions, 0.01, 0.1) # Fit the model # 1. specify number of clusters clustercounts <- list(global=10, context=c(3,3)) # 2. Run inference # Number of iterations is just for demonstration purposes, use # a larger number of iterations in practice! results <- contextcluster(datasets, clustercounts, maxiter = 10, burnin = 5, lag = 1, datadistributions = 'diagnormal', prior = prior, verbose = TRUE) generatetestdata_1d Generate simulated 1D dataset for testing Generate simple 1D dataset with two contexts, where the data are generated from Gaussian distributions. The generated output contains two datasets, where each dataset contains 4 global clusters, originating from two local clusters in each context. generatetestdata_1d(groupcounts, means) groupcounts means Number of data samples in each global cluster. It is assumed to be a vector of four elements: c(c11, c21, c12, c22) where cij is the number of samples coming from cluster i in context 1 and cluster j in context 2. Means of the simulated clusters. It is assumed to be a vector of two elements: c(m1, m2) where m1 is the mean of the first cluster in both contexts, and m2 is the mean of the second cluster in both contexts. Returns the simulated datasets together with true assignmets. data groups List of datasets for each context. This can be used as an input for the contextcluster function. True cluster assignments that were used to generate the data.

generatetestdata_2d 9 testdata <- generatetestdata_1d(groupcounts, means) # Use the dataset as an input for the contextcluster function for testing generatetestdata_2d Generate simulated 2D dataset for testing Generate simple 2D dataset with two contexts, where the data are generated from Gaussian distributions. The generated output contains two datasets, where each dataset contains 4 global clusters, originating from two local clusters in each context. generatetestdata_2d(groupcounts, means, variances = NULL) groupcounts means variances Number of data samples in each global cluster. It is assumed to be a vector of four elements: c(c11, c21, c12, c22) where cij is the number of samples coming from cluster i in context 1 and cluster j in context 2. Means of the simulated clusters. It is assumed to be a vector of two elements: c(m1, m2) where m1 is the mean of the first cluster in both contexts, and m2 is the mean of the second cluster in both contexts. Because the data are twodimensional, the mean is assumed to be the same in both dimensions. Optionally, it is possible to specify different variance for each of the clusters. The variance is assumed to be a vector of two elements: c(v1, v2) where v1 is the variance of the first cluster in both contexts, and v2 is the variance of the second cluster in both contexts. Because the data are two-dimensional, the variance is diagonal and the same in both dimensions. Returns the simulated datasets together with true assignmets. data groups List of datasets for each context. This can be used as an input for the contextcluster function. True cluster assignments that were used to generate the data.

10 numberofclusters testdata <- generatetestdata_1d(groupcounts, means) # Use the dataset as an input for the contextcluster function for testing numberofclusters Estimate number of clusters from global cluster assignments. Estimate number of clusters from global cluster assignments. numberofclusters(assignments) assignments Matrix of cluster assignments, where each row corresponds to cluster assignments sampled in one MCMC iteration Number of unique clusters in each MCMC iteration. # Generate simple test dataset testdata <- generatetestdata_2d(groupcounts, means) # Fit the model # 1. specify number of clusters clustercounts <- list(global=10, context=c(3,3)) # 2. Run inference # Number of iterations is just for demonstration purposes, use # a larger number of iterations in practice! results <- contextcluster(datasets, clustercounts, maxiter = 10, burnin = 5, lag = 1, datadistributions = 'diagnormal', verbose = TRUE) # Extract only the sampled global assignments samples <- results$samples

numberofclusters 11 clusters <- plyr::laply(1:length(samples), function(i) samples[[i]]$global) numberofclusters(clusters)

Index clustersizes, 2 coclusteringmatrix, 3 contextcluster, 4 empiricalbayesprior, 5 generateprior, 7 generatetestdata_1d, 8 generatetestdata_2d, 9 numberofclusters, 10 12