Package SIMLR. December 1, 2017

Similar documents
Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package scmap. March 29, 2019

Package SC3. September 29, 2018

Package omicplotr. March 15, 2019

Package mfa. R topics documented: July 11, 2018

Package validara. October 19, 2017

Package SC3. November 27, 2017

Package RNAseqNet. May 16, 2017

Package bisect. April 16, 2018

Package ClusterSignificance

Package ECctmc. May 1, 2018

Package Linnorm. October 12, 2016

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Package fastdummies. January 8, 2018

Package fusionclust. September 19, 2017

Package strat. November 23, 2016

Package nmslibr. April 14, 2018

Package geogrid. August 19, 2018

Package nngeo. September 29, 2018

Package fastrtext. December 10, 2017

Package EventPointer

Package fingertipsr. May 25, Type Package Version Title Fingertips Data for Public Health

Package bnbc. R topics documented: April 1, Version 1.4.0

Package SCORPIUS. June 29, 2018

Package kdtools. April 26, 2018

Package SNPediaR. April 9, 2019

Package samplesizecmh

Package TMixClust. July 13, 2018

Package ggimage. R topics documented: December 5, Title Use Image in 'ggplot2' Version 0.1.0

Package mgc. April 13, 2018

Package tidylpa. March 28, 2018

Package gpart. November 19, 2018

Package elmnnrcpp. July 21, 2018

Package wrswor. R topics documented: February 2, Type Package

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package ggimage. R topics documented: November 1, Title Use Image in 'ggplot2' Version 0.0.7

Package Rtsne. April 14, 2017

Package projector. February 27, 2018

Package semisup. March 10, Version Title Semi-Supervised Mixture Model

Package robotstxt. November 12, 2017

Package readxl. April 18, 2017

Package triebeard. August 29, 2016

Package PUlasso. April 7, 2018

Package fst. December 18, 2017

Package semver. January 6, 2017

Package enrichplot. September 29, 2018

Package STROMA4. October 22, 2018

Package knitrprogressbar

Package phantasus. March 8, 2019

Package methyvim. April 5, 2019

Package NFP. November 21, 2016

Package rqt. November 21, 2017

Package SEMrushR. November 3, 2018

Package igc. February 10, 2018

Package M3Drop. August 23, 2018

Package datasets.load

Package CoGAPS. January 24, 2019

Package canvasxpress

Package sparsenetgls

Package bigreadr. R topics documented: August 13, Version Date Title Read Large Text Files

Package ssizerna. January 9, 2017

Package githubinstall

Package iccbeta. January 28, 2019

Package vinereg. August 10, 2018

Package rucrdtw. October 13, 2017

Package diffusr. May 17, 2018

Package Numero. November 24, 2018

Package BASiNET. October 2, 2018

Package bife. May 7, 2017

Package BiocNeighbors

Package rmi. R topics documented: August 2, Title Mutual Information Estimators Version Author Isaac Michaud [cre, aut]

Package multihiccompare

Package splithalf. March 17, 2018

Package omu. August 2, 2018

Package scruff. November 6, 2018

Package sigqc. June 13, 2018

Package coga. May 8, 2018

Package mdftracks. February 6, 2017

Package qualmap. R topics documented: September 12, Type Package

Package SingleCellExperiment

Package GSVA. R topics documented: January 22, Version

Package clustree. July 10, 2018

Package lexrankr. December 13, 2017

Package docxtools. July 6, 2018

Package clusternomics

Package balance. October 12, 2018

Package meme. December 6, 2017

Package TVsMiss. April 5, 2018

Package tximport. November 25, 2017

Package spelling. December 18, 2017

Package fst. June 7, 2018

Package raker. October 10, 2017

Package messaging. May 27, 2018

Package mixsqp. November 14, 2018

Package tximport. May 13, 2018

Package preprosim. July 26, 2016

Package cregulome. September 13, 2018

Package sigmanet. April 23, 2018

Package phonics. February 13, Type Package Title Phonetic Spelling Algorithms Version Date Encoding UTF-8

Package BiocManager. November 13, 2018

Transcription:

Version 1.4.0 Date 2017-10-21 Package SIMLR December 1, 2017 Title Title: SIMLR: Single-cell Interpretation via Multi-kernel LeaRning Maintainer Daniele Ramazzotti <daniele.ramazzotti@yahoo.com> Depends R (>= 3.4), Imports parallel, Matrix, stats, methods, Rcpp, pracma, RcppAnnoy, RSpectra Suggests BiocGenerics, BiocStyle, testthat, knitr, igraph Single-cell RNA-seq technologies enable high throughput gene expression measurement of individual cells, and allow the discovery of heterogeneity within cell populations. Measurement of cell-to-cell gene expression similarity is critical to identification, visualization and analysis of cell populations. However, single-cell data introduce challenges to conventional measures of gene expression similarity because of the high level of noise, outliers and dropouts. We develop a novel similarity-learning framework, SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), which learns an appropriate distance metric from the data for dimension reduction, clustering and visualization. SIMLR is capable of separating known subpopulations more accurately in single-cell data sets than do existing dimension reduction methods. Additionally, SIMLR demonstrates high sensitivity and accuracy on high-throughput peripheral blood mononuclear cells (PBMC) data sets generated by the GemCode single-cell technology from 10x Genomics. Encoding UTF-8 LazyData TRUE License file LICENSE URL https://github.com/batzogloulabsu/simlr BugReports https://github.com/batzogloulabsu/simlr biocviews Clustering, GeneExpression, Sequencing, SingleCell RoxygenNote 6.0.1 LinkingTo Rcpp NeedsCompilation yes VignetteBuilder knitr Author Bo Wang [aut], Daniele Ramazzotti [aut, cre], Luca De Sano [aut], 1

2 BuettnerFlorian Junjie Zhu [ctb], Emma Pierson [ctb], Serafim Batzoglou [ctb] R topics documented: BuettnerFlorian....................................... 2 SIMLR........................................... 3 SIMLR_Estimate_Number_of_Clusters.......................... 3 SIMLR_Feature_Ranking.................................. 4 SIMLR_Large_Scale.................................... 5 ZeiselAmit......................................... 6 Index 7 BuettnerFlorian test dataset for SIMLR example dataset to test SIMLR from the work by Buettner, Florian, et al. data(buettnerflorian) Format gene expression measurements of individual cells list of 6: in_ = input dataset as an (m x n) gene expression measurements of individual cells, n_clust = number of clusters (number of distinct true labels), true_labs = ground true of cluster assignments for each of the n_clust clusters, seed = seed used to compute the results for the example, results = result by SIMLR for the inputs defined as described, nmi = normalized mutual information as a measure of the inferred clusters compared to the true labels Source Buettner, Florian, et al. "Computational analysis of cell-to-cell heterogeneity in single-cell RNAsequencing data reveals hidden subpopulations of cells." Nature biotechnology 33.2 (2015): 155-160.

SIMLR 3 SIMLR SIMLR perform the SIMLR clustering algorithm SIMLR(, c, no.dim = NA, k = 10, if.impute = FALSE, normalize = FALSE, cores.ratio = 1) c no.dim k if.impute normalize cores.ratio an (m x n) data matrix of gene expression measurements of individual cells or and object of class SCESet number of clusters to be estimated over number of dimensions tuning parameter should I traspose the input data? should I normalize the input data? ratio of the number of cores to be used when computing the multi-kernel clusters the cells based on SIMLR and their similarities list of 8 elements describing the clusters obtained by SIMLR, of which y are the resulting clusters: y = results of k-means clusterings, S = similarities computed by SIMLR, F = results from network diffiusion, ydata = data referring the the results by k-means, alphak = clustering coefficients, execution.time = execution time of the present run, converge = iterative convergence values by T-SNE, LF = parameters of the clustering SIMLR( = BuettnerFlorian$in_, c = BuettnerFlorian$n_clust, cores.ratio = 0) SIMLR_Estimate_Number_of_Clusters SIMLR Estimate Number of Clusters estimate the number of clusters by means of two huristics as discussed in the SIMLR paper SIMLR_Estimate_Number_of_Clusters(, NUMC = 2:5, cores.ratio = 1)

4 SIMLR_Feature_Ranking NUMC cores.ratio an (m x n) data matrix of gene expression measurements of individual cells vector of number of clusters to be considered ratio of the number of cores to be used when computing the multi-kernel a list of 2 elements: K1 and K2 with an estimation of the best clusters (the lower values the better) as discussed in the original paper of SIMLR SIMLR_Estimate_Number_of_Clusters(BuettnerFlorian$in_, NUMC = 2:5, cores.ratio = 0) SIMLR_Feature_Ranking SIMLR Feature Ranking perform the SIMLR feature ranking algorithm. This takes as input the original input data and the corresponding similarity matrix computed by SIMLR SIMLR_Feature_Ranking(A, ) A an (n x n) similarity matrix by SIMLR an (m x n) data matrix of gene expression measurements of individual cells a list of 2 elements: pvalues and ranking ordering over the n covariates as estimated by the method SIMLR_Feature_Ranking(A = BuettnerFlorian$results$S, = BuettnerFlorian$in_)

SIMLR_Large_Scale 5 SIMLR_Large_Scale SIMLR Large Scale perform the SIMLR clustering algorithm for large scale datasets SIMLR_Large_Scale(, c, k = 10, kk = 100, if.impute = FALSE, normalize = FALSE) c k kk if.impute normalize an (m x n) data matrix of gene expression measurements of individual cells or and object of class SCESet number of clusters to be estimated over tuning parameter number of principal components to be assessed in the PCA should I traspose the input data? should I normalize the input data? clusters the cells based on SIMLR Large Scale and their similarities list of 8 elements describing the clusters obtained by SIMLR, of which y are the resulting clusters: y = results of k-means clusterings, S0 = similarities computed by SIMLR, F = results from the large scale iterative procedure, ydata = data referring the the results by k-means, alphak = clustering coefficients, val = distances from the k-nearest neighbour search, ind = indeces from the k-nearest neighbour search, execution.time = execution time of the present run resized = ZeiselAmit$in_[, 1:340] ## Not run: SIMLR_Large_Scale( = resized, c = ZeiselAmit$n_clust, k = 5, kk = 5) ## End(Not run)

6 ZeiselAmit ZeiselAmit test dataset for SIMLR large scale Format Source example dataset to test SIMLR large scale, reduced version from the work by Zeisel, Amit, et al. data(zeiselamit) gene expression measurements of individual cells list of 6: in_ = input dataset as an (m x n) gene expression measurements of individual cells, n_clust = number of clusters (number of distinct true labels), true_labs = ground true of cluster assignments for each of the n_clust clusters, seed = seed used to compute the results for the example, results = result by SIMLR for the inputs defined as described, nmi = normalized mutual information as a measure of the inferred clusters compared to the true labels Zeisel, Amit, et al. "Cell types in the mouse cortex and hippocampus revealed by single-cell RNAseq." Science 347.6226 (2015): 1138-1142.

Index BuettnerFlorian, 2 SIMLR, 3 SIMLR_Estimate_Number_of_Clusters, 3 SIMLR_Feature_Ranking, 4 SIMLR_Large_Scale, 5 ZeiselAmit, 6 7