Package fusionclust. September 19, 2017

Similar documents
Package SEMrushR. November 3, 2018

Package cattonum. R topics documented: May 2, Type Package Version Title Encode Categorical Features

Package bisect. April 16, 2018

Package ECctmc. May 1, 2018

Package fastdummies. January 8, 2018

Package ggimage. R topics documented: November 1, Title Use Image in 'ggplot2' Version 0.0.7

Package sigqc. June 13, 2018

Package WordR. September 7, 2017

Package validara. October 19, 2017

Package ggimage. R topics documented: December 5, Title Use Image in 'ggplot2' Version 0.1.0

Package canvasxpress

Package pwrab. R topics documented: June 6, Type Package Title Power Analysis for AB Testing Version 0.1.0

Package ADMM. May 29, 2018

Package datasets.load

Package wrswor. R topics documented: February 2, Type Package

Package mixsqp. November 14, 2018

Package crossrun. October 8, 2018

Package facerec. May 14, 2018

Package balance. October 12, 2018

Package sfdct. August 29, 2017

Package vinereg. August 10, 2018

Package TrafficBDE. March 1, 2018

Package gtrendsr. October 19, 2017

Package calpassapi. August 25, 2018

Package semver. January 6, 2017

Package mgc. April 13, 2018

Package geojsonsf. R topics documented: January 11, Type Package Title GeoJSON to Simple Feature Converter Version 1.3.

Package coga. May 8, 2018

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package qualmap. R topics documented: September 12, Type Package

Package TVsMiss. April 5, 2018

Package rmi. R topics documented: August 2, Title Mutual Information Estimators Version Author Isaac Michaud [cre, aut]

Package quadprogxt. February 4, 2018

Package splithalf. March 17, 2018

Package fitbitscraper

Package preprosim. July 26, 2016

Package shinyfeedback

Package deductive. June 2, 2017

Package BASiNET. October 2, 2018

Package jdx. R topics documented: January 9, Type Package Title 'Java' Data Exchange for 'R' and 'rjava'

Package knitrprogressbar

Package optimus. March 24, 2017

Package spelling. December 18, 2017

Package rsppfp. November 20, 2018

Package SteinerNet. August 19, 2018

Package Numero. November 24, 2018

Package ramcmc. May 29, 2018

Package kdtools. April 26, 2018

Package repec. August 31, 2018

Package SIMLR. December 1, 2017

Package apastyle. March 29, 2017

Package ecoseries. R topics documented: September 27, 2017

Package milr. June 8, 2017

Package regexselect. R topics documented: September 22, Version Date Title Regular Expressions in 'shiny' Select Lists

Package stapler. November 27, 2017

Package liftr. R topics documented: May 14, Type Package

Package catenary. May 4, 2018

Package projector. February 27, 2018

Package rprojroot. January 3, Title Finding Files in Project Subdirectories Version 1.3-2

Package editdata. October 7, 2017

Package tropicalsparse

Package DCEM. September 30, 2018

Package geneslope. October 26, 2016

Package IATScore. January 10, 2018

Package pairsd3. R topics documented: August 29, Title D3 Scatterplot Matrices Version 0.1.0

Package strat. November 23, 2016

Package ASICS. January 23, 2018

Package kirby21.base

Package edfreader. R topics documented: May 21, 2017

Package messaging. May 27, 2018

Package profvis. R topics documented:

Package redlistr. May 11, 2018

Package effectr. January 17, 2018

Package fitur. March 11, 2018

Package widyr. August 14, 2017

Package ClustGeo. R topics documented: July 14, Type Package

Package PDN. November 4, 2017

Package bigreadr. R topics documented: August 13, Version Date Title Read Large Text Files

Package ScottKnottESD

Package fingertipsr. May 25, Type Package Version Title Fingertips Data for Public Health

Package censusr. R topics documented: June 14, Type Package Title Collect Data from the Census API Version 0.0.

Package sgmcmc. September 26, Type Package

Package gtrendsr. August 4, 2018

Package slickr. March 6, 2018

Package kerasformula

Package robotstxt. November 12, 2017

Package modmarg. R topics documented:

Package mmpa. March 22, 2017

Package samplesizecmh

Package clustvarsel. April 9, 2018

Package weco. May 4, 2018

Package snakecase. R topics documented: March 25, Version Date Title Convert Strings into any Case

Package zoomgrid. January 3, 2019

Package epitab. July 4, 2018

Package hyphenatr. August 29, 2016

Package EvolutionaryGames

Package clustering.sc.dp

Package raker. October 10, 2017

Package vip. June 15, 2018

Package omu. August 2, 2018

Transcription:

Package fusionclust September 19, 2017 Title Clustering and Feature Screening using L1 Fusion Penalty Version 1.0.0 Provides the Big Merge Tracker and COSCI algorithms for convex clustering and feature screening using L1 fusion penalty. See Radchenko, P. and Mukherjee, G. (2017) <doi:10.1111/rssb.12226> and T.Banerjee et al. (2017) <doi:10.1016/j.jmva.2017.08.001> for more details. Depends R (>= 3.4.1) License GPL (>= 2) Encoding UTF-8 LazyData true Imports bbmle, stats, graphics RoxygenNote 6.0.1 URL https://github.com/trambakbanerjee/fusionclust Suggests knitr, rmarkdown VignetteBuilder knitr NeedsCompilation no Author Trambak Banerjee [aut, cre], Gourab Mukherjee [aut], Peter Radchenko [aut] Maintainer Trambak Banerjee <trambakb@usc.edu> Repository CRAN Date/Publication 2017-09-19 08:21:58 UTC R topics documented: bmt............................................. 2 cosci_is........................................... 3 cosci_is_select....................................... 4 nclust............................................ 5 Index 7 1

2 bmt bmt Big Merge Tracker Solves an L1 relaxed univariate clustering criterion and returns a sequence of λ values where the clusters merge bmt(x, alpha = 0.1, small.perturbation = 10^(-6)) x observation vector alpha merging threshold. Default is 0.1 small.perturbation a small positive number to remove ties. Default is 10^(-6) solves a convex relaxation of the univariate clustering criterion given by equation (2) in the referenced paper and generates a sequence of cluster merges and corresponding λ values. See algorithm 1 in the referenced paper for more details. 1. path - number of clusters on the big merge path 2. lambda.path - sequence of lambda where clusters merge 3. index - cluster index at the point where clusters merge 4. merge - merge points 5. split - split points 6. prob - merging proportion 7. boundaries - cluster boundaries 1. P. Radchenko, G. Mukherjee, Convex clustering via l1 fusion penalization, J. Roy. Statist, Soc. Ser. B (Statistical Methodology) (2017) doi:10.1111/rssb.12226. nclust

cosci_is 3 x<- c(rnorm(1000,-2,1), rnorm(1000,2,1)) out<- bmt(x) cosci_is Rank the p features in an n by p design matrix Ranks the p features in an n by p design matrix where n represents the sample size and p is the number of features. cosci_is(dat, min.alpha, small.perturbation = 10^(-6)) dat n by p data matrix min.alpha the smallest threshold (typically set to 0) small.perturbation a small positive number to remove ties. Default value is 10^(-6) Uses the univariate merging algorithm bmt and produces a score for each feature that reflects its relative importance for clustering. a p vector of scores 1. Banerjee, T., Mukherjee, G. and Radchenko P., Feature Screening in Large Scale Cluster Analysis, Journal of Multivariate Analysis, Volume 161, 2017, Pages 191-212 2. P. Radchenko, G. Mukherjee, Convex clustering via l1 fusion penalization, J. Roy. Statist, Soc. Ser. B (Statistical Methodology) (2017) doi:10.1111/rssb.12226. bmt,cosci_is_select

4 cosci_is_select noise<-matrix(rnorm(49000),nrow=1000,ncol=49) signal<-c(rnorm(500,-1.5,1),rnorm(500,1.5,1)) x<-cbind(signal,noise) scores<- cosci_is(x,0) cosci_is_select Use a data driven approach to select the features Once you have the feature scores from cosci_is, you can select the features 1. based on a pre-defined threshold, 2. using table A.10 in the paper[1] to determine an appropriate threshold or, 3. using a data driven approach described in the references to select the features and obtain an implicit threshold value. cosci_is_select implements option 3. cosci_is_select(score, gamma) score gamma a p vector of scores what proportion of the p features is noise? If your sample size n is smaller than 100, setting gamma = 0.85 is recommended. Otherwise set gamma = 0.9 Converts the problem of screening out features with lower scores into a problem in large scale multiple testing and uses the procedure described in reference [2] to determine the signal features. a vector of selected features

nclust 5 1. Banerjee, T., Mukherjee, G. and Radchenko P., Feature Screening in Large Scale Cluster Analysis, Journal of Multivariate Analysis, Volume 161, 2017, Pages 191-212 2. T. Cai, W. Sun, W., Optimal screening and discovery of sparse signals with applications to multistage high throughput studies, J. Roy.Statist. Soc. Ser. B (Statistical Methodology) 79, no. 1 (2017) 197-223 cosci_is noise<-matrix(rnorm(49000),nrow=1000,ncol=49) signal<-c(rnorm(500,-1.5,1),rnorm(500,1.5,1)) x<-cbind(signal,noise) scores<- cosci_is(x,0) features<-cosci_is_select(scores,0.9) nclust No.of clusters Estimates the number of clusters from the bmt run nclust(bmt_output, prob_threshold = 0.5) bmt_output output from the bmt run prob_threshold probability threshold. Default is 0.5. Do not change it unless you know what you are doing. See the referenced paper

6 nclust Estimates the number of clusters as the number of big merges + 1. The probability threshold is an adjustment that renders this estimation process more robust to sampling fluctuations. If the sum of the sample frequencies for the two merging clusters in the last big merge is less than 50 percent, we do not report any merges and thus are left with just 1 cluster. See the referenced paper for more details. The number of clusters bmt 1. P. Radchenko, G. Mukherjee, Convex clustering via l1 fusion penalization, J. Roy. Statist, Soc. Ser. B (Statistical Methodology) (2017) doi:10.1111/rssb.12226. x<- c(rnorm(1000,-2,1), rnorm(1000,2,1)) out<- bmt(x) k<- nclust(out)

Index bmt, 2, 3, 5, 6 cosci_is, 3, 4, 5 cosci_is_select, 3, 4 nclust, 2, 5 7