Package nlnet. April 8, 2018

Similar documents
Package robustreg. R topics documented: April 27, Version Date Title Robust Regression Functions

Package gee. June 29, 2015

Package EFS. R topics documented:

Package PropClust. September 15, 2018

Package missforest. February 15, 2013

Package comphclust. May 4, 2017

Package TVsMiss. April 5, 2018

Package DTRlearn. April 6, 2018

Lecture 25: Review I

Package ScottKnottESD

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Package sigqc. June 13, 2018

Package hiernet. March 18, 2018

Package orqa. R topics documented: February 20, Type Package

Package corclass. R topics documented: January 20, 2016

Package NetCluster. R topics documented: February 19, Type Package Version 0.2 Date Title Clustering for networks

Package apastyle. March 29, 2017

Package MOST. November 9, 2017

Package TANDEM. R topics documented: June 15, Type Package

Package glmmml. R topics documented: March 25, Encoding UTF-8 Version Date Title Generalized Linear Models with Clustering

Package svapls. February 20, 2015

Package RCA. R topics documented: February 29, 2016

Package Modeler. May 18, 2018

Package multdyn. R topics documented: August 24, Version 1.6 Date Title Multiregression Dynamic Models

Package ssa. July 24, 2016

Package PTE. October 10, 2017

Correlation Motif Vignette

Package optimus. March 24, 2017

Package intccr. September 12, 2017

Package mrm. December 27, 2016

Package comphclust. February 15, 2013

Package SSLASSO. August 28, 2018

Package cgh. R topics documented: February 19, 2015

Package GAMBoost. February 19, 2015

Package clustvarsel. April 9, 2018

Package lga. R topics documented: February 20, 2015

Package PRISMA. May 27, 2018

10701 Machine Learning. Clustering

Package flam. April 6, 2018

Package SoftClustering

Package GWRM. R topics documented: July 31, Type Package

Linear Methods for Regression and Shrinkage Methods

Package samplesizelogisticcasecontrol

Exploratory data analysis for microarrays

Package nonlinearicp

Package glinternet. June 15, 2018

Package cycle. March 30, 2019

Package PDN. November 4, 2017

Package gsscopu. R topics documented: July 2, Version Date

Package survivalmpl. December 11, 2017

Package rocc. February 20, 2015

Package IntNMF. R topics documented: July 19, 2018

Package G1DBN. R topics documented: February 19, Version Date

Package EBglmnet. January 30, 2016

CARTWARE Documentation

Package pomdp. January 3, 2019

Package cwm. R topics documented: February 19, 2015

Package DiffCorr. August 29, 2016

Package sensory. R topics documented: February 23, Type Package

Package treedater. R topics documented: May 4, Type Package

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Package subtype. February 20, 2015

Chapter 1. Using the Cluster Analysis. Background Information

Random Forest A. Fornaser

Package mtsdi. January 23, 2018

Package treethresh. R topics documented: June 30, Version Date

Package esabcv. May 29, 2015

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Supplementary text S6 Comparison studies on simulated data

Package CALIBERrfimpute

Package mgc. April 13, 2018

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Chapter 6: Linear Model Selection and Regularization

Package SCORPIUS. June 29, 2018

Package repfdr. September 28, 2017

Package nsprcomp. August 29, 2016

Package mcclust.ext. May 15, 2015

Package listdtr. November 29, 2016

Package ADMM. May 29, 2018

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

Package clusterrepro

Package texteffect. November 27, 2017

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Lecture 7: Linear Regression (continued)

Package Numero. November 24, 2018

Metabolomic Data Analysis with MetaboAnalyst

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Package ldbod. May 26, 2017

Package ipft. January 4, 2018

Package StructFDR. April 13, 2017

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

Package ridge. R topics documented: February 15, Title Ridge Regression with automatic selection of the penalty parameter. Version 2.

Package MTLR. March 9, 2019

Package balance. October 12, 2018

Package LVGP. November 14, 2018

Package bdots. March 12, 2018

Package catenary. May 4, 2018

Package kernelfactory

Package PCGSE. March 31, 2017

Transcription:

Type Package Package nlnet April 8, 2018 Title Nonlinear Network Reconstruction, Clustering, and Variable Selection Based on DCOL (Distance Based on Conditional Ordered List) Version 1.2 Date 2018-04-07 Author Tianwei Yu, Haodong Liu Maintainer Tianwei Yu<tianwei.yu@emory.edu> It includes four methods: DCOL-based K-profiles clustering, nonlinear network reconstruction, nonlinear hierarchical clustering, and variable selection for generalized additive model. License GPL (>= 2) Imports ROCR, TSP, igraph, fdrtool, coin, methods, graphics, stats, earth, randomforest, e1071 NeedsCompilation no Repository CRAN Date/Publication 2018-04-08 14:43:21 UTC R topics documented: data.gen........................................... 2 KPC............................................. 3 nlhc............................................. 4 nlnet............................................. 6 nvsd............................................. 8 stage.forward........................................ 9 Index 11 1

2 data.gen data.gen Simulated Data Generation Generating gene matrix as a example of input. data.gen(n.genes=100, n.samples=100, n.grps=10, aver.grp.size=10, n.fun.types=6, epsilon=0.1, n.depend=0) Details n.genes n.samples n.grps aver.grp.size n.fun.types epsilon the number of rows of the matrix. the number of columns of the matrix. the number of hidden clusters. averge number of genes in a cluster. number of function types to use. noise level. n.depend data generation dependence structure. can be 0, 1, 2. The data generation scheme is described in detail in IEEE ACM Trans. Comput. Biol. Bioinform. 10(4):1080-85. return the data including gene and clustering. data grps the gene matrix the predicted clustering Tianwei Yu<tyu8@emory.edu> ##generating a gene matrix with 100 genes, some in 5 clusters, and 100 samples per gene. output<-data.gen(n.genes=100, n.samples=10, n.grps=5) ##get the gene matrix from the source of data. matrix<-output$data ##get the hiden clusters from the source of data. grps<-output$grp

KPC 3 KPC implementation of K-Profiles Clustering implementation of K-Profiles Clustering KPC(dataset, ncluster, maxiter = 100, p.max = 0.2, p.min = 0.05) dataset ncluster maxiter p.max p.min the data matrix with genes in the row and samples in the column the number of clusters K the maximum number of iterations the starting p-value cutoff to exclude noise genes the final p-value cutoff to exclude noise genes Return a list about gene cluster and the list of value p cluster p.list gene cluster a list of value p Tianwei Yu <tianwei.yu@emory.edu> References http://www.hindawi.com/journals/bmri/aa/918954/ See Also data.gen ## generating the data matrix & hiden clusters as a sample input<-data.gen(n.genes=40, n.grps=4) ## now input includes data matrix and hiden clusters, so get the matrix as input. input<-input$data ## set ncluster value to 4

4 nlhc kpc<-kpc(input,ncluster=4) ##get the hiden cluster result from "KPC" cluster<-kpc$cluster ##get the list of p p<-kpc$p.list nlhc Non-Linear Hierarchical Clustering The non-linear hierarchical clustering based on DCOL nlhc(array, hamil.method = "nn", concorde.path = NA, use.normal.approx = FALSE, normalization = "standardize", combine.linear = TRUE, use.traditional.hclust = FALSE, method.traditional.hclust = "average") Details array hamil.method the data matrix with no missing values the method to find the hamiltonian path. concorde.path If using the Concorde TSP Solver, the local directory of the solver use.normal.approx whether to use the normal approximation for the null hypothesis. normalization the normalization method for the array. combine.linear whether linear association should be found by correlation to combine with nonlinear association found by DCOL. use.traditional.hclust whether traditional agglomerative clustering should be used. method.traditional.hclust the method to pass on to hclust() if traditional method is chosen. Hamil.method: It is passed onto the function tsp of library TSP. To use linkern method, the user needs to install concord as instructed in TSP. use.normal.approx: If TRUE, normal approximation is used for every feature, AND all covariances are assumed to be zero. If FALSE, generates permutation based null distribution - mean vector and a variance-covariance matrix. normalization: There are three choices - "standardize" means removing the mean of each row and make the standard deviation one; "normal_score" means normal score transformation; "none" means do nothing. In that case we still assume some normalization has been done by the user such that each row has approximately mean 0 and sd 1. combine.linear: The two pieces of information is combined at the start to initiate the distance matrix.

nlhc 5 Returns a hclust object same as the output of hclust(). Reference: help(hclust) merge height order labels call dist.method height.0 an n-1 by 2 matrix. Row i of merge describes the merging of clusters at step i of the clustering. If an element j in the row is negative, then observation -j was merged at this stage. If j is positive then the merge was with the cluster formed at the (earlier) stage j of the algorithm. a set of n-1 real values, the value of the criterion associated with the clusterig method for the particular agglomeration a vector giving the permutation of the original observations suitable for plotting, in the sense that a cluster plot using this ordering and matrix merge will not have crossings of the branches. labels for each of the objects being clustered the call which produced the result the distance that has been used to create d original calculation of merging height Tianwei Yu <tianwei.yu@emory.edu> References http://www.ncbi.nlm.nih.gov/pubmed/24334400 See Also data.gen ## generating the data matrix & hiden clusters as a sample input<-data.gen(n.genes=40, n.grps=4) ## now input includes data matrix and hiden clusters, so get the matrix as input. input<-input$data nlhc.data<-nlhc(input) plot(nlhc.data) ##get the merge from the input. merge<-nlhc.data$merge

6 nlnet nlnet Non-Linear Network reconstruction from expression matrix Non-Linear Network reconstruction method nlnet(input, min.fdr.cutoff=0.05,max.fdr.cutoff=0.2, conn.proportion=0.007, gene.fdr.plot=false, min.module.size=0, gene.community.method="multilevel", use.normal.approx=false, normalization="standardize", plot.method="communitygraph") input the data matrix with no missing values. min.fdr.cutoff the minimun allowable value of the local false discovery cutoff in establishing links between genes. max.fdr.cutoff the maximun allowable value of the local false discovery cutoff in establishing links between genes. conn.proportion the target proportion of connections between all pairs of genes, if allowed by the fdr cutoff limits. gene.fdr.plot whether plot a figure with estimated densities, distribution functions, and (local) false discovery rates. min.module.size the min number of genes together as a module. gene.community.method the method for community detection. use.normal.approx whether to use the normal approximation for the null hypothesis. normalization plot.method the normalization method for the array. the method for graph and community ploting. Details gene.community.method: It provides three kinds of community detection method: "mutilevel", "label.propagation" and "leading.eigenvector". use.normal.approx: If TRUE, normal approximation is used for every feature, AND all covariances are assumed to be zero. If FALSE, generates permutation based null distribution - mean vector and a variance-covariance matrix. normalization: There are three choices: "standardize" means removing the mean of each row and make the standard deviation one; "normal_score" means normal score transformation; "none"

nlnet 7 means do nothing. In that case we still assume some normalization has been done by the user such that each row has approximately mean 0 and sd 1. plot.method: It provides three kinds of ploting method: "none" means ploting no graph, "communitygraph" means ploting community with graph, "graph" means ploting graph, "membership" means ploting membership of the community it returns a graph and the community membership of the graph. algorithm graph community The algorithm name for community detection An igraph object including edges : Numeric vector defining the edges, the first edge points from the first element to the second, the second edge from the third to the fourth, etc. Numeric vector, one value for each vertex, the membership vector of the community structure. Haodong Liu <liuhaodong0828@gmail.com> References https://www.ncbi.nlm.nih.gov/pubmed/27380516 See Also data.gen ## generating the data matrix & hiden clusters as a sample input<-data.gen(n.genes=40, n.grps=4) ## now input includes data matrix and hiden clusters, so get the matrix as input. input<-input$data ##change the ploting method result<-nlnet(input,plot.method="graph") ## get the result and see it values graph<-result$graph ##a igraph object. comm<-result$community ##community of the graph ## use different community detection method #nlnet(input,gene.community.method="label.propagation") ## change the fdr pro to control connections of genes ## adjust the modularity size #nlnet(input,conn.proportion=0.005,min.module.size=10)

8 nvsd nvsd Nonlinear Variable Selection based on DCOL This is a nonlinear variable selection procedure for generalized additive models. It s based on DCOL, using forward stagewise selection. In addition, a cross-validation is conducted to tune the stopping alpha level and finalize the variable selection. nvsd(x, y, fold = 10, step.size = 0.01, stop.alpha = 0.05, stop.var.count = 20, max.model.var.count = 10, roughening.method = "DCOL", do.plot = F, pred.method = "MARS") X y fold step.size stop.alpha The predictor matrix. Each row is a gene (predictor), each column is a sample. Notice the dimensionality is different than most other packages, where each column is a predictor. This is to conform to other functions in this package that handles gene expression type of data. The numerical outcome vector. The fold of cross-validation. The step size of the roughening process. The alpha level (significance of the current selected predictor) to stop the iterations. stop.var.count The maximum number of predictors to select in the forward stagewise selection. Once this number is reached, the iteration stops. max.model.var.count The maximum number of predictors to select. Notice this can be smaller than the stop.var.count. Stop.var.count can be set more liniently, and this parameter controls the final maximum model size. roughening.method The method for roughening. The choices are "DCOL" or "spline". do.plot pred.method Whether to plot the points change in each step. The prediction method for the cross validation variable selection. As forward stagewise procedure doesn t do prediction, a method has to be borrowed from existing packages. The choices include "MARS", "RF", and "SVM". Details Please refer to the reference for details.

stage.forward 9 A list object is returned. The components include the following. selected.pred all.pred The selected predictors (row number). The selected predictors by the forward stagewise selection. The $selected.pred is a subset of this. Tianwei Yu<tianwei.yu@emory.edu> References See Also https://arxiv.org/abs/1601.05285 stage.forward X<-matrix(rnorm(2000),ncol=20) y<-sin(x[,1])+x[,2]^2+x[,3] nvsd(t(x),y,stop.alpha=0.001,step.size=0.05) stage.forward Nonlinear Forward stagewise regression using DCOL The subroutine conducts forward stagewise regression using DCOL. Either DCOL roughening or spline roughening is conducted. stage.forward(x, y, step.size = 0.01, stop.alpha = 0.01, stop.var.count = 20, roughening.method = "DCOL", tol = 1e-08, spline.df = 5, dcol.sel.only = FALSE, do.plot = F) X y step.size The predictor matrix. Each row is a gene (predictor), each column is a sample. Notice the dimensionality is different than most other packages, where each column is a predictor. This is to conform to other functions in this package that handles gene expression type of data. The numerical outcome vector. The step size of the roughening process.

10 stage.forward stop.alpha The alpha level (significance of the current selected predictor) to stop the iterations. stop.var.count The maximum number of predictors to select. Once this number is reached, the iteration stops. roughening.method The method for roughening. The choices are "DCOL" or "spline". tol Details spline.df dcol.sel.only do.plot The tolerance level of sum of squared changes in the residuals. The degree of freedom for the spline. TRUE or FALSE. If FALSE, the selection of predictors will consider both linear and nonlinear association significance. Whether to plot the points change in each step. Please refer to the reference manuscript for details. A list object is returned. The components include the following. found.pred ssx.rec $sel.rec $p.rec The selected predictors (row number). The magnitude of variance explained using the current predictor at each step. The selected predictor at each step. The p-value of the association between the current residual and the selected predictor at each step. Tianwei Yu<tianwei.yu@emory.edu> References See Also https://arxiv.org/abs/1601.05285 nvsd X<-matrix(rnorm(2000),ncol=20) y<-sin(x[,1])+x[,2]^2+x[,3] stage.forward(t(x),y,stop.alpha=0.001,step.size=0.05)

Index Topic nonparametric nvsd, 8 stage.forward, 9 Topic variable selection nvsd, 8 stage.forward, 9 data.gen, 2, 3, 5, 7 KPC, 3 nlhc, 4 nlnet, 6 nvsd, 8 stage.forward, 9 11