Package lvm4net. R topics documented: August 29, Title Latent Variable Models for Networks

Similar documents
Latent Variable Models for the Analysis, Visualization and Prediction of Network and Nodal Attribute Data. Isabella Gollini.

Package multigraph. R topics documented: January 24, 2017

Package munfold. R topics documented: February 8, Type Package. Title Metric Unfolding. Version Date Author Martin Elff

Package OLScurve. August 29, 2016

Package multigraph. R topics documented: November 1, 2017

Package gppm. July 5, 2018

Package manet. September 19, 2017

Package sizemat. October 19, 2018

Package clustering.sc.dp

Package mixphm. July 23, 2015

Package Bergm. R topics documented: September 25, Type Package

Package sure. September 19, 2017

Package fbroc. August 29, 2016

Package mlegp. April 15, 2018

Package RNAseqNet. May 16, 2017

Package hergm. R topics documented: January 10, Version Date

Package sgmcmc. September 26, Type Package

Package pwrrasch. R topics documented: September 28, Type Package

Package cwm. R topics documented: February 19, 2015

Package MeanShift. R topics documented: August 29, 2016

Package ClustGeo. R topics documented: July 14, Type Package

Package EMC. February 19, 2015

Package bdots. March 12, 2018

Package LVGP. November 14, 2018

Package pairsd3. R topics documented: August 29, Title D3 Scatterplot Matrices Version 0.1.0

Package manet. August 23, 2018

Package pathdiagram. R topics documented: August 29, 2016

Package icesadvice. December 7, 2018

Package Grid2Polygons

Package MIICD. May 27, 2017

Package intccr. September 12, 2017

Package dalmatian. January 29, 2018

Package simr. April 30, 2018

Package libstabler. June 1, 2017

Package robustreg. R topics documented: April 27, Version Date Title Robust Regression Functions

Package datasets.load

Package tclust. May 24, 2018

Package PCADSC. April 19, 2017

Package SSLASSO. August 28, 2018

Package RCA. R topics documented: February 29, 2016

Package tsbugs. February 20, 2015

Package esabcv. May 29, 2015

Package sankey. R topics documented: October 22, 2017

Package texteffect. November 27, 2017

Package kofnga. November 24, 2015

Package sspse. August 26, 2018

Package ConvergenceClubs

Package optimus. March 24, 2017

Package ECctmc. May 1, 2018

Package beast. March 16, 2018

Package clusternomics

Package snowboot. R topics documented: October 19, Type Package Title Bootstrap Methods for Network Inference Version 1.0.

Package diffusr. May 17, 2018

Package preprosim. July 26, 2016

Package SmoothHazard

Package rollply. R topics documented: August 29, Title Moving-Window Add-on for 'plyr' Version 0.5.0

Package XMRF. June 25, 2015

Package gains. September 12, 2017

Package FWDselect. December 19, 2015

Package ebdbnet. February 15, 2013

Package sglr. February 20, 2015

Package balance. October 12, 2018

Package longclust. March 18, 2018

Package tpr. R topics documented: February 20, Type Package. Title Temporal Process Regression. Version

Package SparseFactorAnalysis

Package ClusterSignificance

Package bdpopt. March 30, 2016

Package crossrun. October 8, 2018

Package mrm. December 27, 2016

Package stapler. November 27, 2017

Package nplr. August 1, 2015

Package strat. November 23, 2016

Package smoothr. April 4, 2018

Package packcircles. April 28, 2018

Package CuCubes. December 9, 2016

Package acebayes. R topics documented: November 21, Type Package

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package Rambo. February 19, 2015

Package r.jive. R topics documented: April 12, Type Package

Package mfbvar. December 28, Type Package Title Mixed-Frequency Bayesian VAR Models Version Date

Package basictrendline

Package MatrixCorrelation

Package bisect. April 16, 2018

Package wskm. July 8, 2015

Package MixSim. April 29, 2017

Package corclass. R topics documented: January 20, 2016

Package Binarize. February 6, 2017

Package mpm. February 20, 2015

Package sspline. R topics documented: February 20, 2015

Package nonmem2r. April 5, 2018

Package Daim. February 15, 2013

Package RobustGaSP. R topics documented: September 11, Type Package

Package geogrid. August 19, 2018

DDP ANOVA Model. May 9, ddpanova-package... 1 ddpanova... 2 ddpsurvival... 5 post.pred... 7 post.sim DDP ANOVA with univariate response

Package lhs. R topics documented: January 4, 2018

Package ParetoPosStable

The bcp Package. February 8, 2009

Package CorporaCoCo. R topics documented: November 23, 2017

Package PTE. October 10, 2017

Package areaplot. October 18, 2017

Transcription:

Title Latent Variable Models for Networks Package lvm4net August 29, 2016 Latent variable models for network data using fast inferential procedures. Version 0.2 Depends R (>= 3.1.1), MASS, ergm, network, ellipse Imports igraph License GPL (>= 2) LazyData true URL http://github.com/igollini/lvm4net BugReports http://github.com/igollini/lvm4net/issues Author Isabella Gollini [aut, cre] Maintainer Isabella Gollini <igollini.stats@gmail.com> NeedsCompilation no Repository CRAN Date/Publication 2015-04-11 01:00:42 R topics documented: lvm4net-package...................................... 2 boxroc............................................ 2 goflsm............................................ 4 lsjm............................................. 5 lsm.............................................. 6 plot.gofobj.......................................... 8 plot.lsjm........................................... 8 plot.lsm........................................... 10 plot............................................ 11 PPIgen............................................ 12 PPInet............................................ 13 PPIphy............................................ 13 print.gofobj......................................... 14 1

2 boxroc rotxto........................................... 15 simulatelsm........................................ 15 Index 17 lvm4net-package Latent Variable Models for Networks lvm4net provides a range of tools for latent variable models for network data. Most of the models are implemented using a fast variational inference approach. Latent space models for binary networks: the function lsm implements the latent space model (LSM) introduced by Hoff et al. (2002) using a variational inference and squared Euclidian distance; the function lsjm implements latent space joint model (LSJM) for multiplex networks introduced by Gollini and Murphy (2014). These models assume that each node of a network has a latent position in a latent space: the closer two nodes are in the latent space, the more likely they are connected. Functions for binary bipartite networks will be added soon. References Gollini, I., and Murphy, T. B. (2014), "Joint Modelling of Multiple Network Views", Journal of Computational and Graphical Statistics http://arxiv.org/abs/1301.3759. Hoff, P., Raftery, A., and Handcock, M. (2002), "Latent Space Approaches to Social Network Analysis", Journal of the American Statistical Association, 97, 1090 1098. boxroc Boxplot and ROC Curves Function to display boxplots and ROC curves to show model fit in terms of in-sample link prediction. boxroc(, EZ, xit, BOXPLOT = FALSE, ROC = FALSE, Lroc = 100, labelsplot = NULL, powdist = 2, cexrocleg = 0.8, colroc = seq(2, Ndata + 1), ltyroc = seq(2, Ndata + 1), lwdroc = 2,...)

boxroc 3 EZ Value xit BOXPLOT ROC (N x N) binary adjacency matrix, or list containing the adjacency matrices. (N x D) matrix (or list of matrices) containing the posterior means of the latent positions vector of posterior means of the parameter α logical; if TRUE draws the boxplot. Default BOXPLOT = FALSE logical; if TRUE draws the ROC curve. Default ROC = FALSE Lroc number of intervals in the ROC curve. Default Lroc = 100 labelsplot powdist main title for the boxplot. Default labelsplot = NULL vector of power of the distance default powdist = 2, squared Euclidean distance, the alternative is 1, for the Euclidean distance cexrocleg cex for the ROC curve. Default cexrocleg =.8 colroc col for the ROC curve. Default colroc = seq(2, Ndata + 1) ltyroc lty for the ROC curve. Default ltyroc = seq(2, Ndata + 1) lwdroc lwd for the ROC curve. Default lwdroc = 2... to be passed to methods, such as graphical parameters (see par). The area under the ROC curve (AUC) and the selected plots. The closer the AUC takes values to 1 the better the fit. References Gollini, I., and Murphy, T. B. (2014), "Joint Modelling of Multiple Network Views", Journal of Computational and Graphical Statistics http://arxiv.org/abs/1301.3759. See Also lsm, lsjm N <- 20 <- network(n, directed = FALSE)[,] modlsm <- lsm(, D = 2) bp <- boxroc(, EZ = modlsm$lsmez, xit = modlsm$xit, Lroc = 150, ROC = TRUE, BOXPLOT = TRUE) print(bp)

4 goflsm goflsm Goodness-of-Fit diagnostics for LSM model This function produces goodness-of-fit diagnostics for LSM model. goflsm(object,, sim = NULL, nsim = 100, seed, directed = NULL, stats = NULL, doplot = TRUE, parm = TRUE) object object of class 'lsm' (N x N) binary adjacency matrix sim list containing simulated (N x N) adjacency marices. Default sim = NULL nsim number of simulations. Default nsim = 100 seed for simulations directed if the network is directed or not. Default directed = NULL stats statistics used. Default stats = NULL doplot draw boxplot. Default doplot = TRUE parm do all the plots in one window. Default parm = TRUE See Also lsm, simulatelsm, plot.gofobj, print.gofobj <- network(15, directed = FALSE)[,] modlsm <- lsm(, D = 2) mygof <- goflsm(modlsm, = )

lsjm 5 lsjm Latent Space Joint Model Function to joint modelling of multiple network views using the Latent Space Jont Model (LSJM) Gollini and Murphy (2014). The LSJM merges the information given by the multiple network views by assuming that the probability of a node being connected with other nodes in each view is explained by a unique latent variable. lsjm(, D, sigma = 1, xi = rep(0, length()), psi2 = rep(2, length()), Niter = 500, tol = 0.1^2, preit = 20, randomz = FALSE) D Value sigma list containing a (N x N) binary adjacency matrix for each network view. integer dimension of the latent space (D x D) variance/covariance matrix of the prior distribution for the latent positions. Default sigma = 1 xi vector of means of the prior distributions of α. Default xi = 0 psi2 vector of variances of the prior distributions of α. Default psi2 = 2 Niter maximum number of iterations. Default Niter = 500 tol desired tolerance. Default tol = 0.1^2 preit Preliminary number of iterations default preit = 20 randomz List containing: logical; If randomz = TRUE random initialization for the latent positions is used. If randomz = FALSE and D = 2 or 3 the latent positions are initialized using the Fruchterman-Reingold method and multidimensional scaling is used for D = 1 or D > 3. Default randomz = FALSE EZ (N x D) matrix containing the posterior means of the latent positions VZ (D x D) matrix containing the posterior variance of the latent positions lsmez list contatining a (N x D) matrix for each network view containing the posterior means of the latent positions under each model in the latent space. lsmvz list contatining a (D x D) matrix for each network view containing the posterior variance of the latent positions under each model in the latent space. xit vector of means of the posterior distributions of α psi2t vector of variances of the posterior distributions of α Ell expected log-likelihood

6 lsm References Gollini, I., and Murphy, T. B. (2014), "Joint Modelling of Multiple Network Views", Journal of Computational and Graphical Statistics http://arxiv.org/abs/1301.3759. ## Simulate Undirected Network N <- 20 Ndata <- 2 <- list() [[1]] <- network(n, directed = FALSE)[,] ### create a new view that is similar to the original for(nd in 2:Ndata){ [[nd]] <- [[nd - 1]] - sample(c(-1, 0, 1), N * N, replace = TRUE, prob = c(.05,.85,.1)) [[nd]] <- 1 * ([[nd]] > 0 ) diag([[nd]]) <- 0 } par(mfrow = c(1, 2)) z <- plot([[1]], verbose = TRUE, main = 'Network 1') plot([[2]], EZ = z, main = 'Network 2') par(mfrow = c(1, 1)) modlsjm <- lsjm(, D = 2) plot(modlsjm,, drawcb = TRUE) plot(modlsjm,, drawcb = TRUE, plotztilde = TRUE) lsm Latent Space Model Latent space models (LSM) are a well known family of latent variable models for network data introduced by Hoff et al. (2002) under the basic assumption that each node has an unknown position in a D-dimensional Euclidean latent space: generally the smaller the distance between two nodes in the latent space, the greater the probability of them being connected. Unfortunately, the posterior distribution of the LSM cannot be computed analytically. For this reason we propose a variational inferential approach which proves to be less computationally intensive than the MCMC procedure proposed in Hoff et al. (2002) (implemented in the latentnet package) and can therefore easily handle large networks. Salter-Townshend and Murphy (2013) applied variational methods to fit the LSM with the Euclidean distance in the VBLPCM package. In this package, a distance model with squared Euclidean distance is used. We follow the notation of Gollini and Murphy (2014). lsm(, D, sigma = 1, xi = 0, psi2 = 2, Niter = 100, Miniter = 10, tol = 0.1^2, randomz = FALSE, nstart = 1)

lsm 7 D Value sigma (N x N) binary adjacency matrix integer dimension of the latent space (D x D) variance/covariance matrix of the prior distribution for the latent positions. Default sigma = 1 xi mean of the prior distribution of α. Default xi = 0 psi2 variance of the prior distribution of α. Default psi2 = 2 Niter maximum number of iterations. Default Niter = 100 Miniter minimum number of iterations. Default Miniter = 10 tol desired tolerance. Default tol = 0.1^2 randomz nstart List containing: References logical; If randomz = TRUE random initialization for the latent positions is used. If randomz = FALSE and D = 2 or 3 the latent positions are initialized using the Fruchterman-Reingold method and multidimensional scaling is used for D = 1 or D > 3. Default randomz = FALSE number of starts lsmez (N x D) matrix containing the posterior means of the latent positions lsmvz (D x D) matrix containing the posterior variance of the latent positions xit mean of the posterior distribution of α psi2t variance of the posterior distribution of α Ell expected log-likelihood Gollini, I., and Murphy, T. B. (2014), "Joint Modelling of Multiple Network Views", Journal of Computational and Graphical Statistics http://arxiv.org/abs/1301.3759. Hoff, P., Raftery, A., and Handcock, M. (2002), "Latent Space Approaches to Social Network Analysis", Journal of the American Statistical Association, 97, 1090 1098. See Also plot.lsm ### Simulate Undirected Network N <- 20 <- network(n, directed = FALSE)[,] modlsm <- lsm(, D = 2) plot(modlsm, )

8 plot.lsjm plot.gofobj Plot GoF object Function to plot an object of class 'gofobj' ## S3 method for class 'gofobj' plot(x, parm = TRUE,...) x object of class "gofobj" parm do all in one plots... other <- network(20, directed = FALSE)[,] modlsm <- lsm(, D = 2) mygof <- goflsm(modlsm, =, doplot = FALSE) plot(mygof) plot.lsjm Two dimensional plot of Latent Space Joint Model output Function to plot an object of class 'lsjm' ## S3 method for class 'lsjm' plot(x,, drawcb = FALSE, dimz = c(1, 2), plotztilde = FALSE, colpl = 1, colell = rgb(0.6, 0.6, 0.6, alpha = 0.1), LEVEL = 0.95, pchplot = 20, pchell = 19, pchpl = 19, cexpl = 1.1, mainztilde = NULL, arrowhead = FALSE, curve = NULL, xlim = NULL, ylim = NULL, main = NULL,...)

plot.lsjm 9 x drawcb object of class 'lsjm' list containing a (N x N) binary adjacency matrix for each network view. logical if drawcb = TRUE draw confidence bounds dimz dimensions of the latent variable to be plotted. Default dimz = c(1, 2) plotztilde colpl colell if TRUE do the plot for the last step of LSM col for the points representing the nodes. Default colpl = NULL col for the ellipses. Default rgb(.6,.6,.6, alpha=.1) LEVEL levels of confidence bounds shown when plotting the ellipses. Default LEVEL =.95 pchplot Default pchplot = 20 pchell pch for the ellipses. Default pchell = 19 pchpl pch for the points representing the nodes. Default pchpl = 19 cexpl cex for the points representing the nodes. Default cexpl = 1.1 mainztilde arrowhead title for single network plots TRUE do the plot for the last step of LSM logical, if the arrowed are to be plotted. Default arrowhead = FALSE curve curvature of edges. Default curve = 0 xlim ylim main range for x range for y main title... to be passed to methods, such as graphical parameters (see par). ## Simulate Undirected Network N <- 20 Ndata <- 2 <- list() [[1]] <- network(n, directed = FALSE)[,] ### create a new view that is similar to the original for(nd in 2:Ndata){ [[nd]] <- [[nd - 1]] - sample(c(-1, 0, 1), N * N, replace = TRUE, prob = c(.05,.85,.1)) [[nd]] <- 1 * ([[nd]] > 0 ) diag([[nd]]) <- 0 } par(mfrow = c(1, 2)) z <- plot([[1]], verbose = TRUE, main = 'Network 1') plot([[2]], EZ = z, main = 'Network 2') par(mfrow = c(1, 1)) modlsjm <- lsjm(, D = 2) plot(modlsjm,, drawcb = TRUE) plot(modlsjm,, drawcb = TRUE, plotztilde = TRUE)

10 plot.lsm plot.lsm Two dimensional plot of the Latent Space Model output Function to plot an object of class 'lsm' ## S3 method for class 'lsm' plot(x,, drawcb = FALSE, dimz = c(1, 2), colpl = 1, colell = rgb(0.6, 0.6, 0.6, alpha = 0.1), LEVEL = 0.95, pchplot = 20, pchell = 19, pchpl = 19, cexpl = 1.1, arrowhead = FALSE, curve = NULL, xlim = NULL, ylim = NULL,...) x object of class 'lsm' (N x N) binary adjacency matrix drawcb draw confidence bounds dimz dimensions of the latent variable to be plotted. Default dimz = c(1, 2) colpl col for the points representing the nodes. Default colpl = NULL colell col for the ellipses. Default rgb(.6,.6,.6, alpha=.1) LEVEL levels of confidence bounds shown when plotting the ellipses. Default LEVEL =.95 pchplot Default pchplot = 20 pchell pch for the ellipses. Default pchell = 19 pchpl pch for the points representing the nodes. Default pchpl = 19 cexpl cex for the points representing the nodes. Default cexpl = 1.1 arrowhead logical, if the arrowed are to be plotted. Default arrowhead = FALSE curve curvature of edges. Default curve = 0 xlim range for x ylim range for y... to be passed to methods, such as graphical parameters (see par). N <- 20 <- network(n, directed = FALSE)[,] modlsm <- lsm(, D = 2) plot(modlsm, ) # Plot with 95% CB plot(modlsm,, drawcb = TRUE) # Plot with 99% CB plot(modlsm,, drawcb = TRUE, LEVEL =.99)

plot 11 plot Plot the adjacency matrix of the network Function to plot the adjacency matrix of the network. plot(, Ndata = NULL, EZ = NULL, VZ = NULL, dimz = c(1, 2), labels = NULL, colpl = 1, colell = rgb(0.6, 0.6, 0.6, alpha = 0.1), LEVEL = 0.95, pchplot = 20, pchell = 19, pchpl = 19, cexpl = 1.1, arrowhead = FALSE, curve = NULL, lwdline = 0.3, xlim = NULL, ylim = NULL, verbose = FALSE,...) Ndata EZ VZ list, or matrix containing a (N x N) binary adjacency matrix for each network view. number of network views posterior mean latent positions posterior variance latent positions, if specified draw ellipse dimz dimensions of Z to be plotted, default dimz = c(1, 2) labels colpl colell text to be added in the plot representing the labels of each node. Default labels = NULL, no labels are shown col for the points representing the nodes. Default colpl = NULL col for the ellipses. Default rgb(.6,.6,.6, alpha=.1) LEVEL levels of confidence bounds shown when plotting the ellipses. Default LEVEL =.95 pchplot Default pchplot = 20 pchell pch for the ellipses. Default pchell = 19 pchpl pch for the points representing the nodes. Default pchpl = 19 cexpl cex for the points representing the nodes. Default cexpl = 1.1 arrowhead logical, if the arrowed are to be plotted. Default arrowhead = FALSE curve curvature of edges. Default curve = 0 lwdline lwd of edges. Default lwdline =.3 xlim ylim verbose range for x range for y if verbose = TRUE save the nodal positions... to be passed to methods, such as graphical parameters (see par).

12 PPIgen N <- 20 <- network(n, directed = FALSE)[,] plot() # Store the positions of nodes used to plot, in order to redraw the plot using # the same positions z <- plot(, verbose = TRUE) plot(, EZ = z) PPIgen PPI genetic interactions The dataset contains a network formed by genetic protein-protein interactions (PPI) between 67 Saccharomyces cerevisiae proteins. The network is formed of 294 links. The data were downloaded from the Biological General Repository for Interaction Datasets (BioGRID) database http: //thebiogrid.org/ Format Binary adjacency matrix Details Binary adjacency matrix containing genetic interactions between 67 proteins. References Gollini, I., and Murphy, T. B. (2014), "Joint Modelling of Multiple Network Views", Journal of Computational and Graphical Statistics http://arxiv.org/abs/1301.3759. See Also PPIphy

PPInet 13 PPInet PPI genetic and physical interactions data Format Details Source The dataset contains two undirected networks formed by genetic and physical protein-protein interactions (PPI) between 67 Saccharomyces cerevisiae proteins. The genetic interactions network is formed of 294 links, and the physical interactions network is formed of 190 links. The data were downloaded from the Biological General Repository for Interaction Datasets (BioGRID) database http://thebiogrid.org/ Two binary adjacency matrices PPIgen Binary adjacency matrix containing genetic interactions between 67 proteins. PPIphy Binary adjacency matrix containing physical interactions between 67 proteins. http://thebiogrid.org/ References Gollini, I., and Murphy, T. B. (2014), "Joint Modelling of Multiple Network Views", Journal of Computational and Graphical Statistics http://arxiv.org/abs/1301.3759. See Also PPIgen, PPIphy PPIphy PPI physical interactions Format The dataset contains a network formed by physical protein-protein interactions (PPI) between 67 Saccharomyces cerevisiae proteins. The network is formed of 190 links. The data were downloaded from the Biological General Repository for Interaction Datasets (BioGRID) database http: //thebiogrid.org/ Binary adjacency matrix

14 print.gofobj Details Binary adjacency matrix containing physical interactions between 67 proteins. References Gollini, I., and Murphy, T. B. (2014), "Joint Modelling of Multiple Network Views", Journal of Computational and Graphical Statistics http://arxiv.org/abs/1301.3759. See Also PPIgen print.gofobj Print GoF object Function to print an object of class 'gofobj' ## S3 method for class 'gofobj' print(x,...) x object of class 'gofobj'... other <- network(20, directed = FALSE)[,] modlsm <- lsm(, D = 2) mygof <- goflsm(modlsm, =, doplot = FALSE) print(mygof)

rotxto 15 rotxto Rotate X to match Function to rotate X to match via singular value decomposition rotxto(x, ) X matrix to be rotated objective matrix Value rotated object Xrot, and the rotation matrix R simulatelsm Simulate from LSM model Function to simulate networks from the LSM model simulatelsm(object, = NULL, nsim = 100, seed, directed = NULL) object object of class 'lsm' (N x N) binary adjacency matrix nsim number of simulations. Default nsim = 100 seed for simulations directed if the network is directed or not. Default directed = NULL

16 simulatelsm n <- 20 <- network(n, directed = FALSE)[,] modlsm <- lsm(, D = 2) sim <- simulatelsm(modlsm, =, nsim = 8) # store EZ, to keep the nodes in the same positions # and compare the networks EZ <- modlsm$lsmez par(mfrow = c(3,3)) plot(, EZ = EZ, main = "Original Data") for(i in 1:8) plot(sim[[i]], EZ = EZ, main = paste("simulation", i)) par(mfrow = c(1,1))

Index Topic datasets PPIgen, 12 PPInet, 13 PPIphy, 13 boxroc, 2 goflsm, 4 lsjm, 2, 3, 5 lsm, 2 4, 6 lvm4net (lvm4net-package), 2 lvm4net-package, 2 par, 3, 9 11 plot.gofobj, 4, 8 plot.lsjm, 8 plot.lsm, 7, 10 plot, 11 PPIgen, 12, 13, 14 PPInet, 13 PPIphy, 12, 13, 13 print.gofobj, 4, 14 rotxto, 15 simulatelsm, 4, 15 17