Package GEM. R topics documented: January 31, Type Package

Similar documents
Package filematrix. R topics documented: February 27, Type Package

Package GWAF. March 12, 2015

Package geneslope. October 26, 2016

Package LEA. April 23, 2016

Package MultiMeta. February 19, 2015

Package SimGbyE. July 20, 2009

Package lodgwas. R topics documented: November 30, Type Package

Package dmrseq. September 14, 2018

Package coloc. February 24, 2018

Package ENmix. September 17, 2015

Package DMRcate. May 1, 2018

Package snpstatswriter

Package LEA. December 24, 2017

Package DRIMSeq. December 19, 2018

Package MEAL. August 16, 2018

Step-by-Step Guide to Basic Genetic Analysis

Package methylgsa. March 7, 2019

Package DMCHMM. January 19, 2019

Package methyvim. April 5, 2019

II.Matrix. Creates matrix, takes a vector argument and turns it into a matrix matrix(data, nrow, ncol, byrow = F)

Package detectruns. February 6, 2018

Package PedCNV. February 19, 2015

Package LGRF. September 13, 2015

Package MethylSeekR. January 26, 2019

MAGA: Meta-Analysis of Gene-level Associations

Package ccmap. June 17, 2018

FVGWAS- 3.0 Manual. 1. Schematic overview of FVGWAS

Package compartmap. November 10, Type Package

Package conumee. February 17, 2018

Package EMDomics. November 11, 2018

Package ClusterSignificance

SEQGWAS: Integrative Analysis of SEQuencing and GWAS Data

Package RTNduals. R topics documented: March 7, Type Package

Package SMAT. January 29, 2013

Package semisup. March 10, Version Title Semi-Supervised Mixture Model

Package bisect. April 16, 2018

Step-by-Step Guide to Relatedness and Association Mapping Contents

Package TANDEM. R topics documented: June 15, Type Package

Package OmicKriging. August 29, 2016

Package REGENT. R topics documented: August 19, 2015

Package SIMLR. December 1, 2017

Package EBglmnet. January 30, 2016

Package sigqc. June 13, 2018

Package matter. April 16, 2019

Package EMLRT. August 7, 2014

Package rqt. November 21, 2017

Step-by-Step Guide to Advanced Genetic Analysis

Package anota2seq. January 30, 2018

Package slam. February 15, 2013

Package seqcat. March 25, 2019

Package gggenes. R topics documented: November 7, Title Draw Gene Arrow Maps in 'ggplot2' Version 0.3.2

Package RTCGAToolbox

Package FunciSNP. November 16, 2018

Package RcisTarget. July 17, 2018

Package RLMM. March 7, 2019

Cover Page. The handle holds various files of this Leiden University dissertation.

Package slam. December 1, 2016

GemTools Documentation

The fgwas software. Version 1.0. Pennsylvannia State University

methylmnm Tutorial Yan Zhou, Bo Zhang, Nan Lin, BaoXue Zhang and Ting Wang January 14, 2013

Network Based Models For Analysis of SNPs Yalta Opt

Package MOJOV. R topics documented: February 19, 2015

Package MIRA. October 16, 2018

Package SingleCellExperiment

Release Notes. JMP Genomics. Version 4.0

Package gpart. November 19, 2018

Package RobustSNP. January 1, 2011

Package RSNPset. December 14, 2017

Package globalgsa. February 19, 2015

KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li

Package icnv. R topics documented: March 8, Title Integrated Copy Number Variation detection Version Author Zilu Zhou, Nancy Zhang

Package ridge. R topics documented: February 15, Title Ridge Regression with automatic selection of the penalty parameter. Version 2.

Package quickreg. R topics documented:

Package BGData. May 11, 2017

Package haplor. November 1, 2017

Importing and Merging Data Tutorial

Package dkdna. June 1, Description Compute diffusion kernels on DNA polymorphisms, including SNP and bi-allelic genotypes.

Package canvasxpress

Package BDMMAcorrect

LEA: An R Package for Landscape and Ecological Association Studies

Package cnvgsa. R topics documented: January 4, Type Package

Package TMixClust. July 13, 2018

Package cooccurnet. January 27, 2017

Overview. Background. Locating quantitative trait loci (QTL)

The R.huge Package. September 1, 2007

A short manual for LFMM (command-line version)

2 binary_coding. Index 21. Code genotypes as binary. binary_coding(genotype_warnings2na, genotype_table)

Development of linkage map using Mapmaker/Exp3.0

Package spark. July 21, 2017

Package svalues. July 15, 2018

Package glmnetutils. August 1, 2017

Package Hapi. July 28, 2018

Polymorphism and Variant Analysis Lab

Package BMTME. February 12, 2019

Package GARS. March 17, 2019

Package SNPediaR. April 9, 2019

Package RGalaxy. R topics documented: January 19, 2018

Package gpur. October 18, 2017

Package allehap. August 19, 2017

Package aop. December 5, 2016

Transcription:

Type Package Package GEM January 31, 2018 Title GEM: fast association study for the interplay of Gene, Environment and Methylation Version 1.5.0 Date 2015-12-05 Author Hong Pan, Joanna D Holbrook, Neerja Karnani, Chee-Keong Kwoh Maintainer Hong Pan <pan_hong@sics.a-star.edu.sg> Tools for analyzing EWAS, methqtl and GxE genome widely. License Artistic-2.0 biocviews MethylSeq, MethylationArray, GenomeWideAssociation, Regression, DNAMethylation, SNP, GeneExpression, GUI Depends R (>= 3.3) VignetteBuilder knitr Suggests knitr, RUnit, testthat, BiocGenerics Imports tcltk, ggplot2, methods, stats, grdevices, graphics, utils LazyData TRUE RoxygenNote 5.0.1 NeedsCompilation no R topics documented: GEM-package........................................ 2 GEM_Emodel........................................ 2 GEM_Gmodel........................................ 4 GEM_GUI......................................... 5 GEM_GWASmodel..................................... 6 GEM_GxEmodel...................................... 7 SlicedData-class....................................... 8 Index 12 1

2 GEM_Emodel GEM-package GEM: Fast association study for the interplay of Gene, Environment and Methylation The GEM package provides a highly efficient R tool suite for performing epigenome wide association studies (EWAS). GEM provides three major functions named GEM_Emodel, GEM_Gmodel and GEM_GxEmodel to study the interplay of Gene, Environment and Methylation (GEM). Within GEM, the existing "Matrix eqtl" package is utilized and extended to study methylation quantitative trait loci (methqtl) and the interaction of genotype and environment (GxE) to determine DNA methylation variation, using matrix based iterative correlation and memory-efficient data analysis. GEM can facilitate reliable genome-wide methqtl and GxE analysis on a standard laptop computer within minutes. Author(s) Hong Pan References https://github.com/fastgem/gem See Also GEM_GUI ## Launch GEM GUI #GEM_GUI() # remove the hash symbol for running ## Checking the vignettes for more details if(interactive()) browsevignettes(package = 'GEM') GEM_Emodel GEM_Emodel Analysis GEM_Emodel is to find the association between methylation and environmental factor genome widely. Usage GEM_Emodel(env_file_name, covariate_file_name, methylation_file_name, Emodel_pv, output_file_name, qqplot_file_name, saveplot = TRUE)

GEM_Emodel 3 Arguments env_file_name Text file with rows representing environment factor and columns representing samples, such as the example data file "env.txt". covariate_file_name Text file with rows representing covariate factors and columns representing samples, such as the example data file "cov.txt". methylation_file_name Text file with rows representing methylation profiles for CpGs, and columns representing samples, such as the example data file "methylation.txt". Emodel_pv The pvalue cut off. Associations with significances at Emodel_pv level or below are saved to output_file_name, with corresponding estimate of effect size (slope coefficient), test statistics and p-value. Default value is 1.0. output_file_name The result file with each row presenting a CpG and its association with environment, which contains CpGID, estimate of effect size (slope coefficient), test statistics, pvalue and FDR at each column. qqplot_file_name Image file name to present the QQ-plot for all p-value distribution. saveplot If save the plot. Details GEM_Emodel finds the association between methylation and environment genome-wide by performing matrix based iterative correlation and memory-efficient data analysis instead of millions of linear regressions (N = number_of_cpgs). The methylation data are the measurements for CpG probes, for example, 450,000 CpGs from Illumina Infinium HumanMethylation450 Array. The environmental factor can be a particular phenotype or environment factor from,for example, birth outcomes, maternal conditions or disease traits. The output of GEM_Emodel for particular environmental factor is a list of CpGs that are potential epigenetic biomarkers. GEM_Emodel runs linear regression like lm (M ~ E + covt), where M is a matrix with methylation data, E is a matrix with environment factor and covt is a matrix with covariates, and all read from the formatted text data file. Value save results automatically DATADIR = system.file('extdata',package='gem') RESULTDIR = getwd() env_file = paste(datadir, "env.txt", sep =.Platform$file.sep) covariate_file = paste(datadir, "cov.txt", sep =.Platform$file.sep) methylation_file = paste(datadir, "methylation.txt", sep =.Platform$file.sep) Emodel_pv = 1 output_file = paste(resultdir, "Result_Emodel.txt", sep =.Platform$file.sep) qqplot_file = paste(resultdir, "QQplot_Emodel.jpg", sep =.Platform$file.sep) GEM_Emodel(env_file, covariate_file, methylation_file, Emodel_pv, output_file, qqplot_file)

4 GEM_Gmodel GEM_Gmodel GEM_Gmodel Analysis GEM_Gmodel creates a methqtl genome-wide map. Usage GEM_Gmodel(snp_file_name, covariate_file_name, methylation_file_name, Gmodel_pv, output_file_name) Arguments snp_file_name Text file with rows representing genotype encoded as 1,2,3 or any three distinct values for major allele homozygote (AA), heterozygote (AB) and minor allele homozygote (BB) and columns representing samples, such as the example data file "snp.txt". covariate_file_name Text file with rows representing covariate factors and columns representing samples, such as the example data file "cov.txt". methylation_file_name Text file with rows representing methylation profiles for CpGs, and columns representing samples, such as the example data file "methylation.txt". Gmodel_pv The pvalue cut off. Associations with significances at Gmodel_pv level or below are saved to output_file_name, with corresponding estimate of effect size (slope coefficient), test statistics and p-value. Default value is 5.0E-08. output_file_name The result file with each row presenting a CpG and its association with SNP, which contains CpGID, SNPID, estimate of effect size (slope coefficient), test statistics, pvalue and FDR at each column. Details GEM_Gmodel creates a methqtl genome-wide map by performing matrix based iterative correlation and memory-efficient data analysis instead of millions of linear regressions (N = number_of_cpgs x number_of_snps) between methylation and genotyping. Polymorphisms close to CpGs in the same chromosome (cis-) or different chromosome (trans-) often form methylation quantitative trait loci (methqtls) with CpGs. In GEM_Gmodel, MethQTLs can be discovered by correlating single nucleotide polymorphism (SNP) data with CpG methylation from the same samples, by linear regression lm (M ~ G + covt), where M is a matrix with methylation data, G is a matrix with genotype data and covt is a matrix with covariates, and all read from the formatted text data file. The methylation data are the measurements for CpG probes, for example, 450,000 CpGs from Illumina Infinium HumanMethylation450 Array. The genotype data are encoded as 1,2,3 or any three distinct values for major allele homozygote (AA), heterozygote (AB) and minor allele homozygote (BB). The linear regression is adjusted by covariates read from covariate data file. The output of GEM_Gmodel is a list of CpG-SNP pairs, where the SNP is the best fit to explain the particular CpG. The significant association between CpG-SNP pair suggests the methylation driven by genotyping variants, which is so called methylation quantitative trait loci (methqtl).

GEM_GUI 5 Value save results automatically DATADIR = system.file('extdata',package='gem') RESULTDIR = getwd() snp_file = paste(datadir, "snp.txt", sep =.Platform$file.sep) covariate_file = paste(datadir, "cov.txt", sep =.Platform$file.sep) methylation_file = paste(datadir, "methylation.txt", sep =.Platform$file.sep) Gmodel_pv = 1e-04 output_file = paste(resultdir, "Result_Gmodel.txt", sep =.Platform$file.sep) GEM_Gmodel(snp_file, covariate_file, methylation_file, Gmodel_pv, output_file) GEM_GUI Graphical User Interface (GUI) for GEM The user friendly GUI for runing GEM package easily and quickly Usage GEM_GUI() Details The GEM package provides a highly efficient R tool suite for performing epigenome wide association studies (EWAS). GEM provides three major functions named GEM_Emodel, GEM_Gmodel and GEM_GxEmodel to study the interplay of Gene, Environment and Methylation (GEM). Within GEM, the pre-existing "Matrix eqtl" package is utilized and extended to study methylation quantitative trait loci (methqtl) and the interaction of genotype and environment (GxE) to determine DNA methylation variation, using matrix based iterative correlation and memory-efficient data analysis. GEM can facilitate reliable genome-wide methqtl and GxE analysis on a standard laptop computer within minutes. Value GEM model analysis results See Also GEM-package interactive() #GEM_GUI() ## remove the hash symbol to run

6 GEM_GWASmodel GEM_GWASmodel GEM_GWASmodel GEM_GWASmodel performs genome wide association study (GWAS). Usage GEM_GWASmodel(env_file_name, snp_file_name, covariate_file_name, GWASmodel_pv, output_file_name, qqplot_file_name) Arguments env_file_name snp_file_name Text file with rows representing environment factor and columns representing samples, such as the example data file "env.txt". Text file with rows representing genotype encoded as 1,2,3 or any three distinct values for major allele homozygote (AA), heterozygote (AB) and minor allele homozygote (BB) and columns representing samples, such as the example data file "snp.txt". covariate_file_name Text file with rows representing covariate factors, and columns representing samples, such as the example data file "cov.txt". GWASmodel_pv The pvalue cut off. Associations with significances at GWASmodel_pv level or below are saved to output_file_name, with corresponding estimate of effect size (slope coefficient), test statistics and p-value. Default value is 5.0E-08. output_file_name The result file with each row presenting a SNP and its association with environment, which contains SNPID, estimate of effect size (slope coefficient), test statistics, pvalue and FDR at each column. qqplot_file_name Output QQ plot for all pvalues. Details GEM_GWASmodel finds the association between genetic variants and environment genome-wide by performing matrix based iterative correlation and memory-efficient data analysis instead of millions of linear regressions (N = number_of_snps). The environmental factor can be a particular phenotype or environment factor from,for example, birth outcomes, maternal conditions or disease traits. The genotype data are encoded as 1,2,3 or any three distinct values for major allele homozygote (AA), heterozygote (AB) and minor allele homozygote (BB). The linear regression is adjusted by covariates read from covariate data file. The output of GEM_GWASmodel is a list of SNPs and their association with environment. GEM_GWASmodel runs linear regression like lm (E ~ G + covt), where G is a matrix with genotype data, E is a matrix with environment factor and covt is a matrix with covariates, and all read from the formatted text data file. Value save results automatically

GEM_GxEmodel 7 DATADIR = system.file('extdata',package='gem') RESULTDIR = getwd() snp_file = paste(datadir, "snp.txt", sep =.Platform$file.sep) covariate_file = paste(datadir, "cov.txt", sep =.Platform$file.sep) env_file = paste(datadir, "env.txt", sep =.Platform$file.sep) GWASmodel_pv = 1e-5 output_file = paste(resultdir, "Result_GxEmodel.txt", sep =.Platform$file.sep) qqplot_file = paste(resultdir, "Result_GxEmodel.jpeg", sep =.Platform$file.sep) GEM_GWASmodel(env_file, snp_file, covariate_file, GWASmodel_pv, output_file, qqplot_file) GEM_GxEmodel GEM_GxEmodel GEM_GxEmodel tests the ability of the interaction of gene and environmental factor to predict DNA methylation level. Usage GEM_GxEmodel(snp_file_name, covariate_file_name, methylation_file_name, GxEmodel_pv, output_file_name, topkplot = 10, saveplot = TRUE) Arguments snp_file_name Text file with rows representing genotype encoded as 1,2,3 or any three distinct values for major allele homozygote (AA), heterozygote (AB) and minor allele homozygote (BB) and columns representing samples, such as the example data file "snp.txt". covariate_file_name Text file with rows representing covariate factors and the envirnoment value, and the environment value should be put in the last row, and columns representing samples, such as the example data file "gxe.txt". methylation_file_name Text file with rows representing methylation profiles for CpGs, and columns representing samples, such as the example data file "methylation.txt". GxEmodel_pv The pvalue cut off. Associations with significances at GxEmodel_pv level or below are saved to output_file_name, with corresponding estimate of effect size (slope coefficient), test statistics and p-value. Default value is 5.0E-08. output_file_name The result file with each row presenting a CpG and its association with SNPx- Env, which contains CpGID, SNPID, estimate of effect size (slope coefficient), test statistics, pvalue and FDR at each column. topkplot saveplot The top number of topkplot CpG-SNP-Env triplets will be presented into charts to demonstrate how environment values segregated by SNP groups can explain methylation. if save the plot.

8 SlicedData-class Details GEM_GxEmodel explores how the genotype can work in interaction with environment (GxE) to influence specific DNA methylation level, by performing matrix based iterative correlation and memory-efficient data analysis among methylation, genotyping and environment. This has greatly released the computational burden for GxE study from billions of linear regression (N = number_of_cpgs x number_of_snps x number_of_environment) and made it possible to be accomplished in an efficient way. The linear regression is lm (M ~ G x E + covt), where M is a matrix with methylation data, G is a matrix with genotype data, E is environment data and covt is covariate matrix. E values is combined into covariate file as the last row, and all read from the formatted text data file. The output of GEM_GxEmodel is a list of CpG-SNP-Env triplets, where the environment factor segregated in genotype group fits to explain the particular CpG. The significant association suggests the association between methylation and environment can be better explained by segregation in genotype groups (GxE). Value save results automatically DATADIR = system.file('extdata',package='gem') RESULTDIR = getwd() snp_file = paste(datadir, "snp.txt", sep =.Platform$file.sep) covariate_file = paste(datadir, "gxe.txt", sep =.Platform$file.sep) methylation_file = paste(datadir, "methylation.txt", sep =.Platform$file.sep) GxEmodel_pv = 1e-4 output_file = paste(resultdir, "Result_GxEmodel.txt", sep =.Platform$file.sep) GEM_GxEmodel(snp_file, covariate_file, methylation_file, GxEmodel_pv, output_file) SlicedData-class Class SlicedData for storing large matrices This class is created for fast and memory efficient manipulations with large datasets presented in matrix form. It is used to load, store, and manipulate large datasets, e.g. genotype and gene expression matrices. When a dataset is loaded, it is sliced in blocks of 1,000 rows (default size). This allows imputing, standardizing, and performing other operations with the data with minimal memory overhead. Usage # x[[i]] indexing allows easy access to individual slices. # It is equivalent to x$getslice(i) and x$setslice(i,value) x[[i]] ## S4 replacement method for signature 'SlicedData' x[[i]] <- value # The following commands work as if x was a simple matrix object

SlicedData-class 9 nrow(x) ncol(x) dim(x) rownames(x) colnames(x) ## S4 replacement method for signature 'SlicedData' rownames(x) <- value ## S4 replacement method for signature 'SlicedData' colnames(x) <- value # SlicedData object can be easily transformed into a matrix # preserving row and column names as.matrix(x) # length(x) can be used in place of x$nslices() # to get the number of slices in the object length(x) Arguments x i value SlicedData object. Number of a slice. New content for the slice / new row or column names. Value check the resutls Extends SlicedData is a reference classes (envrefclass). Its methods can change the values of the fields of the class. Fields dataenv: environment. Stores the slices of the data matrix. The slices should be accessed via getslice() and setslice() methods. nslices1: numeric. Number of slices. For internal use. The value should be access via nslices() method. rownameslices: list. Slices of row names. columnnames: character. Column names. filedelimiter: character. Delimiter separating values in the input file. fileskipcolumns: numeric. Number of columns with row labels in the input file. fileskiprows: numeric. Number of rows with column labels in the input file. fileslicesize: numeric. Maximum number of rows in a slice. fileomitcharacters: character. Missing value (NaN) representation in the input file.

10 SlicedData-class Methods initialize(mat): Create the object from a matrix. nslices(): Returns the number of slices. ncols(): Returns the number of columns in the matrix. nrows(): Returns the number of rows in the matrix. Clear(): Clears the object. Removes the data slices and row and column names. Clone(): Makes a copy of the object. Changes to the copy do not affect the source object. CreateFromMatrix(mat): Creates SlicedData object from a matrix. LoadFile(filename, skiprows = NULL, skipcolumns = NULL, slicesize = NULL, omitcharacters = NULL, Loads data matrix from a file. filename should be a character string. The remaining parameters specify the file format and have the same meaning as file* fields. Additional rownamescolumn parameter specifies which of the columns of row labels to use as row names. SaveFile(filename): Saves the data to a file. filename should be a character string. getslice(sl): Retrieves sl-th slice of the matrix. setslice(sl, value): Set sl-th slice of the matrix. ColumnSubsample(subset): Reorders/subsets the columns according to subset. Acts as M = M[,subset] for a matrix M. RowReorder(ordr): Reorders rows according to ordr. Acts as M = M[ordr, ] for a matrix M. RowMatrixMultiply(multiplier): Multiply each row by the multiplier. Acts as M = M %*% multiplier for a matrix M. CombineInOneSlice(): Combines all slices into one. The whole matrix can then be obtained via $getslice(1). IsCombined(): Returns TRUE if the number of slices is 1 or 0. ResliceCombined(sliceSize = -1): Cuts the data into slices of slicesize rows. If slicesize is not defined, the value of fileslicesize field is used. GetAllRowNames(): Returns all row names in one vector. RowStandardizeCentered(): Set the mean of each row to zero and the sum of squares to one. SetNanRowMean(): Impute rows with row mean. Rows full of NaN values are imputed with zeros. RowRemoveZeroEps(): Removes rows of zeros and those that are nearly zero. FindRow(rowname): Finds row by name. Returns a pair of slice number an row number within the slice. If no row is found, the function returns NULL. rowmeans(x, na.rm = FALSE, dims = 1L): Returns a vector of row means. Works as rowmeans but requires dims to be equal to 1L. rowsums(x, na.rm = FALSE, dims = 1L): Returns a vector of row sums. Works as rowsums but requires dims to be equal to 1L. colmeans(x, na.rm = FALSE, dims = 1L): Returns a vector of column means. Works as colmeans but requires dims to be equal to 1L. colsums(x, na.rm = FALSE, dims = 1L): Returns a vector of column sums. Works as colsums but requires dims to be equal to 1L. Author(s) Andrey Shabalin <ashabalin@vcu.edu>

SlicedData-class 11 References The package website: http://www.bios.unc.edu/research/genomic_software/matrix_eqtl/ # Create a SlicedData variable

Index Topic classes SlicedData-class, 8 [[,SlicedData-method [[<-,SlicedData-method as.matrix,sliceddata-method colmeans, 10 colmeans,sliceddata-method colnames,sliceddata-method colnames<-,sliceddata-method colsums, 10 colsums,sliceddata-method nrow,sliceddata-method rowmeans, 10 rowmeans,sliceddata-method rownames,sliceddata-method rownames<-,sliceddata-method rowsums, 10 rowsums,sliceddata-method show,sliceddata-method SlicedData, 9 SlicedData SlicedData-class, 8 summary.sliceddata dim,sliceddata-method envrefclass, 9 GEM-package, 2 GEM_Emodel, 2, 2, 5 GEM_Gmodel, 2, 4, 5 GEM_GUI, 2, 5 GEM_GWASmodel, 6 GEM_GxEmodel, 2, 5, 7 length,sliceddata-method matrix, 10 NCOL,SlicedData-method ncol,sliceddata-method NROW,SlicedData-method 12