Package LGRF. September 13, 2015

Similar documents
SKAT Package. Seunggeun (Shawn) Lee. July 21, 2017

Package globalgsa. February 19, 2015

Package dkdna. June 1, Description Compute diffusion kernels on DNA polymorphisms, including SNP and bi-allelic genotypes.

Package gee. June 29, 2015

Package allehap. August 19, 2017

Package GWAF. March 12, 2015

Package SMAT. January 29, 2013

Package RobustSNP. January 1, 2011

Package PedCNV. February 19, 2015

Package lodgwas. R topics documented: November 30, Type Package

PreMeta GENERAL INFORMATION SYNOPSIS

PreMeta GENERAL INFORMATION SYNOPSIS

Package JGEE. November 18, 2015

Package FREGAT. April 21, 2017

Package MultiMeta. February 19, 2015

Package EMLRT. August 7, 2014

Package GEM. R topics documented: January 31, Type Package

Package assortnet. January 18, 2016

Package EBglmnet. January 30, 2016

Package coloc. February 24, 2018

Package ridge. R topics documented: February 15, Title Ridge Regression with automatic selection of the penalty parameter. Version 2.

Package MOJOV. R topics documented: February 19, 2015

Package snplist. December 11, 2017

Package rqt. November 21, 2017

Package datasets.load

Package OmicKriging. August 29, 2016

Package snpstatswriter

Package clusterrepro

Package gpdtest. February 19, 2015

GMDR User Manual Version 1.0

Package semisup. March 10, Version Title Semi-Supervised Mixture Model

Step-by-Step Guide to Advanced Genetic Analysis

Package RobustRankAggreg

Package rocc. February 20, 2015

Package misclassglm. September 3, 2016

Package pnmtrem. February 20, Index 9

Package SimGbyE. July 20, 2009

Genetic type 1 Error Calculator (GEC)

SEQGWAS: Integrative Analysis of SEQuencing and GWAS Data

Package CuCubes. December 9, 2016

Package PTE. October 10, 2017

GWAS Exercises 3 - GWAS with a Quantiative Trait

Package REGENT. R topics documented: August 19, 2015

Package PCGSE. March 31, 2017

MAGA: Meta-Analysis of Gene-level Associations

Step-by-Step Guide to Basic Genetic Analysis

Bioconductor s tspair package

Polymorphism and Variant Analysis Lab

Package svapls. February 20, 2015

Package rpst. June 6, 2017

GMDR User Manual. GMDR software Beta 0.9. Updated March 2011

Package longclust. March 18, 2018

Package Combine. R topics documented: September 4, Type Package Title Game-Theoretic Probability Combination Version 1.

Package sigqc. June 13, 2018

Package mpmi. November 20, 2016

Package RSNPset. December 14, 2017

bimm vignette Matti Pirinen & Christian Benner University of Helsinki November 15, 2016

Package logicfs. R topics documented:

Package poolfstat. September 14, 2018

Package geneslope. October 26, 2016

Package kernelfactory

QUICKTEST user guide

Package CVR. March 22, 2017

Package strat. November 23, 2016

Package Eagle. January 31, 2019

Data Currently Available (And How to Access It) Chance Hohensee Data Training September 9, 2016

Package DPBBM. September 29, 2016

Package TVsMiss. April 5, 2018

Package hsiccca. February 20, 2015

Package TANOVA. R topics documented: February 19, Version Date

Package subtype. February 20, 2015

Package genelistpie. February 19, 2015

Package cosinor. February 19, 2015

Package gibbs.met. February 19, 2015

Package ssd. February 20, Index 5

Package inversion. R topics documented: July 18, Type Package. Title Inversions in genotype data. Version

GMMAT: Generalized linear Mixed Model Association Tests Version 0.7

Package kirby21.base

The fgwas Package. Version 1.0. Pennsylvannia State University

Package CONOR. August 29, 2013

Package ECctmc. May 1, 2018

Package DSBayes. February 19, 2015

Package survclip. November 22, 2017

Package fastdummies. January 8, 2018

Package nlnet. April 8, 2018

Package sensory. R topics documented: February 23, Type Package

Package ranger. November 10, 2015

Package missforest. February 15, 2013

Package longitudinal

Package esabcv. May 29, 2015

Package NB. R topics documented: February 19, Type Package

Package RaPKod. February 5, 2018

Package BMTME. February 12, 2019

Package TANDEM. R topics documented: June 15, Type Package

Package stockr. June 14, 2018

Package triogxe. April 3, 2013

Package LEA. April 23, 2016

Package haplor. November 1, 2017

Package endogenous. October 29, 2016

Package eulerian. February 19, Index 5

Transcription:

Type Package Package LGRF September 13, 2015 Title Set-Based Tests for Genetic Association in Longitudinal Studies Version 1.0 Date 2015-08-20 Author Zihuai He Maintainer Zihuai He <zihuai@umich.edu> Functions for the longitudinal genetic random field method (He et al., 2015, <doi:10.1111/biom.12310>) to test the association between a longitudinally measured quantitative outcome and a set of genetic variants in a gene/region. License GPL-3 Depends CompQuadForm, SKAT, geepack NeedsCompilation no Repository CRAN Date/Publication 2015-09-13 09:22:48 R topics documented: IBS_pseudo......................................... 2 LGRF.example....................................... 2 LGRF.SSD.All....................................... 3 LGRF.SSD.OneSet_SetIndex................................ 5 null.lgrf.......................................... 7 test.lgrf.......................................... 8 test.minp.......................................... 10 Index 12 1

2 LGRF.example IBS_pseudo Generate IBS pseudo variables If users want to calculate the IBS similarity, this function creates the IBS pseudo variables. This is in order to calculate the IBS similarity in an efficient way. IBS_pseudo(x) Arguments x An n by q matrix of genetic variants. Value It returns an n by 3p matrix of pseudo variables for efficiently calculating IBS similarity. library(lgrf) # Load data example # Z: genotype matrix, n by q matrix data(lgrf.example) Z<-LGRF.example$Z A<-IBS_pseudo(Z) # Then the IBS matrix can be calculated by K.IBS<-AA^T. LGRF.example Data example for LGRF The dataset contains outcome variable Y, covariate X, time and genotype data Z. The first column in time is the subject ID and the second column is the measured exam. Y, X and time are all in long form. Z is a genotype matrix where each row corresponds to one subject. data(lgrf.example)

LGRF.SSD.All 3 data(lgrf.example) LGRF.SSD.All LGRF tests for multiple regions/genes using SSD format files Test the association between an outcome variable and multiple regions/genes using SSD format files. LGRF.SSD.All(SSD.INFO, result.null, Gsub.id=NULL, intergxt=false, similarity= GR, impute.method= fixed, MinP.compare=FALSE,...) Arguments Value SSD.INFO result.null Gsub.id intergxt similarity impute.method MinP.compare SSD format information file, output of function Open_SSD". The sets are defined by this file. Output of function null.lgrf". The subject id corresponding to the genotype matrix, an m dimensional vector. This is in order to match the phenotype and genotype matrix. The default is NULL, where the order is assumed to be matched with Y, X and time. Whether to incorperate the gene-time interaction effect. Incorperating this effect can improve power if there is any gene-time interaction, but has slight power loss otherwise. The default is FALSE. *Please note that the second column of time should be included as a covairate when intergxt is TRUE. Choose the similarity measurement for the genetic variants. Can be either "GR" for genetic relationship or "IBS" for identity by state. The default is "GR" for better computational efficiency. Choose the imputation method when there is missing genotype. Can be "random", "fixed" or "bestguess". Given the estimated allele frequency, "random" simulates the genotype from binomial distribution; "fixed" uses the genotype expectation; "Best guess" uses the genotype with highest probability. Whether to compare with the GEE based minimum p-value (MinP) test. The default is FALSE. Please note that implementing the GEE based MinP test is time consuming.... Other options of the GEE based MinP test. Defined same as in function test.minp". results First column contains the set ID; Second column contains the p-values; Third column contains the number of tested SNPs.

4 LGRF.SSD.All # * Since the Plink data files used here are hard to be included in a R package, # The usage is marked by "#" to pass the package check. #library(lgrf) ############################################## # Plink data files: File.Bed, File.Bim, File.Fam # Files defining the sets: File.SetID, File.SSD, File.Info # For longitudinal data, outcome and covariates are saved in a separate file: Y, time, X. # Null model was fitted using function null.lgrf. # Create the MW File # File.Bed<-"./example.bed" # File.Bim<-"./example.bim" # File.Fam<-"./example.fam" # File.SetID<-"./example.SetID" # File.SSD<-"./example.SSD" # File.Info<-"./example.SSD.info" # Generate SSD file # To use binary ped files, you have to generate SSD file first. # If you already have a SSD file, you do not need to call this function. # Generate_SSD_SetID(File.Bed, File.Bim, File.Fam, File.SetID, File.SSD, File.Info) # SSD.INFO<-Open_SSD(File.SSD, File.Info) # Number of samples # SSD.INFO$nSample # Number of Sets # SSD.INFO$nSets ## Fit the null model # Y: outcomes, n by 1 matrix where n is the total number of observations # X: covariates, n by p matrix # time: describe longitudinal structure, n by 2 matrix # result.null<-null.lgrf(y,time,x=cbind(x,time[,2])) # *Please note that the second column of time should be included as a covairate if # the gene by time interaction effect will be incorperated. ## Test all regions # out_all<-lgrf.ssd.all(ssd.info, result.null) # Example result # out.all$results # SetID P.value N.Marker # 1 GENE_01 0.6568851 94 # 2 GENE_02 0.1822183 84 # 3 GENE_03 0.3836986 108 # 4 GENE_04 0.1265337 101

LGRF.SSD.OneSet_SetIndex 5 # 5 GENE_05 0.3236089 103 # 6 GENE_06 0.9401741 94 # 7 GENE_07 0.1043820 104 # 8 GENE_08 0.6093275 96 # 9 GENE_09 0.6351147 100 # 10 GENE_10 0.5631549 100 ## Test all regions, and compare with GEE based MinP test # out_all<-lgrf.ssd.all(ssd.info, result.null,minp.compare=t) # Example result # out.all$results # SetID P.value P.value.MinP N.Marker # 1 GENE_01 0.62842 1.0000 94 # 2 GENE_02 0.06558 0.2718 84 # 3 GENE_03 0.61795 1.0000 108 # 4 GENE_04 0.39667 0.7052 101 # 5 GENE_05 0.17371 0.5214 103 # 6 GENE_06 0.90104 1.0000 94 # 7 GENE_07 0.10143 0.1188 104 # 8 GENE_08 0.78082 0.3835 96 # 9 GENE_09 0.21966 0.5364 100 # 10 GENE_10 0.25468 0.3527 100 LGRF.SSD.OneSet_SetIndex LGRF tests for a single region/gene using SSD format files Test the association between an outcome variable and one region/gene using SSD format files. LGRF.SSD.OneSet_SetIndex(SSD.INFO, SetIndex, result.null, Gsub.id=NULL, intergxt=false, similarity= GR, impute.method= fixed, MinP.compare=FALSE,...) Arguments SSD.INFO SetIndex result.null Gsub.id SSD format information file, output of function Open_SSD". The sets are defined by this file. Set index. From 1 to the total number of sets. Output of function null.lgrf". The subject id corresponding to the genotype matrix, an m dimensional vector. This is in order to match the phenotype and genotype matrix. The default is NULL, where the order is assumed to be matched with Y, X and time.

6 LGRF.SSD.OneSet_SetIndex Value intergxt similarity impute.method MinP.compare Whether to incorperate the gene-time interaction effect. Incorperating this effect can improve power if there is any gene-time interaction, but has slight power loss otherwise. The default is FALSE. *Please note that the second column of time should be included as a covairate when intergxt is TRUE. Choose the similarity measurement for the genetic variants. Can be either "GR" for genetic relationship or "IBS" for identity by state. The default is "GR" for better computational efficiency. Choose the imputation method when there is missing genotype. Can be "random", "fixed" or "bestguess". Given the estimated allele frequency, "random" simulates the genotype from binomial distribution; "fixed" uses the genotype expectation; "Best guess" uses the genotype with highest probability. Whether to compare with the GEE based minimum p-value (MinP) test. The default is FALSE. Please note that implementing the GEE based MinP test is time consuming.... Other options of the GEE based MinP test. Defined same as in function test.minp". p.value n.marker p-value of the LGRF test. number of tested SNPs in the SNP set. # * Since the Plink data files used here are hard to be included in a R package, # The usage is marked by "#" to pass the package check. #library(lgrf) ############################################## # Plink data files: File.Bed, File.Bim, File.Fam # Files defining the sets: File.SetID, File.SSD, File.Info # For longitudinal data, outcome and covariates are saved in a separate file: Y, time, X. # Null model was fitted using function null.lgrf. # Create the MW File # File.Bed<-"./example.bed" # File.Bim<-"./example.bim" # File.Fam<-"./example.fam" # File.SetID<-"./example.SetID" # File.SSD<-"./example.SSD" # File.Info<-"./example.SSD.info" # Generate SSD file # To use binary ped files, you have to generate SSD file first. # If you already have a SSD file, you do not need to call this function. # Generate_SSD_SetID(File.Bed, File.Bim, File.Fam, File.SetID, File.SSD, File.Info) # SSD.INFO<-Open_SSD(File.SSD, File.Info)

null.lgrf 7 # Number of samples # SSD.INFO$nSample # Number of Sets # SSD.INFO$nSets ## Fit the null model # Y: outcomes, n by 1 matrix where n is the total number of observations # X: covariates, n by p matrix # time: describe longitudinal structure, n by 2 matrix # result.null<-null.lgrf(y,time,x=cbind(x,time[,2])) # *Please note that the second column of time should be included as a covairate if # the gene by time interaction effect will be incorperated. ## Test a single region # out_single<-lgrf.ssd.oneset_setindex(ssd.info=ssd.info, SetIndex=1, # result.null=result.null, MinP.compare=F) # Example result # $p.value # [1] 0.6284 # $n.marker # [1] 94 ## Test a single region, and compare with GEE based MinP test # out_single<-lgrf.ssd.oneset_setindex(ssd.info=ssd.info, SetIndex=1, # result.null=result.null,minp.compare=t) # $p.value # LGRF MinP # [1,] 0.6284 1 # $n.marker # [1] 94 null.lgrf Fit the null model for longitudinal genetic random field model Before testing a specific region using a score test, this function fits the longitudinal genetic random field model under the null hypothesis. null.lgrf(y, time, X = NULL)

8 test.lgrf Arguments Y time X The outcome variable, an n*1 matrix where n is the total number of observations An n*2 matrix describing how the observations are measured. The first column is the subject id. The second column is the measured exam (1,2,3,etc.). An n*p covariates matrix where p is the total number of covariates. Value It returns a list used for function test.lgrf(). library(lgrf) # Load data example # Y: outcomes, n by 1 matrix where n is the total number of observations # X: covariates, n by p matrix # time: describe longitudinal structure, n by 2 matrix # Z: genotype matrix, m by q matrix where m is the total number of subjects data(lgrf.example) Y<-LGRF.example$Y;time<-LGRF.example$time;X<-LGRF.example$X;Z<-LGRF.example$Z # Fit the null model result.null<-null.lgrf(y,time,x=cbind(x,time[,2])) # *Please note that the second column of time should be included as a covairate if # the gene by time interaction effect will be incorperated. test.lgrf Test the association between an outcome variable and a region/gene by LGRF Once the model under the null model is fitted using "null.lgrf()", this function tests a specifc region/gene. test.lgrf(z, result.null, Gsub.id=NULL, intergxt = FALSE, similarity = "GR", impute.method="fixed")

test.lgrf 9 Arguments Z result.null Gsub.id intergxt similarity impute.method Genetic variants in the target region/gene, an m*q matrix where m is the subject ID and q is the total number of genetic variables. Note that the number of rows in Z should be same as the number of subjects. The output of function "null.lgrf()" The subject id corresponding to the genotype matrix, an m dimensional vector. This is in order to match the phenotype and genotype matrix. The default is NULL, where the order is assumed to be matched with Y, X and time. Whether to incorperate the gene-time interaction effect. Incorperating this effect can improve power if there is any gene-time interaction, but has slight power loss otherwise. The default is FALSE. *Please note that the second column of time should be included as a covairate when intergxt is TRUE. Choose the similarity measurement for the genetic variants. Can be either "GR" for genetic relationship or "IBS" for identity by state. The default is "GR" for better computational efficiency. Choose the imputation method when there is missing genotype. Can be "random", "fixed" or "bestguess". Given the estimated allele frequency, "random" simulates the genotype from binomial distribution; "fixed" uses the genotype expectation; "Best guess" uses the genotype with highest probability. Value p.value n.marker p-value of the LGRF test. number of tested SNPs in the SNP set. ## null.lgrf fits the null model. # Input: Y, time, X (covariates) ## test.lgrf tests a region and give p-value. # Input: Z (genetic variants) and result of null.longgrf library(lgrf) # Load data example # Y: outcomes, n by 1 matrix where n is the total number of observations # X: covariates, n by p matrix # time: describe longitudinal structure, n by 2 matrix # Z: genotype matrix, m by q matrix where m is the total number of subjects data(lgrf.example) Y<-LGRF.example$Y;time<-LGRF.example$time;X<-LGRF.example$X;Z<-LGRF.example$Z # Fit the null model result.null<-null.lgrf(y,time,x=cbind(x,time[,2])) # *Please note that the second column of time should be included as a covairate if # the gene by time interaction effect will be incorperated.

10 test.minp # The LGRF-G test plgrf_g<-test.lgrf(z,result.null) # The LGRF-GT test plgrf_gt<-test.lgrf(z,result.null,intergxt=true) # The LGRF-G test using the IBS similarity plgrf_g_ibs<-test.lgrf(z,result.null,similarity="ibs") # The LGRF-GT test, main effect is modeled using the IBS similarity plgrf_gt_ibs<-test.lgrf(z,result.null,intergxt=true,similarity="ibs") test.minp Test the association between an outcome variable and a region/gene by MinP If users want to compare LGRF with the minimum p-value (MinP) test, this function tests a specifc region/gene by a GEE based minimum p-value test after fitting "null.lgrf()". test.minp(z, result.null, Gsub.id=NULL, corstr="exchangeable", MinP.adjust=0.95, impute.method="fixed") Arguments Z result.null Gsub.id corstr MinP.adjust impute.method Genetic variants in the target region/gene, an m*q matrix where m is the subject ID and q is the total number of genetic variables. Note that the number of rows in Z should be same as the number of subject. The output of function "null.lgrf()". The subject id corresponding to the genotype matrix, an m dimensional vector. This is in order to match the phenotype and genotype matrix. The default is NULL, where the order is assumed to be matched with Y, X and time. The working correlation as specified in geeglm. The following are permitted: "independence", "exchangeable", "ar1", "unstructured" and "userdefined". The minimum p-value is adjusted by the number of independent tests. Choose the adjustment thereshold as specified in Gao, et al. (2008) "A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms". Values from 0 to 1 are permitted. Choose the imputation method when there is missing genotype. Can be "random", "fixed" or "bestguess". Given the estimated allele frequency, "random" simulates the genotype from binomial distribution; "fixed" uses the genotype expectation; "Best guess" uses the genotype with highest probability.

test.minp 11 Value p.value n.marker p-value of the MinP test. number of tested SNPs in the SNP set. ## null.lgrf fits the null model. # Input: Y, time, X (covariates) ## test.minp tests a region and give p-value. # Input: Z (genetic variants) and result of null.longgrf library(lgrf) # Load data example # Y: outcomes, n by 1 matrix where n is the total number of observations # X: covariates, n by p matrix # time: describe longitudinal structure, n by 2 matrix # Z: genotype matrix, m by q matrix where m is the total number of subjects data(lgrf.example) Y<-LGRF.example$Y;time<-LGRF.example$time;X<-LGRF.example$X;Z<-LGRF.example$Z # Fit the null model result.null<-null.lgrf(y,time,x=x) # The minimum p-value test based on GEE pminp<-test.minp(z,result.null,corstr="exchangeable",minp.adjust=0.95)

Index Topic IBS IBS_pseudo, 2 Topic datasets LGRF.example, 2 Topic null model null.lgrf, 7 Topic plink_test_all LGRF.SSD.All, 3 Topic plink_test_single LGRF.SSD.OneSet_SetIndex, 5 Topic test test.lgrf, 8 test.minp, 10 IBS_pseudo, 2 LGRF.example, 2 LGRF.SSD.All, 3 LGRF.SSD.OneSet_SetIndex, 5 null.lgrf, 7 test.lgrf, 8 test.minp, 10 12