MAGA: Meta-Analysis of Gene-level Associations

MAGA: Meta-Analysis of Gene-level Associations SYNOPSIS MAGA [--sfile] [--chr] OPTIONS Option Default Description --sfile specification.txt Select a specification file --chr Select a chromosome DESCRIPTION MAGA is a command-line program written in C/C++ for meta-analysis of gene-level associations based on single-variant statistics (i.e., p-values of association tests and effect estimates) of rare variants from participating studies. MAGA recovers the multivariate statistics of gene-level association tests from single-variant statistics together with the correlation matrix of the single-variant test statistics, which is estimated from one of the participating studies or from a publicly available database. MAGA accommodates any disease phenotype and any study design and produces all commonly used gene-level tests, i.e., the burden, variable threshold, and variance-component tests. MAGA can perform meta-analysis of gene-level associations by combining rare variants in sequencing studies or by combining low-frequency variants in genome-wide association studies (GWAS). By treating each variant as a gene, MAGA can also perform meta-analysis of single-variant associations, which is more stable than inverse-variance method in the presence of rare or low-frequency variants. We are working intensely to improve the capabilities of MAGA, so please check back frequently for updates. 1

INPUT FILES Specification File REFERENCE_TYPE = internal # internal/external REFERENCE_PHENOTYPE_FILE =.//DataReference/Ref_pheno.dat REFERENCE_PHENOTYPE_FILE_HEADER = TRUE # TRUE/FALSE REGRESSION_MODEL = logistic # logistic/linear PHENOTYPE_COLUMN = 3 COVARIATE_COLUMN = 2 REFERENCE_MAP_FILE =.//DataReference/Ref_info_chr.dat REFERECEN_MAP_FILE_HEADER = TRUE # TRUE/FALSE SNP_ID_COLUMN = 3 SNP_POS_COLUMN = 2 SNP_FREQ_COLUMN = 4 REFERENCE_GENOTYPE_FILE =.//DataReference/Ref_dose_chr.dat ANNOTATION_FILE =.//DataAnnotation/geneList_downloaded_plink.txt ANNOTATION_TYPE = gene STUDY_LIST_FILE =.//DataSummary/StudyList_.txt STUDY_DIR =.//DataSummary/ OUTPUT_FILE =.//MetaResults/MetaResults chr.out MAF_CUTOFF = 0.05 The file describes the input/output files and the program parameters. The syntax follows KEYWORD = value1 [value2 ] with spaces around =. All the following lines are required unless otherwise stated as optional. REFERENCE_TYPE = internal/external When REFERENCE_TYPE = external, the lines REFERENCE_PHENOTYPE_FILE - COVARIATE_COLUMN will not be used and thus do not need to be specified. REFERENCE_PHENOTYPE_FILE = full_pathname REFERENCE_PHENOTYPE_FILE_HEADER = TRUE/FALSE REGRESSION_MODEL = linear/logistic Specify the regression model to fit the internal reference data. PHENOTYPE_COLUMN = column_number_in_reference_phenotype_file Specify the column (starting with number 1) to be used as the phenotype. COVARIATE_COLUMN = column_number_1 [column_number_2 ] 2

Specify column(s) in the reference phenotype file to be used as covariates in the regression model. Optional when REFERENCE_TYPE = internal; no covariate will be used by default. REFERENCE_MAP_FILE = prefix affix Specify the prefix and affix of the pathname. The program will insert the chromosome number (single digit for 1-9 and two digits for 10-23), specified by -chr, to obtain the full pathname. For example, for the two strings in the example specification file, the reference genotype file for chromosome 1 is accessed through the pathname.//datareference/ref_info_chr1.dat REFERENCE_MAP_FILE_HEADER = TRUE/FALSE SNP_ID_COLUMN = column_number_in_reference_map_file SNP_POS_COLUMN = column_number_in_reference_map_file SNP_FREQ_COLUMN = column_number_in_reference_map_file Optional. If not specified, the coding-allele frequencies will be internally determined from the genotype data. REFERENCE_GENOTYPE_FILE = prefix affix ANNOTATION_FILE = full_pathname ANNOTATION_TYPE = gene Specify the format of the annotation file. Currently, only the value gene is allowed. STUDY_LIST_FILE = full_pathname STUDY_DIR = directory_of_summary_result_files STUDY_DIR and the file names in STUDY_LIST_FILE together determines the full pathnames of the summary result files. OUTPUT_FILE = prefix affix MAF_CUTOFF = MAF_cutoff Only variants with MAFs MAF_CUTOFF are considered for meta-analysis. Reference Phenotype File ID sex status W162798 0 1 M129395 1 0 F180062 1 0 3

The file provides information on the phenotype and covariates of the internal reference subjects. Each row contains space or tab delimited data specific to an individual; the header row is optional. The column for the phenotype is required and those for the subject identifier and covariates are optional. In a case-control study, the disease variable should be coded 0/1 to represent unaffected/affected. Missing phenotypes or covariates are denoted as. or NA. Reference Map File Chr Pos Rs Freq 21 9887804 rs885550 0.9793 21 9928594 rs169757 0.9597 21 9928860 rs210498 0.8986 The file provides information on the SNPs of the reference panel, on the particular chromosome specified by --chr. Each row contains space or tab delimited data specific to a SNP; the rows do not need to be in genomic order and the header row is optional. The columns for the position and the SNP identifier are required and the one for the codingallele frequency is optional. If the position of a SNP is missing, it should be denoted as. or NA and that SNP will be excluded from analysis. The SNP identifier will be used to link the SNPs in the reference and the summary result files, and thus should be comparible. Reference Genotype File 1.996 1.967 1.965 1.986 1.976 1.976 1.974 1.867 1.853 The file provides (imputed) genotype information for the reference subjects. Each row contains space or tab delimited data specific to a SNP; the order of SNPs should align with their orders in the reference map file. Each column pertains to a reference subject; the order of subjects should align with their orders in the reference phenotype file if available. This file does not allow any header row or extra columns. 4

Annotation File 21 10042712 10120796 BAGE5 21 10079666 10120808 BAGE 21 13904368 13935777 A26B3 The file provides annotation information on the SNPs or genes. The current version of MAGA (v1.0) only allows the format of gene annotation. Specifically, each row contains data specific to a gene; the rows do not need to be in genomic order. The columns should be in the order of chr pos_start pos_end gene_name without column names. Study List File 1 AGES_HEIGHT_POOLED.txt 2 B58C-T1DGC_HEIGHT_WOMEN.txt The file provides the names of summary result files for participating studies. The first column lists arbitrary numbers indexing the studies, which are used by the program to track the meta-analysis. The second column lists the names of summary result files. The path of the files can be specified in the second column or in STUDY_DIR. When REFERENCE_TYPE = internal, make sure the summary result file of the reference study is not listed here to avoid double-contribution. When REFERENCE_TYPE = external, it is recommended that the study with the largest sample size be listed in the first row. Summary Result Files MarkerName N EAF BETA SE P rs3965725 323 0.5576-0.0911831 0.219373 0.67766 rs2261012 323 0.4854-0.10154 0.187825 0.58878 rs2259093 323 0.4795-0.113684 0.186375 0.54188 All summary result files should contain at least the columns for the SNP identifier, sample size, effect allele frequency, effect size estimate, standard error of effect size estimate, and p-value; their column names should be specified as in the example. The column order can be arbitrary. The row order can be arbitrary. 5

OUTPUT Output File GeneName chr start end nrare N P_T5 P_VT P_SKAT HSPA13 21 14665307 14677380 7 812 4.11e-01 7.40e-01 5.87e-01 SAMSN1 21 14779419 14840535 14 812 6.27e-01 7.58e-01 3.05e-01 The file contains information on the number of variants included in each gene (nrare), the number of subjects contributing to each gene (N), and the p-values of the burden test with the MAF threshold of 5% (T5), the variable threshold test (VT) and SKAT. EXAMPLE Download and unzip the software package. Enter the command $ MAGA -sfile spec_example.txt -chr 21 to obtain the results given in MetaResults chr21.out. REFERENCE Hu, Y.J., Berndt, S.I., Gustafsson, S., Ganna, A., Genetic Investigation of ANthropometric Traits (GIANT) Consortium, Hirschhorn, J., North, K.E., Ingelsson, E., and Lin, D.Y. Meta-Analysis of Gene-Level Associations for Rare Variants Based on Single-Variant Statistics. American Journal of Human Genetics 93: 236--248. VERSION HISTORY v1.0 2013/04/30 First version released. v1.1 2013/09/11 Small bugs fixed. 6