MAGA: Meta-Analysis of Gene-level Associations

Similar documents
SEQGWAS: Integrative Analysis of SEQuencing and GWAS Data

Step-by-Step Guide to Advanced Genetic Analysis

SUGEN 8.6 Overview. Misa Graff, July 2017

MAGMA manual (version 1.06)

Step-by-Step Guide to Basic Genetic Analysis

Package lodgwas. R topics documented: November 30, Type Package

Association Analysis of Sequence Data using PLINK/SEQ (PSEQ)

Genetic Analysis. Page 1

MAGMA manual (version 1.05)

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017

PreMeta GENERAL INFORMATION SYNOPSIS

PreMeta GENERAL INFORMATION SYNOPSIS

JMP Genomics. Release Notes. Version 6.0

Polymorphism and Variant Analysis Lab

GMMAT: Generalized linear Mixed Model Association Tests Version 0.7

Click on "+" button Select your VCF data files (see #Input Formats->1 above) Remove file from files list:

Package MultiMeta. February 19, 2015

MACAU User Manual. Xiang Zhou. March 15, 2017

Package SimGbyE. July 20, 2009

1. Summary statistics test_gwas. This file contains a set of 50K random SNPs of the Subjective Well-being GWAS of the Netherlands Twin Register

GCTA: a tool for Genome- wide Complex Trait Analysis

Importing and Merging Data Tutorial

PRSice: Polygenic Risk Score software v1.22

BioBin User Guide Current version: BioBin 2.3

KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li

GMDR User Manual Version 1.0

Package coloc. February 24, 2018

FVGWAS- 3.0 Manual. 1. Schematic overview of FVGWAS

Package GWAF. March 12, 2015

QUICKTEST user guide

Spotter Documentation Version 0.5, Released 4/12/2010

SKAT Package. Seunggeun (Shawn) Lee. July 21, 2017

Dealing with heterogeneity: group-specific variances and stratified analyses

Package GEM. R topics documented: January 31, Type Package

Release Notes. JMP Genomics. Version 4.0

PRACTICAL SESSION 8 SEQUENCE-BASED ASSOCIATION, INTERPRETATION, VISUALIZATION USING EPACTS JAN 7 TH, 2014 STOM 2014 WORKSHOP

Package MOJOV. R topics documented: February 19, 2015

Package SMAT. January 29, 2013

GMDR User Manual. GMDR software Beta 0.9. Updated March 2011

PRSice: Polygenic Risk Score software - Vignette

Recalling Genotypes with BEAGLECALL Tutorial

Data Walkthrough: Background

Introduction to GDS. Stephanie Gogarten. July 18, 2018

BOLT-LMM v1.2 User Manual

GCTA: a tool for Genome- wide Complex Trait Analysis

Package ridge. R topics documented: February 15, Title Ridge Regression with automatic selection of the penalty parameter. Version 2.

Step-by-Step Guide to Relatedness and Association Mapping Contents

Manual code: MSU_pigs.R

BEAGLECALL 1.0. Brian L. Browning Department of Medicine Division of Medical Genetics University of Washington. 15 November 2010

bimm vignette Matti Pirinen & Christian Benner University of Helsinki November 15, 2016

BIMBAM user manual. Yongtao Guan and Matthew Stephens Baylor College of Medicine and University of Chicago. Version 1.0 Revised on 25 June 2015

ELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2

Estimating. Local Ancestry in admixed Populations (LAMP)

iloci software is used to calculate the gene-gene interactions from GWAS data. This software was implemented by the OpenCL framework.

Package FREGAT. April 21, 2017

CTL mapping in R. Danny Arends, Pjotr Prins, and Ritsert C. Jansen. University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1

snpqc an R pipeline for quality control of Illumina SNP data

Ricopili: Introdution. WCPG Education Day Stephan Ripke / Raymond Walters Toronto, October 2015

GWAS Exercises 3 - GWAS with a Quantiative Trait

NAME QUICKTEST Quick association testing, for quantitative traits, allowing genotype uncertainty

Package cnvgsa. R topics documented: January 4, Type Package

Axiom Analysis Suite Release Notes (For research use only. Not for use in diagnostic procedures.)

The fgwas Package. Version 1.0. Pennsylvannia State University

Package RVS0.0 Jiafen Gong, Zeynep Baskurt, Andriy Derkach, Angelina Pesevski and Lisa Strug October, 2016

BOLT-LMM v2.0 User Manual

GenomeStudio Software Release Notes

Package EMLRT. August 7, 2014

The fgwas software. Version 1.0. Pennsylvannia State University

Intro to NGS Tutorial

Estimating Variance Components in MMAP

Bioinformatics - Homework 1 Q&A style

MIRING: Minimum Information for Reporting Immunogenomic NGS Genotyping. Data Standards Hackathon for NGS HACKATHON 1.0 Bethesda, MD September

Small example of use of OmicABEL

GSCAN GWAS Analysis Plan, v GSCAN GWAS ANALYSIS PLAN, Version 1.0 October 6, 2015

Package EBglmnet. January 30, 2016

Package seqmeta. February 9, 2017

Introduction to Hail. Cotton Seed, Technical Lead Tim Poterba, Software Engineer Hail Team, Neale Lab Broad Institute and MGH

Package gpart. November 19, 2018

Tutorial. Identification of Variants Using GATK. Sample to Insight. November 21, 2017

Package rqt. November 21, 2017

Package LGRF. September 13, 2015

MPG NGS workshop I: Quality assessment of SNP calls

CircosVCF workshop, TAU, 9/11/2017

User Manual for GIGI v1.06.1

Exercises. Biological Data Analysis Using InterMine workshop exercises with answers

Installing the tool. The meta- analysis tool can be downloaded from the TreatOA website ( To install, double click on Metanalisi.

BeviMed Guide. Daniel Greene

Package REGENT. R topics documented: August 19, 2015

Cover Page. The handle holds various files of this Leiden University dissertation.

BOLT-LMM v2.3 User Manual

Package RobustSNP. January 1, 2011

genocn: integrated studies of copy number and genotype

Introduction to GEMINI

GenViewer Tutorial / Manual

HaploHMM - A Hidden Markov Model (HMM) Based Program for Haplotype Inference Using Identified Haplotypes and Haplotype Patterns

User Manual ixora: Exact haplotype inferencing and trait association

QTX. Tutorial for. by Kim M.Chmielewicz Kenneth F. Manly. Software for genetic mapping of Mendelian markers and quantitative trait loci.

Data formats in GWASTools

Agilent Genomic Workbench 7.0

Forensic Resource/Reference On Genetics knowledge base: FROG-kb User s Manual. Updated June, 2017

Transcription:

MAGA: Meta-Analysis of Gene-level Associations SYNOPSIS MAGA [--sfile] [--chr] OPTIONS Option Default Description --sfile specification.txt Select a specification file --chr Select a chromosome DESCRIPTION MAGA is a command-line program written in C/C++ for meta-analysis of gene-level associations based on single-variant statistics (i.e., p-values of association tests and effect estimates) of rare variants from participating studies. MAGA recovers the multivariate statistics of gene-level association tests from single-variant statistics together with the correlation matrix of the single-variant test statistics, which is estimated from one of the participating studies or from a publicly available database. MAGA accommodates any disease phenotype and any study design and produces all commonly used gene-level tests, i.e., the burden, variable threshold, and variance-component tests. MAGA can perform meta-analysis of gene-level associations by combining rare variants in sequencing studies or by combining low-frequency variants in genome-wide association studies (GWAS). By treating each variant as a gene, MAGA can also perform meta-analysis of single-variant associations, which is more stable than inverse-variance method in the presence of rare or low-frequency variants. We are working intensely to improve the capabilities of MAGA, so please check back frequently for updates. 1

INPUT FILES Specification File REFERENCE_TYPE = internal # internal/external REFERENCE_PHENOTYPE_FILE =.//DataReference/Ref_pheno.dat REFERENCE_PHENOTYPE_FILE_HEADER = TRUE # TRUE/FALSE REGRESSION_MODEL = logistic # logistic/linear PHENOTYPE_COLUMN = 3 COVARIATE_COLUMN = 2 REFERENCE_MAP_FILE =.//DataReference/Ref_info_chr.dat REFERECEN_MAP_FILE_HEADER = TRUE # TRUE/FALSE SNP_ID_COLUMN = 3 SNP_POS_COLUMN = 2 SNP_FREQ_COLUMN = 4 REFERENCE_GENOTYPE_FILE =.//DataReference/Ref_dose_chr.dat ANNOTATION_FILE =.//DataAnnotation/geneList_downloaded_plink.txt ANNOTATION_TYPE = gene STUDY_LIST_FILE =.//DataSummary/StudyList_.txt STUDY_DIR =.//DataSummary/ OUTPUT_FILE =.//MetaResults/MetaResults chr.out MAF_CUTOFF = 0.05 The file describes the input/output files and the program parameters. The syntax follows KEYWORD = value1 [value2 ] with spaces around =. All the following lines are required unless otherwise stated as optional. REFERENCE_TYPE = internal/external When REFERENCE_TYPE = external, the lines REFERENCE_PHENOTYPE_FILE - COVARIATE_COLUMN will not be used and thus do not need to be specified. REFERENCE_PHENOTYPE_FILE = full_pathname REFERENCE_PHENOTYPE_FILE_HEADER = TRUE/FALSE REGRESSION_MODEL = linear/logistic Specify the regression model to fit the internal reference data. PHENOTYPE_COLUMN = column_number_in_reference_phenotype_file Specify the column (starting with number 1) to be used as the phenotype. COVARIATE_COLUMN = column_number_1 [column_number_2 ] 2

Specify column(s) in the reference phenotype file to be used as covariates in the regression model. Optional when REFERENCE_TYPE = internal; no covariate will be used by default. REFERENCE_MAP_FILE = prefix affix Specify the prefix and affix of the pathname. The program will insert the chromosome number (single digit for 1-9 and two digits for 10-23), specified by -chr, to obtain the full pathname. For example, for the two strings in the example specification file, the reference genotype file for chromosome 1 is accessed through the pathname.//datareference/ref_info_chr1.dat REFERENCE_MAP_FILE_HEADER = TRUE/FALSE SNP_ID_COLUMN = column_number_in_reference_map_file SNP_POS_COLUMN = column_number_in_reference_map_file SNP_FREQ_COLUMN = column_number_in_reference_map_file Optional. If not specified, the coding-allele frequencies will be internally determined from the genotype data. REFERENCE_GENOTYPE_FILE = prefix affix ANNOTATION_FILE = full_pathname ANNOTATION_TYPE = gene Specify the format of the annotation file. Currently, only the value gene is allowed. STUDY_LIST_FILE = full_pathname STUDY_DIR = directory_of_summary_result_files STUDY_DIR and the file names in STUDY_LIST_FILE together determines the full pathnames of the summary result files. OUTPUT_FILE = prefix affix MAF_CUTOFF = MAF_cutoff Only variants with MAFs MAF_CUTOFF are considered for meta-analysis. Reference Phenotype File ID sex status W162798 0 1 M129395 1 0 F180062 1 0 3

The file provides information on the phenotype and covariates of the internal reference subjects. Each row contains space or tab delimited data specific to an individual; the header row is optional. The column for the phenotype is required and those for the subject identifier and covariates are optional. In a case-control study, the disease variable should be coded 0/1 to represent unaffected/affected. Missing phenotypes or covariates are denoted as. or NA. Reference Map File Chr Pos Rs Freq 21 9887804 rs885550 0.9793 21 9928594 rs169757 0.9597 21 9928860 rs210498 0.8986 The file provides information on the SNPs of the reference panel, on the particular chromosome specified by --chr. Each row contains space or tab delimited data specific to a SNP; the rows do not need to be in genomic order and the header row is optional. The columns for the position and the SNP identifier are required and the one for the codingallele frequency is optional. If the position of a SNP is missing, it should be denoted as. or NA and that SNP will be excluded from analysis. The SNP identifier will be used to link the SNPs in the reference and the summary result files, and thus should be comparible. Reference Genotype File 1.996 1.967 1.965 1.986 1.976 1.976 1.974 1.867 1.853 The file provides (imputed) genotype information for the reference subjects. Each row contains space or tab delimited data specific to a SNP; the order of SNPs should align with their orders in the reference map file. Each column pertains to a reference subject; the order of subjects should align with their orders in the reference phenotype file if available. This file does not allow any header row or extra columns. 4

Annotation File 21 10042712 10120796 BAGE5 21 10079666 10120808 BAGE 21 13904368 13935777 A26B3 The file provides annotation information on the SNPs or genes. The current version of MAGA (v1.0) only allows the format of gene annotation. Specifically, each row contains data specific to a gene; the rows do not need to be in genomic order. The columns should be in the order of chr pos_start pos_end gene_name without column names. Study List File 1 AGES_HEIGHT_POOLED.txt 2 B58C-T1DGC_HEIGHT_WOMEN.txt The file provides the names of summary result files for participating studies. The first column lists arbitrary numbers indexing the studies, which are used by the program to track the meta-analysis. The second column lists the names of summary result files. The path of the files can be specified in the second column or in STUDY_DIR. When REFERENCE_TYPE = internal, make sure the summary result file of the reference study is not listed here to avoid double-contribution. When REFERENCE_TYPE = external, it is recommended that the study with the largest sample size be listed in the first row. Summary Result Files MarkerName N EAF BETA SE P rs3965725 323 0.5576-0.0911831 0.219373 0.67766 rs2261012 323 0.4854-0.10154 0.187825 0.58878 rs2259093 323 0.4795-0.113684 0.186375 0.54188 All summary result files should contain at least the columns for the SNP identifier, sample size, effect allele frequency, effect size estimate, standard error of effect size estimate, and p-value; their column names should be specified as in the example. The column order can be arbitrary. The row order can be arbitrary. 5

OUTPUT Output File GeneName chr start end nrare N P_T5 P_VT P_SKAT HSPA13 21 14665307 14677380 7 812 4.11e-01 7.40e-01 5.87e-01 SAMSN1 21 14779419 14840535 14 812 6.27e-01 7.58e-01 3.05e-01 The file contains information on the number of variants included in each gene (nrare), the number of subjects contributing to each gene (N), and the p-values of the burden test with the MAF threshold of 5% (T5), the variable threshold test (VT) and SKAT. EXAMPLE Download and unzip the software package. Enter the command $ MAGA -sfile spec_example.txt -chr 21 to obtain the results given in MetaResults chr21.out. REFERENCE Hu, Y.J., Berndt, S.I., Gustafsson, S., Ganna, A., Genetic Investigation of ANthropometric Traits (GIANT) Consortium, Hirschhorn, J., North, K.E., Ingelsson, E., and Lin, D.Y. Meta-Analysis of Gene-Level Associations for Rare Variants Based on Single-Variant Statistics. American Journal of Human Genetics 93: 236--248. VERSION HISTORY v1.0 2013/04/30 First version released. v1.1 2013/09/11 Small bugs fixed. 6