SUGEN 8.6 Overview. Misa Graff, July 2017

Similar documents
SEQGWAS: Integrative Analysis of SEQuencing and GWAS Data

SUGEN 8.6 GxE. Misa Graff, July 2017

MAGA: Meta-Analysis of Gene-level Associations

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017

Package lodgwas. R topics documented: November 30, Type Package

Step-by-Step Guide to Basic Genetic Analysis

Association Analysis of Sequence Data using PLINK/SEQ (PSEQ)

MAGMA manual (version 1.05)

MAGMA manual (version 1.06)

GSCAN GWAS Analysis Plan, v GSCAN GWAS ANALYSIS PLAN, Version 1.0 October 6, 2015

Dealing with heterogeneity: group-specific variances and stratified analyses

PreMeta GENERAL INFORMATION SYNOPSIS

QUICKTEST user guide

GMMAT: Generalized linear Mixed Model Association Tests Version 0.7

PreMeta GENERAL INFORMATION SYNOPSIS

Package FREGAT. April 21, 2017

PRACTICAL SESSION 8 SEQUENCE-BASED ASSOCIATION, INTERPRETATION, VISUALIZATION USING EPACTS JAN 7 TH, 2014 STOM 2014 WORKSHOP

Package seqmeta. February 9, 2017

GMDR User Manual Version 1.0

SKAT Package. Seunggeun (Shawn) Lee. July 21, 2017

PRSice: Polygenic Risk Score software v1.22

PLATO User Guide. Current version: PLATO 2.1. Last modified: September Ritchie Lab, Geisinger Health System

Polymorphism and Variant Analysis Lab

BioBin User Guide Current version: BioBin 2.3

BOLT-LMM v1.2 User Manual

PODKAT. An R Package for Association Testing Involving Rare and Private Variants. Ulrich Bodenhofer

Package OmicKriging. August 29, 2016

JMP Genomics. Release Notes. Version 6.0

Package MOJOV. R topics documented: February 19, 2015

GCTA: a tool for Genome- wide Complex Trait Analysis

Genetic Analysis. Page 1

Introduction to GDS. Stephanie Gogarten. July 18, 2018

Package ukbtools. February 5, 2018

Download PLINK from

Estimating Variance Components in MMAP

Step-by-Step Guide to Advanced Genetic Analysis

BeviMed Guide. Daniel Greene

MQLS-XM Software Documentation

MACAU User Manual. Xiang Zhou. March 15, 2017

Correctly Compute Complex Samples Statistics

Package podkat. March 21, 2019

bimm vignette Matti Pirinen & Christian Benner University of Helsinki November 15, 2016

Package samplesizelogisticcasecontrol

Package RVS0.0 Jiafen Gong, Zeynep Baskurt, Andriy Derkach, Angelina Pesevski and Lisa Strug October, 2016

The Lander-Green Algorithm in Practice. Biostatistics 666

Importing and Merging Data Tutorial

Package coloc. February 24, 2018

Package GWAF. March 12, 2015

Data formats in GWASTools

Package SimGbyE. July 20, 2009

Correctly Compute Complex Samples Statistics

BOLT-LMM v2.3 User Manual

GWAS Exercises 3 - GWAS with a Quantiative Trait

SEEK User Manual. Introduction

PRSice: Polygenic Risk Score software - Vignette

DatXPlore A software for data exploration and visualization. January 2014

BOLT-LMM v2.0 User Manual

BOLT-LMM v2.3.2 User Manual

KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li

GCTA: a tool for Genome- wide Complex Trait Analysis

Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.4 User Reference Manual

ENIGMA2 Protocol For Association Testing Using Related Subjects

REAP Software Documentation

Package cnvgsa. R topics documented: January 4, Type Package

Supervised vs unsupervised clustering

NAME QUICKTEST Quick association testing, for quantitative traits, allowing genotype uncertainty

Axiom Analysis Suite Release Notes (For research use only. Not for use in diagnostic procedures.)

Click on "+" button Select your VCF data files (see #Input Formats->1 above) Remove file from files list:

Manual code: MSU_pigs.R

Data Walkthrough: Background

GMDR User Manual. GMDR software Beta 0.9. Updated March 2011

Package rqt. November 21, 2017

The Use of Sample Weights in Hot Deck Imputation

Getting Started with JMP at ISU

haplo.score Score Tests for Association of Traits with Haplotypes when Linkage Phase is Ambiguous

A comprehensive modelling framework and a multiple-imputation approach to haplotypic analysis of unrelated individuals

BIMBAM user manual. Yongtao Guan and Matthew Stephens Baylor College of Medicine and University of Chicago. Version 1.0 Revised on 25 June 2015

StatCalc User Manual. Version 9 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved.

Package Mega2R. June 18, 2018

Package risksetroc. February 20, 2015

Classification/Regression Trees and Random Forests

Also, for all analyses, two other files are produced upon program completion.

CircosVCF workshop, TAU, 9/11/2017

CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

Intro to NGS Tutorial

Tutorial on gene-c ancestry es-ma-on: How to use LASER. Chaolong Wang Sequence Analysis Workshop June University of Michigan

10. Interfacing R. Thomas Lumley Ken Rice. Universities of Washington and Auckland. Seattle, July 2016

The fgwas Package. Version 1.0. Pennsylvannia State University

Package EBglmnet. January 30, 2016

MAN Package for pedigree analysis. Contents.

AMELIA II: A Program for Missing Data

Step-by-Step Guide to Relatedness and Association Mapping Contents

PBAP Version 1 User Manual

Genetic type 1 Error Calculator (GEC)

Quality control of array genotyping data with argyle Andrew P Morgan

Package survivalmpl. December 11, 2017

Package PathwaySeq. February 3, 2015

Introduction to GEMINI

Multiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health

Transcription:

SUGEN 8.6 Overview Misa Graff, July 2017

General Information By Ran Tao, https://sites.google.com/site/dragontaoran/home Website: http://dlin.web.unc.edu/software/sugen/ Standalone command-line software program in C++ GEE to account for relatedness Capable of performing a pooled-analysis of 50,000+ subjects Model-based variance estimator accurate for low-frequency variants equivalent to standard regression for independent samples Continuous and binary traits Single-variant analysis using Wald statistics standard, conditional, gene-environment interaction analysis Gene-based rare-variant analysis based on score statistics

Synopsis $ SUGEN [--pheno pheno_file] [--formula formula] [--id-col iid] [-- family-col fid] [--weight-col wt] [--vcf vcf_file.gz] [--probmatrix prob_file] [--unweighted] [--model model] [--robust-variance] [-- cond cond_file] [--ge envi_covs] [--score] [--score-rescale rescale_rule] [--group group_file] [--group-maf maf_ub] [--groupcallrate cr_lb] [--out-prex out prex] [--out-zip] [--extract-chr chr] [--extract-range range] [--extract-file extract_file]

Input options --pheno {pheno_file}: specifies the phenotype file Missing = NA --formula {formula}: specifies the regression formula trait = age + gender + pc1 + pc2 trait =" (time, event)=age + gender + pc1 + pc2 (time, event= --id-col {iid}: specifies the subject ID column in the pheno file --family-col {fid}: specifies the family ID column in the pheno file when subjects are independent, set fid = iid

Input options --weight-col {WT}: for weighted analyses, specifies the weight column in pheno_file. The default column name is WT. --probmatrix {prob_file}: Specifies the file that contains the file names of the pairwise inclusion probability matrices. The default name is probmatrix.txt. This option is optional in weighted analysis and ignored in unweighted analysis. --unweighted: specifies unweighted analyses --vcf {vcf_file.gz}: specifies the VCF file --dosage: Analyzes dosage data in the VCF file. The dosages must be stored in the DS field of the VCF file.

Input options --model {model}: Specifies the regression model. The default value is linear. linear (linear regression: trait is continuous), logistic (logistic regression: trait is binary 0/1), coxph (Cox proportional hazards regression: event time is positive, and the event indicator is binary) --left-truncation {left_truncation_time}: Specifies the left truncation time (if any) in Cox proportional hazards regression --robust-variance: the robust variance estimator will be used, otherwise the model-based variance estimator will be used. --cond {cond-file}: performs conditional analysis conditioning on the variants included in cond_file.

Input options --ge {envi_covs}: In single-variant analysis, performs gene-environment interaction analysis. envi_covs are the names of the environment variables. format of envi_covs is covariate_1,covariate_2,...,covariate_k. multiple environment variables are separately by commas. either --cond cond_file or --ge envi_covs can be specified, but not both. If neither is specified, then standard association analysis is performed. --score : Uses score statistics. --score-rescale {rescale_rule}: Specifies the method to rescale the score statistics. There are two options: naive and optimal. The default value is naive. Option for weighted analyses. --group {group_file}: Performs gene-based association analysis. Gene memberships of variants are defined in group_file. This option is valid only when --score is specified. --hetero-variance {strata}: Allows the residual variance in linear regression to be different in different levels of strata. Can help reduce inflation.

Output options --out-prefix {out_prefix}: specifies the prefix of the output files --out-zip: compresses the output files --extract-chr {chr}: restricts single-variant analysis to variants in chromosome chr --extract-range {range}: restricts single-variant analysis to variants in a specic range in chromosome chr range format: 1000000-2000000 --extract-file {extract_file}: restricts single-variant analysis to variants in extract file variant IDs in chromosome:position format

Phenotype File fid scanid EV1 EV2 sex sex_num age trait disease group group_unc group_uw 1 p36 2.074 0.4816 M 1 23 89.075 0 uw 0 1 1 p207 0.9547-0.0854 F 2 69 273.06 0 unc 1 0 2 p370 0.7555-0.5965 F 2 59 234.74 1 unc 1 0 2 p202 0.9292-0.8418 F 2 19 87.782 0 unc 1 0 3 p290-0.71 1.2707 M 1 39 140.03 0 unc 1 0 3 p421-0.2321-0.9616 M 1 56 239.42 0 unc 1 0 Tab delimited text file. Missing data denoted by \NA" Required columns: subject ID (must match VCF file), family ID, trait binary trait coded as 0/1 Optional columns: covariates Trait and covariates must be numeric

VCF File Compressed by bgzip and indexed by tabix, using the following commands: To produce the vcf_file.gz $ bgzip vcf_file To produce vcf_file.gz.tbi $ tabix p vcf f vcf_file.gz

Other input files cond_file specified in option --cond cond_file each row is a variant variant ID in chromosome:position format group_file specified in option --group group_file gene1 1:1000,1:1003 gene2 3:50000,3:50354,3:54352 group and variant IDs are separated by a tab extract_file specified in option --extract-file extract_file same format as cond_file

Output files Single-variant analysis chromosome, position, reference/alternative alleles, allele counts, allele frequency, effect estimates, standard error estimates, p-values Gene-based analysis single-variant results score statistics and their covariance matrices in MASS format

Gene-based Rare-variant Tests Using MASS MASS website: http://dlin.web.unc.edu/software/mass/ Performs burden, CMC, variable threshold, SKAT, SKAT-O tests Calculates asymptotic and resampling p-values Performs fixed effects and trans-ethnic meta-analysis Performs conditional analysis Summary statistics generated by SUGEN can be directly loaded into MASS gene association software.