SEQGWAS: Integrative Analysis of SEQuencing and GWAS Data

Similar documents
MAGA: Meta-Analysis of Gene-level Associations

SUGEN 8.6 Overview. Misa Graff, July 2017

Step-by-Step Guide to Basic Genetic Analysis

Step-by-Step Guide to Advanced Genetic Analysis

Package lodgwas. R topics documented: November 30, Type Package

Association Analysis of Sequence Data using PLINK/SEQ (PSEQ)

MAGMA manual (version 1.06)

Genetic Analysis. Page 1

MAGMA manual (version 1.05)

Intro to NGS Tutorial

PreMeta GENERAL INFORMATION SYNOPSIS

Click on "+" button Select your VCF data files (see #Input Formats->1 above) Remove file from files list:

PreMeta GENERAL INFORMATION SYNOPSIS

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017

FVGWAS- 3.0 Manual. 1. Schematic overview of FVGWAS

GMMAT: Generalized linear Mixed Model Association Tests Version 0.7

Polymorphism and Variant Analysis Lab

Package SMAT. January 29, 2013

BOLT-LMM v1.2 User Manual

PRACTICAL SESSION 8 SEQUENCE-BASED ASSOCIATION, INTERPRETATION, VISUALIZATION USING EPACTS JAN 7 TH, 2014 STOM 2014 WORKSHOP

JMP Genomics. Release Notes. Version 6.0

GCTA: a tool for Genome- wide Complex Trait Analysis

GMDR User Manual Version 1.0

Package SimGbyE. July 20, 2009

Importing and Merging Data Tutorial

QUICKTEST user guide

Package EMLRT. August 7, 2014

ELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2

KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li

iloci software is used to calculate the gene-gene interactions from GWAS data. This software was implemented by the OpenCL framework.

Recalling Genotypes with BEAGLECALL Tutorial

SKAT Package. Seunggeun (Shawn) Lee. July 21, 2017

PRSice: Polygenic Risk Score software v1.22

BioBin User Guide Current version: BioBin 2.3

MIRING: Minimum Information for Reporting Immunogenomic NGS Genotyping. Data Standards Hackathon for NGS HACKATHON 1.0 Bethesda, MD September

Step-by-Step Guide to Relatedness and Association Mapping Contents

Package FREGAT. April 21, 2017

BIMBAM user manual. Yongtao Guan and Matthew Stephens Baylor College of Medicine and University of Chicago. Version 1.0 Revised on 25 June 2015

MACAU User Manual. Xiang Zhou. March 15, 2017

Estimating Variance Components in MMAP

GCTA: a tool for Genome- wide Complex Trait Analysis

Package GEM. R topics documented: January 31, Type Package

Package MOJOV. R topics documented: February 19, 2015

Package coloc. February 24, 2018

Package RVS0.0 Jiafen Gong, Zeynep Baskurt, Andriy Derkach, Angelina Pesevski and Lisa Strug October, 2016

Tutorial on gene-c ancestry es-ma-on: How to use LASER. Chaolong Wang Sequence Analysis Workshop June University of Michigan

GMDR User Manual. GMDR software Beta 0.9. Updated March 2011

PRSice: Polygenic Risk Score software - Vignette

CircosVCF workshop, TAU, 9/11/2017

Package MultiMeta. February 19, 2015

Manual code: MSU_pigs.R

snpqc an R pipeline for quality control of Illumina SNP data

BOLT-LMM v2.0 User Manual

Axiom Analysis Suite Release Notes (For research use only. Not for use in diagnostic procedures.)

NAME QUICKTEST Quick association testing, for quantitative traits, allowing genotype uncertainty

Package GWAF. March 12, 2015

Dealing with heterogeneity: group-specific variances and stratified analyses

Spotter Documentation Version 0.5, Released 4/12/2010

Part 1: How to use IGV to visualize variants

Package seqmeta. February 9, 2017

BEAGLECALL 1.0. Brian L. Browning Department of Medicine Division of Medical Genetics University of Washington. 15 November 2010

Ricopili: Introdution. WCPG Education Day Stephan Ripke / Raymond Walters Toronto, October 2015

1. Summary statistics test_gwas. This file contains a set of 50K random SNPs of the Subjective Well-being GWAS of the Netherlands Twin Register

Agilent Genomic Workbench 7.0

CTL mapping in R. Danny Arends, Pjotr Prins, and Ritsert C. Jansen. University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1

Data Walkthrough: Background

A manual for the use of mirvas

Introduction to GDS. Stephanie Gogarten. July 18, 2018

Helpful Galaxy screencasts are available at:

BOLT-LMM v2.3 User Manual

Package RobustSNP. January 1, 2011

User s Guide Release 3.3

INTRODUCTION AUX FORMATS DE FICHIERS

Analyzing Variant Call results using EuPathDB Galaxy, Part II

The fgwas Package. Version 1.0. Pennsylvannia State University

v0.2.0 XX:Z:UA - Unassigned XX:Z:G1 - Genome 1-specific XX:Z:G2 - Genome 2-specific XX:Z:CF - Conflicting

Release Notes. JMP Genomics. Version 4.0

Data Currently Available (And How to Access It) Chance Hohensee Data Training September 9, 2016

Bioinformatics - Homework 1 Q&A style

General Help & Instructions to use with Examples

Package LGRF. September 13, 2015

GWAS Exercises 3 - GWAS with a Quantiative Trait

GSCAN GWAS Analysis Plan, v GSCAN GWAS ANALYSIS PLAN, Version 1.0 October 6, 2015

Introduction to GEMINI

Applications of admixture models

Convert Dosages to Genotypes Author: Autumn Laughbaum, Golden Helix, Inc.

PODKAT. An R Package for Association Testing Involving Rare and Private Variants. Ulrich Bodenhofer

User Manual for GIGI v1.06.1

Package FunciSNP. November 16, 2018

Maximizing Public Data Sources for Sequencing and GWAS

Forensic Resource/Reference On Genetics knowledge base: FROG-kb User s Manual. Updated June, 2017

TCGA Variant Call Format (VCF) 1.0 Specification

arxiv: v2 [q-bio.qm] 17 Nov 2013

User Manual. Ver. 3.0 March 19, 2012

BOLT-LMM v2.3.2 User Manual

Data formats in GWASTools

Tutorial. Identification of Variants Using GATK. Sample to Insight. November 21, 2017

bimm vignette Matti Pirinen & Christian Benner University of Helsinki November 15, 2016

User s Guide for R Routines to Perform Reference Marker Normalization

Package ukbtools. February 5, 2018

Transcription:

SEQGWAS: Integrative Analysis of SEQuencing and GWAS Data SYNOPSIS SEQGWAS [--sfile] [--chr] OPTIONS Option Default Description --sfile specification.txt Select a specification file --chr Select a chromosome DESCRIPTION SEQGWAS is a command-line program written in C/C++ for integrative analysis of sequencing and GWAS data. SEQGWAS produces all commonly used gene-level tests, including the burden test, variable threshold (VT) test, and sequence-kernel association test (SKAT), all of which are based on the score statistic for assessing the effects of individual variants on the trait of interest. SEQGWAS calculates the score statistic based on the observed genotypes for sequenced subjects and the imputed genotypes for non-sequenced subjects, and constructs a robust variance estimator that reflects the true variability of the score statistic regardless of the sampling scheme and imputation quality, so that the corresponding association tests always have correct type I error. We are working intensely to improve the capabilities of SEQGWAS, so please check back frequently for updates. INPUT FILES Specification File REGRESSION_MODEL = linear #linear/logistic SUBJECT_FILE =.//subject.dat SUBJECT_FILE_HEADER = TRUE SUBJECT_PHENOTYPE_COLUMN = 4 SUBJECT_COVARIATE_COLUMN = 2 3 SUBJECT_SEQUENCED_INDICATOR_COLUMN = 5 # optional VARIANT_FILE =.//variant_chr.dat 1

VARIANT_FILE_HEADER = TRUE VARIANT_ID_COLUMN = 2 VARIANT_POS_COLUMN = 1 VARIANT_FREQ_COLUMN = 5 VARIANT_RSQ_COLUMN = 8 # optional DOSAGE_FILE =.//dosage_chr.dat DOSAGE_FILE_HEADER = FALSE DOSAGE_FILE_SKIP_COLUMNS = 2 ANNOTATION_FILE =.//annotation_chr.dat ANNOTATION_FILE_HEADER = FALSE ANNOTATION_TYPE = SNP # SNP/gene ANNOTATION_POS_COLUMN = 2 ANNOTATION_ACCESSION_COLUMN = 3 ANNOTATION_FUNCTION_COLUMN = 4 ANNOTATION_GENE_COLUMN = 5 ANNOTATION_ID_COLUMN = 6 OUTPUT_FILE = results_chr.out MAF_CUTOFF = 0.05 The file describes the input/output files and the program parameters. The syntax follows KEYWORD = value1 [value2 ] with spaces around =. All the following lines are required unless otherwise stated as optional. REGRESSION_MODEL = linear/logistic Specify the regression model for genotype-phenotype association. SUBJECT_FILE = full_pathname SUBJECT_FILE_HEADER = TRUE/FALSE SUBJECT_PHENOTYPE_COLUMN = num Specify the column (starting with number 1) to be used as the phenotype. SUBJECT_COVARIATE_COLUMN = num_1 [num_2 ] Specify column(s) in the subject file to be used as covariates in the regression model. Optional. SUBJECT_SEQUENCED_INDICATOR_COLUMN = num Specify the column to be used as the indicator of whether the subject is sequenced. DOSAGE_FILE = prefix affix Specify the prefix and affix of the pathname. The program will insert the chromosome number (single digit for 1-9 and two digits for 10-23), specified by -chr, to obtain the 2

full pathname. For example, for the two strings in the example specification file, the dosage file for chromosome 1 is accessed through the pathname:.//dosage_chr1.dat DOSAGE_FILE_HEADER = TRUE/FALSE DOSAGE_FILE_SKIP_COLUMNS = num Skip the first num columns. VARIANT_FILE = prefix affix VARIANT_FILE_HEADER = TRUE/FALSE VARIANT_ID_COLUMN = num VARIANT_POS_COLUMN = num VARIANT_FREQ_COLUMN = num VARIANT_RSQ_COLUMN = num Optional. If not specified, the Rsq measure will be calculated internally. ANNOTATION_FILE = prefix affix ANNOTATION_FILE_HEADER = TRUE/FALSE ANNOTATION_TYPE = SNP Specify the format of the annotation file. Currently, only the value SNP is allowed. ANNOTATION_POS_COLUMN = num ANNOTATION_ACCESSION_COLUMN = num ANNOTATION_FUNCTION_COLUMN = num ANNOTATION_GENE_COLUMN = num ANNOTATION_ID_COLUMN = num OUTPUT_FILE = prefix affix MAF_CUTOFF = MAF_cutoff Only variants with MAFs MAF_CUTOFF are considered for analysis. All the data files are space- or tab-delimited and can allow for one header row (or no header row). Subject File GWAS_ID AfrIA age BMI sequenced 700001 0.779796662 74 33.17012 0 700002 0.774728994 76 32.4515 0 700003 0.765335395 59 22.94974 0 3

The file provides information on the phenotype, covariates, and sequencing indicator (indicating whether a subject is sequenced or not) for all subjects in the GWAS cohort. Each row is specific to an individual. The column for the phenotype and the sequencing indicator is required and those for the subject identifier and covariates are optional. In a case-control study, the disease variable should be coded 0/1 to represent unaffected/affected. Missing data are denoted as. or NA. Variant File pos SNP Al1 Al2 Freq1 MAF AvgCall Rsq 10862587 snp.1218005 C C 1 0 1 0 10862595 snp.1218006 A A 1 0 1 0 10862598 snp.1218007 C T 0.99314 0.00686 0.99314 4e-05 The file provides information on the sequencing-identified variants as well as GWAS SNPs on the particular chromosome specified by --chr. Each row is specific to a SNP; the rows must be in genomic order. The columns for the position, SNP identifier, coding-allele frequency are required and the one for the Rsq measurement is optional. If the position of a SNP is missing, it should be denoted as. or NA and that SNP will be excluded from analysis. The SNP position will be used to link the SNPs in the variant and the annotation files, and thus should be comparible. Dosage File 700001 1.996 1.967 1.965 700002 1.986 1.976 1.976 700003 1.974 1.867 1.853 The file provides (imputed) genotypic dosages for all the subjects in the GWAS cohort. Each row pertains to a subject; the order of subjects must align with their orders in the subject file. Each column pertains to a SNP; the order of SNPs must align with their orders in the variant file. This file allows arbitraty number of columns in front of the main data body. 4

Annotation File 21 44473956 NM_000071 utr-3 CBS snp.1227710 21 44473963 NM_000071 utr-3 CBS snp.1227711 21 44473980 NM_000071 utr-3 CBS snp.1227714 21 44474003 NM_000071 missense CBS snp.1227716 The file provides annotation information for the SNPs. The current version of SEQGWAS (v1.0) only allows the annotation format for SNPs. Specifically, each row pertains to a SNP; the rows must be grouped by the accession number. OUTPUT Output File chr index gene accession n_var Rsq_gene p_t1 p_t5 p_v p_skat 21 309 LIPI NM_198996 9 0.7026 8.74e-1 3.83e-1 7.13e-1 5.03e-1 21 311 TPTE NM_199259 31 0.0012 2.99e-1 4.00e-1 2.67e-1 2.24e-1 The file contains information on the number of variants included in each gene (n_var), the gene-averged Rsq (Rsq_gene), and the p-values of the burden test with the MAF threshold of 1% (T1) and 5% (T1), the variable threshold test (VT) and SKAT. EXAMPLE Download and unzip the software package. Enter the command $ SEQGWAS -sfile specification.txt -chr 21 to obtain the results given in results_chr21.out. REFERENCE Hu, Y.J., Li, Y., Auer, P,L. and Lin, D.Y. Integrative Analysis of Sequencing and GWAS Data for Rare Variant Associations. Submitted. 5

VERSION HISTORY v1.0 2014/03/04 First version released. 6