Polymorphism and Variant Analysis Lab

Similar documents
BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017

Step-by-Step Guide to Basic Genetic Analysis

Download PLINK from

Genetic Analysis. Page 1

Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.4 Graphical User Interface (GUI) Manual

GWAS Exercises 3 - GWAS with a Quantiative Trait

Tutorial. Identification of Variants Using GATK. Sample to Insight. November 21, 2017

KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li

Association Analysis of Sequence Data using PLINK/SEQ (PSEQ)

User s Guide. Version 2.2. Semex Alliance, Ontario and Centre for Genetic Improvement of Livestock University of Guelph, Ontario

Genetic type 1 Error Calculator (GEC)

Package lodgwas. R topics documented: November 30, Type Package

ToCatchAThief c ryan campbell & jenn coughlan 7/23/2018

Emile R. Chimusa Division of Human Genetics Department of Pathology University of Cape Town

Affymetrix Genotyping Console 4.2 Release Notes (For research use only. Not for use in diagnostic procedures.)

Step-by-Step Guide to Advanced Genetic Analysis

Package PedCNV. February 19, 2015

GMDR User Manual Version 1.0

Estimating. Local Ancestry in admixed Populations (LAMP)

Axiom Analysis Suite Release Notes (For research use only. Not for use in diagnostic procedures.)

The fgwas software. Version 1.0. Pennsylvannia State University

Importing and Merging Data Tutorial

PRSice: Polygenic Risk Score software - Vignette

MAGA: Meta-Analysis of Gene-level Associations

CircosVCF workshop, TAU, 9/11/2017

Bioinformatics - Homework 1 Q&A style

Optimising PLINK. Weronika Filinger. September 2, 2013

The fgwas Package. Version 1.0. Pennsylvannia State University

REAP Software Documentation

Quality control of array genotyping data with argyle Andrew P Morgan

Rice Imputation Server tutorial

BGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14)

SAP Standard Reporting Quick Reference Guide

User Manual. Ver. 3.0 March 19, 2012

ELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2

SEQGWAS: Integrative Analysis of SEQuencing and GWAS Data

Agilent Genomic Workbench 7.0

Package SimGbyE. July 20, 2009

User Manual ixora: Exact haplotype inferencing and trait association

1. Summary statistics test_gwas. This file contains a set of 50K random SNPs of the Subjective Well-being GWAS of the Netherlands Twin Register

Data input vignette Reading genotype data in snpstats

Package detectruns. February 6, 2018

Click on "+" button Select your VCF data files (see #Input Formats->1 above) Remove file from files list:

FVGWAS- 3.0 Manual. 1. Schematic overview of FVGWAS

LFMM version Reference Manual (Graphical User Interface version)

RnBeadsDJ A Quickstart Guide to the RnBeads Data Juggler

Agilent Genomic Workbench 6.5

SNP HiTLink Manual. Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1

GSEQ Software User s Guide for AccuID TM

Step-by-Step Guide to Relatedness and Association Mapping Contents

Tutorial: Resequencing Analysis using Tracks

Package inversion. R topics documented: July 18, Type Package. Title Inversions in genotype data. Version

PriVar documentation

Spotter Documentation Version 0.5, Released 4/12/2010

MAGMA manual (version 1.06)

A short manual for LFMM (command-line version)

Estimating Variance Components in MMAP

Creating and Using Genome Assemblies Tutorial

ENIGMA2 Protocol For Association Testing Using Related Subjects

Affymetrix Genotyping Console 3.0 User Manual

SNPViewer Documentation

Package RobustSNP. January 1, 2011

SUGEN 8.6 Overview. Misa Graff, July 2017

JMP Genomics. Release Notes. Version 6.0

Importing sequence assemblies from BAM and SAM files

Module 1 - Class Introduction

KaryoStudio v1.4 User Guide

PBAP Version 1 User Manual

PRSice: Polygenic Risk Score software v1.22

DEPARTMENT OF HEALTH AND HUMAN SCIENCES HS900 RESEARCH METHODS

QTX. Tutorial for. by Kim M.Chmielewicz Kenneth F. Manly. Software for genetic mapping of Mendelian markers and quantitative trait loci.

Package allehap. August 19, 2017

Analyzing Variant Call results using EuPathDB Galaxy, Part II

M(ARK)S(IM) Dec. 1, 2009 Payseur Lab University of Wisconsin

Tutorial. Identification of Variants in a Tumor Sample. Sample to Insight. November 21, 2017

MAGMA manual (version 1.05)

Introduction to Hail. Cotton Seed, Technical Lead Tim Poterba, Software Engineer Hail Team, Neale Lab Broad Institute and MGH

Helpful Galaxy screencasts are available at:

UIS USER GUIDE SEPTEMBER 2013 USER GUIDE FOR UIS.STAT (BETA)

The H3ABioNet GWAS Pipeline

PBAP Version 1 User Manual

Breeding Guide. Customer Services PHENOME-NETWORKS 4Ben Gurion Street, 74032, Nes-Ziona, Israel

MACAU User Manual. Xiang Zhou. March 15, 2017

Manual code: MSU_pigs.R

SOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie

Alpha 1 i2b2 User Guide

Release Notes. JMP Genomics. Version 4.0

Microsoft Windows Authentication Technical Note for GTGS

QuickReferenceCard. Axiom TM Analysis Suite - Analyzing your Samples. Setting Up and Running an Analysis

Acrobat 6.0 Standard - Basic Tasks

500K Data Analysis Workflow using BRLMM

PROMO 2017a - Tutorial

Documentation for FCgene

MQLS-XM Software Documentation

Package MultiMeta. February 19, 2015

Introduction to Stata Toy Program #1 Basic Descriptives

Recalling Genotypes with BEAGLECALL Tutorial

4.1. Access the internet and log on to the UCSC Genome Bioinformatics Web Page (Figure 1-

IBM Mobile Portal Accelerator Enablement

Gegenees genome format...7. Gegenees comparisons...8 Creating a fragmented all-all comparison...9 The alignment The analysis...

Transcription:

Polymorphism and Variant Analysis Lab Arian Avalos PowerPoint by Casey Hanson Polymorphism and Variant Analysis Matt Hudson 2018 1

Exercise In this exercise, we will do the following:. 1. Gain familiarity with a graphical user interface to PLINK 2. Run a Quality Control (QC) analysis on genotype data of 90 individuals of two ethnic groups(han Chinese and Japanese) genotyped for ~230,000 SNPs. 3. Use our QC data to perform a genome wide association test (GWAS) across two phenotypes: case and control. We will compare the results of our GWAS with and without multiple hypothesis correction. Polymorphism and Variant Analysis Matt Hudson 2018 2

Step 0D: Local Files For viewing and manipulating the files needed for this laboratory exercise, insert your flash drive. Denote the path to the flash drive as the following: [course_directory] We will use the files found in: [course_directory]/09_variant_analysis/data/ Polymorphism and Variant Analysis Matt Hudson 2018 3

Dataset Characteristics filename meaning plink.exe gplink.jar Haploview.jar wgas1.ped wgas1.map extra.ped extra.map pop.cov An executable of the PLINK GWAS toolkit. (Preinstalled) A JAVA graphical user interface (GUI) that interfaces with plink.exe. A haplotype analysis program written in JAVA. Used to view PLINK results and SNP analysis. Genotype data for 228,694 SNPS on 90 people. Map file for the snps in wgas1.ped. Genotype data for 29 SNPS on the same 90 people. Map file for the SNPS in extra.ped. Population membership of the 90 people. (1 = Han Chinese, 2 = Japanese) Polymorphism and Variant Analysis Matt Hudson 2018 4

The PED File Format The PED File Format specifies for each individual their genotype for each SNP and their phenotype. Family ID is either CH (Chinese) or JP (Japanese) Paternal and Maternal IDs of 0 indicate missing. Sex is either Male=1, Female=2, Other=Unknown Phenotype is either 0 = missing, 1 = affected, 2 = unaffected. Family ID Individual ID Paternal ID Maternal ID Sex Phenotype Genotype CH18526 NA18526 0 0 2 1 A A G.. Polymorphism and Variant Analysis Matt Hudson 2018 5

The MAP File Format The MAP File Format specifies the location of each SNP. Note: Morgans (M) are a special kind of genetic distance derived from chromosomal recombination studies. Morgans can be used to reconstruct chromosomal maps. chr SNP ID M BasePair Position 8 rs17121574 12.8 12799052 Polymorphism and Variant Analysis Matt Hudson 2018 6

Configuring gplink In this exercise, we will configure gplink to work with our data. Additionally, we will perform a format conversion to speed up our QC analysis. Finally, we will validate our conversion and see what individuals and SNPs would be filtered out with default filters for QC analysis. Polymorphism and Variant Analysis Matt Hudson 2018 7

Step 1A: Starting gplink gplink is a graphical user interface, written in JAVA, to the command line program PLINK. To start gplink, navigate to [course_directory]/09_variant_analysis/data/ Double click on gplink.jar Polymorphism and Variant Analysis Matt Hudson 2018 8

Step 1B: Starting gplink A window should appear similar to the one below: Polymorphism and Variant Analysis Matt Hudson 2018 9

Step 2A: Configuring gplink Click on the Project item on the Menu Bar. Select Open from the drop down menu. The pop-up window should look similar to the screenshot below. Click on Browse. Polymorphism and Variant Analysis Matt Hudson 2018 10

Step 2B: Configuring gplink In the file browser, navigate to the following directory: [course_directory]/05_variant_analysis Click on the data directory and click Open. Click OK on the Open Project window. Polymorphism and Variant Analysis Matt Hudson 2018 11

Step 2C: Configuring gplink You should see the files in the data folder in the Folder Viewer on the left hand side of gplink. Polymorphism and Variant Analysis Matt Hudson 2018 12

Step 3A: Creating a Binary Input File Click the PLINK item on the Menu Bar. Click Data Management. Click Generate fileset. In the next window, select Standard Input on the tab bar. Select wgas1 under Quick Fileset. Check Binary fileset. Under Output File input wgas2. Click OK. Polymorphism and Variant Analysis Matt Hudson 2018 13

Step 3B: Creating a Binary Input File On the Execute Command window, click OK. This will convert our wgas1 files to a binary format. Under the Operations Viewer, you will wgas2 with an R next to it indicating running. Wait for it to turn GREEN. Polymorphism and Variant Analysis Matt Hudson 2018 14

Step 3C: Creating a Binary Input File In the Folder Viewer, you should see a bunch of new wgas2 files created during the file creation process. Polymorphism and Variant Analysis Matt Hudson 2018 15

Step 4A: Validating the Conversion Click the PLINK item on the Menu Bar. Click Summary Statistics. Click Validate Fileset. In the next window, select Binary Input on the tab bar. Select wgas2 under Quick Fileset. Under Output File input validate. Click Threshold. Polymorphism and Variant Analysis Matt Hudson 2018 16

Step 4B: Validating the Conversion On the Threshold window: Set Minor allele frequency to 0.01. Set Maximum SNP missingness rate to 0.05. Set Maximum individual missingness rate to 0.05 Click OK. Polymorphism and Variant Analysis Matt Hudson 2018 17

Step 4C: Validating the Conversion On the Execute Command window click OK. Wait for the command to finish (validate will show the icon) Click on the validate track: Polymorphism and Variant Analysis Matt Hudson 2018 18

Step 4C: Validating the Conversion Look in the Log viewer 46834 out of ~ 230,000 SNPs were removed because the failed the MAF. 2728 SNPS were removed because they were not genotyped in enough individuals (minimum, 95%). Polymorphism and Variant Analysis Matt Hudson 2018 19

Step 4D: Validating the Conversion Click the + adjacent to the Validate track to expand it. Click the + adjacent to the Output track to expand it. Right click validate.irem and click Open in default viewer. You should see the following: JA19012 NA19012 The family ID is JA19012 (Japanese) and the individual ID is NA19012. This individual was removed because of a low genotyping rate. Polymorphism and Variant Analysis Matt Hudson 2018 20

Quality Control Analysis In this exercise, we will perform Quality Control Analysis (QC) to filter our data according to a set of criteria. Polymorphism and Variant Analysis Matt Hudson 2018 21

Quality Control Filters The validation tool will impose the following criteria on our data. filter meaning threshold Minor Allele Frequency (MAF) Individual Genotyping rate SNP genotyping rate The proportion of the minor allele to the major allele of a SNP in the population must exceed this threshold for the SNP to be included in the analysis The number of SNPs probed for an individual must exceed this threshold for the person to be analyzed. The SNP must be probed for at least this many individuals. 1% 95% 95% Polymorphism and Variant Analysis Matt Hudson 2018 22

Step 5A: Quality Control Analysis Click the PLINK item on the Menu Bar. Click Data Management. Click Generate Fileset. In the next window, select Binary Input on the tab bar. Select wgas2 under Quick Fileset. Click Binary fileset. Under Output File input wgas3. Click Threshold. Polymorphism and Variant Analysis Matt Hudson 2018 23

Step 5B: Quality Control Analysis On the Threshold window: Set Minor allele frequency to 0.01. Set Maximum SNP missingness rate to 0.05. Set Maximum individual missingness rate to 0.05 Click OK. Polymorphism and Variant Analysis Matt Hudson 2018 24

Step 5C: Quality Control Analysis Click OK. On the Execute Command window, click OK. This will create a new set of files prefixed wgas3 that are filtered according to the thresholds on the previous slide. Polymorphism and Variant Analysis Matt Hudson 2018 25

Genome Wide Association Test (GWAS) In this exercise, we will perform a GWAS on our filtered data across two phenotypes: a case study and control. We will then compare the results between unadjusted p-values and multiple hypothesis corrected p-values. Polymorphism and Variant Analysis Matt Hudson 2018 26

Step 6A: GWAS Click the PLINK item on the Menu Bar. Click Association. Click Allelic Association Tests. In the next window, select Binary Input on the tab bar. Select wgas3 under Quick Fileset. Click Adjusted p-values. Under Output File input assoc1. Click OK. Polymorphism and Variant Analysis Matt Hudson 2018 27

Step 6B: GWAS On the Execute Command window, click OK. This will perform the GWAS analysis on our data and store the results under assoc1 in the main window of gplink. Polymorphism and Variant Analysis Matt Hudson 2018 28

Step 7: GWAS Without Multiple Hypothesis Correction The SNP! values from our GWAS with no multiple hypothesis correction are located in the 9 th column of assoc1.assoc. You can inspect this file by Right Clicking it and selecting Open in default viewer. Open in Excel if you want to sort by p-value. Overall, 13,294 SNPS survive at! value of 0.05 WITHOUT Multiple Hypothesis Correction. The top 5 are shown below, after using the unix sort, awk, and head commands. Polymorphism and Variant Analysis Matt Hudson 2018 29

Step 8: GWAS With Multiple Hypothesis Correction The SNP! values from our GWAS with multiple hypothesis correction are located in the 9 th column of assoc1.assoc.adjusted. You can inspect this file by Right Clicking it and selecting Open in default viewer. Overall, only 4 SNPS!!! show a Bonfferoni Correction of less than 1. Polymorphism and Variant Analysis Matt Hudson 2018 30

Visualization In this exercise, we will generate a Manhattan Plot of our association results using Haploview from the Broad Institute. Polymorphism and Variant Analysis Matt Hudson 2018 31

Step 9A: Configuring Haploview Open Haploview from Desktop Click PLINK Format Polymorphism and Variant Analysis Matt Hudson 2018 32

Step 9B: Configuring Haploview Click on Browse next to Results File: Polymorphism and Variant Analysis Matt Hudson 2018 33

Step 9C: Configuring Haploview Navigate to the directory gplink saved the file assoc1.assoc. Select assoc1.assoc and click Open. Polymorphism and Variant Analysis Matt Hudson 2018 34

Step 9D: Configuring Haploview Click on Browse next to Map File: Polymorphism and Variant Analysis Matt Hudson 2018 35

Step 9E: Configuring Haploview Navigate to the data directory containing wgas1.map Select wgas1.map and click Open. Polymorphism and Variant Analysis Matt Hudson 2018 36

Step 9F: Configuring Haploview Click on OK. Polymorphism and Variant Analysis Matt Hudson 2018 37

Step 9G: Configuring Haploview Your asssoc1 should be shown in Haploview in tabular format. To create a Manhattan Plot, click Plot Polymorphism and Variant Analysis Matt Hudson 2018 38

Step 9H: Configuring Haploview Select Chromosomes for X-Axis Select P for Y-Axis Select log10 for Y-Axis Scale Click OK Polymorphism and Variant Analysis Matt Hudson 2018 39

Step 10: Manhattan Plot Haploview then should generate the following Manhattan Plot Polymorphism and Variant Analysis Matt Hudson 2018 40