GWAS Exercises 3 - GWAS with a Quantiative Trait

Size: px
Start display at page:

Download "GWAS Exercises 3 - GWAS with a Quantiative Trait"

Transcription

1 GWAS Exercises 3 - GWAS with a Quantiative Trait Peter Castaldi January 28, 2013 PLINK can also test for genetic associations with a quantitative trait (i.e. a continuous variable). In this exercise, we will be testing the effect of SNPs on the expression level of a gene transcript. This type of analysis is often called eqtl analysis. We will be using genotype and gene expression data from 90 cell lines obtained from HapMap CEU individuals. There is a package in R called GGtools that contains a number of functions and data structures for doing eqtl analysis. We won t review this R package in detail, but this was used to make many of the datasets that we will work with today. In this exercise we are going to test for association between SNPs on Chromosome 17 and mrna levels for the ORMDL3 gene. 1 Data Files Copy the genotype data and expression data for ORMDL3 to your directory with the following command: cp /cluster/tufts/cbicourse/gas/data/plink/ormdl3*. You should now have three files starting with ``ORMDL3'' in your directory. Take a peek at the MAP file by typing: head ORMDL3.map etc. The PED file is very large, you can try using the `head' command, but as you'll see, it's not very helpful. A unix command which shows how many rows in the file is: 1

2 wc -l ORMDL3.ped And how many columns: head -1 ORMDL3.ped awk '{print NF}' In the simplest case, all of the information needed for a GWAS is contained in the PED and MAP files. However, if you have multiple phenotypes or alternate phenotypes, it is often more convenient to have a separate file that contains your covariates and alternate phenotypes. The PLINK documentation refers to these as alternate phenotype and covariate files, but in practice you can use the same file for both. The first two columns of an alternate phenotype file need to be the Family ID and Individual ID columns, this is how PLINK merges the data in the PED file with the data in the phenotype file. ORMDL3 is a gene on Chromosome 17 that has been associated with susceptibility to asthma in multuiple GWAS. The PED file that you have contains genotype data for Chromosome 17 from 90 HapMap CEU samples. The phenotype file ORMDL3 Pheno.txt contains data on the expression of ORMDL3. 2 Examing Phenotype Distributions in R When doing association testing with a continuous variable, it is important to examine the distributional proporties of your phenotype. Phenotypes that are non-normally distributed can be susceptible to distorted analytic results from outlier values having an undue influence on the analysis. Start R by typing: module add R/ bsub -Ip -q int_public6 R The R code below reads in the alternate phenotype file, and displays the structure of the object with the str command. The ORMDL3 expression values are contained in the ORMDL3 variable. We examine the quantiles of this variable and plot the histogram of the variable distribution. > pheno <- read.table("ormdl3_pheno.txt", header = T, stringsasfactors = F) > str(pheno) 2

3 'data.frame': 90 obs. of 4 variables: $ FID : int $ IID : int $ ORMDL3: num $ male : int > quantile(pheno$ormdl3) 0% 25% 50% 75% 100% > pdf("ormdl3_histogram.pdf") > hist(pheno$ormdl3) > dev.off() null device 1 Take a look at your histogram by moving the file ORMDL3 Histogram.pdf to your desktop with WinScp. Remember that pdfs can t be opened directly from the WinScp window, you have to find them in Windows and click on it in the Windows environment (or open directly with Adobe, etc). The histogram should look like this: 3

4 Histogram of pheno$ormdl3 Frequency pheno$ormdl3 4

5 Run the following PLINK command to test all of the genotyped SNPs on Chromosome 17 for association with expression of ORMDL3 in this particular cell type. If you haven't already, type: module add plink/1.06 plink --file ORMDL3 --pheno ORMDL3_Pheno.txt --pheno-name ORMDL3 --assoc --out ORMDL3_Res Now let s read the results into R and do some basic interpretation, starting with generating a Q-Q plot. > res <- read.table("ormdl3_res.qassoc", stringsasfactors = F, + header = T) > library(snpstats) > pdf("ormdl3_qq.pdf") > qq.chisq(-2 * log(res$p), df = 2, pvals = TRUE, overdisp = TRUE) N omitted lambda > dev.off() null device 1 5

6 QQ plot Observed P value Expected Expected distribution: chi squared (2 df) 6

7 This is the QQ plot of your dreams. Probably the first thought should be that something is wrong, because there is such an excess of positive results. However, in this case, we know that there are strong signals between SNPs and the expression level of ORMDL3. A few questions: Question 1: You may have noticed that some numbers were spit out as you did the QQ plot. What is lambda and what is it supposed to represent? Question 2: Why is a QQ plot a good way to evaluate GWAS results? (Hint: What is the expected proportion of null to positive results in a large-scale genetic association analysis?) Question 3: The gray area on the QQ plot indicated the 95% confidence interval around the line of identity. Completely null results should be expected to mostly fall within the gray area. Why does the gray area increase as expected p-values become very low? Let s look at the top ten results from our analysis. First we have to order the results by p-value, then display the top ten results. > res <- res[order(res$p), ] > print(res[1:10, ]) CHR SNP BP NMISS BETA SE R2 T P rs e rs e rs e rs e rs e rs e rs e rs e rs e rs e-08 Move your results file and the ORMDL3.map to your desktop using WinScp. Open these files in the WGAViewer program. Click on 'File' Click on 'Open External Data File' 'Open PLINK Output' 7

8 Fill in the fields If you can get the program to load your data, you'll see a genome-wide representation of our results. The database plotter window can be an easy way to browse your results, though I personally don't like it very much. It does show that there is one large peak of low p-values. Any guess as to where this may be located? You can explore this pulling the genome coordinates from the browser and looking them up on the UCSC genome browser. Use the 2009 assembly. Here is what the results look like on locuszoom around the ORMDL3 gene. 8

9 Plotted SNPs 10 rs r log 10(p value) Recombination rate (cm/mb) ERBB2 IKZF3 GSDMB GSDMA CSF3 THRA MSL1 C17orf37 ZPBP2 ORMDL3 PSMD3 MED24 NR1D1 GRB7 SNORD Position on chr17 (Mb) 9

10 3 Other Stuff At this point, we ve covered how to do large-scale association analysis with both binary and continuous phenotypes in PLINK. Some additional tasks that you can try with this data. ˆ We saw a signal around ORMDL3. How could you determine whether there is one independent signal at that locus or multiple signals? ˆ Are there other significant hits for ORMDL3 expression on Chromsome 17? How would you decide? ˆ Do a case-control analysis testing for SNP associations on Chromosome 17 with gender. 10

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017 BICF Nano Course: GWAS GWAS Workflow Development using PLINK Julia Kozlitina Julia.Kozlitina@UTSouthwestern.edu April 28, 2017 Getting started Open the Terminal (Search -> Applications -> Terminal), and

More information

Spotter Documentation Version 0.5, Released 4/12/2010

Spotter Documentation Version 0.5, Released 4/12/2010 Spotter Documentation Version 0.5, Released 4/12/2010 Purpose Spotter is a program for delineating an association signal from a genome wide association study using features such as recombination rates,

More information

Polymorphism and Variant Analysis Lab

Polymorphism and Variant Analysis Lab Polymorphism and Variant Analysis Lab Arian Avalos PowerPoint by Casey Hanson Polymorphism and Variant Analysis Matt Hudson 2018 1 Exercise In this exercise, we will do the following:. 1. Gain familiarity

More information

BGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14)

BGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14) BGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14) Genome Informatics (Part 1) https://bioboot.github.io/bggn213_f17/lectures/#14 Dr. Barry Grant Nov 2017 Overview: The purpose of this lab session is

More information

Package lodgwas. R topics documented: November 30, Type Package

Package lodgwas. R topics documented: November 30, Type Package Type Package Package lodgwas November 30, 2015 Title Genome-Wide Association Analysis of a Biomarker Accounting for Limit of Detection Version 1.0-7 Date 2015-11-10 Author Ahmad Vaez, Ilja M. Nolte, Peter

More information

Package SMAT. January 29, 2013

Package SMAT. January 29, 2013 Package SMAT January 29, 2013 Type Package Title Scaled Multiple-phenotype Association Test Version 0.98 Date 2013-01-26 Author Lin Li, Ph.D.; Elizabeth D. Schifano, Ph.D. Maintainer Lin Li ;

More information

Step-by-Step Guide to Basic Genetic Analysis

Step-by-Step Guide to Basic Genetic Analysis Step-by-Step Guide to Basic Genetic Analysis Page 1 Introduction This document shows you how to clean up your genetic data, assess its statistical properties and perform simple analyses such as case-control

More information

Step-by-Step Guide to Advanced Genetic Analysis

Step-by-Step Guide to Advanced Genetic Analysis Step-by-Step Guide to Advanced Genetic Analysis Page 1 Introduction In the previous document, 1 we covered the standard genetic analyses available in JMP Genomics. Here, we cover the more advanced options

More information

Release Notes. JMP Genomics. Version 4.0

Release Notes. JMP Genomics. Version 4.0 JMP Genomics Version 4.0 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP. A Business Unit of SAS SAS Campus Drive

More information

QUICKTEST user guide

QUICKTEST user guide QUICKTEST user guide Toby Johnson Zoltán Kutalik December 11, 2008 for quicktest version 0.94 Copyright c 2008 Toby Johnson and Zoltán Kutalik Permission is granted to copy, distribute and/or modify this

More information

Package MultiMeta. February 19, 2015

Package MultiMeta. February 19, 2015 Type Package Package MultiMeta February 19, 2015 Title Meta-analysis of Multivariate Genome Wide Association Studies Version 0.1 Date 2014-08-21 Author Dragana Vuckovic Maintainer Dragana Vuckovic

More information

FVGWAS- 3.0 Manual. 1. Schematic overview of FVGWAS

FVGWAS- 3.0 Manual. 1. Schematic overview of FVGWAS FVGWAS- 3.0 Manual Hongtu Zhu @ UNC BIAS Chao Huang @ UNC BIAS Nov 8, 2015 More and more large- scale imaging genetic studies are being widely conducted to collect a rich set of imaging, genetic, and clinical

More information

Association Analysis of Sequence Data using PLINK/SEQ (PSEQ)

Association Analysis of Sequence Data using PLINK/SEQ (PSEQ) Association Analysis of Sequence Data using PLINK/SEQ (PSEQ) Copyright (c) 2018 Stanley Hooker, Biao Li, Di Zhang and Suzanne M. Leal Purpose PLINK/SEQ (PSEQ) is an open-source C/C++ library for working

More information

Small example of use of OmicABEL

Small example of use of OmicABEL Small example of use of OmicABEL Yurii Aulchenko for the OmicABEL developers July 1, 2013 Contents 1 Important note on data format for OmicABEL 1 2 Outline of the example 2 3 Prepare the data for analysis

More information

Genetic type 1 Error Calculator (GEC)

Genetic type 1 Error Calculator (GEC) Genetic type 1 Error Calculator (GEC) (Version 0.2) User Manual Miao-Xin Li Department of Psychiatry and State Key Laboratory for Cognitive and Brain Sciences; the Centre for Reproduction, Development

More information

PRSice: Polygenic Risk Score software - Vignette

PRSice: Polygenic Risk Score software - Vignette PRSice: Polygenic Risk Score software - Vignette Jack Euesden, Paul O Reilly March 22, 2016 1 The Polygenic Risk Score process PRSice ( precise ) implements a pipeline that has become standard in Polygenic

More information

Genetic Analysis. Page 1

Genetic Analysis. Page 1 Genetic Analysis Page 1 Genetic Analysis Objectives: 1) Set up Case-Control Association analysis and the Basic Genetics Workflow 2) Use JMP tools to interact with and explore results 3) Learn advanced

More information

Ricopili: Introdution. WCPG Education Day Stephan Ripke / Raymond Walters Toronto, October 2015

Ricopili: Introdution. WCPG Education Day Stephan Ripke / Raymond Walters Toronto, October 2015 Ricopili: Introdution WCPG Education Day Stephan Ripke / Raymond Walters Toronto, October 2015 What will we offer? Practical: Sorry, no practical sessions today, please refer to the summer school, organized

More information

Click on "+" button Select your VCF data files (see #Input Formats->1 above) Remove file from files list:

Click on + button Select your VCF data files (see #Input Formats->1 above) Remove file from files list: CircosVCF: CircosVCF is a web based visualization tool of genome-wide variant data described in VCF files using circos plots. The provided visualization capabilities, gives a broad overview of the genomic

More information

Download PLINK from

Download PLINK from PLINK tutorial Amended from two tutorials that the PLINK author Shaun Purcell wrote, see http://pngu.mgh.harvard.edu/~purcell/plink/tutorial.shtml and 'Teaching materials and example dataset' at http://pngu.mgh.harvard.edu/~purcell/plink/res.shtml

More information

Package QCEWAS. R topics documented: February 1, Type Package

Package QCEWAS. R topics documented: February 1, Type Package Type Package Package QCEWAS February 1, 2019 Title Fast and Easy Quality Control of EWAS Results Files Version 1.2-2 Date 2019-02-01 Author Peter J. van der Most, Leanne K. Kupers, Ilja Nolte Maintainer

More information

Package RobustSNP. January 1, 2011

Package RobustSNP. January 1, 2011 Package RobustSNP January 1, 2011 Type Package Title Robust SNP association tests under different genetic models, allowing for covariates Version 1.0 Depends mvtnorm,car,snpmatrix Date 2010-07-11 Author

More information

PRSice: Polygenic Risk Score software v1.22

PRSice: Polygenic Risk Score software v1.22 PRSice: Polygenic Risk Score software v1.22 Jack Euesden jack.euesden@kcl.ac.uk Cathryn M. Lewis April 30, 2015 Paul F. O Reilly Contents 1 Overview 3 2 R packages required 3 3 Quickstart 3 3.1 Input Data...................................

More information

The fgwas software. Version 1.0. Pennsylvannia State University

The fgwas software. Version 1.0. Pennsylvannia State University The fgwas software Version 1.0 Zhong Wang 1 and Jiahan Li 2 1 Department of Public Health Science, 2 Department of Statistics, Pennsylvannia State University 1. Introduction Genome-wide association studies

More information

SKAT Package. Seunggeun (Shawn) Lee. July 21, 2017

SKAT Package. Seunggeun (Shawn) Lee. July 21, 2017 SKAT Package Seunggeun (Shawn) Lee July 21, 2017 1 Overview SKAT package has functions to 1) test for associations between SNP sets and continuous/binary phenotypes with adjusting for covariates and kinships

More information

Package DSPRqtl. R topics documented: June 7, Maintainer Elizabeth King License GPL-2. Title Analysis of DSPR phenotypes

Package DSPRqtl. R topics documented: June 7, Maintainer Elizabeth King License GPL-2. Title Analysis of DSPR phenotypes Maintainer Elizabeth King License GPL-2 Title Analysis of DSPR phenotypes LazyData yes Type Package LazyLoad yes Version 2.0-1 Author Elizabeth King Package DSPRqtl June 7, 2013 Package

More information

MQLS-XM Software Documentation

MQLS-XM Software Documentation MQLS-XM Software Documentation Version 1.0 Timothy Thornton 1 and Mary Sara McPeek 2,3 Department of Biostatistics 1 The University of Washington Departments of Statistics 2 and Human Genetics 3 The University

More information

KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li

KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual Miao-Xin Li, Jiang Li Department of Psychiatry Centre for Genomic Sciences Department

More information

MAGMA manual (version 1.06)

MAGMA manual (version 1.06) MAGMA manual (version 1.06) TABLE OF CONTENTS OVERVIEW 3 QUICKSTART 4 ANNOTATION 6 OVERVIEW 6 RUNNING THE ANNOTATION 6 ADDING AN ANNOTATION WINDOW AROUND GENES 7 RESTRICTING THE ANNOTATION TO A SUBSET

More information

PRACTICAL SESSION 8 SEQUENCE-BASED ASSOCIATION, INTERPRETATION, VISUALIZATION USING EPACTS JAN 7 TH, 2014 STOM 2014 WORKSHOP

PRACTICAL SESSION 8 SEQUENCE-BASED ASSOCIATION, INTERPRETATION, VISUALIZATION USING EPACTS JAN 7 TH, 2014 STOM 2014 WORKSHOP PRACTICAL SESSION 8 SEQUENCE-BASED ASSOCIATION, INTERPRETATION, VISUALIZATION USING EPACTS JAN 7 TH, 2014 STOM 2014 WORKSHOP HYUN MIN KANG UNIVERSITY OF MICHIGAN, ANN ARBOR EPACTS ASSOCIATION ANALYSIS

More information

Package GWAF. March 12, 2015

Package GWAF. March 12, 2015 Type Package Package GWAF March 12, 2015 Title Genome-Wide Association/Interaction Analysis and Rare Variant Analysis with Family Data Version 2.2 Date 2015-03-12 Author Ming-Huei Chen

More information

Bayesian Multiple QTL Mapping

Bayesian Multiple QTL Mapping Bayesian Multiple QTL Mapping Samprit Banerjee, Brian S. Yandell, Nengjun Yi April 28, 2006 1 Overview Bayesian multiple mapping of QTL library R/bmqtl provides Bayesian analysis of multiple quantitative

More information

QTX. Tutorial for. by Kim M.Chmielewicz Kenneth F. Manly. Software for genetic mapping of Mendelian markers and quantitative trait loci.

QTX. Tutorial for. by Kim M.Chmielewicz Kenneth F. Manly. Software for genetic mapping of Mendelian markers and quantitative trait loci. Tutorial for QTX by Kim M.Chmielewicz Kenneth F. Manly Software for genetic mapping of Mendelian markers and quantitative trait loci. Available in versions for Mac OS and Microsoft Windows. revised for

More information

MAGMA manual (version 1.05)

MAGMA manual (version 1.05) MAGMA manual (version 1.05) TABLE OF CONTENTS OVERVIEW 3 QUICKSTART 4 ANNOTATION 6 OVERVIEW 6 RUNNING THE ANNOTATION 6 ADDING AN ANNOTATION WINDOW AROUND GENES 7 RESTRICTING THE ANNOTATION TO A SUBSET

More information

Package ridge. R topics documented: February 15, Title Ridge Regression with automatic selection of the penalty parameter. Version 2.

Package ridge. R topics documented: February 15, Title Ridge Regression with automatic selection of the penalty parameter. Version 2. Package ridge February 15, 2013 Title Ridge Regression with automatic selection of the penalty parameter Version 2.1-2 Date 2012-25-09 Author Erika Cule Linear and logistic ridge regression for small data

More information

Package SimGbyE. July 20, 2009

Package SimGbyE. July 20, 2009 Package SimGbyE July 20, 2009 Type Package Title Simulated case/control or survival data sets with genetic and environmental interactions. Author Melanie Wilson Maintainer Melanie

More information

Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.4 Graphical User Interface (GUI) Manual

Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.4 Graphical User Interface (GUI) Manual Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.4 Graphical User Interface (GUI) Manual Department of Epidemiology and Biostatistics Wolstein Research Building 2103 Cornell Rd Case Western

More information

Package OmicKriging. August 29, 2016

Package OmicKriging. August 29, 2016 Type Package Title Poly-Omic Prediction of Complex TRaits Version 1.4.0 Date 2016-03-03 Package OmicKriging August 29, 2016 Author Hae Kyung Im, Heather E. Wheeler, Keston Aquino Michaels, Vassily Trubetskoy

More information

The fgwas Package. Version 1.0. Pennsylvannia State University

The fgwas Package. Version 1.0. Pennsylvannia State University The fgwas Package Version 1.0 Zhong Wang 1 and Jiahan Li 2 1 Department of Public Health Science, 2 Department of Statistics, Pennsylvannia State University 1. Introduction The fgwas Package (Functional

More information

Practical Course in Genome Bioinformatics

Practical Course in Genome Bioinformatics Practical Course in Genome Bioinformatics 20/01/2017 Exercises - Day 1 http://ekhidna.biocenter.helsinki.fi/downloads/teaching/spring2017/ Answer questions Q1-Q3 below and include requested Figures 1-5

More information

ChIP-seq hands-on practical using Galaxy

ChIP-seq hands-on practical using Galaxy ChIP-seq hands-on practical using Galaxy In this exercise we will cover some of the basic NGS analysis steps for ChIP-seq using the Galaxy framework: Quality control Mapping of reads using Bowtie2 Peak-calling

More information

GMMAT: Generalized linear Mixed Model Association Tests Version 0.7

GMMAT: Generalized linear Mixed Model Association Tests Version 0.7 GMMAT: Generalized linear Mixed Model Association Tests Version 0.7 Han Chen Department of Biostatistics Harvard T.H. Chan School of Public Health Email: hanchen@hsph.harvard.edu Matthew P. Conomos Department

More information

GEO 425: SPRING 2012 LAB 9: Introduction to Postgresql and SQL

GEO 425: SPRING 2012 LAB 9: Introduction to Postgresql and SQL GEO 425: SPRING 2012 LAB 9: Introduction to Postgresql and SQL Objectives: This lab is designed to introduce you to Postgresql, a powerful database management system. This exercise covers: 1. Starting

More information

How To: Run the ENCODE histone ChIP- seq analysis pipeline on DNAnexus

How To: Run the ENCODE histone ChIP- seq analysis pipeline on DNAnexus How To: Run the ENCODE histone ChIP- seq analysis pipeline on DNAnexus Overview: In this exercise, we will run the ENCODE Uniform Processing ChIP- seq Pipeline on a small test dataset containing reads

More information

Linkage analysis with paramlink Session I: Introduction and pedigree drawing

Linkage analysis with paramlink Session I: Introduction and pedigree drawing Linkage analysis with paramlink Session I: Introduction and pedigree drawing In this session we will introduce R, and in particular the package paramlink. This package provides a complete environment for

More information

GMDR User Manual Version 1.0

GMDR User Manual Version 1.0 GMDR User Manual Version 1.0 Oct 30, 2011 1 GMDR is a free, open-source interaction analysis tool, aimed to perform gene-gene interaction with generalized multifactor dimensionality methods. GMDR is being

More information

MACAU User Manual. Xiang Zhou. March 15, 2017

MACAU User Manual. Xiang Zhou. March 15, 2017 MACAU User Manual Xiang Zhou March 15, 2017 Contents 1 Introduction 2 1.1 What is MACAU...................................... 2 1.2 How to Cite MACAU................................... 2 1.3 The Model.........................................

More information

MAGA: Meta-Analysis of Gene-level Associations

MAGA: Meta-Analysis of Gene-level Associations MAGA: Meta-Analysis of Gene-level Associations SYNOPSIS MAGA [--sfile] [--chr] OPTIONS Option Default Description --sfile specification.txt Select a specification file --chr Select a chromosome DESCRIPTION

More information

A. Using the data provided above, calculate the sampling variance and standard error for S for each week s data.

A. Using the data provided above, calculate the sampling variance and standard error for S for each week s data. WILD 502 Lab 1 Estimating Survival when Animal Fates are Known Today s lab will give you hands-on experience with estimating survival rates using logistic regression to estimate the parameters in a variety

More information

CircosVCF workshop, TAU, 9/11/2017

CircosVCF workshop, TAU, 9/11/2017 CircosVCF exercise In this exercise, we will create and design circos plots using CircosVCF. We will use vcf files of a published case "X-linked elliptocytosis with impaired growth is related to mutated

More information

Package FunciSNP. November 16, 2018

Package FunciSNP. November 16, 2018 Type Package Package FunciSNP November 16, 2018 Title Integrating Functional Non-coding Datasets with Genetic Association Studies to Identify Candidate Regulatory SNPs Version 1.26.0 Date 2013-01-19 Author

More information

You will be re-directed to the following result page.

You will be re-directed to the following result page. ENCODE Element Browser Goal: to navigate the candidate DNA elements predicted by the ENCODE consortium, including gene expression, DNase I hypersensitive sites, TF binding sites, and candidate enhancers/promoters.

More information

REAP Software Documentation

REAP Software Documentation REAP Software Documentation Version 1.2 Timothy Thornton 1 Department of Biostatistics 1 The University of Washington 1 REAP A C program for estimating kinship coefficients and IBD sharing probabilities

More information

Exercises. Biological Data Analysis Using InterMine workshop exercises with answers

Exercises. Biological Data Analysis Using InterMine workshop exercises with answers Exercises Biological Data Analysis Using InterMine workshop exercises with answers Exercise1: Faceted Search Use HumanMine for this exercise 1. Search for one or more of the following using the keyword

More information

Assignment 5.5. Nothing here to hand in

Assignment 5.5. Nothing here to hand in Assignment 5.5 Nothing here to hand in Load the tidyverse before we start: library(tidyverse) ## Loading tidyverse: ggplot2 ## Loading tidyverse: tibble ## Loading tidyverse: tidyr ## Loading tidyverse:

More information

Importing and Merging Data Tutorial

Importing and Merging Data Tutorial Importing and Merging Data Tutorial Release 1.0 Golden Helix, Inc. February 17, 2012 Contents 1. Overview 2 2. Import Pedigree Data 4 3. Import Phenotypic Data 6 4. Import Genetic Data 8 5. Import and

More information

ChIP-Seq Tutorial on Galaxy

ChIP-Seq Tutorial on Galaxy 1 Introduction ChIP-Seq Tutorial on Galaxy 2 December 2010 (modified April 6, 2017) Rory Stark The aim of this practical is to give you some experience handling ChIP-Seq data. We will be working with data

More information

Package Eagle. January 31, 2019

Package Eagle. January 31, 2019 Type Package Package Eagle January 31, 2019 Title Multiple Locus Association Mapping on a Genome-Wide Scale Version 1.3.0 Maintainer Andrew George Author Andrew George [aut, cre],

More information

Notes on QTL Cartographer

Notes on QTL Cartographer Notes on QTL Cartographer Introduction QTL Cartographer is a suite of programs for mapping quantitative trait loci (QTLs) onto a genetic linkage map. The programs use linear regression, interval mapping

More information

Step-by-Step Guide to Relatedness and Association Mapping Contents

Step-by-Step Guide to Relatedness and Association Mapping Contents Step-by-Step Guide to Relatedness and Association Mapping Contents OBJECTIVES... 2 INTRODUCTION... 2 RELATEDNESS MEASURES... 2 POPULATION STRUCTURE... 6 Q-K ASSOCIATION ANALYSIS... 10 K MATRIX COMPRESSION...

More information

1. Summary statistics test_gwas. This file contains a set of 50K random SNPs of the Subjective Well-being GWAS of the Netherlands Twin Register

1. Summary statistics test_gwas. This file contains a set of 50K random SNPs of the Subjective Well-being GWAS of the Netherlands Twin Register Quality Control for Genome-Wide Association Studies Bart Baselmans & Meike Bartels Boulder 2017 Setting up files and directories To perform a quality control protocol in a Genome-Wide Association Meta

More information

GxE.scan. October 30, 2018

GxE.scan. October 30, 2018 GxE.scan October 30, 2018 Overview GxE.scan can process a GWAS scan using the snp.logistic, additive.test, snp.score or snp.matched functions, whereas snp.scan.logistic only calls snp.logistic. GxE.scan

More information

BIMBAM user manual. Yongtao Guan and Matthew Stephens Baylor College of Medicine and University of Chicago. Version 1.0 Revised on 25 June 2015

BIMBAM user manual. Yongtao Guan and Matthew Stephens Baylor College of Medicine and University of Chicago. Version 1.0 Revised on 25 June 2015 BIMBAM user manual Yongtao Guan and Matthew Stephens Baylor College of Medicine and University of Chicago Version 1.0 Revised on 25 June 2015 Contents 1 Copyright 2 2 Introduction 3 2.1 The model..........................................

More information

LDheatmap (Version ): Example of Adding Tracks

LDheatmap (Version ): Example of Adding Tracks LDheatmap (Version 0.99-5): Example of Adding Tracks Jinko Graham and Brad McNeney August 15, 2018 1 Introduction As of version 0.9, LDheatmap allows users to flip the heatmap below a horizontal line in

More information

Tutorial 1: Exploring the UCSC Genome Browser

Tutorial 1: Exploring the UCSC Genome Browser Last updated: May 12, 2011 Tutorial 1: Exploring the UCSC Genome Browser Open the homepage of the UCSC Genome Browser at: http://genome.ucsc.edu/ In the blue bar at the top, click on the Genomes link.

More information

ChIP-seq Analysis Practical

ChIP-seq Analysis Practical ChIP-seq Analysis Practical Vladimir Teif (vteif@essex.ac.uk) An updated version of this document will be available at http://generegulation.info/index.php/teaching In this practical we will learn how

More information

ToCatchAThief c ryan campbell & jenn coughlan 7/23/2018

ToCatchAThief c ryan campbell & jenn coughlan 7/23/2018 ToCatchAThief c ryan campbell & jenn coughlan 7/23/2018 Welcome to the To Catch a Thief: With Data! walkthrough! https://bioconductor.org/packages/devel/ bioc/vignettes/snprelate/inst/doc/snprelatetutorial.html

More information

GCTA: a tool for Genome- wide Complex Trait Analysis

GCTA: a tool for Genome- wide Complex Trait Analysis GCTA: a tool for Genome- wide Complex Trait Analysis Version 1.04, 13 Sep 2012 Overview GCTA (Genome- wide Complex Trait Analysis) is designed to estimate the proportion of phenotypic variance explained

More information

Emile R. Chimusa Division of Human Genetics Department of Pathology University of Cape Town

Emile R. Chimusa Division of Human Genetics Department of Pathology University of Cape Town Advanced Genomic data manipulation and Quality Control with plink Emile R. Chimusa (emile.chimusa@uct.ac.za) Division of Human Genetics Department of Pathology University of Cape Town Outlines: 1.Introduction

More information

Finding and Exporting Data. BioMart

Finding and Exporting Data. BioMart September 2017 Finding and Exporting Data Not sure what tool to use to find and export data? BioMart is used to retrieve data for complex queries, involving a few or many genes or even complete genomes.

More information

Brief Guide on Using SPSS 10.0

Brief Guide on Using SPSS 10.0 Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new

More information

Manual code: MSU_pigs.R

Manual code: MSU_pigs.R Manual code: MSU_pigs.R Authors: Jose Luis Gualdrón Duarte 1 and Juan Pedro Steibel,3 1 Departamento de Producción Animal, Facultad de Agronomía, UBA-CONICET, Buenos Aires, ARG Department of Animal Science,

More information

file:///users/williams03/a/workshops/2015.march/final/intro_to_r.html

file:///users/williams03/a/workshops/2015.march/final/intro_to_r.html Intro to R R is a functional programming language, which means that most of what one does is apply functions to objects. We will begin with a brief introduction to R objects and how functions work, and

More information

MAGMA joint modelling options and QC read-me (v1.07a)

MAGMA joint modelling options and QC read-me (v1.07a) MAGMA joint modelling options and QC read-me (v1.07a) This document provides a brief overview of the (application of) the different options for conditional, joint and interaction analysis added in version

More information

Supplementary Material. Cell type-specific termination of transcription by transposable element sequences

Supplementary Material. Cell type-specific termination of transcription by transposable element sequences Supplementary Material Cell type-specific termination of transcription by transposable element sequences Andrew B. Conley and I. King Jordan Controls for TTS identification using PET A series of controls

More information

RNA-Seq Analysis With the Tuxedo Suite

RNA-Seq Analysis With the Tuxedo Suite June 2016 RNA-Seq Analysis With the Tuxedo Suite Dena Leshkowitz Introduction In this exercise we will learn how to analyse RNA-Seq data using the Tuxedo Suite tools: Tophat, Cuffmerge, Cufflinks and Cuffdiff.

More information

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to

More information

SEQGWAS: Integrative Analysis of SEQuencing and GWAS Data

SEQGWAS: Integrative Analysis of SEQuencing and GWAS Data SEQGWAS: Integrative Analysis of SEQuencing and GWAS Data SYNOPSIS SEQGWAS [--sfile] [--chr] OPTIONS Option Default Description --sfile specification.txt Select a specification file --chr Select a chromosome

More information

Lecture 9: Building Functions. Eugen Buehler November 14, 2016

Lecture 9: Building Functions. Eugen Buehler November 14, 2016 Lecture 9: Building Functions Eugen Buehler November 14, 2016 R Functions are objects R is a functional programming language. This means that functions are "objects", just like data frames, vectors, and

More information

A short Introduction to UCSC Genome Browser

A short Introduction to UCSC Genome Browser A short Introduction to UCSC Genome Browser Elodie Girard, Nicolas Servant Institut Curie/INSERM U900 Bioinformatics, Biostatistics, Epidemiology and computational Systems Biology of Cancer 1 Why using

More information

Package geneslope. October 26, 2016

Package geneslope. October 26, 2016 Type Package Package geneslope October 26, 2016 Title Genome-Wide Association Study with SLOPE Version 0.37.0 Date 2016-10-26 Genome-wide association study (GWAS) performed with SLOPE, short for Sorted

More information

Package LGRF. September 13, 2015

Package LGRF. September 13, 2015 Type Package Package LGRF September 13, 2015 Title Set-Based Tests for Genetic Association in Longitudinal Studies Version 1.0 Date 2015-08-20 Author Zihuai He Maintainer Zihuai He Functions

More information

Package detectruns. February 6, 2018

Package detectruns. February 6, 2018 Type Package Package detectruns February 6, 2018 Title Detect Runs of Homozygosity and Runs of Heterozygosity in Diploid Genomes Version 0.9.5 Date 2018-02-05 Detection of runs of homozygosity and of heterozygosity

More information

Stat 290: Lab 2. Introduction to R/S-Plus

Stat 290: Lab 2. Introduction to R/S-Plus Stat 290: Lab 2 Introduction to R/S-Plus Lab Objectives 1. To introduce basic R/S commands 2. Exploratory Data Tools Assignment Work through the example on your own and fill in numerical answers and graphs.

More information

Estimating Variance Components in MMAP

Estimating Variance Components in MMAP Last update: 6/1/2014 Estimating Variance Components in MMAP MMAP implements routines to estimate variance components within the mixed model. These estimates can be used for likelihood ratio tests to compare

More information

Tutorial. RNA-Seq Analysis of Breast Cancer Data. Sample to Insight. November 21, 2017

Tutorial. RNA-Seq Analysis of Breast Cancer Data. Sample to Insight. November 21, 2017 RNA-Seq Analysis of Breast Cancer Data November 21, 2017 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com AdvancedGenomicsSupport@qiagen.com

More information

Package Hapi. July 28, 2018

Package Hapi. July 28, 2018 Type Package Package Hapi July 28, 2018 Title Inference of Chromosome-Length Haplotypes Using Genomic Data of Single Gamete Cells Version 0.0.3 Author, Han Qu, Jinfeng Chen, Shibo Wang, Le Zhang, Julong

More information

Forensic Resource/Reference On Genetics knowledge base: FROG-kb User s Manual. Updated June, 2017

Forensic Resource/Reference On Genetics knowledge base: FROG-kb User s Manual. Updated June, 2017 Forensic Resource/Reference On Genetics knowledge base: FROG-kb User s Manual Updated June, 2017 Table of Contents 1. Introduction... 1 2. Accessing FROG-kb Home Page and Features... 1 3. Home Page and

More information

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from

More information

CHAPTER 6. The Normal Probability Distribution

CHAPTER 6. The Normal Probability Distribution The Normal Probability Distribution CHAPTER 6 The normal probability distribution is the most widely used distribution in statistics as many statistical procedures are built around it. The central limit

More information

SISG/SISMID Module 3

SISG/SISMID Module 3 SISG/SISMID Module 3 Introduction to R Ken Rice Tim Thornton University of Washington Seattle, July 2018 Introduction: Course Aims This is a first course in R. We aim to cover; Reading in, summarizing

More information

Genomic Files. University of Massachusetts Medical School. October, 2015

Genomic Files. University of Massachusetts Medical School. October, 2015 .. Genomic Files University of Massachusetts Medical School October, 2015 2 / 55. A Typical Deep-Sequencing Workflow Samples Fastq Files Fastq Files Sam / Bam Files Various files Deep Sequencing Further

More information

JMP Genomics. Release Notes. Version 6.0

JMP Genomics. Release Notes. Version 6.0 JMP Genomics Version 6.0 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP, A Business Unit of SAS SAS Campus Drive

More information

Package REGENT. R topics documented: August 19, 2015

Package REGENT. R topics documented: August 19, 2015 Package REGENT August 19, 2015 Title Risk Estimation for Genetic and Environmental Traits Version 1.0.6 Date 2015-08-18 Author Daniel J.M. Crouch, Graham H.M. Goddard & Cathryn M. Lewis Maintainer Daniel

More information

SUGEN 8.6 GxE. Misa Graff, July 2017

SUGEN 8.6 GxE. Misa Graff, July 2017 SUGEN 8.6 GxE Misa Graff, July 2017 Running a GxE analysis For a GxE the environmental variable has to be numeric. If we want to use sex or group as the environmental variable, then we should use the binary

More information

Package GEM. R topics documented: January 31, Type Package

Package GEM. R topics documented: January 31, Type Package Type Package Package GEM January 31, 2018 Title GEM: fast association study for the interplay of Gene, Environment and Methylation Version 1.5.0 Date 2015-12-05 Author Hong Pan, Joanna D Holbrook, Neerja

More information

Week 7: The normal distribution and sample means

Week 7: The normal distribution and sample means Week 7: The normal distribution and sample means Goals Visualize properties of the normal distribution. Learning the Tools Understand the Central Limit Theorem. Calculate sampling properties of sample

More information

SUGEN 8.6 Overview. Misa Graff, July 2017

SUGEN 8.6 Overview. Misa Graff, July 2017 SUGEN 8.6 Overview Misa Graff, July 2017 General Information By Ran Tao, https://sites.google.com/site/dragontaoran/home Website: http://dlin.web.unc.edu/software/sugen/ Standalone command-line software

More information

Population structure Jerome Goudet and Bruce Weir

Population structure Jerome Goudet and Bruce Weir Population structure Jerome Goudet and Bruce Weir 2018-07-13 Contents Individual populations Betas....................................... 1 Hapmap data 6 Sliding windows...............................................

More information

The Lander-Green Algorithm in Practice. Biostatistics 666

The Lander-Green Algorithm in Practice. Biostatistics 666 The Lander-Green Algorithm in Practice Biostatistics 666 Last Lecture: Lander-Green Algorithm More general definition for I, the "IBD vector" Probability of genotypes given IBD vector Transition probabilities

More information