haplo.score Score Tests for Association of Traits with Haplotypes when Linkage Phase is Ambiguous

Similar documents
Package haplo.stats. R topics documented: April 1, Version Date

Step-by-Step Guide to Basic Genetic Analysis

Genetic Analysis. Page 1

The haplo.stats Package

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017

QTX. Tutorial for. by Kim M.Chmielewicz Kenneth F. Manly. Software for genetic mapping of Mendelian markers and quantitative trait loci.

Package asymld. August 29, 2016

Step-by-Step Guide to Advanced Genetic Analysis

Family Based Association Tests Using the fbat package

Package lodgwas. R topics documented: November 30, Type Package

Notes on QTL Cartographer

A comprehensive modelling framework and a multiple-imputation approach to haplotypic analysis of unrelated individuals

Creating a data file and entering data

QUICKTEST user guide

PRSice: Polygenic Risk Score software - Vignette

Running Minitab for the first time on your PC

Preparing for Data Analysis

Estimation of haplotypes

Package RSNPset. December 14, 2017

MAN Package for pedigree analysis. Contents.

SOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie

Estimating Variance Components in MMAP

Computer lab 2 Course: Introduction to R for Biologists

HaploHMM - A Hidden Markov Model (HMM) Based Program for Haplotype Inference Using Identified Haplotypes and Haplotype Patterns

Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018

Mr. Kongmany Chaleunvong. GFMER - WHO - UNFPA - LAO PDR Training Course in Reproductive Health Research Vientiane, 22 October 2009

User Manual ixora: Exact haplotype inferencing and trait association

SUB-PANEL SPREADSHEET GUIDE (SUB-PANELUG)

Polymorphism and Variant Analysis Lab

Package REGENT. R topics documented: August 19, 2015

CHAPTER 1 INTRODUCTION

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Preparing for Data Analysis

STATISTICS (STAT) Statistics (STAT) 1

FHIR Genomics Use Case: Reporting HLA Genotyping Consensus-Sequence-Block & Observation-genetics-Sequence Cardinality

Surviving SPSS.

Regression III: Advanced Methods

Association Analysis of Sequence Data using PLINK/SEQ (PSEQ)

CHAPTER 1: INTRODUCTION...

Introducing Categorical Data/Variables (pp )

Package SimGbyE. July 20, 2009

I Launching and Exiting Stata. Stata will ask you if you would like to check for updates. Update now or later, your choice.

SUGEN 8.6 Overview. Misa Graff, July 2017

Development of linkage map using Mapmaker/Exp3.0

Package citools. October 20, 2018

Genome-Wide Association Study Using

An Introduction to the R Commander

Quality control of array genotyping data with argyle Andrew P Morgan

Package DSPRqtl. R topics documented: June 7, Maintainer Elizabeth King License GPL-2. Title Analysis of DSPR phenotypes

SPSS. (Statistical Packages for the Social Sciences)

Brief Guide on Using SPSS 10.0

Data analysis using Microsoft Excel

Introduction to Stata Toy Program #1 Basic Descriptives

Release Notes and Installation Guide (Unix Version)

Chapter 3: The IF Function and Table Lookup

Introduction (SPSS) Opening SPSS Start All Programs SPSS Inc SPSS 21. SPSS Menus

Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.4 User Reference Manual

CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS

Let s use Technology Use Data from Cycle 14 of the General Social Survey with Fathom for a data analysis project

The Power and Sample Size Application

Extensions to the Cox Model: Stratification

MHPE 494: Data Analysis. Welcome! The Analytic Process

MultiMap. User's Guide. A Program for Automated Genetic Linkage Mapping. version 1.1. Written by. Tara Matise. Mark Perlin Aravinda Chakravarti

Linkage Analysis Package. User s Guide to Support Programs

PediHaplotyper Manual

Step-by-Step Guide to Relatedness and Association Mapping Contents

Generalized Linear Models

CTL mapping in R. Danny Arends, Pjotr Prins, and Ritsert C. Jansen. University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

Also, for all analyses, two other files are produced upon program completion.

JMP Book Descriptions

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

Basic Medical Statistics Course

User s Guide. Version 2.2. Semex Alliance, Ontario and Centre for Genetic Improvement of Livestock University of Guelph, Ontario

Bayesian Multiple QTL Mapping

Tutorial #1: Using Latent GOLD choice to Estimate Discrete Choice Models

CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT

Convert Dosages to Genotypes Author: Autumn Laughbaum, Golden Helix, Inc.

8. MINITAB COMMANDS WEEK-BY-WEEK

A Simple Guide to Using SPSS (Statistical Package for the. Introduction. Steps for Analyzing Data. Social Sciences) for Windows

Download PLINK from

A short manual for LFMM (command-line version)

Package LEA. April 23, 2016

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University

The Imprinting Model

Data Analysis using SPSS

Experimental epidemiology analyses with R and R commander. Lars T. Fadnes Centre for International Health University of Bergen

Package RobustSNP. January 1, 2011

Importing and Merging Data Tutorial

IBMSPSSSTATL1P: IBM SPSS Statistics Level 1

Package JGEE. November 18, 2015

SPSS TRAINING SPSS VIEWS

Package GWAF. March 12, 2015

Laboratory for Two-Way ANOVA: Interactions

User manual search for suitable kidney recipient 21nov2017

JMP Genomics. Release Notes. Version 6.0

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

Small example of use of OmicABEL

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression

Transcription:

haploscore Score Tests for Association of Traits with Haplotypes when Linkage Phase is Ambiguous Charles M Rowland, David E Tines, and Daniel J Schaid Mayo Clinic Rochester, MN E-mail contact: rowland@mayoedu I Brief Description A suite of S-PLUS routines, referred to as "haploscore", can be used to compute score statistics to test associations between haplotypes and a wide variety of traits, including binary, ordinal, quantitative, and Poisson These methods assume that all subjects are unrelated and that haplotypes are ambiguous (due to unknown linkage phase of the genetic markers) The methods provide several different global and haplotype-specific tests for association, as well as provide adjustment for non-genetic covariates and computation of simulation p-values (which may be needed for sparse data) Details on the background and theory of the score statistics can be found in the following reference: Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA Score tests for association of traits with haplotypes when linkage phase is ambiguous American J Human Genetics, February, 2002 II Important Details The current Version 10 of haploscore is written for S-PLUS version 60 on a Unix operating system Installation procedures are provided in a README file included with the software distribution III Getting Started by Example After installing the haploscore package, the routines and an example data set are available by starting S-PLUS and attaching the appropriate directory The easiest way to get started is by following an example The experienced S-PLUS user may want to skip the example and simply view the details in the help file for haploscore In the following S-PLUS example session, indented text following the prompt ">" is what the user would type into an S-PLUS session, as well as results printed by S-PLUS 1

Example S-PLUS Session To the illustrate use of the haploscore analyses, you will first need to attach the directory with the haploscore routines and example data set (hlademo) To do this, we use the local library concept, and type > libary(haplo,libloc="/installation Directory Path Here/haploscore") where "/Installation Directory Path Here/" is the directory path that leads to the directory haploscore (note that haplo is a subdirectory of haploscore) Now look at the names of variables in hlademo > names(hlademo) [1] "resp" "respcat" "male" "age" "DQBa1" "DQBa2" [7] "DRBa1" "DRBa2" "Ba1" "Ba2" The variables are defined as follows: resp quantitative antibody response to measles vaccination respcat a factor with levels "low", "normal", "high", for categorical antibody response male sex code with 1="male", 0="female" age age (months) at immunization The remaining variables are genotypes for 3 HLA loci, with a prefix name (eg, "DQB") and a suffix for each of two alleles ("a1" and "a2") The variables in hlademo can be accessed by typing hlademo$ before their names, such as hlademo$resp Alternatively, it is easier to attach hlademo, so the variables can be accessed by simply typing their names > attach(hlademo) For convenience, create a separate data frame for the 3 loci, and call this geno (mainly to make it easier to refer to the loci, distinct from the other variables in hlademo) > geno <- hlademo[,-c(1:4)] Create some labels for the loci > locuslabel <-c("dqb","drb","b") Quantitative Trait Analysis: First, analyze the quantitative trait called resp A quantitative trait is identified by the option traittype="gaussian" (reminding us that a gaussian distribution is assumed for the distribution of the error terms) 2

The other arguments in the function are defined in the help file, viewed by typing help(haploscore) > hapgaus <- haploscore(resp, geno, traittype="gaussian", + offset = NA, xadj = NA, skiphaplo=005, + locuslabel=locuslabel, missval=0, nsim=0) To view results, we can either type the name of the object, hapgaus, or print(hapgaus) > print(hapgaus) global-stat = 4660471, df = 40, p-val = 021918 DQB DRB B Hap-Freq Hap-Score p-val [1,] 21 3 8 010501-245007 001428 [2,] 21 7 13 001082-231499 002061 [3,] 31 4 44 002871-226739 002337 [4,] 63 13 60 000558-165137 009866 [5,] 31 11 27 000609-105623 029086 [6,] 62 2 35 001046-098429 032497 [7,] 51 1 44 001744-09311 03518 [8,] 63 13 44 001555-071345 047557 [9,] 33 7 57 000688-059746 05502 [10,] 63 2 7 001376-055206 058091 [11,] 31 11 44 001002-053055 059573 [12,] 32 4 60 00309-04914 062314 [13,] 21 7 44 002354-044142 065891 [14,] 33 9 60 000688-043678 066227 [15,] 21 3 35 000575-041964 067475 [16,] 62 2 44 00138-028164 077822 [17,] 62 2 60 000515-024514 080635 [18,] 62 2 18 001556-023514 08141 [19,] 21 7 62 000768-005942 095262 [20,] 51 1 27 001415 005732 095429 [21,] 32 8 7 000688 009216 092657 [22,] 31 4 13 000524 013048 089619 [23,] 31 11 37 000688 015753 087483 [24,] 31 11 51 001095 016185 087142 [25,] 31 4 60 000692 01961 084453 [26,] 31 11 38 000688 033018 074126 [27,] 64 13 7 000961 046477 06421 [28,] 31 11 35 001728 04883 062534 [29,] 51 1 51 00074 05015 061602 [30,] 63 13 38 000688 06685 050381 [31,] 63 13 62 00084 076891 044195 3

[32,] 51 1 35 002967 079307 042773 [33,] 64 13 63 000688 088207 037774 [34,] 31 11 62 000608 09689 03326 [35,] 32 4 7 001718 099641 031905 [36,] 64 13 35 000672 125833 020827 [37,] 21 7 7 001257 126578 020559 [38,] 63 13 7 001605 219393 002824 [39,] 32 4 62 002371 235151 00187 [40,] 62 2 7 005073 239238 001674 Explanation of table results: The first column (eg, [1, ]) gives row numbers The next 3 columns are the alleles making up the haplotypes Hap-Freq is the estimated frequency of the haplotype in the pool of all subjects Hap-Score is the score for the haplotype p-val is the asymptotic chi-square p-value In our example, the option nsim=0 in the function haploscore implied no simulation p-values (simulation p-values are computed when nsim >0) Note that this table is sorted according to Hap-Score Plots and Haplotype Labels Another convenient way to view results is a plot of the haplotype frequencies (Hap-Freq) versus the haplotype score statistics (Hap-Score) > plot(hapgaus) > title("figure 1 Haplotype Score Statistics\nQuantitative Response") Some points on the plot may be of interest, perhaps due to their score statistic, or their haplotype frequency To put haplotype labels on individual points in the plot, use the function locatorhaplo To use this feature, first type the following command: > locatorhaplo(hapgaus) where hapgaus is the object where the results are stored Now, with your left mouse button, click on all the points of interest After all your chosen points are clicked, click on the middle mouse button Viola! All the points are labeled with their haplotype labels, as illustrated in Figure 1 4

Figure 1 Haplotype Score Statistics Quantitative Response Haploltype Score Statistic -2-1 0 1 2 32:4:62 62:2:7 63:13:7 21:7:13 31:4:44 21:3:8 002 004 006 008 010 Haplotype Frequency Note that some of the haplotype labels may overlap, so that it may be necessary to clean up the coordinates To do this, you can save the x-y coordinates and haplotype text, > locgaus <- locatorhaplo(hapgaus) and then fix some of the x-y coordinates in the locgaus list Skipping Rare Haplotypes For the above analyses, we used the option skiphaplo=005, to pool all haplotypes with frequencies <005 into a common group Let's try a different cut-off, such as skiphaplo=01 > hapgaus2 <- haploscore(resp, geno, traittype="gaussian", + offset = NA, xadj = NA, skiphaplo=01, + locuslabel=locuslabel, missval=0, nsim=0) > print(hapgaus2) global-stat = 3501088, df = 21, p-val = 002816 5

DQB DRB B Hap-Freq Hap-Score p-val [1,] 21 3 8 010501-245007 001428 [2,] 21 7 13 001082-231499 002061 [3,] 31 4 44 002871-226739 002337 [4,] 62 2 35 001046-098429 032497 [5,] 51 1 44 001744-09311 03518 [6,] 63 13 44 001555-071345 047557 [7,] 63 2 7 001376-055206 058091 [8,] 31 11 44 001002-053055 059573 [9,] 32 4 60 00309-04914 062314 [10,] 21 7 44 002354-044142 065891 [11,] 62 2 44 00138-028164 077822 [12,] 62 2 18 001556-023514 08141 [13,] 51 1 27 001415 005732 095429 [14,] 31 11 51 001095 016185 087142 [15,] 31 11 35 001728 04883 062534 [16,] 51 1 35 002967 079307 042773 [17,] 32 4 7 001718 099641 031905 [18,] 21 7 7 001257 126578 020559 [19,] 63 13 7 001605 219393 002824 [20,] 32 4 62 002371 235151 00187 [21,] 62 2 7 005073 239238 001674 Note that by using a different value for skiphaplo, the global statistic and its p-value change (due to decreased df), but the haplotype-specific scores do not change Haplotype scores, adjusted for sex and age First set up a matrix, with the first column for sex (male=1/female=0), and the second column for age (in months) > xma <- cbind(male, age) Now use this matrix as an argument to haploscore > hapgausadj <- haploscore(resp, geno, traittype="gaussian", + offset = NA, xadj = xma, skiphaplo=005, + locuslabel=locuslabel, missval=0, nsim=0) > print(hapgausadj) global-stat = 466244, df = 40, p-val = 02186 6

DQB DRB B Hap-Freq Hap-Score p-val [1,] 21 3 8 010501-245887 001394 [2,] 21 7 13 001082-230155 002136 [3,] 31 4 44 002871-226812 002332 [38,] 63 13 7 001605 218738 002871 [39,] 32 4 62 002371 233344 001963 [40,] 62 2 7 005073 237943 001734 When adjusting for covariates, all score statistics can change, although not by much in this example Ordinal Traits We will create an ordinal trait, converting respcat (a factor with levels "low", "normal", "high") to numeric values, yord (with levels 1, 2, 3) > yord <- asnumeric(respcat) Now use the option traittype = "ordinal" > hapord <- haploscore(yord, geno, traittype="ordinal", + offset = NA, xadj = NA, skiphaplo=005, + locuslabel=locuslabel, missval=0, nsim=0) > print(hapord) global-stat = 6512193, df = 40, p-val = 000727 DQB DRB B Hap-Freq Hap-Score p-val [1,] 21 7 13 001082-369342 000022 [2,] 21 3 8 010501-283638 000456 [3,] 31 4 44 002871-263765 000835 [38,] 32 4 62 002371 189113 005861 [39,] 63 13 7 001605 189743 005777 [40,] 62 2 7 005073 242248 001542 7

WARNING FOR ORDINAL TRAITS: When analyzing an ordinal trait with adjustment for covariates (using the xadj option), the software requires the libraries Design and Hmisc, distributed by Frank Harrell, PhD (see Harrell, FE Regression Modeling Strategies, Springer-Verlag, NY, 2001) If the user does not have these libraries installed, then it will not be possible to use the xadj option However, the unadjusted scores for an ordinal trait (using the default option xadj=na)do not require these libraries To check whether your local system has these libraries, type > library() and you should then be able to examine the list of libraries available to you Binary Traits Because "low" responders are of primary interest, let's create a binary trait that has values of 1 when response is "low", and 0 otherwise > ybin <-ifelse(yord==1,1,0) Now use the option traittype = "binomial" in the haploscore routine > hapbin <- haploscore(ybin, geno, traittype="binomial", + offset = NA, xadj = NA, skiphaplo=005, + locuslabel=locuslabel, missval=0, nsim=0) > print(hapbin) global-stat = 6587372, df = 40, p-val = 000614 DQB DRB B Hap-Freq Hap-Score p-val [1,] 62 2 7 005073-214567 00319 [2,] 51 1 35 002967-170405 008837 [3,] 63 13 7 001605-169539 009 [38,] 31 4 44 002871 253351 001129 [39,] 21 7 13 001082 369342 000022 [40,] 21 3 8 010501 382268 000013 8

Simulation p-values Simulation p-values are computed when nsim > 0, and the value of nsim directs the number of simulations to perform The following example illustrates computation of 1,000 simulations > hapbinsim <- haploscore(ybin, geno, traittype="binomial", + offset = NA, xadj = NA, skiphaplo=005, + locuslabel=locuslabel, missval=0, nsim=1000) > print(hapbinsim) global-stat = 6587372, df = 40, p-val = 000614, sim p-val = 0004 max-stat sim p-val = 0001 DQB DRB B Hap-Freq Hap-Score p-val sim p-val [1,] 62 2 7 005073-214567 00319 0031 [2,] 51 1 35 002967-170405 008837 0112 [3,] 63 13 7 001605-169539 009 0188 [38,] 31 4 44 002871 253351 001129 0006 [39,] 21 7 13 001082 369342 000022 0003 [40,] 21 3 8 010501 382268 000013 0 When simulations are requested, simulated p-values are computed for the global-stat (which has an asymptotic chi-square distribution), as well as max-stat (the maximum absolute value of the haplotype-specific scores) Furthermore, simulated p-values are computed for each of the haplotype-specific scores 9

IV Future Directions and Feedback We would like your feedback on the use of this software If you find strange results or behavior of the software, or you would like to make suggestions on features that you would find helpful, please let us know Some of our future plans include: use of the S-PLUS standard model formula syntax, extensions for analysis of censored survival data, longitudinal data, and matched case-control designs You can send suggestions by email to rowland@mayoedu Acknowledgements This research was supported by United States Public Health Services, National Institutes of Health; Contract grant numbers R01 DE13276, N01 AI45240, and R01 2AI33144 The hlademo data is kindly provided by Gregory A Poland, MD and the Mayo Vaccine Research Group for illustration only, and may not be used for publication 10