Overview. Background. Locating quantitative trait loci (QTL)

Similar documents
Notes on QTL Cartographer

Lecture 7 QTL Mapping

Genet. Sel. Evol. 36 (2004) c INRA, EDP Sciences, 2004 DOI: /gse: Original article

Package EBglmnet. January 30, 2016

Bayesian Multiple QTL Mapping

Chapter 6: Linear Model Selection and Regularization

Development of linkage map using Mapmaker/Exp3.0

THERE is a long history of work to map genetic loci (quantitative

The Imprinting Model

And there s so much work to do, so many puzzles to ignore.

Package robustreg. R topics documented: April 27, Version Date Title Robust Regression Functions

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100.

Package dlmap. February 19, 2015

Lecture 13: Model selection and regularization

The Lander-Green Algorithm in Practice. Biostatistics 666

QTX. Tutorial for. by Kim M.Chmielewicz Kenneth F. Manly. Software for genetic mapping of Mendelian markers and quantitative trait loci.

Biased estimators of quantitative trait locus heritability and location in interval mapping

PROC QTL A SAS PROCEDURE FOR MAPPING QUANTITATIVE TRAIT LOCI (QTLS)

Intersection Tests for Single Marker QTL Analysis can be More Powerful Than Two Marker QTL Analysis

STA121: Applied Regression Analysis

PLNT4610 BIOINFORMATICS FINAL EXAMINATION

Mendel and His Peas Investigating Monhybrid Crosses Using the Graphing Calculator

Parallel Algorithms and Implementations for Genetic Analysis of Quantitative Traits

Effective Recombination in Plant Breeding and Linkage Mapping Populations: Testing Models and Mating Schemes

What is machine learning?

Machine Learning. Topic 4: Linear Regression Models

Topics in Machine Learning-EE 5359 Model Assessment and Selection

Discussion Notes 3 Stepwise Regression and Model Selection

2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy

Breeding View A visual tool for running analytical pipelines User Guide Darren Murray, Roger Payne & Zhengzheng Zhang VSN International Ltd

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

CTL mapping in R. Danny Arends, Pjotr Prins, and Ritsert C. Jansen. University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1

Lecture 25: Review I

CS249: ADVANCED DATA MINING

Simulation Study on the Methods for Mapping Quantitative Trait Loci in Inbred Line Crosses

Introduction to Evolutionary Computation

QUANTITATIVE genetics has developed rapidly, methods. Kruglyak and Lander (1995) apply the linear

Estimating Variance Components in MMAP

simulation with 8 QTL

Model selection Outline for today

Random Forest in Genomic Selection

PLNT4610 BIOINFORMATICS FINAL EXAMINATION

Practical OmicsFusion

Package DSPRqtl. R topics documented: June 7, Maintainer Elizabeth King License GPL-2. Title Analysis of DSPR phenotypes

Multivariate Analysis Multivariate Calibration part 2

MAPPING quantitative trait loci (QTL) is important

THE standard approach for mapping the genetic the binary trait, defined by whether or not an individual

QTL Analysis with QGene Tutorial

To finish the current project and start a new project. File Open a text data

Package smoothmest. February 20, 2015

Supporting Figures Figure S1. FIGURE S2.

A guide to QTL mapping with R/qtl

Exploratory model analysis

Akaike information criterion).

GenViewer Tutorial / Manual

Simultaneous search for multiple QTL using the global optimization algorithm DIRECT

Resampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016

Computational Biology and Chemistry

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:

Precision Mapping of Quantitative Trait Loci

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Release Notes. JMP Genomics. Version 4.0

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2017

For our example, we will look at the following factors and factor levels.

Mapping Quantitative Trait Loci Underlying Function-Valued Traits Using Functional Principal Component Analysis and Multi-Trait Mapping

Design of Experiments

LEA: An R Package for Landscape and Ecological Association Studies

CPSC 340: Machine Learning and Data Mining

Lecture 7: Linear Regression (continued)

Package GWRM. R topics documented: July 31, Type Package

Journal of Statistical Software

STATISTICS (STAT) Statistics (STAT) 1

Applied Regression Modeling: A Business Approach

An Introduction to the Bootstrap

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Minitab 17 commands Prepared by Jeffrey S. Simonoff

A Grid Portal Implementation for Genetic Mapping of Multiple QTL

Machine Learning (BSMC-GA 4439) Wenke Liu

Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson

Family Based Association Tests Using the fbat package

Bioconductor s stepnorm package

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Fathom Dynamic Data TM Version 2 Specifications

Step-by-Step Guide to Relatedness and Association Mapping Contents

22s:152 Applied Linear Regression

Modelling and Quantitative Methods in Fisheries

Statistical Pattern Recognition

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017

Package stepnorm. R topics documented: April 10, Version Date

Statistical Pattern Recognition

ATHENA Manual. Table of Contents...1 Introduction...1 Example...1 Input Files...2 Algorithms...7 Sample files...8

Version 2.0 Zhong Wang Department of Public Health Sciences Pennsylvannia State University

GMDR User Manual. GMDR software Beta 0.9. Updated March 2011

Lecture 06 Decision Trees I

Mutations for Permutations

Evaluation of Double Hierarchical Generalized Linear Model versus Empirical Bayes method A comparison on Barley dataset

Haplotype Analysis. 02 November 2003 Mendel Short IGES Slide 1

MTTS1 Dimensionality Reduction and Visualization Spring 2014 Jaakko Peltonen

QUICKTEST user guide

Transcription:

Overview Implementation of robust methods for locating quantitative trait loci in R Introduction to QTL mapping Andreas Baierl and Andreas Futschik Institute of Statistics and Decision Support Systems University of Vienna Analysis of QTL data modified BIC Robust methods Implementation and Simulations in R Robust Methods for QTL Mapping in R Andreas Baierl 1 Robust Methods for QTL Mapping in R Andreas Baierl 2 Locating quantitative trait loci (QTL) Background Quantitative trait: evolution occurred in small steps characters, that are influenced by many genes Many relevant traits are quantitative: height, yield, A gene can obtains different forms (alleles) contribution of genetic effects to total (phenotypic) variation of a trait (heritability) determines rate at which characters respond to selection. (environmental variance reduces efficiency of response) Quantitative trait locus (QTL): gene (functional sequence of bases) that influences a certain quantitative trait Relevant questions: - How many genes influence a trait (How many QTL) - Find exact positions of QTL (- estimate size of genetic effects) Robust Methods for QTL Mapping in R Andreas Baierl 3 trait value = genetic influence + environmental influence partitioning genotypic variance into components with different impact on selection: additive, non-additive gene effects (epistasis) -> dependency on background population evolutionary reason: stabilization of phenotype phenotype: the form taken by some character in a specific individual. genotype: genetic makeup of individual Robust Methods for QTL Mapping in R Andreas Baierl 4

Data from experimental crosses Data matrix for backcross design A A a a BACKCROSS F0 Indiv. 1 QT 34.3 marker.1 marker.2 marker.m ~ 50-500 markers 2 65.4 * F1 3 23.2 * 4 45.4 INTERCROSS F2. ~ 200 1000 individuals Robust Methods for QTL Mapping in R Andreas Baierl 5 Robust Methods for QTL Mapping in R Andreas Baierl 6 Genetic map Analysis of QTL data 0 Genetic map Distance between markers is usually estimated from Find NUMBER, POSITIONS, EFFECT TYPES and SIZES of QTL 20 recombination frequency Challenges: large number of possible models If marker is close to QTL, then (main effects + interactions = m + m(m-1)/2 ~ 100 + 5.000) Location (cm) 40 60 marker genotype will be associated with QTL genotype (There would be a 1-1 -> efficient search strategy -> correct for test multiplicity deviation from normality of conditional distribution of trait given marker correspondence, if there were genotypes (especially when heavy tails or outliers) 80 no recombinations) recover unobserved / wrong / missing genotype information 100 No linkage between confounding of effect types 1 2 3 4 5 6 7 8 9 10 11121314151617181920 chromosomes selection bias for effect sizes, especially for small effects Chromosome Robust Methods for QTL Mapping in R Andreas Baierl 7 Robust Methods for QTL Mapping in R Andreas Baierl 8

Methods for QTL mapping Multiple regression approach marker based ANOVA on single markers multiple regression X ij : genotype of the i th individual (out of n) at the j th marker (out of m). univariate multiple X ij = ½ if individual has genotype (homozygous) X ij = -½ if individual has genotype (heterozygous) - interval mapping - composite interval - conditional interval mapping - multiple interval mapping strict Bayesian approach I: subset of the set N = {1,,m} marker U: subset of N x N mapping - Bayesian (Sen & Churchill) i : random error term with distribution f estimation of QTL location Robust Methods for QTL Mapping in R Andreas Baierl 9 Robust Methods for QTL Mapping in R Andreas Baierl 10 Model selection Behaviour of BIC depending on n & # of predictors aim: identify correct model, not minimise prediction error -> criterion for inclusion and exclusion of variables cross validation / bootstrap AIC: n log (RSS) + 2k minimises prediction error BIC: n log (RSS) + k log(n) more conservative than AIC, especially for small n n: sample size k = p + q = number of main effects (p) and interaction effects (q) under consideration RSS: residual sum of squares (assuming normal error distribution!) -> efficient search strategy forward selection + backward elimination step type 1 error und M0 0.00 0.05 0.10 0.15 0.20 0.25 0.30 number of predictors 0 2000 4000 6000 8000 10000 n 1 5 10 20 30 100 BIC Mi : BIC of 1-dimensional Model M i N: Number of 1-dim models n: sample size BIC chooses too many QTL every model has the same probability to be selected -> more likely to select large model. Robust Methods for QTL Mapping in R Andreas Baierl 11 Robust Methods for QTL Mapping in R Andreas Baierl 12

modified BIC Comparison of mbic and BIC Additional penalty term dependent on number of predictors under consideration (Bogdan et al 2004) modified BIC with E(p): expected number of main effects E(q): expected number of epistasis (=interaction) effects E(p) = E(q) = 2.2 controls the Type I error at a level of of 5% (for n = 200) type 1 error under M0 0.00 0.05 0.10 0.15 0.20 5 predictors (+10 two-way interaction terms) BIC mbic 0 1000 2000 3000 4000 5000 Robust Methods for QTL Mapping in R Andreas Baierl 13 n Robust Methods for QTL Mapping in R Andreas Baierl 14 Deviations from Normality Typically, non-parametric methods based on ranks are used Here we use robust regression techniques, in particular M-Estimators: minimise other measure of distance instead of residual sum of squares. popular alternatives are: rho.huber, k=0.05 rho.huber, k=1.3 Limiting Distribution has the following property: rho.bisquare rho.hampel with and error distribution f(x) Robust Methods for QTL Mapping in R Andreas Baierl 15 Robust Methods for QTL Mapping in R Andreas Baierl 16

Values for normalisation constant c e Robust mbic In practice, c e and therefore the error distribution f(x) have to be estimated. This leads to a robust version of the mbic: with for L 2 c e = 1 Robust Methods for QTL Mapping in R Andreas Baierl 17 Robust Methods for QTL Mapping in R Andreas Baierl 18 Simulation Setup Simulation Results Percentage correctly identified effects and false discovery rate 2 chromosomes with 11 marker each (m=22) 200 individuals (n=200) 1 additive effect 1 epistasis effect error distributions: Normal, Laplace, Cauchy, Tukey, 2 estimators: L 2, Huber (k=0.05) ~ L 1, Huber (k=1.3), Bisquare, Hampel Percentage 0.0 0.2 0.4 0.6 Huber-0.05 Huber-1.3 Bisquare Hampel L2-mBIC L2-BIC Normal Laplace Cauchy Tukey Chisq Chisq.med Robust Methods for QTL Mapping in R Andreas Baierl 19 Robust Methods for QTL Mapping in R Andreas Baierl 20

Implementation in R References Robust regression using procedure rlm of package MASS program structure: parameter specification generate realisation of genetic setup estimation of error distribution and c e in each forward step: estimate likelihood for m + m(m-1)/2 models generate output simulations: 1000 replications n=200-500, m=20-120 Baierl, A., Bogdan, M., Frommlet, F., Futschik, A., 2006. On Locating multiple interacting quantitative trait loci in intercross designs. To appear in Genetics. Bogdan, M., J. K. Ghosh and R. W. Doerge, 2004. Modifying the Schwarz Bayesian Information Criterion to Locate Multiple Interacting Quantitative Trait Loci. Genetics, 167: 989-999. Broman, K. W. and T. P. Speed, 2002. A model selection approach for the identification of quantitative trait loci in experimental crosses. J Roy Stat Soc B, 64: 641-656. Jureckova, J., Sen, P.K., 1996. Robust statistical procedures: asymptoticsand interrelations. Wiley, New York. Sen and Churchill (2001), A Statistical framework for quantitative trait mapping, Genetics, 159:371-387. Robust Methods for QTL Mapping in R Andreas Baierl 21 Robust Methods for QTL Mapping in R Andreas Baierl 22