Overview Implementation of robust methods for locating quantitative trait loci in R Introduction to QTL mapping Andreas Baierl and Andreas Futschik Institute of Statistics and Decision Support Systems University of Vienna Analysis of QTL data modified BIC Robust methods Implementation and Simulations in R Robust Methods for QTL Mapping in R Andreas Baierl 1 Robust Methods for QTL Mapping in R Andreas Baierl 2 Locating quantitative trait loci (QTL) Background Quantitative trait: evolution occurred in small steps characters, that are influenced by many genes Many relevant traits are quantitative: height, yield, A gene can obtains different forms (alleles) contribution of genetic effects to total (phenotypic) variation of a trait (heritability) determines rate at which characters respond to selection. (environmental variance reduces efficiency of response) Quantitative trait locus (QTL): gene (functional sequence of bases) that influences a certain quantitative trait Relevant questions: - How many genes influence a trait (How many QTL) - Find exact positions of QTL (- estimate size of genetic effects) Robust Methods for QTL Mapping in R Andreas Baierl 3 trait value = genetic influence + environmental influence partitioning genotypic variance into components with different impact on selection: additive, non-additive gene effects (epistasis) -> dependency on background population evolutionary reason: stabilization of phenotype phenotype: the form taken by some character in a specific individual. genotype: genetic makeup of individual Robust Methods for QTL Mapping in R Andreas Baierl 4
Data from experimental crosses Data matrix for backcross design A A a a BACKCROSS F0 Indiv. 1 QT 34.3 marker.1 marker.2 marker.m ~ 50-500 markers 2 65.4 * F1 3 23.2 * 4 45.4 INTERCROSS F2. ~ 200 1000 individuals Robust Methods for QTL Mapping in R Andreas Baierl 5 Robust Methods for QTL Mapping in R Andreas Baierl 6 Genetic map Analysis of QTL data 0 Genetic map Distance between markers is usually estimated from Find NUMBER, POSITIONS, EFFECT TYPES and SIZES of QTL 20 recombination frequency Challenges: large number of possible models If marker is close to QTL, then (main effects + interactions = m + m(m-1)/2 ~ 100 + 5.000) Location (cm) 40 60 marker genotype will be associated with QTL genotype (There would be a 1-1 -> efficient search strategy -> correct for test multiplicity deviation from normality of conditional distribution of trait given marker correspondence, if there were genotypes (especially when heavy tails or outliers) 80 no recombinations) recover unobserved / wrong / missing genotype information 100 No linkage between confounding of effect types 1 2 3 4 5 6 7 8 9 10 11121314151617181920 chromosomes selection bias for effect sizes, especially for small effects Chromosome Robust Methods for QTL Mapping in R Andreas Baierl 7 Robust Methods for QTL Mapping in R Andreas Baierl 8
Methods for QTL mapping Multiple regression approach marker based ANOVA on single markers multiple regression X ij : genotype of the i th individual (out of n) at the j th marker (out of m). univariate multiple X ij = ½ if individual has genotype (homozygous) X ij = -½ if individual has genotype (heterozygous) - interval mapping - composite interval - conditional interval mapping - multiple interval mapping strict Bayesian approach I: subset of the set N = {1,,m} marker U: subset of N x N mapping - Bayesian (Sen & Churchill) i : random error term with distribution f estimation of QTL location Robust Methods for QTL Mapping in R Andreas Baierl 9 Robust Methods for QTL Mapping in R Andreas Baierl 10 Model selection Behaviour of BIC depending on n & # of predictors aim: identify correct model, not minimise prediction error -> criterion for inclusion and exclusion of variables cross validation / bootstrap AIC: n log (RSS) + 2k minimises prediction error BIC: n log (RSS) + k log(n) more conservative than AIC, especially for small n n: sample size k = p + q = number of main effects (p) and interaction effects (q) under consideration RSS: residual sum of squares (assuming normal error distribution!) -> efficient search strategy forward selection + backward elimination step type 1 error und M0 0.00 0.05 0.10 0.15 0.20 0.25 0.30 number of predictors 0 2000 4000 6000 8000 10000 n 1 5 10 20 30 100 BIC Mi : BIC of 1-dimensional Model M i N: Number of 1-dim models n: sample size BIC chooses too many QTL every model has the same probability to be selected -> more likely to select large model. Robust Methods for QTL Mapping in R Andreas Baierl 11 Robust Methods for QTL Mapping in R Andreas Baierl 12
modified BIC Comparison of mbic and BIC Additional penalty term dependent on number of predictors under consideration (Bogdan et al 2004) modified BIC with E(p): expected number of main effects E(q): expected number of epistasis (=interaction) effects E(p) = E(q) = 2.2 controls the Type I error at a level of of 5% (for n = 200) type 1 error under M0 0.00 0.05 0.10 0.15 0.20 5 predictors (+10 two-way interaction terms) BIC mbic 0 1000 2000 3000 4000 5000 Robust Methods for QTL Mapping in R Andreas Baierl 13 n Robust Methods for QTL Mapping in R Andreas Baierl 14 Deviations from Normality Typically, non-parametric methods based on ranks are used Here we use robust regression techniques, in particular M-Estimators: minimise other measure of distance instead of residual sum of squares. popular alternatives are: rho.huber, k=0.05 rho.huber, k=1.3 Limiting Distribution has the following property: rho.bisquare rho.hampel with and error distribution f(x) Robust Methods for QTL Mapping in R Andreas Baierl 15 Robust Methods for QTL Mapping in R Andreas Baierl 16
Values for normalisation constant c e Robust mbic In practice, c e and therefore the error distribution f(x) have to be estimated. This leads to a robust version of the mbic: with for L 2 c e = 1 Robust Methods for QTL Mapping in R Andreas Baierl 17 Robust Methods for QTL Mapping in R Andreas Baierl 18 Simulation Setup Simulation Results Percentage correctly identified effects and false discovery rate 2 chromosomes with 11 marker each (m=22) 200 individuals (n=200) 1 additive effect 1 epistasis effect error distributions: Normal, Laplace, Cauchy, Tukey, 2 estimators: L 2, Huber (k=0.05) ~ L 1, Huber (k=1.3), Bisquare, Hampel Percentage 0.0 0.2 0.4 0.6 Huber-0.05 Huber-1.3 Bisquare Hampel L2-mBIC L2-BIC Normal Laplace Cauchy Tukey Chisq Chisq.med Robust Methods for QTL Mapping in R Andreas Baierl 19 Robust Methods for QTL Mapping in R Andreas Baierl 20
Implementation in R References Robust regression using procedure rlm of package MASS program structure: parameter specification generate realisation of genetic setup estimation of error distribution and c e in each forward step: estimate likelihood for m + m(m-1)/2 models generate output simulations: 1000 replications n=200-500, m=20-120 Baierl, A., Bogdan, M., Frommlet, F., Futschik, A., 2006. On Locating multiple interacting quantitative trait loci in intercross designs. To appear in Genetics. Bogdan, M., J. K. Ghosh and R. W. Doerge, 2004. Modifying the Schwarz Bayesian Information Criterion to Locate Multiple Interacting Quantitative Trait Loci. Genetics, 167: 989-999. Broman, K. W. and T. P. Speed, 2002. A model selection approach for the identification of quantitative trait loci in experimental crosses. J Roy Stat Soc B, 64: 641-656. Jureckova, J., Sen, P.K., 1996. Robust statistical procedures: asymptoticsand interrelations. Wiley, New York. Sen and Churchill (2001), A Statistical framework for quantitative trait mapping, Genetics, 159:371-387. Robust Methods for QTL Mapping in R Andreas Baierl 21 Robust Methods for QTL Mapping in R Andreas Baierl 22