Supplementary information for. Genomic and metabolic prediction of complex heterotic traits in hybrid maize

Size: px

Start display at page:

Download "Supplementary information for. Genomic and metabolic prediction of complex heterotic traits in hybrid maize"

Augustine Hunter
5 years ago
Views:

1 Supplementary information for Genomic and metabolic prediction of complex heterotic traits in hybrid maize Christian Riedelsheimer 1, Angelika Czedik-Eysenberg, Christoph Grieder 1, Jan Lisec, Frank Technow 1, Ronan Sulpice, Thomas Altmann 3, Mark Stitt, Lothar Willmitzer,4, & Albrecht E Melchinger 1 1 Institute of Plant Breeding, Seed Science, and Population Genetics, University of Hohenheim, Stuttgart, Germany Max-Planck Institute of Molecular Plant Physiology, Potsdam, Germany 3 Department Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Gatersleben, Germany 4 King Abdulaziz University, Jeddah, Saudi Arabia Correspondence should be sent to: Prof. Dr. A.E. Melchinger University of Hohenheim Institut e of Plant Breeding, Seed Science, and Population Genetics Fruwirthstr Stuttgart Germany melchinger@uni-hohenheim.de Supplementary information, page 1

2 CONTENTS SUPPLEMENTARY FIGURES Supplementary Figure 1 Phenotypic variation of the predicted traits. Supplementary Figure Distribution of genetic distances. Supplementary Figure 3 Genealogy of the population with labeled leaves. Supplementary Figure 4 Distribution of repeatabilities of individual metabolites ( w ). Supplementary Figure 5 Results of a principal component analysis (PCA). Supplementary Figure 6 Comparison of genetic architecture of GCA for dry matter yield with the estimated SNP effects used for its prediction. Supplementary Figure 7 Manhattan plots showing the genetic architectures of the investigated traits. Supplementary Figure 8 Quantile-Quantile (QQ) plots of genome-wide association scans of the investigated traits. Supplementary Figure 9 Effects for metabolites ( û ) estimated with RR-BLUP for predicting GCA for dry matter yield. Supplementary Figure 10 Analysis of genetic distances within the core set of 14 lines. Supplementary Figure 11 Accuracy of whole-genome prediction of GCA for dry matter yield within the core set depending on number of SNPs. Supplementary Figure 1 Observed versus whole-genome predicted GCA for female flowering within the core set. M i SUPPLEMENTARY TABLES Supplementary Table 1 Summary information about the inbred lines. Supplementary Table List of measured metabolites. Supplementary Table 3 Summary of whole-genome and metabolic prediction within the core set. Supplementary Table 4 Predictive abilities of whole-genome and metabolic prediction within the core set using different subgroups as validation sets. SUPPLEMENTARY NOTE 1. Near-infrared spectroscopy (NIRS).. Statistical analysis of phenotypic data..1. Metabolites... General combining ability (GCA). References. Supplementary information, page

3 Supplementary Figure 1 Phenotypic variation of the predicted traits. Distribution of GCA for (a) dry matter yield, (b) plant height, (c) dry matter concentration, (d) female flowering, (e) starch content, (f) sugar content, and (g) lignin content. The distributions are shown for the breeding subgroups Stiff Stalk (SS), tropical lines (T), and non-stiff Stalk (NSS) as well as for the geographical origins Europe (EU) and North America (NA). Supplementary information, page 3

4 Supplementary Figure Distribution of genetic distances. Pairwise genetic distances were calculated as Euclidean distances scaled to lie within zero and one (modified Rogers distances). The mean value ( x ) is indicated as a red line. Supplementary information, page 4

Supplementary Figure 3 Genealogy of the population with labeled leaves.

5 Supplementary Figure 3 Genealogy of the population with labeled leaves. The tree was reconstructed from SNP data with the balanced minimum evolution (BME) algorithm on genetic distances. The breeding subgroups Stiff Stalk (SS), tropical lines (T) and non-stiff Stalk (NSS) are distinguished by their color. Supplementary information, page 5

6 Supplementary Figure 4 Distribution of repeatabilities of individual metabolites ( w value ( x ) is indicated as a red line. M i ).The mean Supplementary Figure 5 Results of a principal component analysis (PCA). Results are shown for (a) SNPs and (b) metabolites. Variables were centered and scaled. Only metabolites with a repeatability greater than 0.9 were used. The explained variance is given in brackets. The breeding subgroups Stiff Stalk (SS), tropical lines (T), and non-stiff Stalk (NSS) are distinguished by their color. The weak correspondence of the grouping pattern was also reflected by the low Mantel correlation of 0.31 between (i) genetic distances and (ii) Euclidean distances on standardized levels of metabolites. Supplementary information, page 6

Supplementary Figure 6 Comparison of genetic architecture of GCA for dry matter yield with the estimated SNP effects used for its prediction.

7 Supplementary Figure 6 Comparison of genetic architecture of GCA for dry matter yield with the estimated SNP effects used for its prediction. (a) Manhattan plot showing the obtained P- values on a log 10 scale of a genome-wide association scan with correction for population structure and cryptic relatedness using a Q + K-model. (b) Estimated normally distributed SNP effects ( û ) obtained with RR-BLUP. Supplementary information, page 7

Supplementary Figure 7 Manhattan plots showing the genetic architectures of the investigated traits.

8 Supplementary Figure 7 Manhattan plots showing the genetic architectures of the investigated traits. Results are shown for GCA for (a) plant height, (b) dry matter concentration, (c) female flowering, (d) starch content, (e) sugar content, and (f) lignin content. The obtained P-values are shown on a log 10 scale and were obtained with genome-wide association scans with correction for population structure and cryptic relatedness using a Q + K-model. Supplementary information, page 8

9 Supplementary Figure 8 Quantile-Quantile (QQ) plots of genome-wide association scans of the investigated traits. Results are shown for GCA for (a) dry matter yield, (b) plant height, (c) dry matter concentration, (d) female flowering, (e) starch content, (f) sugar content, and (g) lignin content. Supplementary information, page 9

10 Supplementary Figure 9 Effects for metabolites ( û ) estimated with RR-BLUP for predicting GCA for dry matter yield. The model includes all metabolites M i with different individual Pearson correlations with GCA for dry matter yield ( r (M, y) ). i Supplementary information, page 10

11 Supplementary Figure 10 Analysis of genetic distances within the core set of 14 lines. (a) Distribution of genetic distances. The mean value ( x ) is indicated as a red line. (b) Heatmap of pairwise genetic distances. (c) Decay of within groups sum of squares depending on the number of k-means clusters estimated on genetic distances. Supplementary information, page 11

12 Supplementary Figure 11 Accuracy of whole-genome prediction of GCA for dry matter yield within the core set depending on number of SNPs. Accuracies ( r g,g ) averaged over all crossvalidation runs are shown for 15, 50, 500, 1,000,,500, 5,000 and 10,000 evenly spaced SNPs. The red line shows the accuracy obtained with the full model of 38,019 SNPs. ( ˆ ) Supplementary Figure 1 Observed versus whole-genome predicted GCA for female flowering within the core set. Results were averaged over all cross-validation runs and colored according to (a) the breeding subgroups Stiff Stalk (SS), tropical lines (T), and non-stiff Stalk (NSS), and (b) their geographical origins Europe (EU), North America (NA), and other regions. Supplementary information, page 1

13 Supplementary Table 1 Summary information about the inbred lines. Inbred line Maturity group Geographical origin Breeding Used for Part of subgroup prediction core set Late USA NSS Yes Yes A148 Early USA NSS Yes Yes A188 Intermediate USA NSS Yes Yes A347 Late USA NSS Yes Yes A374 Early USA NSS Yes - A375 Intermediate USA NSS Yes Yes A619 Intermediate USA NSS Yes - A63 Late USA SS Yes - A654 Early USA NSS Yes Yes B100 Late USA NSS Yes - B101 Late USA SS Yes Yes B10 Late USA NSS Yes - B103 Early USA NSS Yes Yes B106 Late USA NSS Yes Yes B107 Late USA NSS Yes - B108 Intermediate USA NSS Yes Yes B109 Intermediate USA SS Yes - B110 Late USA SS Yes Yes B111 Late USA SS Yes Yes B11 Late USA NSS Yes Yes B113 Late USA NSS Yes Yes B14a Late USA SS Yes - B37 Late USA SS Yes Yes B68 Late USA SS Yes - B73 Intermediate USA SS Yes - B97 Late USA NSS Yes Yes B98 Late USA NSS Yes Yes B99 Intermediate USA NSS Yes Yes CG1 Intermediate Canada NSS Yes Yes CH39 Early Switzerland NSS Yes Yes CI187 Late USA NSS Yes Yes CL30 Early Canada NSS Yes Yes CM105 Intermediate Canada SS Yes - CM174 Early Canada SS Yes Yes CML103 Late Mexico T Yes - CML46 Late Mexico T Yes Yes CML3 Late Mexico T Yes Yes CML333 Late Mexico T Yes Yes CML91 Late Mexico T Yes Yes Co15 Early Canada NSS - - Co151 Early Canada NSS Yes Yes Co158 Early Canada NSS Yes Yes Co431 Early Canada NSS Yes Yes Supplementary information, page 13

14 Co43 Early Canada NSS Yes Yes Co441 Early Canada NSS Yes Yes CQ01 Early Canada NSS Yes Yes CQ50 Early Canada NSS Yes Yes D01 Early Germany NSS Yes - D06 Early Germany NSS Yes - D09 Early Germany NSS Yes Yes D17 Early Germany NSS Yes Yes D Late Germany NSS Yes - D3 Early Germany NSS Yes - D4 Early Germany NSS Yes - D3 Intermediate Germany NSS Yes - D403 Early Germany NSS - - D408 Early Germany NSS Yes Yes D46 Early Germany NSS Yes - D48 Late Germany NSS Yes - D51 Intermediate Germany NSS Yes - D60 Intermediate Germany NSS Yes - D61 Intermediate Germany NSS Yes - D63 Early Germany NSS Yes - D66 Intermediate Germany NSS Yes - D67 Intermediate Germany NSS - - D800 Early Germany NSS Yes Yes D83 Intermediate Germany NSS Yes - D851 Intermediate Germany NSS Yes Yes D95 Early Germany NSS Yes Yes De811 Late USA SS Yes Yes Dent_1 Intermediate Germany NSS - - Dent_ Early Germany NSS - - Dent_3 Late Germany NSS - - Dent_4 Intermediate Germany NSS - - Dent_5 Late Germany NSS - - Dent_6 Late Germany NSS - - Dent_7 Late Germany NSS - - Dent_8 Late Germany NSS - - Dent_9 Intermediate Germany NSS - - F5 Early France NSS Yes - F544 Early France NSS Yes - F7009 Intermediate France NSS Yes Yes F7019 Early France NSS Yes Yes F705 Early France SS Yes Yes F708 Intermediate France NSS Yes Yes F7038 Early France NSS Yes - F7057 Intermediate France NSS - - F7058 Intermediate France NSS Yes - F7059 Early France NSS Yes - F71 Late France NSS Yes - Supplementary information, page 14

15 F748 Late France SS Yes Yes F75 Late France SS Yes Yes F838 Late France NSS Yes Yes F888 Late France NSS Yes Yes F904 Intermediate France NSS Yes Yes F908 Early France NSS Yes Yes F91 Late France NSS Yes Yes F918 Late France SS Yes Yes FC185 Early France NSS Yes Yes FC1890 Early France NSS Yes - FV18 Early France NSS Yes Yes FV30 Early France NSS Yes Yes FV5 Early France NSS Yes - FV71 Early France NSS Yes Yes FV84 Early France NSS Yes Yes FV88 Early France NSS Yes Yes FV317 Early France NSS Yes Yes FV330 Early France NSS Yes Yes FV33 Intermediate France NSS Yes Yes FV335 Early France NSS Yes Yes FV353 Early France NSS Yes Yes FV354 Early France NSS Yes Yes FV356 Early France NSS Yes Yes GL7 Intermediate Canada NSS Yes Yes GL6 Late Canada NSS Yes - GY93 Late China NSS Yes Yes H95 Late USA NSS Yes Yes H99 Intermediate USA NSS Yes Yes Ia153 Early USA NSS Yes - M16W Late USA NSS Yes Yes M37W Late USA NSS Yes Yes M01 Early Germany NSS Yes - M016 Early Germany NSS Yes - Mo17 Late USA NSS Yes Yes Mo18W Late USA T Yes Yes Mo4W Late USA NSS Yes Yes Ms71 Late USA NSS Yes Yes Mt4 Early USA NSS Yes Yes N19 Early USA SS Yes Yes N Late USA SS Yes Yes N5 Late USA NSS Yes Yes N6 Late USA NSS Yes Yes NC50 Late USA SS Yes Yes NC58 Late USA NSS Yes - NC60 Late USA NSS Yes - NC6B Late USA NSS Yes - NC88 Late USA NSS Yes - Supplementary information, page 15

16 NC90 Intermediate USA NSS Yes Yes NC96 Late USA T Yes Yes NC98 Late USA T Yes - NC30 Late USA T Yes Yes NC348 Late USA T Yes Yes NC350 Late USA T Yes Yes NC358 Late USA T Yes Yes ND11 Early USA NSS Yes Yes ND46 Early USA NSS Yes Yes NDB8 Early USA NSS Yes Yes Oh0 Late USA NSS Yes Yes Oh33 Late USA NSS Yes Yes Oh40B Late USA NSS Yes Yes OH43 Late USA NSS Yes Yes Oh7B Late USA NSS Yes Yes Os46 Late USA NSS Yes Yes P001 Late Germany NSS Yes - P006 Early Germany NSS Yes - P009 Intermediate Germany NSS Yes - P017 Early Germany NSS Yes - P0 Intermediate Germany NSS Yes - P07 Late Germany NSS Yes - P09 Intermediate Germany NSS Yes - P031 Early Germany NSS Yes - P033 Early Germany NSS Yes - P034 Intermediate Germany NSS Yes - P036 Early Germany NSS Yes - P038 Intermediate Germany NSS Yes - P040 Intermediate Germany NSS Yes - P04 Intermediate Germany NSS Yes - P043 Early Germany NSS Yes - P045 Early Germany NSS Yes - P046 Early Germany NSS Yes - P047 Early Germany NSS - - P053 Intermediate Germany NSS Yes - P054 Early Germany NSS Yes - P057 Early Germany NSS Yes - P060 Intermediate Germany NSS Yes Yes P063 Intermediate Germany NSS Yes - P064 Late Germany NSS Yes - P065 Late Germany NSS Yes - P066 Intermediate Germany NSS Yes - P068 Late Germany NSS Yes - P070 Late Germany NSS Yes - P071 Intermediate Germany NSS Yes - P074 Intermediate Germany NSS Yes - P075 Intermediate Germany NSS Yes - Supplementary information, page 16

17 P079 Intermediate Germany NSS Yes - P080 Intermediate Germany NSS Yes - P081 Intermediate Germany NSS Yes - P083 Intermediate Germany NSS Yes - P084 Intermediate Germany NSS Yes Yes P086 Intermediate Germany NSS Yes - P087 Intermediate Germany NSS Yes - P089 Late Germany NSS Yes Yes P091 Late Germany NSS Yes - P09 Intermediate Germany NSS Yes - P094 Intermediate Germany NSS Yes - P095 Early Germany NSS Yes - P096 Early Germany NSS Yes - P097 Early Germany NSS Yes - P100 Intermediate Germany NSS Yes - P101 Intermediate Germany NSS Yes - P103 Intermediate Germany NSS Yes - P104 Intermediate Germany NSS Yes - P105 Early Germany NSS Yes - P106 Intermediate Germany NSS Yes - P107 Late Germany NSS Yes - P110 Early Germany NSS Yes - P111 Intermediate Germany NSS Yes - P11 Early Germany NSS Yes - P114 Late Germany NSS Yes - P115 Intermediate Germany NSS Yes - P118 Intermediate Germany NSS Yes - P119 Late Germany NSS Yes - P10 Late Germany NSS Yes - P1 Intermediate Germany NSS Yes - P16 Intermediate Germany NSS Yes - P17 Late Germany NSS Yes - P18 Late Germany NSS Yes - P19 Intermediate Germany NSS Yes - P130 Early Germany NSS Yes - P131 Intermediate Germany NSS Yes - P133 Intermediate Germany NSS Yes - P135 Intermediate Germany NSS Yes Yes P136 Intermediate Germany NSS Yes - P137 Early Germany NSS Yes - P14 Intermediate Germany NSS Yes - P144 Intermediate Germany NSS Yes - P145 Early Germany NSS Yes - P146 Early Germany NSS Yes - P148 Early Germany NSS Yes - P39 Late USA NSS Yes Yes Pa31 Early USA SS Yes Yes Supplementary information, page 17

18 Pa405 Early USA NSS Yes Yes Pa91 Late USA NSS Yes Yes S015 Late Germany NSS Yes - S016 Intermediate Germany NSS Yes - S018 Late Germany NSS Yes - S00 Intermediate Germany NSS Yes Yes S01 Late Germany NSS Yes - S05 Intermediate Germany NSS Yes Yes S033 Late Germany NSS Yes - S034 Late Germany NSS Yes - S035 Intermediate Germany NSS Yes - S036 Intermediate Germany NSS Yes - S037 Intermediate Germany NSS Yes - S040 Intermediate Germany NSS Yes - S044 Intermediate Germany NSS Yes - S046 Early Germany NSS Yes - S048 Intermediate Germany NSS Yes - S049 Intermediate Germany NSS Yes - S050 Late Germany NSS Yes - S051 Intermediate Germany NSS Yes - S05 Intermediate Germany NSS Yes - S053 Intermediate Germany NSS Yes - S054 Late Germany NSS Yes - S055 Early Germany NSS Yes - S058 Early Germany NSS Yes - S060 Early Germany NSS Yes - S065 Intermediate Germany NSS Yes - S066 Intermediate Germany NSS Yes - S067 Intermediate Germany NSS Yes - S069 Intermediate Germany NSS Yes Yes S070 Early Germany NSS Yes - S073 Early Germany NSS Yes - SDp54 Early USA NSS Yes - T3 Late USA NSS Yes Yes T8 Late USA NSS Yes Yes UH00 Intermediate Germany NSS Yes - UH50 Intermediate Germany NSS Yes - UH301 Intermediate Germany NSS Yes Yes UH303 Intermediate Germany NSS Yes Yes UH304 Intermediate Germany NSS Yes Yes W117 Early USA NSS Yes - W117HT Early USA NSS Yes - W153R Intermediate USA NSS Yes Yes W18B Early USA NSS Yes - W18E Intermediate USA NSS Yes Yes W3 Late USA NSS Yes Yes W401 Early USA NSS Yes Yes Supplementary information, page 18

19 W60S Late USA NSS Yes Yes W604S Intermediate USA NSS Yes Yes W64A Late USA NSS Yes Yes W79A Early USA NSS Yes Yes W9 Early USA NSS Yes Yes WH Early USA NSS Yes Yes WJ Early USA NSS Yes Yes Supplementary information, page 19

20 Supplementary Table List of measured metabolites. Metabolite 1,3-Diaminopropane -Aminobutyric acid -oxo-glutaric acid 6-alpha-Mannobiose β-alanine γ-aminobutyric acid p-coumaric acid Adenosine Alanine Amino acids Ascorbic acid Asparagine Aspartic acid Benzoic acid Caffeic acid Chlorogenic acid Chlorophyll A Chlorophyll B Citramalic acid Dopamine Ethanolaminie Ferulic acid Fructose Fumerate Galactinol dihydrate Gentiobiose Glucopyranose Glucose Glucose-6-phosphate Glutamic acid Glyceric acid Glyceric acid-3-phosphat Glycerol Glycine Homoserine Hydroxypyridine Isoleucine Itaconic acid Leucine Lysine MaLate Maltose Methionin Myo-Inositol Nitrate O-Acetylserine Octacosanoic acid Oxaloacetate Palmitic acid Phenylalanine Phenylpyruvic acid Phosophoric acid Phosphoric acid monomethyl ester Supplementary information, page 0

21 Proline Protein (total) Putrescine Pyruvic acid Quinic acid Raffinose Rhamnose Ribitol Serine Spermidine Starch Succinic acid Sucrose Threonic acid-1,4-lactone Threonine Triacontanoic acid Tyramine Tyrosine Valine Xylose 57 metabolites with unknown chemical structure Supplementary Table 3 Summary of whole-genome and metabolic prediction within the core set. GCA h GCA w M r( y,y ˆ ) SNPs r g,g s.d. ˆ ( ˆ ) r( y,y ) Metabolites Dry matter yield Plant height Dry matter concentration Female flowering Starch content Sugar content Lignin content Prediction accuracies ( r g,g ) averaged over all cross-validation runs and their standard deviations ( ˆ ) (s.d.) are shown for models using either SNPs or metabolites within the core set of 14 lines only. Repeatabilities of the used metabolic profile ( w ) were calculated as the weighted sum of the repeatabilities of the individual metabolites (Methods). Heritabilites of the predicted traits ( h ) were calculated using raw data of the core set only. M r( g,g ˆ ) s.d. GCA Supplementary information, page 1

22 Supplementary Table 4 Predictive abilities of whole-genome and metabolic prediction within the core set using different subgroups as validation sets. r( y,y ˆ ) (s.d.) GCA SNPs Metabolites SS NSS EU NA SS NSS EU NA Dry matter yield 0.40 (0.34) 0.68 (0.11) 0.6 (0.0) 0.76 (0.10) 0.74 (0.14) 0.39 (0.18) 0.41 (0.36) 0.44 (0.0) Plant height 0.45 (0.8) 0.64 (0.11) 0.41 (0.33) 0.74 (0.10) 0.6 (0.34) 0.45 (0.16) 0.34 (0.31) 0.49 (0.17) Dry matter concentration 0.89 (0.1) 0.60 (0.13) 0.7 (0.37) 0.68 (0.11) 0.74 (0.8) 0.51 (0.15) 0.37 (0.35) 0.53 (0.16) Female flowering 0.91 (0.14) 0.69 (0.10) 0.4 (0.34) 0.80 (0.08) 0.49 (0.41) 0.55 (0.13) 0.5 (0.34) 0.58 (0.15) Starch content 0.51 (0.9) 0.56 (0.15) 0.1 (0.37) 0.70 (0.11) 0.67 (0.1) 0.48 (0.16) 0.16 (0.34) 0.54 (0.16) Sugar content 0.51 (0.33) 0.51 (0.16) 0.11 (0.40) 0.70 (0.10) 0.33 (0.34) 0.37 (0.17) 0.14 (0.38) 0.47 (0.17) Lignin content 0.54 (0.3) 0.66 (0.1) 0.49 (0.30) 0.78 (0.09) 0.47 (0.34) 0.4 (0.18) 0.36 (0.9) 0.48 (0.19) Predictive abilities ( r y,y ) averaged over all cross-validation runs and their standard deviations ( ˆ ) (s.d.) are shown for models using either SNPs or metabolites within the core set of 14 lines only. For each validation population, predictive abilities were calculated separately for four subgroups of lines. The division was based on either breeding subgroups (Stiff Stalk, SS; Non-Stiff Stalk, NSS) or geographical origin (Europe, EU; North America, NA). The average numbers of lines in the validation populations were 6 for SS, 0 for NSS, 8 for EU, and 16 for NA. Supplementary information, page

23 Supplementary Note 1. Near-infrared spectroscopy (NIRS). A set of 55 maize genotypes served as the reference set to build the NIRS calibrations. The reference set was randomized as a randomized complete block design with two field replications and grown in the same environments as the testcrosses. Reference samples of 1.5 kg chopped whole plant material were collected from 66 plots in 008 (the second field replication was missing in some cases) and from 18 genotypes from one field replication in 009. Each field-plot sample was dried to constant weight at 55 C and ground with a Retsch mill to pass through a 1 mm grit. Starch and sugar contents in the reference samples were determined following the procedures prescribed by the European Commission. Lignin was determined as acid detergent lignin (ADL) following Goering and van Soest 1. Near infrared spectra were collected from the reference samples and from plant samples of one field replication of all field trials of all testcrosses. The spectra were measured using a laboratory NIRSystems 6500 spectrometer (FOSS NIRSystems, Inc. Silver Spring, MD, USA), equipped with sample cups having 3.5-cm diameter for measuring ground material. Spectra were collected in the nm spectral range at an interval of nm. For each field-plot sample, one sample cup was filled with sample material and measured in duplicate. Spectra were averaged over all technical and field replications. Partial least squares regression (PLSR) was used to develop calibration models. The first derivative of the spectra was used. Additionally, spectra were subjected to smoothing over a gap of four wavelengths and multiplicative scatter correction. For calibration development, the spectra were scaled to unit variance and centered to mean zero. From the complete spectral range, wavelengths at 8 nm intervals were included in the calibrations. The following validation scheme was applied: A set of 60 samples from a random draw of 18 genotypes was chosen as a prediction set whereas the remaining samples were used for developing the PLSR models. The model parameters were estimated in a 500 times repeated cross-validation using three quarters for calibration and one quarter for validation. The minimum root mean square error of prediction from cross-validation was used as the criterion for choosing the optimum number of components to be used for predicting. An upper limit was set to 15 components. In each cross-validation run, the developed model with the optimum number of components was applied to the prediction set to get appropriate performance statistics. The average coefficients of determination (R ) in the prediction set was 0.90 for starch content, 0.95 for sugar content, and 0.77 for lignin content. The final calibration models were applied on NIR spectra collected from all field trials of all testcrosses to predict phenotypic values per field plot. These were further analyzed to estimate GCA values of the lines (see section.). Supplementary information, page 3

24 . Statistical analysis of phenotypic data..1. Metabolites. Estimates of metabolite levels on a genotype-mean basis were obtained using the following linear mixed model in the notation of Patterson and Piepho:,3 G + M + M R + M R A + M R A N : M R A B with effects genotype (G), trial of maturity group (M) captured with the common checks, field replication (R), block (B), batch (A), and technical replication (N). Fixed and random effects are separated by a colon and random effects follow fixed effects. To achieve homoscedasticity of the residuals, a Box-Cox power transformation was applied to all metabolic traits with the optimum transformation parameter estimated by the maximum likelihood method described by Piepho 4 using a grid search between 0 and 1 with 100 steps. The 1.1 % missing values in the metabolite matrix were imputed using a Bayesian PCA 5. Estimates of variance components σ (genotypic variance) and g σ e (error variance) were estimated with REML, considering their corresponding effects as random. Repeatability of metabolite i was obtained as w σ g M = i σ e σ g + r where r is the number of field replications... General combining ability (GCA). GCA values of the inbred lines were obtained in a joint linear mixed model analysis of both testcross populations over all six environments: GCA + T + SCA : GCA E + T E + SCA E + E + E M +E M R + E M R B with factors GCA of inbred lines, tester (T), specific combining abilitity (SCA) being the interaction between inbred lines and testers. Factors M, R, and B are the same as for the model for the metabolites. Block variances and error variances were allowed to be independent in every maturity group trial in each environment. Estimates of variance components variance), σ GCA E (GCA-by-environment interaction variance), σ SCA (SCA variance), σ GCA (GCA σ SCA E (SCA-by-environment interaction variance), and σ e (pooled error variance) were estimated with REML, considering their corresponding effects as random. Dummy variables were used to estimate these variance components only for the investigated testcrosses in order to eliminate the inflation of these variance components due to the superior performing commercial check hybrids 6. Supplementary information, page 4

25 Heritability of GCA values was obtained as h GCA GCA = σgca E σsca σsca E σe σ GCA σ e t te ter Where e is the number of environments, t is the number of testers, and r is the number of field replications. Mixed model calculations were performed using ASReml-R References. 1. Goering, H.K. & Van Soest, P.J. Forage Fiber Analysis. In: Agr. Handbook No. 379 (USDA-ARS, Washington D.C., USA, 1970).. Patterson, H.D. Analysis of series of variety trials. In: Kempton, R.A., Fox, P.N. (eds.) Statistical methods for plant variety evaluation (Chapman & Hall, London, UK, 1997). 3. Piepho, H.P., Büchse, A. & Emrich, K. A Hitchhiker s Guide to Mixed Models for Randomized Experiments. J. Agronomy & Crop Sci. 189, 10-3 (003). 4. Piepho, H.-P. Data transformation in statistical analysis of field trials with changing treatment variance. Agronomy J. 101, (009). 5. Oba, S. et al. A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19, (003). 6. Piepho, H.P., Williams, E.R., & Fleck, M. A Note on the Analysis of Designed Experiments with Complex Treatment Structure. HortScience 41, (006). 7. Butler, D.G., Cullis, B.R., Gilmour, A.R., & Gogel, B.J. ASReml-R reference manual. Version 3 (Queensland Department of Primary Industries and Fisheries, Australia, 009). Supplementary information, page 5

Similar documents

Asreml-R: an R package for mixed models using residual maximum likelihood

Asreml-R: an R package for mixed models using residual maximum likelihood David Butler 1 Brian Cullis 2 Arthur Gilmour 3 1 Queensland Department of Primary Industries Toowoomba 2 NSW Department of Primary

More information

DeltaGen: Quick start manual

1 DeltaGen: Quick start manual Dr. Zulfi Jahufer & Dr. Dongwen Luo CONTENTS Page Main operations tab commands 2 Uploading a data file 3 Matching variable identifiers 4 Data check 5 Univariate analysis

More information

warm-up exercise Representing Data Digitally goals for today proteins example from nature

warm-up exercise Representing Data Digitally goals for today proteins example from nature Representing Data Digitally Anne Condon September 6, 007 warm-up exercise pick two examples of in your everyday life* in what media are the is represented? is the converted from one representation to another,

More information

Der Nutzen von teilwiederholten Versuchen bei Sonnenblumen

Sommertagung AG Landwirtschaftliches Versuchswesen 29. 30. Juni 2017 Die Universität Hohenheim Der Nutzen von teilwiederholten Versuchen bei Sonnenblumen 29.06.2017 Christian Sponagel, Hans-Peter Piepho,

More information

Breeding View A visual tool for running analytical pipelines User Guide Darren Murray, Roger Payne & Zhengzheng Zhang VSN International Ltd

Breeding View A visual tool for running analytical pipelines User Guide Darren Murray, Roger Payne & Zhengzheng Zhang VSN International Ltd January 2015 1. Introduction The Breeding View is a visual tool

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 2015 MODULE 4 : Modelling experimental data Time allowed: Three hours Candidates should answer FIVE questions. All questions carry equal

More information

Exploratory data analysis for microarrays

Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA

More information

Biostatistics and Bioinformatics Molecular Sequence Databases

Biostatistics and Bioinformatics Molecular Sequence Databases . 1 Description of Module Subject Name Paper Name Module Name/Title 13 03 Dr. Vijaya Khader Dr. MC Varadaraj 2 1. Objectives: In the present module, the students will learn about 1. Encoding linear sequences

More information

Chapter 15 Mixed Models. Chapter Table of Contents. Introduction Split Plot Experiment Clustered Data References...

Chapter 15 Mixed Models. Chapter Table of Contents. Introduction Split Plot Experiment Clustered Data References... Chapter 15 Mixed Models Chapter Table of Contents Introduction...309 Split Plot Experiment...311 Clustered Data...320 References...326 308 Chapter 15. Mixed Models Chapter 15 Mixed Models Introduction

More information

ASReml: AN OVERVIEW. Rajender Parsad 1, Jose Crossa 2 and Juan Burgueno 2

ASReml: AN OVERVIEW. Rajender Parsad 1, Jose Crossa 2 and Juan Burgueno 2 ASReml: AN OVERVIEW Rajender Parsad 1, Jose Crossa 2 and Juan Burgueno 2 1 I.A.S.R.I., Library Avenue, New Delhi - 110 012, India 2 Biometrics and statistics Unit, CIMMYT, Mexico ASReml is a statistical

More information

Breeding Guide. Customer Services PHENOME-NETWORKS 4Ben Gurion Street, 74032, Nes-Ziona, Israel

Breeding Guide. Customer Services PHENOME-NETWORKS 4Ben Gurion Street, 74032, Nes-Ziona, Israel Breeding Guide Customer Services PHENOME-NETWORKS 4Ben Gurion Street, 74032, Nes-Ziona, Israel www.phenome-netwoks.com Contents PHENOME ONE - INTRODUCTION... 3 THE PHENOME ONE LAYOUT... 4 THE JOBS ICON...

More information

Practical OmicsFusion

Practical OmicsFusion Introduction In this practical, we will analyse data, from an experiment which aim was to identify the most important metabolites that are related to potato flesh colour, from an

More information

Step-by-Step Guide to Advanced Genetic Analysis

Step-by-Step Guide to Advanced Genetic Analysis Page 1 Introduction In the previous document, 1 we covered the standard genetic analyses available in JMP Genomics. Here, we cover the more advanced options

More information

STATISTICAL PACKAGE FOR AGRICULTURAL RESEARCH (SPAR 2.0)

STATISTICAL PACKAGE FOR AGRICULTURAL RESEARCH (SPAR 2.0) Sangeeta Ahuja, P.K. Malhotra, V.K. Bhatia and Rajender Parsad I.A.S.R.I., Library Avenue, New Delhi 110 012 vkbhatia@iasri.res.in 1. Introduction

More information

Challenges of Statistical Analysis/Control in a Continuous Process

PQRI workshop on Sample Sizes for Decision Making in New Manufacturing Paradigms Challenges of Statistical Analysis/Control in a Continuous Process Fernando Muzzio, Professor II Director, ERC-SOPS Rutgers

More information

Estimating Variance Components in MMAP

Last update: 6/1/2014 Estimating Variance Components in MMAP MMAP implements routines to estimate variance components within the mixed model. These estimates can be used for likelihood ratio tests to compare

More information

Amino Acid Graph Representation for Efficient Safe Transfer of Multiple DNA Sequence as Pre Order Trees

International Journal of Bioinformatics and Biomedical Engineering Vol. 1, No. 3, 2015, pp. 292-299 http://www.aiscience.org/journal/ijbbe Amino Acid Graph Representation for Efficient Safe Transfer of

More information

Feed Check Sample No Meat and Bone Meal (Pork) Association of American Feed Control Officials

Feed Check Sample No Meat and Bone Meal (Pork) Association of American Feed Control Officials Feed Check Sample No. - 200997 Meat and Bone Meal (Pork) Association of American Feed Control Officials - Pass 1 Results for 193 Labs - - Pass 2 Results for 192 Labs - No. Average No. Average AOAC Method

More information

Gene Clustering & Classification

BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

of genomic prediction models

Genetics: Published Articles Ahead of Print, published on May 17, 2010 as 10.1534/genetics.110.116426 1 2 Graph-based data selection for the construction of genomic prediction models 3 Steven Maenhout,1,

More information

Cluster Analysis for Microarray Data

Cluster Analysis for Microarray Data Seventh International Long Oligonucleotide Microarray Workshop Tucson, Arizona January 7-12, 2007 Dan Nettleton IOWA STATE UNIVERSITY 1 Clustering Group objects that

More information

Metabolomic Data Analysis with MetaboAnalyst

Metabolomic Data Analysis with MetaboAnalyst User ID: guest6522519400069885256 April 14, 2009 1 Data Processing and Normalization 1.1 Reading and Processing the Raw Data MetaboAnalyst accepts a variety

More information

MACAU User Manual. Xiang Zhou. March 15, 2017

MACAU User Manual. Xiang Zhou. March 15, 2017 MACAU User Manual Xiang Zhou March 15, 2017 Contents 1 Introduction 2 1.1 What is MACAU...................................... 2 1.2 How to Cite MACAU................................... 2 1.3 The Model.........................................

More information

Feed Check Sample No Preconditioning/Receiving Chow, Med Association of American Feed Control Officials

Feed Check Sample No Preconditioning/Receiving Chow, Med Association of American Feed Control Officials Feed Check Sample No. - 200929 Preconditioning/Receiving Chow, Med Association of American Feed Control Officials - Pass 1 Results for 212 Labs - - Pass 2 Results for 211 Labs - No. Average No. Average

More information

Overview. Background. Locating quantitative trait loci (QTL)

Overview. Background. Locating quantitative trait loci (QTL) Overview Implementation of robust methods for locating quantitative trait loci in R Introduction to QTL mapping Andreas Baierl and Andreas Futschik Institute of Statistics and Decision Support Systems

More information

Fly wing length data Sokal and Rohlf Box 10.1 Ch13.xls. on chalk board

Fly wing length data Sokal and Rohlf Box 10.1 Ch13.xls. on chalk board Model Based Statistics in Biology. Part IV. The General Linear Model. Multiple Explanatory Variables. Chapter 13.6 Nested Factors (Hierarchical ANOVA ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6,

More information

Non-destructive techniques in seed quality determination

Non-destructive techniques in seed quality determination Merete Halkjær Olesen Aarhus University Science & Technology Department of Agroecology TATION presen Outline Seed anatomy and the importance of

More information

Pre-processing method minimizing the need for reference analyses

JOURNAL OF CHEMOMETRICS J. Chemometrics 2001; 15: 123 131 Pre-processing method minimizing the need for reference analyses Per Waaben Hansen* Foss Electric A/S, Slangerupgade 69, DK-3400 Hillerød, Denmark

More information

Building and Animating Amino Acids and DNA Nucleotides in ShockWave Using 3ds max

1 Building and Animating Amino Acids and DNA Nucleotides in ShockWave Using 3ds max MIT Center for Educational Computing Initiatives THIS PDF DOCUMENT HAS BOOKMARKS FOR NAVIGATION CLICK ON THE TAB TO THE

More information

Validation of a Direct Analysis in Real Time Mass Spectrometry (DART-MS) Method for the Quantitation of Six Carbon Sugar in Saccharification Matrix

Validation of a Direct Analysis in Real Time Mass Spectrometry (DART-MS) Method for the Quantitation of Six Carbon Sugar in Saccharification Matrix Daudi Saang onyo, a,b Gary Selby b and Darrin L. Smith*

More information

1. Estimation equations for strip transect sampling, using notation consistent with that used to

1. Estimation equations for strip transect sampling, using notation consistent with that used to Web-based Supplementary Materials for Line Transect Methods for Plant Surveys by S.T. Buckland, D.L. Borchers, A. Johnston, P.A. Henrys and T.A. Marques Web Appendix A. Introduction In this on-line appendix,

More information

Feed Check Sample No Chicken Starter/Grower, Medicated Association of American Feed Control Officials

Feed Check Sample No Chicken Starter/Grower, Medicated Association of American Feed Control Officials Feed Check Sample No. - 200926 Chicken Starter/Grower, Medicated Association of American Feed Control Officials - Pass 1 Results for 207 Labs - - Pass 2 Results for 206 Labs - No. Average No. Average AOAC

More information

NIR Technologies Inc. FT-NIR Technical Note TN 005

NIR Technologies Inc. FT-NIR Technical Note TN 00 IDENTIFICATION OF PURE AND BLENDED FOOD SPICES Fourier Transform Near Infrared Spectroscopy (FT-NIR) (Fast, Accurate, Reliable, and Non-destructive) Identification

More information

Experiment 5: Exploring Resolution, Signal, and Noise using an FTIR CH3400: Instrumental Analysis, Plymouth State University, Fall 2013

Experiment 5: Exploring Resolution, Signal, and Noise using an FTIR CH3400: Instrumental Analysis, Plymouth State University, Fall 2013 Adapted from JP Blitz and DG Klarup, "Signal-to-Noise Ratio, Signal

More information

2016 Stat-Ease, Inc. & CAMO Software

2016 Stat-Ease, Inc. & CAMO Software Multivariate Analysis and Design of Experiments in practice using The Unscrambler X Frank Westad CAMO Software fw@camo.com Pat Whitcomb Stat-Ease pat@statease.com Agenda Goal: Part 1: Part 2: Show how

More information

2014 Stat-Ease, Inc. All Rights Reserved.

2014 Stat-Ease, Inc. All Rights Reserved. What s New in Design-Expert version 9 Factorial split plots (Two-Level, Multilevel, Optimal) Definitive Screening and Single Factor designs Journal Feature Design layout Graph Columns Design Evaluation

More information

Lab #9: ANOVA and TUKEY tests

Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for

More information

Feed Check Sample No Foundation Cattle Mineral, Medicated Association of American Feed Control Officials

Feed Check Sample No Foundation Cattle Mineral, Medicated Association of American Feed Control Officials Feed Check Sample No. - 200927 Foundation Cattle Mineral, Medicated Association of American Feed Control Officials - Pass 1 Results for 170 Labs - - Pass 2 Results for 168 Labs - No. Average No. Average

More information

Database system. Régis Mollard

Database system. Régis Mollard Database system The use of an on-line anthropometric database system for morphotype analysis and sizing system adaptation for different world market apparel sportwear Régis Mollard Paris Descartes University

More information

) I R L Press Limited, Oxford, England. The protein identification resource (PIR)

) I R L Press Limited, Oxford, England. The protein identification resource (PIR) Volume 14 Number 1 Volume 1986 Nucleic Acids Research 14 Number 1986 Nucleic Acids Research The protein identification resource (PIR) David G.George, Winona C.Barker and Lois T.Hunt National Biomedical

More information

Multivariate Analysis Multivariate Calibration part 2

Multivariate Analysis Multivariate Calibration part 2 Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Linear Latent Variables An essential concept in multivariate data

More information

Package EBglmnet. January 30, 2016

Package EBglmnet. January 30, 2016 Type Package Package EBglmnet January 30, 2016 Title Empirical Bayesian Lasso and Elastic Net Methods for Generalized Linear Models Version 4.1 Date 2016-01-15 Author Anhui Huang, Dianting Liu Maintainer

More information

Visualizing and Exploring Data

Visualizing and Exploring Data Sargur University at Buffalo The State University of New York Visual Methods for finding structures in data Power of human eye/brain to detect structures Product of eons

More information

Introduction to GE Microarray data analysis Practical Course MolBio 2012

Introduction to GE Microarray data analysis Practical Course MolBio 2012 Claudia Pommerenke Nov-2012 Transkriptomanalyselabor TAL Microarray and Deep Sequencing Core Facility Göttingen University Medical

More information

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination

More information

The TargetSearch Package

The TargetSearch Package Alvaro Cuadros-Inostroza, Jan Lisec, Henning Redestig and Matthew A Hannah Max Planck Institute for Molecular Plant Physiology Potsdam, Germany http://www.mpimp-golm.mpg.de/ October

More information

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems Shari Kraber Pat Whitcomb Martin Bezener Stat-Ease, Inc. Stat-Ease, Inc. Stat-Ease, Inc. 221 E. Hennepin Ave. 221 E. Hennepin Ave. 221 E.

More information

CTL mapping in R. Danny Arends, Pjotr Prins, and Ritsert C. Jansen. University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1

CTL mapping in R. Danny Arends, Pjotr Prins, and Ritsert C. Jansen. University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1 CTL mapping in R Danny Arends, Pjotr Prins, and Ritsert C. Jansen University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1 First written: Oct 2011 Last modified: Jan 2018 Abstract: Tutorial

More information

powered by TAS Technology

powered by TAS Technology Highest Performance - Unmatched Accuracy and Reliability Built Tough for Plant Floor and Lab Environments Fast - Results in 30 Seconds or Less for Immediate Feedback Easy to Implement, Operate and Maintain

More information

Article A Novel Principal Component Analysis Method for the Reconstruction of Leaf Reflectance Spectra and Retrieval of Leaf Biochemical Contents

Article A Novel Principal Component Analysis Method for the Reconstruction of Leaf Reflectance Spectra and Retrieval of Leaf Biochemical Contents Liangyun Liu *, Bowen Song, Su Zhang and Xinjie Liu Key

More information

Positional Amino Acid Frequency Patterns for Automatic Protein Annotation

UNIVERSIDADE DE LISBOA FACULDADE DE CIÊNCIAS DEPARTAMENTO DE INFORMÁTICA Positional Amino Acid Frequency Patterns for Automatic Protein Annotation Mestrado em Bioinformática e Biologia Computacional Bioinformática

More information

Tutorial of the Breeding Planner (BP) for Marker Assisted Recurrent Selection (MARS)

Tutorial of the Breeding Planner (BP) for Marker Assisted Recurrent Selection (MARS) BP system consists of three tools relevant to molecular breeding. MARS: Marker Assisted Recurrent Selection MABC: Marker

More information

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE

More information

Multivariate Calibration Quick Guide

Multivariate Calibration Quick Guide Last Updated: 06.06.2007 Table Of Contents 1. HOW TO CREATE CALIBRATION MODELS...1 1.1. Introduction into Multivariate Calibration Modelling... 1 1.1.1. Preparing Data... 1 1.2. Step 1: Calibration Wizard

More information

Dynamic Programming: Sequence alignment. CS 466 Saurabh Sinha

Dynamic Programming: Sequence alignment CS 466 Saurabh Sinha DNA Sequence Comparison: First Success Story Finding sequence similarities with genes of known function is a common approach to infer a newly

More information

powered by TRUE NIR Technology

powered by TRUE NIR Technology Accurate Results in 30 Seconds Built-in Computer with Solid State Drive Easy Implementation Flexible Sample Presentation Pre-Calibrated or Custom Applications Scans up to 2500 nm IDEAL FOR: Incoming raw

More information

Chemometrics. Description of Pirouette Algorithms. Technical Note. Abstract

Chemometrics. Description of Pirouette Algorithms. Technical Note. Abstract 19-1214 Chemometrics Technical Note Description of Pirouette Algorithms Abstract This discussion introduces the three analysis realms available in Pirouette and briefly describes each of the algorithms

More information

New Technology in NIR Spectroscopy. Rachael Glenister and Bethany Steevens IAOM, 2017

New Technology in NIR Spectroscopy Rachael Glenister and Bethany Steevens IAOM, 2017 About Unity Unity Scientific was founded to focus on customer satisfaction and technical expertise. We knew that if

More information

SELECTION OF A MULTIVARIATE CALIBRATION METHOD

SELECTION OF A MULTIVARIATE CALIBRATION METHOD 0. Aim of this document Different types of multivariate calibration methods are available. The aim of this document is to help the user select the proper

More information

Predicting Percentage of Intramuscular Fat Using Two Types of Real-Time Ultrasound Equipment

Predicting Percentage of Intramuscular Fat Using Two Types of Real-Time Ultrasound Equipment A. S. Leaflet R1732 Abebe Hassen, assistant scientist Doyle Wilson, professor of animal science Viren Amin,

More information

NANOCOLOR PC Software for Spectrophotometers

NANOCOLOR PC Software for Spectrophotometers PC Software for Spectrophotometers Version 4.0 Rev. 8 (October, 2010) Instructions for Enzymatic Tests Software Manual Addendum III (2009) Manual Addendum II, Version 4.0, June 2010 Page 2 of 21 1 Introduction

More information

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.

GAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K. GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential

More information

Expectations from 2D NIRS

Expectations from 2D NIRS Sven-Olof Lundqvist, Thomas Grahn Innventia Presentation at Trees4Future Final Conference Brussels, 4-6th April, 2016 Example of application of 2D NIRS in tree breeding Expansion

More information

Analysis of (cdna) Microarray Data: Part I. Sources of Bias and Normalisation

Analysis of (cdna) Microarray Data: Part I. Sources of Bias and Normalisation MICROARRAY ANALYSIS My (Educated?) View 1. Data included in GEXEX a. Whole data stored and securely available b. GP3xCLI on

More information

Bayesian analysis of genetic population structure using BAPS: Exercises

Bayesian analysis of genetic population structure using BAPS: Exercises p S u k S u p u,s S, Jukka Corander Department of Mathematics, Åbo Akademi University, Finland Exercise 1: Clustering of groups of

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

lme4: Mixed-effects modeling with R

Douglas M. Bates lme4: Mixed-effects modeling with R February 17, 2010 Springer Page: 1 job: lmmwr macro: svmono.cls date/time: 17-Feb-2010/14:23 Page: 2 job: lmmwr macro: svmono.cls date/time: 17-Feb-2010/14:23

More information

JMP Genomics. Release Notes. Version 6.0

JMP Genomics. Release Notes. Version 6.0 JMP Genomics Version 6.0 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP, A Business Unit of SAS SAS Campus Drive

More information

THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE

PLS 802 Spring 2018 Professor Jacoby THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE This handout shows the log of a Stata session

More information

TMRPres2D High quality visual representation of transmembrane protein models. User's manual

TMRPres2D High quality visual representation of transmembrane protein models Version 0.91 User's manual Ioannis C. Spyropoulos, Theodore D. Liakopoulos, Pantelis G. Bagos and Stavros J. Hamodrakas Department

More information

Mendel and His Peas Investigating Monhybrid Crosses Using the Graphing Calculator

Mendel and His Peas Investigating Monhybrid Crosses Using the Graphing Calculator 20 Investigating Monhybrid Crosses Using the Graphing Calculator This activity will use the graphing calculator s random number generator to simulate the production of gametes in a monohybrid cross. The

More information

2008 Cemagref. Reprinted with permission.

2008 Cemagref. Reprinted with permission. O. Haavisto and H. Hyötyniemi. 8. Partial least squares estimation of mineral flotation slurry contents using optical reflectance spectra. In: Proceedings of the th Conference on Chemometrics in Analytical

More information

Lecture 11: Classification

Lecture 11: Classification 1 2009-04-28 Patrik Malm Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University 2 Reading instructions Chapters for this lecture 12.1 12.2 in

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

Summer School in Statistics for Astronomers & Physicists June 15-17, Cluster Analysis

Summer School in Statistics for Astronomers & Physicists June 15-17, 2005 Session on Computational Algorithms for Astrostatistics Cluster Analysis Max Buot Department of Statistics Carnegie-Mellon University

More information

Recent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will

Lectures Recent advances in Metamodel of Optimal Prognosis Thomas Most & Johannes Will presented at the Weimar Optimization and Stochastic Days 2010 Source: www.dynardo.de/en/library Recent advances in

More information

Applying the Pathway Activity Profiling - PAPi

Applying the Pathway Activity Profiling - PAPi Raphael Aggio October 30, 2017 Introduction This document describes how to use the Pathway Activity Profiling - PAPi. PAPi is an R package for predicting

More information

Package soil.spec. July 24, 2010

Package soil.spec. July 24, 2010 Package soil.spec July 24, 2010 Type Package Title Soil spectral data exploration and regression functions Version 1.4 Date 2010-07-10 Author Maintainer This package combines existing

More information

Application of genetic algorithm PLS for feature selection in spectral data sets

JOURNAL OF CHEMOMETRICS J. Chemometrics 2000; 14: 643 655 Application of genetic algorithm PLS for feature selection in spectral data sets Riccardo Leardi* Department of Pharmaceutical and Food Chemistry

More information

Predicting Percentage of Intramuscular Fat Using Two Types of Real-Time Ultrasound Equipment

Beef Research Report, 2000 Animal Science Research Reports 2001 Predicting Percentage of Intramuscular Fat Using Two Types of Real-Time Ultrasound Equipment Abebe Hassen Doyle Wilson Viren Amin Gene Rouse

More information

Example Workflow for the Analysis of the Provided Maize Dataset

Image Analysis with IAP Example Workflow for the Analysis of the Provided Maize Dataset Image Analysis Group - Leibniz Institute of Plant Genetics and Crop Plant Research IPK, Corrensstr. 3, 06466 Gatersleben

More information

Hyperspectral Chemical Imaging: principles and Chemometrics.

Hyperspectral Chemical Imaging: principles and Chemometrics aoife.gowen@ucd.ie University College Dublin University College Dublin 1,596 PhD students 6,17 international students 8,54 graduate students

More information

Lecture Overview. Sequence search & alignment. Searching sequence databases. Sequence Alignment & Search. Goals: Motivations:

Lecture Overview. Sequence search & alignment. Searching sequence databases. Sequence Alignment & Search. Goals: Motivations: Lecture Overview Sequence Alignment & Search Karin Verspoor, Ph.D. Faculty, Computational Bioscience Program University of Colorado School of Medicine With credit and thanks to Larry Hunter for creating

More information

Chapter 13 Multivariate Techniques. Chapter Table of Contents

Chapter 13 Multivariate Techniques Chapter Table of Contents Introduction...279 Principal Components Analysis...280 Canonical Correlation...289 References...298 278 Chapter 13. Multivariate Techniques

More information

(DNA#): Molecular Biology Computation Language Proposal

(DNA#): Molecular Biology Computation Language Proposal Aalhad Patankar, Min Fan, Nan Yu, Oriana Fuentes, Stan Peceny {ap3536, mf3084, ny2263, oif2102, skp2140} @columbia.edu Motivation Inspired by the

More information

SAS/STAT 13.1 User s Guide. The NESTED Procedure

SAS/STAT 13.1 User s Guide The NESTED Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute

More information

Data transformation in multivariate quality control

Motto: Is it normal to have normal data? Data transformation in multivariate quality control J. Militký and M. Meloun The Technical University of Liberec Liberec, Czech Republic University of Pardubice

More information

Sequence Alignment & Search

Sequence Alignment & Search Karin Verspoor, Ph.D. Faculty, Computational Bioscience Program University of Colorado School of Medicine With credit and thanks to Larry Hunter for creating the first version

More information

Supplementary Figures

Supplementary Figures re co rd ed ch annels darkf ield MPM2 PI cell cycle phases G1/S/G2 prophase metaphase anaphase telophase bri ghtfield 55 pixel Supplementary Figure 1 Images of Jurkat cells captured

More information

Chapter 6: Linear Model Selection and Regularization

Chapter 6: Linear Model Selection and Regularization As p (the number of predictors) comes close to or exceeds n (the sample size) standard linear regression is faced with problems. The variance of the

More information

DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R

DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R Lee, Rönnegård & Noh LRN@du.se Lee, Rönnegård & Noh HGLM book 1 / 24 Overview 1 Background to the book 2 Crack growth example 3 Contents

More information

The latest trend of hybrid instrumentation

The latest trend of hybrid instrumentation Multivariate Data Processing of Spectral Images: The Ugly, the Bad, and the True The results of various multivariate data-processing methods of Raman maps recorded with a dispersive Raman microscope are

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

Outline. Multivariate analysis: Least-squares linear regression Curve fitting

Outline. Multivariate analysis: Least-squares linear regression Curve fitting DATA ANALYSIS Outline Multivariate analysis: principal component analysis (PCA) visualization of high-dimensional data clustering Least-squares linear regression Curve fitting e.g. for time-course data

More information

Multivariate Analysis

Multivariate Analysis Cluster Analysis Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Unsupervised Learning Cluster Analysis Natural grouping Patterns in the data

More information

Global food production must increase by 70%

Global food production must increase by 70% Published August 23, 2018 The Plant Genome sci en ce note An R Package for Multitrait and Multienvironment Data with the Item-Based Collaborative Filtering Algorithm Osval A. Montesinos-López,* Francisco

More information

Step-by-Step Guide to Relatedness and Association Mapping Contents

Step-by-Step Guide to Relatedness and Association Mapping Contents OBJECTIVES... 2 INTRODUCTION... 2 RELATEDNESS MEASURES... 2 POPULATION STRUCTURE... 6 Q-K ASSOCIATION ANALYSIS... 10 K MATRIX COMPRESSION...

More information

Applied Regression Modeling: A Business Approach

i Applied Regression Modeling: A Business Approach Computer software help: SPSS SPSS (originally Statistical Package for the Social Sciences ) is a commercial statistical software package with an easy-to-use

More information

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.