Trait Analysis by association, Evolution and Linkage (TASSEL) User Manual

Size: px
Start display at page:

Download "Trait Analysis by association, Evolution and Linkage (TASSEL) User Manual"

Transcription

1 Trait Analysis by association, Evolution and Linkage (TASSEL) User Manual

2 Testing and Validation of TASSEL were performed by the Buckler Lab at North Carolina State University and Cornell University Last update: March 13, 2009 For more quick and precise answers, please address your questions to the most pertinent person in the current programming team. General Information Ed Buckler (Project leader) Data import, GDPC, Pipeline Terry Casstevens Data analysis by GLM/MLM Peter Bradbury Zhiwu Zhang Acknowledgement of contributors: Yogesh Ramdoss, Michael E. Oak, and Karin J. Holmberg. The technical editing was provided by N. Stevens for this manuscript. This project is supported by the USDA-ARS and the National Science Foundation. Open source code: Extensive use of the PAL library is made ( Database access is achieved by GDPC middleware ( ii

3 Table of Contents INTRODUCTION 1 1 GETTING START WEB START STAND-ALONE OPEN SOURCE CODE MENU FILE TOOLS PANELS METHOD BUTTONS 6 2 DATA MODE GDPC FILE SEQUENCE ALIGNMENT POLYMORPHISM ALIGNMENT ANNOTATED ALIGNMENT NUMERICAL TRAIT OR COVARIATE SQUARE NUMERICAL MATRIX SITES TAXA GENOTYPE CONVERTER TRANSFORM, STANDARDIZE AND IMPUTE DATA GENOTYPE NUMERICALIZATION TRANSFORM AND/OR STANDARDIZE DATA IMPUTING MISSING DATA PCA SYNONYMIZE TAXA NAMES UNION JOIN INTERSECTION JOIN 22 3 ANALYSIS MODE DIVERSITY 24 iii

4 3.2 LINKAGE DISEQUILIBRIUM CLADOGRAM KINSHIP MATRIX GENERAL LINEAR MODEL MIXED LINEAR MODEL STRUCTURED ASSOCIATION TEST MANUAL SNP EXTRACTOR BULK SNP EXTRACTION 35 4 RESULTS MODE TABLE LD PLOT TREE PLOT CHART D PLOT GENE PLOT VISUAL GENOTYPE HAPLOTYPE VIEWER 43 5 TUTORIAL IMPORTING DATA FROM FLAT FILES SNP DATA PHENOTYPE DATA POPULATION STRUCTURE DATA KINSHIP DATA ANNOTATED ALIGNMENT IMPORTING DATA FROM A DATABASE (VIA GDPC) CONNECTING WITH A DATABASE DATA QUERY IMPORTING GDPC DATA INTO TASSEL SAVING GDPC QUERY RESULTS IMPORTING/EXPORTING DATA TO/FROM A DATA TREE DATA MANIPULATION FILTERING DATA JOINING DATASETS 55 iv

5 5.4.3 MISSING VALUE IMPUTATION PRINCIPAL COMPONENT ANALYSIS (PCA) USING PRINCIPAL COMPONENTS AS STRUCTURE 59 ASSOCIATION ANALYSIS USING GLM ASSOCIATION ANALYSIS USING MLM WORKING WITH GENOTYPE DATA DATA VISUALIZATION D PLOT XY PLOT LD PLOT TREE PLOT 76 6 ASSOCIATION CASE STUDIES CASE 1: PHENOTYPE + CANDIDATE SNP MARKERS (HAPLOID) CASE 2: PHENOTYPE + CANDIDATE SNP MARKERS (HAPLOID) + Q CASE 3: PHENOTYPE + CANDIDATE SNP MARKERS (HAPLOID) + Q +K CASE 4: PHENOTYPE + CANDIDATE SNP MARKERS (HAPLOID) WITH A SET OF RANDOM MARKERS CASE 5: PHENOTYPE + CANDIDATE SSR MARKERS (HAPLOID) CASE 6: PHENOTYPE + CANDIDATE SNP MARKERS (DIPLOID) PRESENTATION OF ASSOCIATION RESULT 81 7 APPENDIX NUCLEOTIDE CODES ASSIGNED BY IUB AS USED IN THE BULK SNP METHOD TASSEL TUTORIAL DATASET BIOGRAPHY OF TASSEL FREQUENTLY ASKED QUESTIONS IN GENERAL WHAT DO I DO IF TASSEL MISBEHAVES? HOW DO I CHANGE THE AMOUNT OF MEMORY USED? WHAT DO I DO WHEN THE EXCEPTION JAVA.LANG.OUTOFMEMORYERROR APPEARS? HOW DO I JOIN THE FUN: TASSEL ON SOURCEFORGE? WHERE DO I TURN FOR MORE INFORMATION? WHEN I CLICK ON THE MOST CURRENT VERSION OF TASSEL WEB START, A PREVIOUS VERSION APPEARS. WHAT SHOULD I DO? WHAT SHOULD I SUBSTITUTE FOR MISSING VALUES IN TASSEL? I HAVE 30 MARKER LOCI (295 GERMPLASM LINES) BUT EACH LOCI HAS 2 ALLELES (DIFFERING IN SIZE). IT SEEMS THAT TASSEL CONSIDERS THE TWO ALLELES AS DIFFERENT LOCI, THUS ACCEPTING 60 LOCI INSTEAD OF 30. IS THERE A WAY TO FIX THIS? IS IT POSSIBLE TO CHANGE DATA NAMES IN THE DATA TREE? HOW TO CREATE TAASEL ICON ON DESKTOP? FREQUENTLY ASKED QUESTIONS ON GLM/ML WHY DO I GET EMPTY SQUARES IN MLM ASSOCIATION ANALYSIS? HOW TO SOLVE NON-CONVERGE PROBLEM? WHY SHOULD I EXCLUDE ONE COLUMN OF THE POPULATION STRUCTURE? I RAN BOTH SAS AND TASSEL ON THE SAME DATA AND GOT RESULTS FROM SAS BUT TASSEL DID NOT OUTPUT AN ANSWER. WHY? IF I HAVE KINSHIP DATA, IS POPULATION STRUCTURE STILL NEEDED WHEN USING THE MLM? OR IS ONLY KINSHIP, TRAIT AND GENOTYPE DATA NEEDED? MY POPULATION CONSISTS OF OUTBRED DIPLOID INDIVIDUALS. THE MATRIX THAT I GENERATED USING THE KIN FUNCTION IN TASSEL WAS DRASTICALLY DIFFERENT FROM THE KINSHIP COEFFICIENT MATRIX THAT I GENERATED USING SPAGEDI (USING THE LOISELLE '95 KINSHIP COEFFICIENT). WHY? 87 v

6 7. HOW CAN I GET R SQUARE FROM SAS PROC MIXED? HOW R SQUARE ARE CALCULATED FOR MODEL AND MARKER IN MLM? HOW HERITABILITY IS DEFINED? DOES MLM FIND MORE ASSOCIATION THAN GLM?? DO I NEED MULTIPLE TEST CORRECTION FOR THE P VALUE FROM TASSEL? HOW TO CREATE KINSHIP MATRIX? REFERENCES 89 vi

7 INTRODUCTION With advances in genotyping technology, including rapid increases in the number of genetic markers available for QTL studies, association analysis is now a viable approach for the dissection of complex genetic traits (CHURCHILL et al. 2004). Trait Analysis by Association, Evolution and Linkage, or TASSEL, makes use of the most advanced statistical methods to maximize statistical power for finding QTL. Both a structured association approach (PRITCHARD et al. 2000; THORNSBERRY et al. 2001) and a unified mixed model method have been implemented to minimize the risk of false positives by integrating population structure and family relatedness within populations (YU et al. 2006). To aid in the interpretation of results, TASSEL allows for linkage disequilibrium statistics to be calculated and visualized graphically. Linkage disequilibrium is estimated by the standardized disequilibrium coefficient, D, as well as r 2 and P-values. Diversity analysis tools are also available, where diversity estimates include average pair-wise divergence (π) and segregating sites. Other notable features of TASSEL include a sequence alignment viewer, extraction of SNPs and indels (insertions & deletions) from alignments, a neighbor-joining cladogram, and a variety of data graphing functions. TASSEL can merge data from different sources into a single analysis data set, impute missing data using a k-nearest-neighbor algorithm (COVER and HART 1967), and conduct principal components analysis (PCA) to reduce a set of correlated phenotypes. Due to the increasing availability of open data sources, TASSEL utilizes a data browser from the Genomic Diversity and Phenotype Connection (GDPC) project (CASSTEVENS and BUCKLER 2004) to provide an interface to relational databases. As a result, TASSEL users can access many open data sources, including genomic diversity data such as SNPs, SSRs and sequence alignments, as well as phenotypic data collected from field, genetic or physiological experiments. By incorporating this middleware which functions by means of a common graphical interface, TASSEL users are freed from creating a SQL query for each new database. At the time of this writing GDPC is connected to Panzea, Gramene, Germinate, and GRIN. TASSEL is written in Java, thereby enabling its use with virtually any operating system. It can be installed very simply using Java Web Start technology, which automatically checks for the most recent version of TASSEL each time the application is instantiated. In addition, Java Web Start will ensure that the correct version of Java Runtime Environment is running, thus avoiding complicated installation and upgrade procedures. 1

8 1 Getting Start The TASSEL software package can be installed using several different methods. At the user level, Web Start or Stand-alone installation can be used. Alternatively, at the developer level, the open source code can be downloaded and used to build the software package. 1.1 Web Start TASSEL can be installed very simply using Java Web Start technology, which automatically checks for the most recent version of TASSEL each time the application is instantiated. In addition, Java Web Start will ensure that the correct version of Java Runtime Environment is running, thus avoiding complicated installation and upgrade procedures. To begin, Java Web Start (JWS) must first be installed (prior to the installation of TASSEL). JWS is included as part of Java Runtime Environment (JRE) 5.0 and above. After the successful installation of JWS, TASSEL can be launched by clicking the following link: If you will be using TASSEL frequently and would prefer to launch the application from your desktop rather than by revisiting the website, Java Web Start can be used to manually launch TASSEL each time and/or to create a shortcut. Access the Java Application Cache Viewer by going to Start > Settings > Control Panel > Java. From the General tab, click on Settings in the Temporary Internet Files section and then click on View Applications and the Java Application Cache Viewer will appear. (Another way of achieving this is by going to Start > Run and typing in javaws). The TASSEL icon should now be visible and can be used to launch the application. Shortcuts can be created from the menu of the Java Application Cache Viewer: Application > Install Shortcuts. 1.2 Stand-alone Downloading a stand-alone version is recommended for anyone who has a slow Internet connection. While Java Web Start is a very good way of deploying software, it does not ask the user before attempting to download updates. Thus, a slow Internet connection may start a download process that requires an unreasonable amount of time to complete. If you are not interested in disabling your network connection each time before starting TASSEL, we recommend downloading the stand-alone version which does not attempt to update the program. However, given that TASSEL is a Java application, a Java Runtime Environment (version or greater) is still required for installation. To run the stand-alone version, double-click on the JAR file (TASSEL<version #>.jar). Alternatively, at a command prompt (in Windows go to Start > Run and type in cmd or command, then type cd to access the directory containing the stand-alone jar file) type the following: java jar <name of tassel jar file> 1.3 Open source code Open source code for the TASSEL software package is available at: The package uses the standard PAL library ( the COLT library ( and jfreechart ( Database access is achieved by GDPC middleware ( 2

9 1.4 Menu The TASSEL menu has several features designed to save the user time and effort File Save Data Tree This feature allows you to save the entire contents of the Data Tree panel to a default location. This is helpful when the user does not wish to recreate a Data Tree panel that is already well populated with information the next time he/she initializes the program. To save a Data Tree, select File > Save Data Tree Open Data Tree To restore a Data Tree that was saved during a previous instantiation of the program, select File > Open Data Tree Save Data Tree As To save the contents of a Data Tree panel to a specific location or to give them a specific name, select File > Save Data Tree As. A dialog will then appear which allows for user input. NOTE: The information outlined above for saving a Data Tree is applicable to files that are, in general, version specific. A Data Tree saved using TASSEL version may reopen using version or it may not. We recommend that you include the TASSEL version number when naming your saved Data Tree file so that, if necessary, you can download a particular standalone version of TASSEL from the downloads archive at Open Multiple Alignment File This feature is used to open multiple sequence alignment flat files that are in FASTA format Save Selected Dataset As This feature allows the user to save a data set as a flat file in Phylip sequential format by selecting a sequence alignment Tools The tools menu contains contingency test and option to set preference. 3

10 Contingency Test This utility calculates a chi-square contingency test or Fisher exact test (when using only the 2 x 2 table of observations) using the same algorithm as is used in determining linkage disequilibrium Preferences The Quality Score Colors tab, found in the Preferences dialog box, allows the user to set cutoff values for visualizing quality score values on a sequence alignment or a set of called SNPs. 4

11 To set a desired threshold, simply adjust the slider on the left side of the dialog. Ns, - (dashes), and alignments without any quality score information have a default value of -1 (minus one). 1.5 Panels TASSEL is organized into four main panels: the Options panel, Data Tree panel, Report panel and Main panel. The Options panel runs across the top of the users viewing area. All method buttons are located within this panel. The Data Tree panel is located beneath the Options panel on the left side of the viewing area. This panel organizes retrieved data and results into a structured format. Dataset(s) displayed in the Data Tree panel must first be selected before a desired function or analysis can be performed. To select multiple datasets, press the CTRL key in combination with mouse selecting. The Report panel is located on the bottom left of the viewing area. It displays information about a selected dataset(s) from the Data Tree panel, such as the type of data and how it was created. The Main panel occupies the right side of the viewing area. It displays the content of a selected dataset(s) from the Data Tree panel. 5

12 Option panel Main panel Data tree panel Report panel When a sequence alignment is loaded into TASSEL, users have the option of several different views. To select your preferred view, hold down the Control button while clicking on the Main Panel (or simply right-click if you are using a two-button mouse). This brings up a context-sensitive menu with the following options: Match: Displays only those bases that do not match the top row. All other bases appear as.. Color Based on Top Row: All bases are displayed, but those bases that differ from the top row will be red in color. Color Consensus: All indels and all sites of variation will be red in color. Color Based on Quality Scores: If the alignment or called SNPs has associated quality scores, bases will be colored according to the cutoffs set in Tools > Preferences > Quality Score Colors from the Menu. 1.6 Method Buttons 6

13 All method buttons are placed in options panel. The buttons on top row are the buttons for Mode Selectors, Delete, Progress Bar, Print, Save and help. The buttons on the bottom row depend the mode of chosen. By default, the buttons for Data Mode are displayed. To make something happen in TASSEL, data must be imported into data tree panel. Click data set in the Data Tree and then click on the buttons across the top to work with those data set. All of TASSEL s functions are accessed from the button toolbar which is found below the Menu and above the Main Panel and Data Tree. The button toolbar has three buttons which divide the applications behavior into three categories or modes: 1) Data: import and manipulate data sets 2) Analysis: linkage disequilibrium, diversity, selection, association, etc. 3) Results: visualization tools The various functions available of each mode are described in the next three main sections. 7

14 2 Data Mode Data mode serves the purposes of importing data and preliminary data manipulation. Data mode is the default mode when TASSEL starts. The Data mode can be switched back from other mode (Analysis and Result) by clicking the Data mode button. Tassel has two ways of importing data. One way is via GDPC to import data from databases. The other way is through flat files in format of SNP genotype, polymorphism genotype, phenotype (Trait), population structure and kinship. The preliminary data manipulations includes filtering data by site or taxa, joining data (union or intersection), and etc. 2.1 GDPC Genotype and phenotype data generated from numerous genomic research projects are still valuable resources for the public, even after results are published. Some of these data have been migrated to several databases and can be accessed using Genotype Data and Phenotype Connection (GDPC). GDPC is middleware that eliminates the need for end users of data to understand various database schemas and write raw SQL queries to extract data. Instead, the GDPC browser provides a single, easy-to-use interface which can extract genotype and phenotype data from a variety of sources (CASSTEVENS and BUCKLER 2004). Currently, GDPC has connections to the following databases: Gramene diversity for maize, wheat and rice (JAISWAL et al. 2002; WARE et al. 2002a; WARE et al. 2002b; YAMAZAKI and JAISWAL 2005). Panzea (CANARAN et al. 2006; DU et al. 2003; ZHAO et al. 2006). GRIN: Germinate: GDPC has been integrated into TASSEL. Operation of GDPC in TASSEL is very similar to the operation of GDPC alone. GDPC can be displayed in TASSEL by clicking on the GDPC button in Data mode. 8

15 An additional feature allows TASSEL users to load GDPC data into a TASSEL dataset. Data is available for import once the user has parameterized a query and data is visible in either the Genotypes or Phenotypes tab. To load data, activate either the Genotypes or Phenotypes tab (depending on the data you wish to import) and then click the Load button. For additional information about GDPC, please see File This function provides multiple options to Import files for genotypes, phenotype, populations structure and kinship. Several genotype formats are allowed for genotype data, including sequence, FASTA, polymorphism and annotated alignments. Phenotype and population structure can be imported as numerical trait or covariates. Kinship must be load as squared numerical matrix in order to be used by mixed model. TASSEL datasets generally consist of two parts: a header that defines data structure and a body containing the main data. The first row of header usually contains number of rows, number of columns (or number sequences) and number of rows that name the columns. Tabs used as the delimiters. Missing values are represented by -999 for numerical data, and N,?, or - for genotype data. Users can either specify the file type or use the Guess option to let the program to determine the file type through a dialog box as follows. 9

16 Here we use the Guess function to import all the files from the tutorial dataset. After highlighting all the file and clicking Open button, these files will be displayed on the data tree window. 10

17 2.2.1 Sequence alignment TASSEL accepts the following format for haploid sequence alignment file: TaxaNumber SiteNumber Taxa1Name Sequence Taxa2Name Sequence Note: TASSEL s treatment of taxon names is case-sensitive. For example, b73 and B73 are treated as separate and distinct taxa. Here is an example: 6 30 A272.ID2 A554.ID4 A6.ID5 B103.ID8 AG--GGGGGG GGGGGGGGGG GGG-GGGGGG AGGGGGCG-G GGGGGGGGGG GG--GGGGGG GGGGGGCG-- GGGGGGGGGG G---GGGGGG GGGGGGCG-- -GGGGGGGGG GGG---GGGG 11

18 B14A.ID10?????????????????????????????? B73.ID13 G G TASSEL also accepts some NEXUS formats, so we encourage some trial and error by the user. Files can be interleaved or not and include spaces or not Polymorphism alignment To input multiple allele SSRs, or SNP data, a file must contain a heading with first row consists of three numbers. The first is the number of individuals or lines. The second is the number of loci. The last is the number of ploy (2 for diploid). The first two numbers are separated by a Tab. The last two numbers are separated by a colon. The second row of the heading contains names of loci. In situation of haploid, the ploidy and the colon in front of can cab be omitted. The body consists of one column for the identification of individual or line, followed by additional columns for genotypes over different loci. Columns in the body are separated by a Tab. For each locus, the home alleles are separated by a colon. A missing value is indicated by a question mark (?). Example 1: Haploid SSR data 99 6:1 S01 S02 S03 S04 S05 S06 TX TX TX ? TX Or 99 6 S01 S02 S03 S04 S05 S06 TX TX TX ? TX Example 2: Diploid SSR data 99 6:2 S01 S02 S03 S04 S05 S06 TX01 123: : : : : :228 TX02 121: : : : : :230 TX04 119: : : : :227?:? TX05 119: : : : : :228 Example 3: Diploid SNP data 12

19 99 6:02 S01 S02 S03 S04 S05 S06 TX01 T:T A:T T:T A:A A:T G:T TX02 T:T A:A A:A A:A T:T T:T TX03 A:A T:T T:T T:T A:A T:T Annotated alignment Here is an example: <Annotated> <Transposed> Yes <Taxa_Number> 5 <Locus_Number> 3 <Poly_Type> Catagorical <Delimited_Values> No <Taxon_Name> MO001 MO003 MO005 MO007 MO008 <Chromosome_Number> <Genetic_Position> <Locus_Name> <Value> 1 0 umc1354 BABAB umc1566 AABAB AY AABAB Data for the above annotated alignment would be displayed as follows: Numerical trait or covariate To input trait values or phenotypes by flat file, a text file can be used. Population structure information can be entered using the same format. TASSEL accepts the following format: TaxaNumber TraitNumber HeaderRows Header1.1 Header1.2 etc. (These are usually trait names) 13

20 Header2.1 Header2.2 etc. (These are usually environment names) Taxa1Name Value1 Value2 Taxa2Name Value1 Value2 Example 1: Three traits on 301 maize lines EarHT dpoll EarDia Example 2: Population structure of 286 maize lines Q1 Q2 Q3 STIFFSTALK NONSTIFF TROPICAL Missing values are indicated by Header rows are optional--if more than two are used, the additional headers will be ignored. Population structure information can be loaded from flat files in the same manner as trait files Square numerical matrix Kinship can either be calculated externally from pedigree by using SAS Procedure Inbreeding (SAS 2002.) or from markers by using software packages such as SPAGedi (HARDY and VEKEMANS 2002). If n represents the number of taxa, the format for kinship files is as follows: n Taxa1Name r11 r12 r1n Taxa2Name r21 r22 r2n TaxanName rn1 rn2 rnn Here rij (i, j=1,2,, n) is the element in the kinship matrix located at row i and column j. Missing values are not allowed for kinship matrix. 14

21 Important note: The current format in TASSEL 2.1 and after is different from the format in previous version. 2.3 Sites Subset data The alignment can be filtered in several ways. Monomorphic sites can be eliminated, and various regions of a sequence can be eliminated. Minimum Count - the minimum number of taxa in which the site must have been scored to be included in the filtered dataset (GAP or missing data does not count). Minimum Frequency - the minimum frequency of the minority polymorphisms for the site to be included in the filtered data set. Start Position, End Position, Distance from End establishes the range of sites for filtering. Extract Indels - if selected, indels are extracted from the alignment. If not selected, only point substitutions are extracted. Remove minor SNP states converts tertiary and rarer states to missing data (? ), thereby forcing sites to have only two types of segregating sites at a locus. This can help remove bad sequence effects. 15

22 Generate haplotypes via sliding window creates haplotypes from an ordered set of SNPs. 2.4 Taxa Subset data: remove/select specific taxa from a data set. This sub setting is achieved by selecting either genotypic, phenotypic, or population structure data from the data tree. The resulting dialog box displays the selected data in table format. By using either the CTRL or SHIFT key in conjunction with mouse clicking, the user can select or deselect taxa rows. Once desired taxa has been selected, the Capture Selection or Capture Unselected buttons will send the desired data back to the data tree as a new dataset. 2.5 Genotype Converter This button generates a data set that represents allele heterozygosity, including additive and dominance models. 16

23 Genetic analysis requires that genotype data be converted from allele format into formats more compatible with genetic analysis tools. One such format is genotype status. To convert genotype data to genotype status, highlight the genotype data you wish to convert and then click the Genotype button located in the Options panel. In the resulting dialog box, select the Create alignment based on genotypic state (eg. A:a > Aa) option and then click OK. 2.6 Transform, Standardize and Impute Data This suite of functions allows multiple data manipulation on genotype and phenotype (numerical) data. When a genotype date is selected, a numerical transformation is performed for genotype data. When a numerical data is selected, mathematical transformation, data imputation and principal component analysis (PCA) are performed. The Transform The columns tags will be displayed in a Data dialog box with three tabs: Trans, Impute and PCA Genotype numericalization Two options are provided to transform genotype from character to numerical as shown in the following dialog box. 17

24 2.6.2 Transform and/or Standardize Data The Trans dialog box is the default selection. It has the display as follows. In the Column list, select the column(s) you wish to transform. Then select the type of transformation you wish to execute. Selecting the Standardize checkbox will transform data by subtracting the column mean from the value and then dividing by the column s standard deviation. Clicking on the Create Dataset button will result in the placement of a data set containing only the selected columns in the Data Tree. 18

25 2.6.3 Imputing Missing Data The k-nearest-neighbor algorithm (COVER and HART 1967) is used to impute missing data. Click on the Impute tab to display the following: PCA Principal component analysis can only be performed on a numerical data without missing values. Two options are available: correlation or covarianc. 19

26 2.7 Synonymize Taxa Names This button makes taxa names uniform to permit the joining of data sets. Both the union and intersection joining functions that generate fused data sets work by matching taxa names. Consequently, if multiple names exist for a given taxa (an added suffix, alternative spellings, different naming conventions, etc.) then the two data sets will not join correctly. To help remedy this, the Synonymizer function allows the taxa names of one data set to replace the taxa names of the second data set. It relies on an algorithm that calculates the degree of similarity between names, using that name which is most similar to those in the first data set. When using the Synonymizer, keep in mind that order of selection matters. Always select the data set with the names you wish to use (the real name) first, and then, while holding down the CTRL key, click on the second data set with the taxa names you wish to change (the synonym ). Then click on the Synonymizer button. A synonym data set will be placed on the Data Tree panel under Synonyms. Each name in the data set selected second is now listed in the TaxaSynonym column. Next to this column is a TaxaRealName column listing the highest scoring match derived from the real name data set. The MatchScore column gives an indication of the amount of similarity between the two names (where 0 is no similarity and 1.0 is identity). 20

27 Caution! Before the synonyms are applied, we strongly encourage the user to check the match score, especially for those taxa with low match scores. Once it has been determined that these taxa were interpreted correctly, the synonyms can be applied as following: With the synonyms selected, hold down the CTRL key while clicking on the second/synonym data set (the data set whose names you would like to change). Then once again click on the Synonymizer button to apply the new names to the data set. In the event that some pf taxas are not interpreted satisfactorily, matches can be modified manually. Select the taxa you wish to modify on the left side, and then choose a replacement taxa from the right side. Click the arrow button to substitute the taxa. Click OK to save the change. 21

28 2.8 Union Join This button joins multiple datasets by a union of their taxa. Missing data will be inserted if taxa is missing from one data set. To achieve this function, select multiple datasets using the CTRL key in conjunction with mouse clicks, and then click on the union button to join the datasets. Because this function uses taxa names to join datasets, any variation in taxa names can prevent proper joining. Taxa names can be made uniform by the Synonymizer. 2.9 Intersection Join This button joins multiple datasets by the intersection of their taxa. Taxa must be present in both datasets to be included. 22

29 To achieve this function, select multiple datasets using the CTRL key in conjunction with mouse clicks, and then click on the intersection button to join the datasets. Because this function uses taxa names to join datasets, any variation in taxa names can prevent proper joining. Taxa names can be made uniform by the Synonymizer. 23

30 3 Analysis Mode Analysis mode consists of the following options: Each option is described in detail below: 3.1 Diversity This button executes a basic analysis of diversity. Average pairwise divergence (π), segregating sites, and θ estimates (4Nμ) can be calculated, as well as sliding windows of diversity. To run a diversity analysis, click on a raw sequence alignment, and then select Analysis Diversity. In the resulting Diversity Surveys dialog box, the various site classes available for analysis are listed on the left. If the sequence has no annotation, then only the Overall and Indels options will be active. A sliding window of diversity can also be calculated across the region. To produce a sliding window, check the box next to Sliding Window, and then enter the desired step size and size of the sliding window. Results can be plotted using Results Chart or viewed in a table via Results Table. 24

31 3.2 Linkage Disequilibrium This button generates a linkage disequilibrium data set from SNP data. NOTE: It is important to use only filtered datasets (apply Data Sites first) when estimating linkage disequilibrium, as a raw alignment with numerous invariant bases will take a very long time and consume a large amount of memory to calculate. Linkage disequilibrium between any set of polymorphisms can be estimated by clicking on a filtered set of polymorphisms and then using Analysis Link. Diseq. At this time, D', r2 and P-values will be estimated. The current version calculates LD between haplotypes with known phase only (genotypes are not supported; see PowerMarker or Arlequin for genotype support). D' is the standardized disequilibrium coefficient, a useful statistic for determining whether recombination or homoplasy has occurred between a pair of alleles. r 2 represents the correlation between alleles at two loci, which is informative for evaluating the resolution of association approaches. D' and r2 can be calculated (Weir 1996) when only two alleles are present. If multiple alleles are present, a weighted average of D' or r2 is calculated between the two loci (Farnir 2000). This weighted average is determined by calculating D' or r2 for all possible combinations of alleles, and then weighting them according to the allele's frequency. Note: It is not entirely certain that this procedure fully accounts for allele number effects. P-values are determined by two methods. If only two alleles are present at both loci, then a two-sided Fisher's Exact test is calculated. NOTE: Previous editions of TASSEL used a one-sided test, but TASSEL version and later use a two-sided test. If more than two alleles are present, permutations are used to calculate the proportion of permuted gamete distributions that are less probable then the observed gamete distribution under the null hypothesis of independence (Weir 1996). When calculating linkage disequilibrium, users have the option of employing Rapid Permutations. If this option is selected, the algorithm will compute either a fixed number of permutations or run until 10 permutations are found that are more significant than the observed P-value. While this slightly reduces P- values, it also saves a large amount of computational time. If an unbiased p-value is desired, then the user must unselect the Rapid Permutations check box. 25

32 Linkage disequilibrium results can be plotted using Results LD Plot or viewed in a table via (Results Table). Farnir, F., Coppieters, W., Arranz, J.-J., Berzi, P., Cambisano, N., Grisart, B., Karim, L., Marcq, F., Moreau, L., Mni, M., Nezer, C., Simon, P., Vanmanshoven, P., Wagenaar, D. & Georges, M. (2000). Extensive Genome-wide Linkage Disequilibrium in Cattle. Genome Res 10, Cladogram This function generates a tree or cladogram data set. TASSEL produces neighbor-joining trees using only simple parsimony substitution models. To retrieve cladogram data, first select genotypic data from the Data Tree panel and then click on the Analysis button, followed by the Cladogram button. The resulting tree data and the corresponding matrix will appear as separate data sets on the Data Tree panel. Results can be plotted using Results Tree Plot. 3.4 Kinship Matrix The function generates a kinship matrix from a set of random SNPs. To do so, first highlight SNP data then click on the Analysis button, followed by the Kin button. The resulting kinship data will be added as a data set on the Data Tree panel. 26

33 When a genotype file is selected, the kinship matrix is generated by converting the distance matrix calculated from TASSEL s Cladogram function to a similarity matrix by subtracting all values from 2. Kinship can be derived from a set of random SNP data (a minimum of several hundred SNPs spread over the whole genome is recommended). Users may also load their own kinship data using Data Kin. Kinship matrices can be calculated using the SPAGeDi sortware package ( A kinship matrix can also be calculated based on percent shared alleles (Stich et al. 2008) 3.5 General Linear Model This function performs association analysis by a least squares fixed effects linear model. TASSEL utilizes a fixed effects linear model built by the user to test for association between segregating sites and phenotypes, while also accounting for population structure. To establish the model, the user must first define the input data types so that the software knows how to treat each input variable. In the Input Data Definition window, users will find two columns marked Input and Type. Each row in the Input column represents data from the phenotypic data set. A corresponding drop down box in the Type column allows users to indicate whether that row contains data, a covariate, or a factor (classification variable), or whether it should be excluded from the analysis. It is important that each variable be correctly identified in order to obtain valid results. A separate analysis will be performed for each trait. If the box is checked for Analyze each column of data separately, then each column will be analyzed independently of the others. For example, if the same trait is analyzed in three separate environments, each environment will be analyzed separately rather than combined into a single analysis. Population structure can be incorporated into the model by using covariates that indicate percent contribution to each taxon by specific populations. These must be identified as covariates not as data. If population structure percents sum to 100% for each taxon, one of the populations should be excluded from the analysis in order to obtain valid F-tests of the population 27

34 covariates. If the covariates are linearly dependent, then the F-test for each of them will be zero because the F-test for each covariate is calculated after fitting all the others. Each item in the Data Headers section represents a header in the input file. Data imported from GDPC will have only two header rows: trait and environment. Data imported from flat files can contain additional headers. These additional headers may indicate additional levels of nesting, such as locations and reps within locations. The first header row is assumed to be trait and the second is assigned a label of env (for environment), which can be changed by the user. The user must type in a label for any additional headers or they will not be utilized in the analysis. Pressing OK on the Input Data Definition window opens the GLM Dialog. This dialog box allows the user to select the effects to be used in the model: To specify main effects, select one or more input fields and then click the > button. To specify an interaction effect, select the factors involved in the Available Factors list then click the * button. Hold down CTRL to make multiple selections. To specify a nested effect, include it as part of an interaction but not as a separate main effect. To remove an effect from the model, click <. To add or remove all traits click >> or << The next step will be either to run the model or to modify the default output to be generated by the analysis. Clicking the button labeled Define Output displays the F-tests that will be included in the output as well as any effect estimates to be generated. F-tests are defined by selecting one factor for the hypothesis and one for the error term, then clicking Add F-Test. A permutation test for the significance of an F-test will be calculated if the user enters the number of permutations in the Ftest table. A 28

35 permutation test can only be requested when the residual is used as the error term. Additionally, means will be calculated for a selected hypothesis by clicking the add LSMean button. When data for taxa are replicated, the proper error term for testing markers is not the residual but taxa nested within markers. This term can be specified in the model by including marker:taxa but not including taxa as a main effect. Alternatively, the analysis can be run using the taxa means rather than the replicated data. In fact, taxa means must be used in order to specify a permutation test because permutations will not be run if the error term is marker:taxa. The reason is that if permutations are run using anything except the model residual as the error term, the analysis takes much longer. Calculating the taxa means then using that data set for the permutation test of markers gives approximately the same results and saves a lot of computation time. Means for taxa can be calculated by selecting the phenotype data set and using GLM to generate a taxa means data set, which can then be combined with genotype and population data as usual. In addition, GLM can estimate effect means using a Fusion data set to provide greater flexibility in subsetting the data. When data is replicated across environments, a good scenario to follow is to use GLM without permutations to test markers and marker:environment interactions. When marker:environment interaction is significant, the user should consider analyzing each environment separately. Taxa means can then be 29

36 used to test markers across locations if a permutation test is desired. In the presence of marker:environment interaction, care should be taken in the interpretation of tests of markers across locations. Taxa means should be calculated without population covariates or markers in the model. Population covariates could then be included in the subsequent analysis of marker association based on the taxa means. Background: The association test computes a least squares solution (SEARLE 1987) to the fixed effects linear model constructed in the GLM dialog. The software uses a Σ-restricted model, which leads to a marker SS that is a good test of marker effects even when data is unbalanced. Data is unbalanced when there are an unequal number of observations for marker states or for other main effect levels. For each effect in the model, the sum of squares is computed after fitting all other model effects. In the case of missing data (no values for some marker by main effects combinations), the resulting F-test for markers is not exact in the presence of significant interaction effects and care must be taken in interpreting the results. In general, one must exercise care in interpreting tests of main effects when interactions are significant. The primary purpose of the permutation test (CHURCHILL 1994) is to provide a test of significance that corresponds to the experiment-wise error in order to correct for the fact that multiple comparisons are being made. The output lists both a site-wise and an experiment-wise permutation test. After each permutation, the p-values for each site are recalculated. For the site-wise value, the new p-value for each site is compared to the original p-value for that site, which was calculated from the unpermuted data. For the experiment-wise value, the minimum p-value across all hypotheses from that permutation is compared to the original p-value for each site. The value reported in the output table is the percentage of instances in which the permutation p-value is lower than the original p-value. The site-wise value should be very close to the p-value from the original F-test for normally distributed data. The experiment-wise value reflects the experiment-wise error rate, correcting for multiple comparisons, and is the best value to use in making decisions about significance of marker effects. Churchill and Doerge recommend using at least 1000 permutations for a.05 level of significance. Permutations are carried out for simple models containing only a single main effect by randomly shuffling the data across all observational units. For more complex models, the residuals from the reduced model are shuffled and added back to the individual observation estimates. The reduced model is the model which contains all the effects except the one being tested (Anderson and Ter Braak 2003). Description of Output: The following table shows an example of an output table from a simple analysis: In addition to displaying the F-statistics and p-values for the requested F-tests, the table also contains information about degrees of freedom, the error mean square for the model, R-square of the model, and R- square for the marker. The model R-square is the portion of total variation explained by the full model. 30

37 The marker R-square is the portion of total variation explained by the marker but not by the other terms in the model. When permutations are requested, #perm_marker is the number of permutations run, p- perm_marker is a test of individual markers, and p-adj_marker is the marker p-value adjusted for multiple tests. The p-adj_marker value is a permutation test derived using a step-down MinP procedure (Ge et al. 2003) and controls the family-wise error rate (FWER). For example, if only markers with p-adj values of.05 or less are accepted as significant, then the probability of rejecting a single true null hypothesis across the entire set of hypotheses is held to.05 or less. This test takes dependence between hypotheses into account and does not assume that hypotheses are independent as do other multiple test correction procedures. 3.6 Mixed Linear Model This conducts association analysis via a mixed linear model (MLM). Background: Complex traits are usually controlled by multiple QTL. The primary goal of QTL mapping is to find associated markers for each QTL. One common approach of association studies is to fit markers into a linear model as fixed effects. Any QTL not linked to the markers contribute to the residual error, thus inflating the error term and reducing statistical power. Further, if the remaining QTL are not randomly distributed across the tested markers, they can cause spurious associations between phenotypes and the tested markers. These false associations can be partially corrected with a structured association method that uses a Q-matrix of population membership estimates, and further reduced by incorporating multiple background QTLs as a random effect in a mixed model based on the premise that the genome of each individual or line is a sample of its population s gene pool. The average relationship between individuals or lines can be estimated by kinship (K) calculated from either pedigree or a suitable amount of random markers across the entire genome. The Q + K method, which combines information from both Q and K, was shown to be superior to more conventional linear models in association analyses (YU et al. 2006). The Q + K method has been implemented in TASSEL as a MLM (mixed linear model) function. The statistical model can be described in Henderson s notation (HENDERSON 1975) as follows: y = Xβ + Zu + e where y is the vector of observations; β is an unknown vector containing fixed effects, including genetic marker and population structure (Q); u is an unknown vector of random additive genetic effects from multiple background QTL for individuals/lines; X and Z are the known design matrices; and e is the unobserved vector of random residual. The u and e vectors are assumed to be normally distributed with null mean and variance of Var u = G 0 e 0 R where G = σ 2 a K with σ 2 a as the additive genetic variance and K as the kinship matrix. Homogeneous variance was assumed for residual effect which means R=I σ 2 e, where σ 2 e is residual variance. The REML estimate of σ 2 a and σ 2 e were obtained through the expectation and maximization algorithm (LAIRD and WARE 1982). Disadvantages of using a mixed linear model The number of equations to be solved in a mixed linear model is usually two or more orders of magnitude greater than with a GLM. In addition, the solution to each of these equations is achieved by means of iteration. Consequently, a mixed linear model requires much more computational time. 31

38 MLM provides the user with a few different options for solving mixed models. These options are available on the Heritability dialog box. The user can choose a method named P3D to estimate heritability, can estimate the variance components and heritability separately for each marker, or can use a previously determined heritability. P3D uses the mixed model specified by the user but without a marker effect to estimate the variance components, σ 2 a and σ 2 g. Tests of significance for each marker are calculated using those initial estimates. When a heritability value is supplied by the user, the variance components can be calculated directly from the heritability without iteration. In addition, the user can specify an analysis method. The available methods are EM and EMMA. The EM method uses an expectation-maximization algorithm to derive a restricted maximum likelihood (REML) estimate of the variance components. The EMMA method uses a Java implementation of the efficient mixed-model association algorithm (KANG et al. 2008) Using the P3D method or using a prior heritability can result in a substantial time savings with the EM method compared to calculating heritability for each marker. The same two methods do not provide much time savings when using the EMMA method. In the Parameters dialog that follows this screen, the EMMA method does not use the Aitken Step Size parameter or maximum number of iterations. 32

39 3.7 Structured Association Test Association analysis by logistic regression. This function will carry out associations between segregating sites and phenotypes, while accounting for population structure. No experiment-wise error correction is provided. Because biallelic genotypic data is used, any rare, multi-allelic states will be fused. A logistic regression ratio test is used to evaluate associations involving quantitative traits while controlling for population structure (Thornsberry, 2001). In the null hypothesis H0, candidate polymorphisms are independent of phenotype, while in the alternative hypothesis H1, candidate polymorphisms are associated with the phenotype. The probability of each hypothesis is compared in the following way: Pr1 ( C; T, Qˆ) Λ = Pr ( C; Qˆ) 0 where C is the genotype of the candidate polymorphism for all lines, and T is the trait value for all lines. In this test, the difference in the natural logarithm likelihoods of the model with (Pr1) and without (Pr0) the trait is the test statistic Λ. Since the distribution of Λ is not known precisely, permutations should be used to determine significance. If several sites with high LD are being scored, then the maximum Λ over all sites is used as the test statistic Λmax. Permutations are calculated based on this Λmax statistic. Permutations to determine significance: These statistical tests will result in a p-value associated with each polymorphism-trait pair. However, for many association tests there will be tens or hundreds of polymorphisms to test. The normal modification for multiple tests would be a Bonferroni correction, but this is far too conservative for highly correlated polymorphisms. The goal of permutations is to determine the number of independent tests, which is confounded by LD, and account for non-normality in trait distributions. The trait values should be permuted relative to the fixed haplotypes (CHURCHILL 1994), and then associations recalculated for 100 to 1000 permutations. (PRITCHARD et al. 2000) suggests permutations based on population structure, and this approach is implemented by TASSEL. To run a structured association test, select a dataset that contains phenotype, polymorphisms, and population structure information and then click on Analysis Struct. Assoc. to open the menu below. Use the arrow keys to move estimates of population structure into the Population Structure Estimates list. To ensure a quality significance estimate, make sure the permutation box has been checked and enter an appropriate number of permutations. 33

40 Two different sets of results sets are produced from this analysis. The first, LogRat_Perm_Trait_Gene, provides summary statistics for the locus. These results can be explored using a table or a 2-D Plot. The second result set is LogRat_Trait_Gene. This provides results for individual sites by trait and environment. Users are encouraged to consider two key columns: (1) DiffLL the difference in log likelihoods for the two models above (H0 and H1), and (2) PermutedP the p-value based on the permutations described above. The remaining statistics describe the fit of each model for that specific site. The first line of this table provides information on the very best site-trait-environment combination (NOTE: This may be eliminated in future versions of TASSEL, as this information is captured in the LogRat_Perm_Trait_Gene result table). The permuted p-value for this first site is the locus-wise p-value, equivalent to Site_Statistic_Over_Environment_And_Trait described above (this accounts for multiple testing issues). The remaining p-values are for a specific site, and DO NOT ACCOUNT FOR MULTIPLE TEST ISSUES. 34

41 3.8 Manual SNP Extractor Manually extracts a SNP from raw gene sequences. The extracted SNP is added under Results SNP Data in the Data Tree panel. To append additional SNPs to the results, hold down the CTRL key while selecting both the raw gene sequence and the extracted SNP data set to which you wish to add. Then click SNP Extr. In the SNP Extractor Dialog, enter the base number in What SNP? For example, if a polymorphism exists at base 95, then type in 95. Continuing this example, the Surrounding Bases feature works as follows: if the user types a 5 in the box, then the SNP will be extracted along with bases 90 to 100. Two format options are available for output: Format 1: GGTAT Y TCTAC Format 2: GGTAT[C/T]TCTAC The resulting data can be viewed or exported using a table. 3.9 Bulk SNP Extraction Extracts SNPs from a raw sequence alignment into a useful format for export. Additionally, this function provides information for designing genotyping assays. Below is a detailed explanation of the SNP Extractor Dialog: 35

42 Minimum Site Frequency: the minimum frequency for which the site must have a good base Minimum SNP Frequency: the minimum frequency of the minority polymorphisms for the site to be included in the resulting data set Minimum Surrounding Bases: the minimum number of good bases on at least one side of the SNP Minimum Good SBE Bases: the minimum number of good bases on at least one side of SNP Filter SNPs to Biallelic: converts tertiary and rarer states to missing data (? ), thereby forcing sites to have only two types of segregating sites at any particular locus. This helps to remove bad sequence effects. Results are displayed on the Data Tree panel and include SNPs along with their context in two different formats (IUPAC-IUB ambiguous nucleotide and another see Appendix for more information). Additional information is also provided, including: the location of the nearest polymorphisms on either side, polymorphism information content ( PIC ), and Haplotype PIC. Overall score is essentially an estimate of the ability to design a single-base pair extension reaction in the region. These results can be exported by using a table (Results Table). 36

43 4 Results Mode Results mode consists of the following options: 4.1 Table Allows data to be displayed in a spreadsheet view and exported into a flat file. To create a table, select a data set from the Data Tree panel, then click on the Results button followed by the Table button (Results Table). Shown below is an example in which diversity estimates are displayed. Data can be sorted by clicking on the column header of interest. A secondary sort can be done by holding down the CTRL key and clicking on a second column. Data can be exported to flat files that are either comma-separated (Comma Separated Values = CSV) or tab-delimited. Both these formats can then be imported into a spreadsheet program such as Excel. Tables can also be printed. 37

44 4.2 LD Plot Displays the results of linkage disequilibrium analysis. After selecting the desired result from the Data Tree panel, click on the LD Plot button while in Results mode (Results LD Plot). The graph that is generated displays LD between all possible pairs of sites. The black diagonal represents LD between each site and itself. The default setting graphs r 2 in the upper right and p-values in the lower left. This default can be modified by clicking on the radio buttons in the lower left. The left side of the graph contains a text description of the gene (or chromosome) and the site within the gene (or genetic position within the chromosome). At the bottom of the graph is a display of the position of each site along the gene or chromosome. This display can be hidden by deselecting the Schematic checkbox. Legends describing the color scheme appear on the right hand side of the graph. LD plots can be printed, saved in JPEG format, or saved as a Scalable Vector Graphics (SVG) file. An SVG file is useful for creating publication quality graphics which can be easily sized using an editor such as Adobe Illustrator, Corel Draw, and OpenOffice.org Draw

45 4.3 Tree Plot Displays the results of cladogram analysis. After running Analysis Cladogram, select the desired dataset and then click Tree Plot in the Results mode (Results Tree Plot). Trees can be visualized in either a Normal or Circular layout. These image can be printed, saved in JPEG format, or saved as a Scalable Vector Graphics (SVG) file. 4.4 Chart Method for visualizing numeric data. This feature can be used to display histograms, XY plots, bar charts and/or pie charts. Any numeric table data can be charted, including LD results, phenotypic data, diversity results, and association results. Histograms: Use the graph type combo box to select the desired graph type (Histogram) from the list of options. Up to two different series of data can be plotted together. Users may specify the number of bins to be used in the histogram. 39

46 Scatter plots: Use the graph type combo box to select the desired graph type (XY Plot) from the list of options. Select data to be plotted in X and Y axes using the appropriate drop down boxes. If two data series are plotted simultaneously on the Y axis, the 2 Y Axes checkbox will provide an axis for each. 40

47 4.5 2D Plot Displays 2D plots and determines color thresholds. This function is useful for plotting associations in multiple environments. Note: This view is limited to Struct Assoc results, as neither GLM nor MLM report results by environment. First select the desired result set from a Structured Association analysis. Using the drop down boxes provided, populate rows with Environment, columns with Site, and value with PermuteP. The cutoff value for coloring can be chosen either by inputting a value in the text box or by using the slider tool to the right of the text box. Users can mouse over any box to view the value associated with that box, as shown here: If P-value coloring is desired, simply check the P-value box as shown below: By checking the P-value box, Cutoff selection tools will be disabled and fields will instead be colored according to the following grayscale: 41

48 This key can be shown by clicking on the? icon next to the P-value check box. 4.6 Gene Plot Allows for gene annotation to be displayed visually. Note: Although this feature is designed for imported data sets, TASSEL does not currently support a method for importing annotated data sets. For ideas and/or programming collaboration, TASSEL source code is available on SourceForge. 4.7 Visual Genotype Sorts and displays genotypes using visual color coding from SNP data. This feature provides a visual abstraction of a genotype in which each SNP is represented by a colored box, thus simplifying identification of major and minor variants. To create a visual genotype, first select a SNP dataset (as generated from a sequence alignment by Data Sites in Section 4.1.7). Then click on VGType (in Results mode). Users can double-click on a box to move that particular taxa to the top row. All rows that share the same variant will then be grouped immediately below it. Another useful feature allows users to pause the mouse icon over any box to reveal specific taxa/sequence information for that particular box (see image below). 42

49 Checking the Color by Reference box will make the selected taxa (the top row) a single color and will then sort and color SNPs of the remaining taxa relative to the top row. Thus, taxa in the bottom rows share the least in common with the selected taxa/top row. As seen above, double-clicking on a box that is already positioned in the top row will make the actual SNP variant visible (verses re-ordering the entire grid). The Refresh button will refresh the view after selecting/deselecting Color by Reference and will also resort taxa by alphabetical order. This graphic can be exported as an SVG file. 4.8 Haplotype Viewer Creates a haplotype network view using SNP or sequence data. This feature allows viewers to create a minimum spanning tree that depicts the relationships (lines) between distinct entities (nodes or boxes). Each line between nodes represents the number of changes between entities. Each node contains the number of taxa that share a given gene variant. To access this feature, first select a raw gene sequence alignment or SNP dataset (as generated from a sequence alignment by Data Sites in Section 4.1.7). While in Results mode, click on HapView. If the resulting network tends to migrate across the screen over time, click the Freeze button to force the network to be static. By clicking and dragging, nodes can be moved to more suitable locations. The Blank button toggles the node numbers. As seen below, users can view the gene variant as well as all taxa that share it by clicking on a node while holding down the CTRL key. 43

50 44

51 5 Tutorial This tutorial reviews several common scenarios for the use of TASSEL, helping the user better understand its capabilities for data manipulation and association analyses. The TASSEL software package includes a tutorial dataset that can be downloaded from the TASSEL website (please unzip all files to a directory of your choice). This tutorial dataset contains real, raw data on phenotype, genotype, population structure and kinship. The results generated by TASSEL are included, as well as results generated by SAS procedure Mixed so that users can compare output for association analysis via a mixed linear model. File details are described in the Appendix. The Tassel 2.1 has user interface as shown 5.1 Importing Data from flat files Flat files are the most common type used for importing data into TASSEL. Flat files should be in the form of Tab-delimited text files. Files can be prepared using the instructions provided earlier in this manual or by following the examples in the TASSEL tutorial dataset. On Data panel, click File button to display File Loader dialog box as follows. At this time we use the default (Guess) function to load following files from the tutorial dataset. 45

52 These files will be displayed in the following sections SNP data 46

53 5.1.2 Phenotype data Population structure data Kinship data 47

54 5.1.5 Annotated Alignment This format can not be Guess correctly. You have to selection the Load annotated alignment (custom) radio option to load the demonstration file: TestAnnotatedAlignment.txt. The file is displayed in TASSEL as follows. 5.2 Importing data from a database (via GDPC) GDPC, middleware that is integrated into TASSEL, allows the user to import data from a database. To display GDPC in TASSEL, click on the GDPC button in Data mode. General rules for working with databases include: 1. First establish a connection with the database. 2. Then define a query. 3. Finally, once the desired data is in GDPC, load the data from GDPC into TASSEL Connecting with a database To establish a connection with a database, click the Add Conn button followed by the button of the database you wish to add. Then click Ok. In the example below, we chose Panzea. 48

55 The information about Panzea is displayed as follows: To connect to more than one database, simply repeat the process outlined above. In the following figures, only the GDPC area will be displayed if other areas are deemed irrelevant Data query GDPC is equipped with several tabs to query data, namely Taxa, Taxon Parents, Loci, Genotype Experiments, Environment Experiment, and Localities. Within each tab, any retrieved data will be displayed in the Filtered List. Choose interested attributes by checking the desired boxes (located beneath the Filtered List). These selections subsequently reveals the available values by which to filter data. Here, using the Taxa tab, we chose Germplasm type (field) and then selected Inbred (value for filtering). After clicking the Get Data button, the subset of taxa from the database that meets these criteria are now included in both the Filtered List and the Working List. 49

56 Items listed in the Working List can be modified by the user. To do so, first break the link between the Filtered List and the Working List by clicking on the Link/Unlink button. The button will now appear as. This activates the Add selected items, Add all items, Remove and Remove all buttons. Here we first removed all items from the working list, then selected items with a name starting with the letter D. Clicking on the Add selected items button moved them to the Working List. The Working List is shown as follows. 50

57 To filter data by polymorphism type, first click on the Genotype Experiments tab, check the Polymorphism Type and Producer checkbox (field), and then select SNP and Jim Holland (value for filtering). Finally, click the Get Data button to reveal the subset of data that meets these criteria. Results for this examples are shown below: Now genotype data can be extracted from the database by clicking on the Genotypes tab, followed by the Get Data button. After a moment, genotype data will be displayed as follows: Users can either save this genotype data in several forms or upload it to TASSEL. However, before outlining these procedures, let us finish the query by exploring phenotypes. Supposing the user would like to acquire data from experiments conducted in 2000, we first selected the Environment Experiments tab, followed by the Repetition checkbox. We then selected the desired repetitions in 2000 as the values 51

58 to be used for filtering, and clicked the Get Data button. The subset of data that meets these criteria is returned as follows: Now the user can extract phenotype data by first clicking on the Phenotypes tab. Traits can only be extracted one at a time. Here we chose Days to Silk listed in the Ontology field. Make sure no Taxa are selected and all Environment Experiments are selected that were retrieved in the previous step. Then click the Get Data button. Then click the Merge button, leaving only Accession checked under the Taxa Properties section. Leave Locality and Repetition checked under the Environment Experiments Properties section. Data are merged as follows: Importing GDPC data into TASSEL Genotype and phenotype data must be loaded in separate steps. To load genotype data, first click on GDPC in Data mode. Then click on the Genotypes tab, followed by the Load button. The genotype data 52

59 is then loaded into TASSEL and labeled as Genotype. To view the uploaded data, click on Genotype within the Data folder in TASSEL. Results will look as follows: To load phenotype data from GDPC into TASSEL, first click on the GDPC button in Data mode. Then choose the Phenotypes tab, followed by the Load button. The phenotype data is then loaded into TASSEL and labeled as 4 traits/environ. To view the uploaded data, select 4 traits/environ from the Phenotypes folder in TASSEL. Results will appear as follows: 53

60 5.2.4 Saving GDPC query results All query results, including both genotype and phenotype queries, can be saved as either Tab-delimited text files or XML files. Results are exported as tab-delimited text files by first choosing the Query Tab and then clicking on the Export button, or by clicking the Save As button to save results in XML format. Location and file name must be specified in both situations. Data in XML format can be imported back into GDPC by clicking on the Open button. 5.3 Importing/Exporting data to/from a data tree The TASSEL tutorial data set includes a file named demodata.zip. This file is a binary file that stores a loaded data tree for each type of data (Gene, Polymorphism, Phenotype, Population Structure and Kinship). To import the saved data tree, first click on the File menu, and then click Open to locate the demodata.zip file. Finally, click the OK button to import the data tree. Warning: The current data tree will be replaced with the imported data tree. To save a current data tree, use the File menu to select Save data tree as and then specify a location and name for the saved tree. 5.4 Data manipulation Filtering data While in Data mode, select the data set dwarf8 and then click on the Sites button. A Filter Alignment Dialog will open. Enter the following data by typing in the text boxes: Minimum count of 69 and Minimum freq of 0.05 (5%). Then select Remove minor SNP states. 54

61 Now click the Filter button. This will add a file named Point in the Data Tree panel Joining datasets In Data mode, select the datasets you wish to join while holding down the CTRL key. For this tutorial,select three_traits_rn_rename and popstructure_taxa286_rn_rename and then click on the U Join button. This adds the dataset three_traits_rn_rename + popstructure_taxa286_rn_rename to the Data Tree panel, as shown below. 55

62 5.4.3 Missing value imputation The phenotype file three_traits.txt will be used to demonstrate the process of imputing missing data. Note that the dataset below contains missing values (NaN). To impute missing data, first select the 3 traits/environ dataset in the Data Tree panel and then click the Transform button (Data Transform). The Transform Column Data window will open. Click on the Impute tab in this window. Finally, click on the Create Dataset button to add the new dataset with missing values imputed to the data tree. Note that missing values are now filled. 56

63 5.4.4 Principal component analysis (PCA) Because PCA requires no missing values, data must be thoroughly checked and any missing values imputed. Here we use the dataset from three_traits_rn_rename_3_imputed file with missing values imputed (three_traits_rn_rename). First select the desired dataset from the Data Tree panel and then click on the Transform button (Data Transform). When the Transform Column Data window appears, click on the PCA tab. The window will now look as follows: 57

64 Clicking on the Create Dataset button will add three datasets to the Data Tree panel, one each for principal components, eigenvectors and eigenvalues. 58

65 5.5 Using principal components as structure It was shown that principal components had the equivalent effect as population structure estimated through the maximum likelihood approach (Zhao et al. 2007). Principal components can be quickly derived from a set of random markers. Here we use the 553 random marker (Yu et al 2006) for demonstration. First we convert the character data (in format of ATGCN) to numerical data by extracting the major allele and collapsing the non major alleles. Highlight the data and click Transform button to bring next windows: 59

66 After click the Create Dataset button, a new dataset named d8coding_rn_rename_collasped was created. Please notice that a NaN value was filled for missing SNP data. To use the complete data, we impute the missing data before the principal component analysis. Therefore, select the data set of d8coding_rn_rename_collasped, click Transform button to bring next windows: Chose the Impute panel and click Create Dataset button to fill in the missing data as following: 60

67 Now we are ready to create principal components. First we highlight the imputed data, then click Transform button again and select PCA panel. We choose two principal components as following: After the Create Dataset button is clicked, the PAC result is created, including eigen value, eigen vector and principal components. The first two principal components contribute 9.8% of total variation. They can be used as population structure in association analysis. 61

68 Association analysis using GLM We use following files from the Tutorial dataset to perform association study by using general linear mode (GLM). Phenotype: three_traits_rn_rename.txt Marker: d8coding_rn_rename.txt Population structure: popstructure_taxa286_rn_rename.txt First we extract SNP from the sequence file (d8coding_rn_rename.txt) by highlighting it and then click Sites button on Data panel to show following dialog box. After the Filter button is clicked, a dataset name Point will be added to data tree. 62

69 63

70 Now we need to join phenotype, population structure and genotype (Point) dataset together. First is to select these data from the data tree then click ANF Join button. A dataset is added to data tree with following name. three_traits_rn_rename + popstructure_taxa286_rn_rename + Point Then click on the dataset from the Data Tree panel, followed by the GLM button. The Input Data Definition dialog box will open. Use the drop down lists under Input Data Types to change Q1 and Q2 to Covariate and Q3 to Exclude as pictured in the screen shot below. Then click OK. Note that the Data Header labels have been filled in by the application. Header 1 is always the trait name. Header 2 is labeled Env (environment) and can be changed by the user. Additional headers, if they exist, must be assigned label names by the user. The Build a Linear Model dialog box will appear. Based on the factors chosen in the Data Definition dialog, TASSEL creates a default model, which appears in the Model box. To change which effects are included in the model, use the > or >> buttons to add selected factors or all factors from the Available Factors list. Use the < or << buttons to remove selected factors or all factors from the model. Select factors in the Available Factors list then click * to add an interaction term to the model. For this example, do not make any changes to the default model. Click Define Output. 64

71 The Define Output dialog box appears. TASSEL provides a default F-test for markers using the model residual as the error. That is appropriate in this case. Type 1000 in the # permutations column of the Define F-Tests table to run permutations permutations will estimate p-values of 0.05 or higher with reasonable accuracy. For estimating lower p-values use more permutations. Optionally, additional F-tests can be added by selected a hypothesis and error term then clicking Add F-Test. To create a report of marker mean estimates, select Marker in the hypothesis list and click Add LSMean. The dialog box should now appear as shown below. Click Run. The results GLM_Point + 3 traits/environ + 3 element Q and Lsmeans_Marker_Point + 3 traits/environ + 3 element Q will be added to the Data Tree panel under Association. Viewing Results - Table: To view the results of this association analysis, click on the Results button in the upper left of the screen. Select the association result GLM_ three_traits_rn_rename + popstructure_taxa286_rn_rename + Point and then click Table. F_marker is the F-test calculated as MS_marker / MS_error. p-value is probability of a larger F based on the F-distribution. #-permutations is the number of permutations run. p-perm_marker is the permutation based test for marker significance individual markers. padj_marker is the experiment-wise p-value for marker significance that controls the error-rate for the entire set of hypotheses. Rsq Model is the fraction of the total variation explained by the full model. Rsq_Marker is the fraction of the total variation explained by the marker after fitting the other model effects. 65

72 5.6 Association analysis using MLM Here we illustrate association mapping by MLM with the following model: Trait = marker effect + population structure + polygene effect + residual To begin, phenotype data, SNP data, population structure and kinship are loaded into TASSEL from the following files: Phenotype: three_traits_rn_rename.txt Marker: d8coding_rn_rename.txt Population structure: popstructure_taxa286_rn_rename.txt Kinship: kinship_277taxa_by_spagedi_rn_sq_rename.txt After extraction of SNP from the sequence file, we join it with phenotype, and population structure dataset. Caution: Do not join kinship. Select the joint dataset and kinship dataset, click MLM button in Analysis mode. The Input data definition dialog box will pop-up. Use the drop down lists under Input Data Types to keep dpoll as Data and change stiff-stalk and nonstiff to Covariate and exclude the others as seen in the screen shot below. Then Click OK. 66

73 After the OK button is clicked, the following dialog box is shown to list all the available factors (on left side). These factors can be displayed or removed from on the right side (factors in the model) by clicking the buttons in the middle. After the Run button is clicked, the Heritability dialog box is shown for user to choose different options. 67

74 In case the Use prior heritability given above is not checked, the following dialog box is shown for defining the parameters used for iterations. The computation will be finished after a few seconds for this example. The computing time manly depends on number of taxas, number of traits and markers. Association mapping results and estimates of variance components will be put on the Datatree shown below in the same order 68

75 The traits with squares mean none converge. 5.7 Working with genotype data For the purposes of this tutorial, we created a demonstration dataset to guide the user through scenarios involving genotype data. The dataset includes SSR genotype data (diploid_ssr.txt), phenotype data (diploid_traits.txt) and population structure data (diploid_stru.txt). These files are loaded into TASSEL as follows: 69

76 The example below demonstrates the application of GLM for association analysis between SSR genotypes and phenotypes. The first step involves using the Genotype Converter to transfer genotypes from raw form into genotypic state. To do this, first click on the Genotype button (Data Genotype) and then click on the Genotype Converter button. In the Genotype Converter window, activate the Create alignment based on genotypic state (eg. A:a>Aa) radio button as follows: Clicking OK will add the dataset GenoStates to the Data Tree panel. 70

77 Now GenoStates can be joined with phenotype and structure datasets. To continue with the analysis, switch from Data mode into Analysis mode by clicking on the Analysis button in the Options panel. Then click on the GLM button to show the following dialog box. In the Input Data Definition window, use the drop down boxes to include all the phenotypic traits, take Q1 and Q2 as covariates and exclude Q3. 71

78 Then click the OK button to define the model as follows: Click the Run button to add the association results to the Data Tree. 72

79 5.8 Data visualization TASSEL is equipped with many visualization tools so that the user can better interpret raw data, manipulated data and analysis results D Plot To create a 2D Plot, first click on the Results button in the Options panel. Select LogisticRat_5T_18 under Association from the Data Tree panel and then click on the 2D plot button. In the 2-D chart dialogue box that opens, use the drop down lists to select sources for Row, Column and Value. Then activate the check boxes next to Row, Column and Value. This example used Row-Environment, Column-Site, and Value-PermuteP to get the graph below: XY Plot To create an XY plot, click on the Analysis button in the upper left of the screen. Select d8coding_rn_rename (raw gene) in the Data Tree panel and then click on the Diversity button. In 73

80 the Diversity Surveys dialog box, activate the Sliding Window checkbox. Clicking the Run button will add two datasets to the Data Tree panel under Diversity. To visualize these results in an XY plot, click on the Results button in the Options panel. Select Diversity: d8coding_rn_rename from the Data Tree panel and then click on the Chart button. Using the Graph Type drop down list, select XY Plot. Then select Pi for the Y1 axis, Theta for the Y2 axis, and Site for the X axis. A Site vs. Pi and Theta graph will be generated as shown below. 74

81 5.8.3 LD Plot To create an LD plot, first click on the Analysis button in the Option panel. Select Point from the Data Tree panel (under Genes ) and then click the Link. Diseq button. Click Run in the dialog box that appears. The result LD: Point will be added to the Data Tree panel under LD. To visualize these results using a LD plot, click on the Results button in the Options panel. Select LD: Point from the Data Tree panel and then click on the LD Plot button. The following graph is displayed. 75

Genetic Analysis. Page 1

Genetic Analysis. Page 1 Genetic Analysis Page 1 Genetic Analysis Objectives: 1) Set up Case-Control Association analysis and the Basic Genetics Workflow 2) Use JMP tools to interact with and explore results 3) Learn advanced

More information

Step-by-Step Guide to Basic Genetic Analysis

Step-by-Step Guide to Basic Genetic Analysis Step-by-Step Guide to Basic Genetic Analysis Page 1 Introduction This document shows you how to clean up your genetic data, assess its statistical properties and perform simple analyses such as case-control

More information

Step-by-Step Guide to Advanced Genetic Analysis

Step-by-Step Guide to Advanced Genetic Analysis Step-by-Step Guide to Advanced Genetic Analysis Page 1 Introduction In the previous document, 1 we covered the standard genetic analyses available in JMP Genomics. Here, we cover the more advanced options

More information

SOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie

SOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie SOLOMON: Parentage Analysis 1 Corresponding author: Mark Christie christim@science.oregonstate.edu SOLOMON: Parentage Analysis 2 Table of Contents: Installing SOLOMON on Windows/Linux Pg. 3 Installing

More information

Step-by-Step Guide to Relatedness and Association Mapping Contents

Step-by-Step Guide to Relatedness and Association Mapping Contents Step-by-Step Guide to Relatedness and Association Mapping Contents OBJECTIVES... 2 INTRODUCTION... 2 RELATEDNESS MEASURES... 2 POPULATION STRUCTURE... 6 Q-K ASSOCIATION ANALYSIS... 10 K MATRIX COMPRESSION...

More information

Estimating Variance Components in MMAP

Estimating Variance Components in MMAP Last update: 6/1/2014 Estimating Variance Components in MMAP MMAP implements routines to estimate variance components within the mixed model. These estimates can be used for likelihood ratio tests to compare

More information

Release Notes. JMP Genomics. Version 4.0

Release Notes. JMP Genomics. Version 4.0 JMP Genomics Version 4.0 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP. A Business Unit of SAS SAS Campus Drive

More information

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017 BICF Nano Course: GWAS GWAS Workflow Development using PLINK Julia Kozlitina Julia.Kozlitina@UTSouthwestern.edu April 28, 2017 Getting started Open the Terminal (Search -> Applications -> Terminal), and

More information

Chapter 6: Linear Model Selection and Regularization

Chapter 6: Linear Model Selection and Regularization Chapter 6: Linear Model Selection and Regularization As p (the number of predictors) comes close to or exceeds n (the sample size) standard linear regression is faced with problems. The variance of the

More information

User Manual ixora: Exact haplotype inferencing and trait association

User Manual ixora: Exact haplotype inferencing and trait association User Manual ixora: Exact haplotype inferencing and trait association June 27, 2013 Contents 1 ixora: Exact haplotype inferencing and trait association 2 1.1 Introduction.............................. 2

More information

MLSTest Tutorial Contents

MLSTest Tutorial Contents MLSTest Tutorial Contents About MLSTest... 2 Installing MLSTest... 2 Loading Data... 3 Main window... 4 DATA Menu... 5 View, modify and export your alignments... 6 Alignment>viewer... 6 Alignment> export...

More information

The Power and Sample Size Application

The Power and Sample Size Application Chapter 72 The Power and Sample Size Application Contents Overview: PSS Application.................................. 6148 SAS Power and Sample Size............................... 6148 Getting Started:

More information

fasta2genotype.py Version 1.10 Written for Python Available on request from the author 2017 Paul Maier

fasta2genotype.py Version 1.10 Written for Python Available on request from the author 2017 Paul Maier 1 fasta2genotype.py Version 1.10 Written for Python 2.7.10 Available on request from the author 2017 Paul Maier This program takes a fasta file listing all sequence haplotypes of all individuals at all

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

JMP Genomics. Release Notes. Version 6.0

JMP Genomics. Release Notes. Version 6.0 JMP Genomics Version 6.0 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP, A Business Unit of SAS SAS Campus Drive

More information

Genetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland

Genetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland Genetic Programming Charles Chilaka Department of Computational Science Memorial University of Newfoundland Class Project for Bio 4241 March 27, 2014 Charles Chilaka (MUN) Genetic algorithms and programming

More information

GMDR User Manual. GMDR software Beta 0.9. Updated March 2011

GMDR User Manual. GMDR software Beta 0.9. Updated March 2011 GMDR User Manual GMDR software Beta 0.9 Updated March 2011 1 As an open source project, the source code of GMDR is published and made available to the public, enabling anyone to copy, modify and redistribute

More information

QTX. Tutorial for. by Kim M.Chmielewicz Kenneth F. Manly. Software for genetic mapping of Mendelian markers and quantitative trait loci.

QTX. Tutorial for. by Kim M.Chmielewicz Kenneth F. Manly. Software for genetic mapping of Mendelian markers and quantitative trait loci. Tutorial for QTX by Kim M.Chmielewicz Kenneth F. Manly Software for genetic mapping of Mendelian markers and quantitative trait loci. Available in versions for Mac OS and Microsoft Windows. revised for

More information

STEM. Short Time-series Expression Miner (v1.1) User Manual

STEM. Short Time-series Expression Miner (v1.1) User Manual STEM Short Time-series Expression Miner (v1.1) User Manual Jason Ernst (jernst@cs.cmu.edu) Ziv Bar-Joseph Center for Automated Learning and Discovery School of Computer Science Carnegie Mellon University

More information

Overview. Background. Locating quantitative trait loci (QTL)

Overview. Background. Locating quantitative trait loci (QTL) Overview Implementation of robust methods for locating quantitative trait loci in R Introduction to QTL mapping Andreas Baierl and Andreas Futschik Institute of Statistics and Decision Support Systems

More information

GBS Bioinformatics Pipeline(s) Overview

GBS Bioinformatics Pipeline(s) Overview GBS Bioinformatics Pipeline(s) Overview Getting from sequence files to genotypes. Pipeline Coding: Ed Buckler Jeff Glaubitz James Harriman Presentation: Terry Casstevens With supporting information from

More information

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS ABSTRACT Paper 1938-2018 Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS Robert M. Lucas, Robert M. Lucas Consulting, Fort Collins, CO, USA There is confusion

More information

Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.4 Graphical User Interface (GUI) Manual

Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.4 Graphical User Interface (GUI) Manual Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.4 Graphical User Interface (GUI) Manual Department of Epidemiology and Biostatistics Wolstein Research Building 2103 Cornell Rd Case Western

More information

STATS PAD USER MANUAL

STATS PAD USER MANUAL STATS PAD USER MANUAL For Version 2.0 Manual Version 2.0 1 Table of Contents Basic Navigation! 3 Settings! 7 Entering Data! 7 Sharing Data! 8 Managing Files! 10 Running Tests! 11 Interpreting Output! 11

More information

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression analysis. Analysis of Variance: one way classification,

More information

Documentation for BayesAss 1.3

Documentation for BayesAss 1.3 Documentation for BayesAss 1.3 Program Description BayesAss is a program that estimates recent migration rates between populations using MCMC. It also estimates each individual s immigrant ancestry, the

More information

Lecture 7 QTL Mapping

Lecture 7 QTL Mapping Lecture 7 QTL Mapping Lucia Gutierrez lecture notes Tucson Winter Institute. 1 Problem QUANTITATIVE TRAIT: Phenotypic traits of poligenic effect (coded by many genes with usually small effect each one)

More information

An introduction to SPSS

An introduction to SPSS An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible

More information

KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li

KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual Miao-Xin Li, Jiang Li Department of Psychiatry Centre for Genomic Sciences Department

More information

Intro to NGS Tutorial

Intro to NGS Tutorial Intro to NGS Tutorial Release 8.6.0 Golden Helix, Inc. October 31, 2016 Contents 1. Overview 2 2. Import Variants and Quality Fields 3 3. Quality Filters 10 Generate Alternate Read Ratio.........................................

More information

Tutorial: Resequencing Analysis using Tracks

Tutorial: Resequencing Analysis using Tracks : Resequencing Analysis using Tracks September 20, 2013 CLC bio Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 Fax: +45 86 20 12 22 www.clcbio.com support@clcbio.com : Resequencing

More information

Enterprise Miner Tutorial Notes 2 1

Enterprise Miner Tutorial Notes 2 1 Enterprise Miner Tutorial Notes 2 1 ECT7110 E-Commerce Data Mining Techniques Tutorial 2 How to Join Table in Enterprise Miner e.g. we need to join the following two tables: Join1 Join 2 ID Name Gender

More information

StatCalc User Manual. Version 9 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved.

StatCalc User Manual. Version 9 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved. StatCalc User Manual Version 9 for Mac and Windows Copyright 2018, AcaStat Software. All rights Reserved. http://www.acastat.com Table of Contents Introduction... 4 Getting Help... 4 Uninstalling StatCalc...

More information

Breeding View A visual tool for running analytical pipelines User Guide Darren Murray, Roger Payne & Zhengzheng Zhang VSN International Ltd

Breeding View A visual tool for running analytical pipelines User Guide Darren Murray, Roger Payne & Zhengzheng Zhang VSN International Ltd Breeding View A visual tool for running analytical pipelines User Guide Darren Murray, Roger Payne & Zhengzheng Zhang VSN International Ltd January 2015 1. Introduction The Breeding View is a visual tool

More information

Also, for all analyses, two other files are produced upon program completion.

Also, for all analyses, two other files are produced upon program completion. MIXOR for Windows Overview MIXOR is a program that provides estimates for mixed-effects ordinal (and binary) regression models. This model can be used for analysis of clustered or longitudinal (i.e., 2-level)

More information

MACAU User Manual. Xiang Zhou. March 15, 2017

MACAU User Manual. Xiang Zhou. March 15, 2017 MACAU User Manual Xiang Zhou March 15, 2017 Contents 1 Introduction 2 1.1 What is MACAU...................................... 2 1.2 How to Cite MACAU................................... 2 1.3 The Model.........................................

More information

Importing and Merging Data Tutorial

Importing and Merging Data Tutorial Importing and Merging Data Tutorial Release 1.0 Golden Helix, Inc. February 17, 2012 Contents 1. Overview 2 2. Import Pedigree Data 4 3. Import Phenotypic Data 6 4. Import Genetic Data 8 5. Import and

More information

Notes on QTL Cartographer

Notes on QTL Cartographer Notes on QTL Cartographer Introduction QTL Cartographer is a suite of programs for mapping quantitative trait loci (QTLs) onto a genetic linkage map. The programs use linear regression, interval mapping

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

Product Catalog. AcaStat. Software

Product Catalog. AcaStat. Software Product Catalog AcaStat Software AcaStat AcaStat is an inexpensive and easy-to-use data analysis tool. Easily create data files or import data from spreadsheets or delimited text files. Run crosstabulations,

More information

CHAPTER 2. GENERAL PROGRAM STRUCTURE

CHAPTER 2. GENERAL PROGRAM STRUCTURE CHAPTER 2. GENERAL PROGRAM STRUCTURE Windows Latent GOLD contains a main window called the Viewer. Viewer. When you estimate a model, all statistical results, tables and plots are displayed in the Viewer.

More information

Package GWAF. March 12, 2015

Package GWAF. March 12, 2015 Type Package Package GWAF March 12, 2015 Title Genome-Wide Association/Interaction Analysis and Rare Variant Analysis with Family Data Version 2.2 Date 2015-03-12 Author Ming-Huei Chen

More information

QTL Analysis with QGene Tutorial

QTL Analysis with QGene Tutorial QTL Analysis with QGene Tutorial Phillip McClean 1. Getting the software. The first step is to download and install the QGene software. It can be obtained from the following WWW site: http://qgene.org

More information

Math 227 EXCEL / MEGASTAT Guide

Math 227 EXCEL / MEGASTAT Guide Math 227 EXCEL / MEGASTAT Guide Introduction Introduction: Ch2: Frequency Distributions and Graphs Construct Frequency Distributions and various types of graphs: Histograms, Polygons, Pie Charts, Stem-and-Leaf

More information

SAS/STAT 13.1 User s Guide. The NESTED Procedure

SAS/STAT 13.1 User s Guide. The NESTED Procedure SAS/STAT 13.1 User s Guide The NESTED Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute

More information

Introduction to the workbook and spreadsheet

Introduction to the workbook and spreadsheet Excel Tutorial To make the most of this tutorial I suggest you follow through it while sitting in front of a computer with Microsoft Excel running. This will allow you to try things out as you follow along.

More information

- 1 - Fig. A5.1 Missing value analysis dialog box

- 1 - Fig. A5.1 Missing value analysis dialog box WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation

More information

Intermediate SAS: Statistics

Intermediate SAS: Statistics Intermediate SAS: Statistics OIT TSS 293-4444 oithelp@mail.wvu.edu oit.wvu.edu/training/classmat/sas/ Table of Contents Procedures... 2 Two-sample t-test:... 2 Paired differences t-test:... 2 Chi Square

More information

Lesson 3: Building a Market Basket Scenario (Intermediate Data Mining Tutorial)

Lesson 3: Building a Market Basket Scenario (Intermediate Data Mining Tutorial) From this diagram, you can see that the aggregated mining model preserves the overall range and trends in values while minimizing the fluctuations in the individual data series. Conclusion You have learned

More information

24 - TEAMWORK... 1 HOW DOES MAXQDA SUPPORT TEAMWORK?... 1 TRANSFER A MAXQDA PROJECT TO OTHER TEAM MEMBERS... 2

24 - TEAMWORK... 1 HOW DOES MAXQDA SUPPORT TEAMWORK?... 1 TRANSFER A MAXQDA PROJECT TO OTHER TEAM MEMBERS... 2 24 - Teamwork Contents 24 - TEAMWORK... 1 HOW DOES MAXQDA SUPPORT TEAMWORK?... 1 TRANSFER A MAXQDA PROJECT TO OTHER TEAM MEMBERS... 2 Sharing projects that include external files... 3 TRANSFER CODED SEGMENTS,

More information

The NESTED Procedure (Chapter)

The NESTED Procedure (Chapter) SAS/STAT 9.3 User s Guide The NESTED Procedure (Chapter) SAS Documentation This document is an individual chapter from SAS/STAT 9.3 User s Guide. The correct bibliographic citation for the complete manual

More information

Overview and Implementation of the GBS Pipeline. Qi Sun Computational Biology Service Unit Cornell University

Overview and Implementation of the GBS Pipeline. Qi Sun Computational Biology Service Unit Cornell University Overview and Implementation of the GBS Pipeline Qi Sun Computational Biology Service Unit Cornell University Overview of the Data Analysis Strategy Genotyping by Sequencing (GBS) ApeKI site (GCWGC) ( )

More information

SAS/STAT 13.1 User s Guide. The Power and Sample Size Application

SAS/STAT 13.1 User s Guide. The Power and Sample Size Application SAS/STAT 13.1 User s Guide The Power and Sample Size Application This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as

More information

CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT

CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT This chapter provides step by step instructions on how to define and estimate each of the three types of LC models (Cluster, DFactor or Regression) and also

More information

EDIT202 Spreadsheet Lab Prep Sheet

EDIT202 Spreadsheet Lab Prep Sheet EDIT202 Spreadsheet Lab Prep Sheet While it is clear to see how a spreadsheet may be used in a classroom to aid a teacher in marking (as your lab will clearly indicate), it should be noted that spreadsheets

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SPSS SPSS (originally Statistical Package for the Social Sciences ) is a commercial statistical software package with an easy-to-use

More information

SNP HiTLink Manual. Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1

SNP HiTLink Manual. Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1 SNP HiTLink Manual Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1 1 Department of Neurology, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan 2 Dynacom Co., Ltd, Kanagawa,

More information

Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018

Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018 Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018 Contents Introduction... 1 Start DIONE... 2 Load Data... 3 Missing Values... 5 Explore Data... 6 One Variable... 6 Two Variables... 7 All

More information

MIRING: Minimum Information for Reporting Immunogenomic NGS Genotyping. Data Standards Hackathon for NGS HACKATHON 1.0 Bethesda, MD September

MIRING: Minimum Information for Reporting Immunogenomic NGS Genotyping. Data Standards Hackathon for NGS HACKATHON 1.0 Bethesda, MD September MIRING: Minimum Information for Reporting Immunogenomic NGS Genotyping Data Standards Hackathon for NGS HACKATHON 1.0 Bethesda, MD September 27 2014 Static Dynamic Static Minimum Information for Reporting

More information

Breeding Guide. Customer Services PHENOME-NETWORKS 4Ben Gurion Street, 74032, Nes-Ziona, Israel

Breeding Guide. Customer Services PHENOME-NETWORKS 4Ben Gurion Street, 74032, Nes-Ziona, Israel Breeding Guide Customer Services PHENOME-NETWORKS 4Ben Gurion Street, 74032, Nes-Ziona, Israel www.phenome-netwoks.com Contents PHENOME ONE - INTRODUCTION... 3 THE PHENOME ONE LAYOUT... 4 THE JOBS ICON...

More information

Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM

Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM * Which directories are used for input files and output files? See menu-item "Options" and page 22 in the manual.

More information

m6aviewer Version Documentation

m6aviewer Version Documentation m6aviewer Version 1.6.0 Documentation Contents 1. About 2. Requirements 3. Launching m6aviewer 4. Running Time Estimates 5. Basic Peak Calling 6. Running Modes 7. Multiple Samples/Sample Replicates 8.

More information

Lab 4: Multiple Sequence Alignment (MSA)

Lab 4: Multiple Sequence Alignment (MSA) Lab 4: Multiple Sequence Alignment (MSA) The objective of this lab is to become familiar with the features of several multiple alignment and visualization tools, including the data input and output, basic

More information

Fathom Dynamic Data TM Version 2 Specifications

Fathom Dynamic Data TM Version 2 Specifications Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other

More information

Genetic type 1 Error Calculator (GEC)

Genetic type 1 Error Calculator (GEC) Genetic type 1 Error Calculator (GEC) (Version 0.2) User Manual Miao-Xin Li Department of Psychiatry and State Key Laboratory for Cognitive and Brain Sciences; the Centre for Reproduction, Development

More information

Gegenees genome format...7. Gegenees comparisons...8 Creating a fragmented all-all comparison...9 The alignment The analysis...

Gegenees genome format...7. Gegenees comparisons...8 Creating a fragmented all-all comparison...9 The alignment The analysis... User Manual: Gegenees V 1.1.0 What is Gegenees?...1 Version system:...2 What's new...2 Installation:...2 Perspectives...4 The workspace...4 The local database...6 Populate the local database...7 Gegenees

More information

Spreadsheet definition: Starting a New Excel Worksheet: Navigating Through an Excel Worksheet

Spreadsheet definition: Starting a New Excel Worksheet: Navigating Through an Excel Worksheet Copyright 1 99 Spreadsheet definition: A spreadsheet stores and manipulates data that lends itself to being stored in a table type format (e.g. Accounts, Science Experiments, Mathematical Trends, Statistics,

More information

MIXREG for Windows. Overview

MIXREG for Windows. Overview MIXREG for Windows Overview MIXREG is a program that provides estimates for a mixed-effects regression model (MRM) including autocorrelated errors. This model can be used for analysis of unbalanced longitudinal

More information

Wilson Leung 01/03/2018 An Introduction to NCBI BLAST. Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment

Wilson Leung 01/03/2018 An Introduction to NCBI BLAST. Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment An Introduction to NCBI BLAST Prerequisites: Detecting and Interpreting Genetic Homology: Lecture Notes on Alignment Resources: The BLAST web server is available at https://blast.ncbi.nlm.nih.gov/blast.cgi

More information

Graphical Analysis of Data using Microsoft Excel [2016 Version]

Graphical Analysis of Data using Microsoft Excel [2016 Version] Graphical Analysis of Data using Microsoft Excel [2016 Version] Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters.

More information

Lab #9: ANOVA and TUKEY tests

Lab #9: ANOVA and TUKEY tests Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for

More information

Introduction to GDS. Stephanie Gogarten. July 18, 2018

Introduction to GDS. Stephanie Gogarten. July 18, 2018 Introduction to GDS Stephanie Gogarten July 18, 2018 Genomic Data Structure CoreArray (C++ library) designed for large-scale data management of genome-wide variants data format (GDS) to store multiple

More information

To finish the current project and start a new project. File Open a text data

To finish the current project and start a new project. File Open a text data GGEbiplot version 5 In addition to being the most complete, most powerful, and most user-friendly software package for biplot analysis, GGEbiplot also has powerful components for on-the-fly data manipulation,

More information

Vector Xpression 3. Speed Tutorial: III. Creating a Script for Automating Normalization of Data

Vector Xpression 3. Speed Tutorial: III. Creating a Script for Automating Normalization of Data Vector Xpression 3 Speed Tutorial: III. Creating a Script for Automating Normalization of Data Table of Contents Table of Contents...1 Important: Please Read...1 Opening Data in Raw Data Viewer...2 Creating

More information

BEAGLECALL 1.0. Brian L. Browning Department of Medicine Division of Medical Genetics University of Washington. 15 November 2010

BEAGLECALL 1.0. Brian L. Browning Department of Medicine Division of Medical Genetics University of Washington. 15 November 2010 BEAGLECALL 1.0 Brian L. Browning Department of Medicine Division of Medical Genetics University of Washington 15 November 2010 BEAGLECALL 1.0 P a g e i Contents 1 Introduction... 1 1.1 Citing BEAGLECALL...

More information

Spotter Documentation Version 0.5, Released 4/12/2010

Spotter Documentation Version 0.5, Released 4/12/2010 Spotter Documentation Version 0.5, Released 4/12/2010 Purpose Spotter is a program for delineating an association signal from a genome wide association study using features such as recombination rates,

More information

An Interactive GUI Front-End for a Credit Scoring Modeling System by Jeffrey Morrison, Futian Shi, and Timothy Lee

An Interactive GUI Front-End for a Credit Scoring Modeling System by Jeffrey Morrison, Futian Shi, and Timothy Lee An Interactive GUI Front-End for a Credit Scoring Modeling System by Jeffrey Morrison, Futian Shi, and Timothy Lee Abstract The need for statistical modeling has been on the rise in recent years. Banks,

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

Week 6, Week 7 and Week 8 Analyses of Variance

Week 6, Week 7 and Week 8 Analyses of Variance Week 6, Week 7 and Week 8 Analyses of Variance Robyn Crook - 2008 In the next few weeks we will look at analyses of variance. This is an information-heavy handout so take your time reading it, and don

More information

QUICKTEST user guide

QUICKTEST user guide QUICKTEST user guide Toby Johnson Zoltán Kutalik December 11, 2008 for quicktest version 0.94 Copyright c 2008 Toby Johnson and Zoltán Kutalik Permission is granted to copy, distribute and/or modify this

More information

An Interactive GUI Front-End for a Credit Scoring Modeling System

An Interactive GUI Front-End for a Credit Scoring Modeling System Paper 6 An Interactive GUI Front-End for a Credit Scoring Modeling System Jeffrey Morrison, Futian Shi, and Timothy Lee Knowledge Sciences & Analytics, Equifax Credit Information Services, Inc. Abstract

More information

Package EBglmnet. January 30, 2016

Package EBglmnet. January 30, 2016 Type Package Package EBglmnet January 30, 2016 Title Empirical Bayesian Lasso and Elastic Net Methods for Generalized Linear Models Version 4.1 Date 2016-01-15 Author Anhui Huang, Dianting Liu Maintainer

More information

LEA: An R Package for Landscape and Ecological Association Studies

LEA: An R Package for Landscape and Ecological Association Studies LEA: An R Package for Landscape and Ecological Association Studies Eric Frichot and Olivier François Université Grenoble-Alpes, Centre National de la Recherche Scientifique, TIMC-IMAG UMR 5525, Grenoble,

More information

freebayes in depth: model, filtering, and walkthrough Erik Garrison Wellcome Trust Sanger of Iowa May 19, 2015

freebayes in depth: model, filtering, and walkthrough Erik Garrison Wellcome Trust Sanger of Iowa May 19, 2015 freebayes in depth: model, filtering, and walkthrough Erik Garrison Wellcome Trust Sanger Institute @University of Iowa May 19, 2015 Overview 1. Primary filtering: Bayesian callers 2. Post-call filtering:

More information

Chapter 13 Multivariate Techniques. Chapter Table of Contents

Chapter 13 Multivariate Techniques. Chapter Table of Contents Chapter 13 Multivariate Techniques Chapter Table of Contents Introduction...279 Principal Components Analysis...280 Canonical Correlation...289 References...298 278 Chapter 13. Multivariate Techniques

More information

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Online Learning Centre Technology Step-by-Step - Minitab Minitab is a statistical software application originally created

More information

Exploring Data. This guide describes the facilities in SPM to gain initial insights about a dataset by viewing and generating descriptive statistics.

Exploring Data. This guide describes the facilities in SPM to gain initial insights about a dataset by viewing and generating descriptive statistics. This guide describes the facilities in SPM to gain initial insights about a dataset by viewing and generating descriptive statistics. 2018 by Minitab Inc. All rights reserved. Minitab, SPM, SPM Salford

More information

Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D.

Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D. Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D. Introduction to Minitab The interface for Minitab is very user-friendly, with a spreadsheet orientation. When you first launch Minitab, you will see

More information

Polymorphism and Variant Analysis Lab

Polymorphism and Variant Analysis Lab Polymorphism and Variant Analysis Lab Arian Avalos PowerPoint by Casey Hanson Polymorphism and Variant Analysis Matt Hudson 2018 1 Exercise In this exercise, we will do the following:. 1. Gain familiarity

More information

wgmlst typing in the Brucella demonstration database

wgmlst typing in the Brucella demonstration database BioNumerics Tutorial: wgmlst typing in the Brucella demonstration database 1 Introduction This guide is designed for users to explore the wgmlst functionality present in BioNumerics without having to create

More information

PLNT4610 BIOINFORMATICS FINAL EXAMINATION

PLNT4610 BIOINFORMATICS FINAL EXAMINATION 9:00 to 11:00 Friday December 6, 2013 PLNT4610 BIOINFORMATICS FINAL EXAMINATION Answer any combination of questions totalling to exactly 100 points. The questions on the exam sheet total to 120 points.

More information

High dimensional data analysis

High dimensional data analysis High dimensional data analysis Cavan Reilly October 24, 2018 Table of contents Data mining Random forests Missing data Logic regression Multivariate adaptive regression splines Data mining Data mining

More information

HORIZONTAL GENE TRANSFER DETECTION

HORIZONTAL GENE TRANSFER DETECTION HORIZONTAL GENE TRANSFER DETECTION Sequenzanalyse und Genomik (Modul 10-202-2207) Alejandro Nabor Lozada-Chávez Before start, the user must create a new folder or directory (WORKING DIRECTORY) for all

More information

jackstraw: Statistical Inference using Latent Variables

jackstraw: Statistical Inference using Latent Variables jackstraw: Statistical Inference using Latent Variables Neo Christopher Chung August 7, 2018 1 Introduction This is a vignette for the jackstraw package, which performs association tests between variables

More information

A comprehensive modelling framework and a multiple-imputation approach to haplotypic analysis of unrelated individuals

A comprehensive modelling framework and a multiple-imputation approach to haplotypic analysis of unrelated individuals A comprehensive modelling framework and a multiple-imputation approach to haplotypic analysis of unrelated individuals GUI Release v1.0.2: User Manual January 2009 If you find this software useful, please

More information

PROC QTL A SAS PROCEDURE FOR MAPPING QUANTITATIVE TRAIT LOCI (QTLS)

PROC QTL A SAS PROCEDURE FOR MAPPING QUANTITATIVE TRAIT LOCI (QTLS) A SAS PROCEDURE FOR MAPPING QUANTITATIVE TRAIT LOCI (QTLS) S.V. Amitha Charu Rama Mithra NRCPB, Pusa campus New Delhi - 110 012 mithra.sevanthi@rediffmail.com Genes that control the genetic variation of

More information

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Examples: Mixture Modeling With Cross-Sectional Data CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Mixture modeling refers to modeling with categorical latent variables that represent

More information

Asset Arena InvestOne

Asset Arena InvestOne Asset Arena InvestOne 1 21 AD HOC REPORTING 21.1 OVERVIEW Ad Hoc reporting supports a range of functionality from quick querying of data to more advanced features: publishing reports with complex features

More information

Mendel and His Peas Investigating Monhybrid Crosses Using the Graphing Calculator

Mendel and His Peas Investigating Monhybrid Crosses Using the Graphing Calculator 20 Investigating Monhybrid Crosses Using the Graphing Calculator This activity will use the graphing calculator s random number generator to simulate the production of gametes in a monohybrid cross. The

More information

Learning Worksheet Fundamentals

Learning Worksheet Fundamentals 1.1 LESSON 1 Learning Worksheet Fundamentals After completing this lesson, you will be able to: Create a workbook. Create a workbook from a template. Understand Microsoft Excel window elements. Select

More information