Download PLINK from

Size: px
Start display at page:

Download "Download PLINK from"

Transcription

1 PLINK tutorial Amended from two tutorials that the PLINK author Shaun Purcell wrote, see and 'Teaching materials and example dataset' at Download PLINK from In this tutorial, we will use PLINK to analyse some real and some example large-scale SNP data, to give a demonstration of what the program can do (e.g. data management, summary statistics and basic association analysis). What we do today might not be particularly realistic or accurate, but I hope it gives an idea of what PLINK is capable of! EXAMPLE DATASETS AND SOFTWARE 1. approximately 80,000 autosomal SNPs from the 89 Asian HapMap individuals (Han Chinese from Beijing and Japanese from Tokyo). A phenotype has been simulated based on the genotype at one SNP. Download from 2. a bigger file with all ~250,000 SNPs for 90 Asian HapMap individuals (Han Chinese from Beijing and Japanese from Tokyo), along with the simulated disease phenotype. In addition, a small subset of SNPs (N=29) genotyped on the same individuals represent a "follow-up genotyping" exercise are included, as well as a file with population membership (Chinese or Japanese). Download from 3. a file of 771 SNPs genotyped on a captive zebra finch pedigree CARDINAL RULES & CAVEATS When using PLINK there are a few key points to remember. Always consult the LOG file (console output) PLINK has no memory o each run loads data anew, previous filters lost Exact syntax and spelling is very important o minus minus

2 Not every option can be combined with every other option o For example, basic haplotype tests cannot take covariates o PLINK doesn t always warn you o LOG file often shows what has happened (or not) Consult the web documentation ( GETTING STARTED PLINK is a command line program, so we need to operate in a command line window. All commands involve typing plink at the command prompt (e.g. DOS window or Unix terminal) followed by a number of options (all starting with --option) to specify the data files / methods to be used. All results are written to files with various extensions. Putting your files in the same directory as the "plink.exe" file will let you do all analysis from this directory. Navigate to the plink directory, and to check you are in the correct folder, type plink and press enter, which should start PLINK and generate some output describing the program. If you get an error message, you are in the wrong directory. For most options, PLINK needs two plain text files: 1) a file with family ID, individual ID, father ID, mother ID, sex (1=male, 2 = female, other = unknown), phenotype and genotypes in columns for each individual, with the extension.ped. This file has NO HEADER so looks like this: FAM A A G G A C FAM A A A G 0 0 2) a file with marker information, including chromosome number, the SNP name, the position in morgans on the chromosome, and the position in base pairs on the chromosome, with the extension.map. Again, this file has NO HEADER so looks like this: 1 rs rs rs rs HINT! It is easiest if these files have the same name (e.g. sheep.ped and sheep.map). There are lots of options to change the format of these files (for example, you can provide genotypes as "AG" instead of "A G") - see the information on the PLINK website.

3 HINT! PLINK is designed for humans - so unless you tell it otherwise, it will assume your genome has 23 chromosomes, with chr23 being the X chromosome! Use the option --dog for up to 39 chromosomes 1. HAPMAP1 data Download the example data from and unzip the contents into your plink folder. A phenotype was simulated, so that a single SNP (rs ) should be associated with the 'disease'. The files are hapmap1.ped Genotype data for 83,000 SNPs on 90 individuals hapmap1.map Map file for these SNPs pop.phe Population membership coding (coded 1=CH / 2=JP) qt.phe Quantitative phenotype - we won't use this one Just typing plink and specifying a file with no further options is a good way to check that the file is intact, and to get some basic summary statistics about the file. plink --file hapmap1 The --file option takes a single parameter, the input file name, and will look for two files: a.ped file and a.map file with the name hapmap1 (i.e. hapmap1.ped and hapmap1.map). The above command should generate something like the following output in the console window. It will also save this information to a file called PLINK! v0.99l 27/Jul/ (C) 2006 Shaun Purcell, GNU General Public License, v Web-based version check ( --noweb to skip ) Connecting to web OK, v0.99l is current *** Pre-Release Testing Version *** Writing this text to log file [ plink.log ] Analysis started: Mon Jul 31 09:00: Options in effect: --file hapmap (of 83534) markers to be included from [ hapmap1.map ] 89 individuals read from [ hapmap1.ped ] 89 individuals with nonmissing phenotypes Assuming a binary trait (1=unaff, 2=aff, 0=miss)

4 Missing phenotype value is also -9 Before frequency and genotyping pruning, there are SNPs Applying filters (SNP-major mode) 89 founders and 0 non-founders found 0 of 89 individuals removed for low genotyping ( MIND > 0.1 ) 859 SNPs failed missingness test ( GENO > 0.1 ) SNPs failed frequency test ( MAF < 0.01 ) After frequency and genotyping pruning, there are SNPs Analysis finished: Mon Jul 31 09:00: The information contained here can be summarized as follows: A banner showing copyright information and the version number -- the web-based version check shows that this is an up-to-date version of PLINK and displays a message that v0.99l is a pre-release testing version. A message indicating that the log file will be saved in plink.log. The name of the output file can be changed with the --out option -- e.g. specifying --out anal1 will generate a log file called anal1.log instead. A list of the command options specified is given next: in this case it is only a single option, --file hapmap1. By keeping track of log files, and naming each analysis with its own --outname, it makes it easier to keep track of when and how the different output files were generated. Next is some information on the number of markers and individuals read from the MAP and PED file. In total, just over 80,000 SNPs were read in from the MAP file. It is written "83534 (of 83534)" because some SNPs might be excluded (by making the physical position a negative number in the MAP file), in which case the first number would indicate how many SNPs are included. In this case, all SNPs are read in from the PED file. We also see that 89 individuals were read in from the PED file, and that all these individuals had valid phenotype information. Next, PLINK tells us that the phenotype is an affection status variable, as opposed to a quantitative trait, and lets us know what the missing values are. The next stage is the filtering stage -- individuals and/or SNPs are removed on the basis of thresholds. Please see this page for more information on setting thresholds. In this case we see that no individuals were removed, but almost 20,000 SNPs were removed, based on missingness (859) and frequency (16994). This particularly high proportion of removed SNPs is based on the fact that these are random HapMap SNPs in the Chinese and Japanese samples, rather than preselected markers on a whole-genome association product: there will be many more rare and monomorphic markers here than one would normally expect. Finally, a line is given that indicates when this analysis finished. You can see that it took 8 seconds (on my machine at least) to read in the file and apply the filters. If other analyses had been requested, then the other output files generated would have been indicated in the log file. HINT! All output files that PLINK generates have the same format: root.extension where root is, by default, "plink" but can be changed with the --out option, and the extension will depend on the type of output file it is. Making a binary PED file The first thing we will do is to make a binary PED file. This more compact representation of the data saves space and speeds up subsequent analysis. To make a binary PED file, use the following command.

5 plink --file hapmap1 --make-bed --out hapmap1 If it runs correctly on your machine, you should see the following in your output: above as before Before frequency and genotyping pruning, there are SNPs Applying filters (SNP-major mode) 89 founders and 0 non-founders found 0 SNPs failed missingness test ( GENO > 1 ) 0 SNPs failed frequency test ( MAF < 0 ) After frequency and genotyping pruning, there are SNPs Writing pedigree information to [ hapmap1.fam ] Writing map (extended format) information to [ hapmap1.bim ] Writing genotype bitfile to [ hapmap1.bed ] Using (default) SNP-major mode Analysis finished: Mon Jul 31 09:10: There are several things to note: When using the --make-bed option, the threshold filters for missing rates and allele frequency were automatically set to exclude nobody. Although these filters can be specified manually (using --mind, --geno and --maf) to exclude people, this default tends to be wanted when creating a new PED or binary PED file. The commands --extract / --exclude and --keep /-- remove can also be applied at this stage. Three files are created with this command -- the binary file that contains the raw genotype data hapmap1.bed but also a revsied map file hapmap1.bim which contains two extra columns that give the allele names for each SNP, and hapmap1.fam which is just the first six columns of hapmap1.ped. You can view the.bim and.fam files -- but do not try to view the.bed file. None of these three files should be manually editted. If, for example, you wanted to create a new file that only includes individuals with high genotyping (at least 95% complete), you would run: plink --file hapmap1 --make-bed --mind out highgeno which would create files highgeno.bed highgeno.bim highgeno.fam Working with the binary PED file To specify that the input data are in binary format, as opposed to the normal text PED/MAP format, just use the --bfile option instead of --file. To repeat the first command we ran (which just loads the data and prints some basic summary statistics): plink --bfile hapmap1 Writing this text to log file [ plink.log ] Analysis started: Mon Jul 31 09:12:

6 Options in effect: --bfile hapmap1 Reading map (extended format) from [ hapmap1.bim ] markers to be included from [ hapmap1.bim ] Reading pedigree information from [ hapmap1.fam ] 89 individuals read from [ hapmap1.fam ] 89 individuals with nonmissing phenotypes Reading genotype bitfile from [ hapmap1.bed ] Detected that binary PED file is v1.00 SNP-major mode Before frequency and genotyping pruning, there are SNPs Applying filters (SNP-major mode) 89 founders and 0 non-founders found 0 of 89 individuals removed for low genotyping ( MIND > 0.1 ) 859 SNPs failed missingness test ( GENO > 0.1 ) SNPs failed frequency test ( MAF < 0.01 ) After frequency and genotyping pruning, there are SNPs Analysis finished: Mon Jul 31 09:12: The things to note here: That three files hapmap1.bim, hapmap1.fam and hapmap1.bed were loaded instead of the usual two files. That is, hapmap1.ped and hapmap1.map are not used in this analysis, and could in fact be deleted now. The data are loaded in much more quickly -- based on the timestamp at the beginning and end of the log output, this took 2 seconds instead of 10. Summary statistics: missing rates Next, we shall generate some simple summary statistics on rates of missing data in the file, using the -- missing option: plink --bfile hapmap1 --missing --out miss_stat which should generate the following output: 0 of 89 individuals removed for low genotyping ( MIND > 0.1 ) Writing individual missingness information to [ miss_stat.imiss ] Writing locus missingness information to [ miss_stat.lmiss ] Here we see that no individuals were removed for low genotypes (MIND > 0.1 implies that we accept people with less than 10 percent missingness). The per individual and per SNP (after excluding individuals on the basis of low genotyping) rates are then output to the files miss_stat.imiss and miss_stat.lmiss respectively. If we had not specified an -- out option, the root output filename would have defaulted to "plink". These output files are standard, plain text files that can be viewed in any text editor, pager, spreadsheet or statistics package (albeit one that can handle large files). Taking a look at the filemiss_stat.lmiss, for example using the more command which is present on most systems: more miss_stat.lmiss

7 we see CHR SNP N_MISS F_MISS 1 rs rs rs rs rs rs rs rs rs HINT! To exit from more, type 'q' to quit That is, for each SNP, we see the number of missing individuals (N_MISS) and the proportion of individuals missing (F_MISS). Similarly: more miss_stat.imiss we see FID IID MISS_PHENO N_MISS F_MISS HCB181 1 N HCB182 1 N HCB183 1 N HCB184 1 N HCB185 1 N HCB186 1 N HCB187 1 N The final column is the actual genotyping rate for that individual -- we see the genotyping rate is very high here. HINT If you are using a spreadsheet package that can only display a limited number of rows (some popular packages can handle just over 65,000 rows) then it might be desirable to ask PLINKto analyse the data by chromosome, using the --chr option. For example, to perform the above analysis for chromosome 1: plink --bfile hapmap1 --chr 1 --out res1 --missing then for chromosome 2: plink --bfile hapmap1 --chr 2 --out res2 --missing and so on.

8 Summary statistics: allele frequencies Next we perform a similar analysis, except requesting allele frequencies instead of genotyping rates. The following command generates a file called freq_stat.frq which contains the minor allele frequency and allele codes for each SNP. plink --bfile hapmap1 --freq --out freq_stat It is also possible to perform this frequency analysis (and the missingness analysis) stratified by a categorical, cluster variable. In this case, we shall use the file that indicates whether the individual is from the Chinese or the Japanese sample, pop.phe. This cluster file contains three columns; each row is an individual. The format is described more fully in the main documentation. To perform a stratified analysis, use the --within option. plink --bfile hapmap1 --freq --within pop.phe --out freq_stat The output will now indicate that a file called freq_stat.frq.strat. has been generated instead of freq_stat.frq. If we view this file: more freq_stat.frq.strat we see each row is now the allele frequency for each SNP stratifed by subpopulation: CHR SNP CLST A1 A2 MAF 1 rs rs rs rs rs rs rs rs Here we see that each SNP is represented twice - the CLST column indicates whether the frequency is from the Chinese or Japanese populations, coded as per the pop.phe file. If you were just interested in a specific SNP, and wanted to know what the frequency was in the two populations, you can use the --snp option to select this SNP: plink --bfile hapmap1 --snp rs freq --within pop.phe --out snp1_frq_stat would generate a file snp1_frq_stat.frq.strat containing only the population-specific frequencies for this single SNP. You can also specify a range of SNPs by adding the --window kb option or using the options -- from and --to, following each with a different SNP (they must be in the correct order and be on the same chromosome). Basic association analysis Let's now perform a basic association analysis on the disease trait for all single SNPs. The basic command is plink --bfile hapmap1 --assoc --out as1

9 which generates an output file as1.assoc which contains the following fields CHR SNP A1 F_A F_U A2 CHISQ P OR 1 rs rs rs rs rs rs rs rs rs where each row is a single SNP association result. The fields are: Chromosome SNP identifier Code for allele 1 (the minor, rare allele based on the entire sample frequencies) The frequency of this variant in cases The frequency of this variant in controls Code for the other allele The chi-squared statistic for this test (1 df) The asymptotic significance value for this test The odds ratio for this test If a test is not defined (for example, if the variant is monomorphic but was not excluded by the filters) then values of NA for not applicable will be given (as these are read by the package R to indicate missing data, which is convenient if using R to analyse the set of results). HINT In a Unix/Linux environment, you can use the available command line tools to sort the list of association statistics and print out the top ten, for example: sort --key=7 -nr as1.assoc head would give 13 rs e rs e Here we see that the simulated disease variant rs is actually the second most significant SNP in the list, with a large difference in allele frequencies of 0.28 in cases versus 0.62 in controls. However, we also see that, just by chance, a second SNP on chromosome 13 shows a slightly higher test result, with coincidentally similar allele frequencies in cases and controls. When performing so many tests, particularly in a small sample, we often expect the distribution of true positive results to be virtually indistinguishable from the best false positive results. That our variant appears in the top ten list is reassuring however. To get a sorted list of association results, that also includes a range of significance values that are adjusted for multiple testing, use the --adjust flag:

10 plink --bfile hapmap1 --assoc --adjust --out as2 This generates the file as2.assoc.adjust in addition to the basic as2.assoc output file. Using more (or opening the file), one can easily look at one's most significant associations: more as2.assoc.adjusted CHR SNP UNADJ GC BONF HOLM SIDAK_SS SIDAK_SD FDR_BH FDR_BY 13 rs e e rs e e rs e e rs e e rs e e rs e e rs e e rs e rs e rs e rs e Here we see a pre-sorted list of association results. The fields are as follows: Chromosome SNP identifier Unadjusted, asymptotic significance value Genomic control adjusted significance value. This is based on a simple estimation of the inflation factor based on median chi-square statistic. These values do not control for multiple testing therefore. Bonferroni adjusted significance value Holm step-down adjusted significance value Sidak single-step adjusted significance value Sidak step-down adjusted significance value Benjamini & Hochberg (1995) step-up FDR control Benjamini & Yekutieli (2001) step-up FDR control In this particular case, we see that no single variant is significant at the 0.05 level after genome-wide correction. Different correction measures have different properties which are beyond the scope of this tutorial to discuss: it is up to the investigator to decide which to use and how to interpret them. When the --adjust command is used, the log file records the inflation factor calculated for the genomic control analysis, and the mean chi-squared statistic (that should be 1 under the null): Genomic inflation factor (based on median chi-squared) is Mean chi-squared statistic is These values would actually suggest that although no very strong stratification exists, there is perhaps a hint of an increased false positive rate, as both values are greater than HINT The adjusted significance values that control for multiple testing are, by default, based on the unadjusted significance values. If the flag --gc is specified as well as --adjust then these adjusted values will be based on the genomic-control significance value instead. In this particular instance, where we already know about the Chinese/Japanese subpopulations, it might be of interest to directly look at the inflation factor that results from having population membership as the phenotype in a case/control analysis, just to provide extra information about the sample. That is,

11 running the command using the alternate phenotype option (i.e. replacing the disease phenotype with the one in pop.phe, which is actually subpopulation membership): plink --bfile hapmap1 --pheno pop.phe --assoc --adjust --out as3 we see that testing for frequency differences between Chinese and Japanese individuals, we do see some departure from the null distribution: Genomic inflation factor (based on median chi-squared) is Mean chi-squared statistic is That is, the inflation factor of 1.7 represents the maximum possible inflation factor if the disease were perfectly correlated with subpopulation that could arise from the Chinese/Japanese split in the sample (this does not account for any possible within-subpopulation structure, of course, that might also increase SNP-disease false positive rates). This is a good test of whether it is appropriate to do an association study without adjusting for population stratification. Extracting a SNP of interest Finally, given you've identified a SNP, set of SNPs or region of interest, you might want to extract those SNPs as a separate, smaller, more manageable file. In particular, for other applications to analyse the data, you will need to convert from the binary PED file format to a standard PED format. This is done using the --recode options. There are a few forms of this option: we will use the --recode12 that codes the genotypes in a manner that is convenient for subsequent analysis. To extract only this single SNP, use: plink --bfile hapmap1 --snp rs recode12 --out rec_snp1 This particular recode feature codes genotypes as 1/2 alleles, and outputs new.ped and.map files with this SNP. The files are wgas1.ped wgas1.map extra.ped extra.map pop.cov 2. LARGER SET OF HAPMAP1 DATA Genotype data for 250,000 SNPs on 90 individuals Map file for these SNPs Genotype data for an additional 29 SNPs genotyped for the same individuals Map file for these SNPs Population membership coding (coded 1=CH / 2=JP) First, make a new binary file of the data. Note this operation may take a while. plink --file wgas1 --make-bed --out wgas3 Previous analyses have shown that a SNP rs was the most highly associated with the phenotype. We now want to extract the data for rs and perform a series of more detailed analyses on this single SNP.

12 HINT Remember everything in the command should be typed on a single line (not across lines as shown in the boxes below) Purpose Command Input Output Notes Extract data for single SNP rs plink --bfile wgas3 --recode --snp rs out tophit wgas3.bed QC+ whole genome SNP binary fileset wgas3.bim wgas3.fam tophit.ped Standard PED file for this single SNP tophit.map Corresponding marker information We are converting back from the binary format to standard text format. The --snp command is a filter, just extracting data for this one SNP. For this single SNP, we shall next examine the genotyping rate and, second, the Hardy-Weinberg test statistic. In all cases, here and below, the analysis output files are small. These can be viewed by typing, for example, either the "more" or "type" DOS commands: more plink.lmiss or type plink.assoc Purpose Command Input Output Notes Examine genotyping rate for rs plink --file tophit --all --missing tophit.ped Standard PED file for single SNP tophit.map plink.lmiss Missing rate per locus (SNP) plink.imiss Missing rate per individual The --all flag is added because otherwise PLINK would first remove any individual with missing genotypes for this SNP, before calculating the per-snp genotyping rate. Also note use of --file instead of -bfile as tophit is in standard PED format. Finally, note that we do not always need to specify a unique output name when using PLINK directly, so all output files start plink.ext by default Purpose Command Input Examine Hardy-Weinberg equilibrium P-value for rs plink --file tophit --hardy tophit.ped Standard PED file for single SNP tophit.map Corresponding marker information

13 Output plink.hwe Hardy-Weinberg statistic and genotype counts Notes For case/control datasets, tests given for all individual, as well as for cases and controls separately Next, we can ask whether allele frequency differs between the two groups. This involves using the population label as the phenotype of an association test rather than as a covariate. Purpose Explicitly test whether allele frequency for rs differs between populations Command plink --file tophit --assoc --pheno pop.cov Input tophit.ped tophit.map pop.cov Standard PED file for single SNP Corresponding marker information Indicates Chinese (1) or Japanese (2) Output plink.assoc Association (with population) results Notes Here we specify population as the phenotype, not a covariate Purpose Explicitly test whether allele frequency for rs differs between populations, allowing for association with disease Command plink --file tophit --logistic --pheno pop.cov --covar tophit.ped --covar-number 4 Input tophit.ped tophit.map pop.cov Standard PED file for single SNP Corresponding marker information Indicates Chinese (1) or Japanese (2) Output plink.assoc.logistic Association (with population) results Notes We treat the PED file as a covariate file, extracting just the phenotype (i.e. the 4 th column after family ID and individual ID) These results would suggest that the frequency does indeed differ (again, make a note of exactly why this is). Population stratification Initially, we used the known population labels of Chinese versus Japanese. In many studies, we might not have this direct information, or the potential differences in ancestry can be subtle Analyses of population stratification should be performed on a set of SNPs that are approximately in linkage equilibrium: we achieve this by using PLINK's command to remove highly correlated, nearby SNPs. Note: this operation may take a while. Purpose Command Create a LD pruned set of markers (first step) plink --bfile wgas3

14 Input Output Notes --indep-pairwise out prune1 wgas3.bed QC+ whole genome SNP binary fileset wgas3.bim wgas3.fam prune1.prune.in List of SNPs included after pruning prune1.prune.out List of SNPs excluded after pruning This option does not actually remove any SNPs, it just creates two lists of SNPs, which we use below. This removes any SNP that has r-squared > 0.2 with another SNP within a 50-SNP window; this window is shifted across the chromosome 10 SNPs at a time. We next calculate identity-by-state (IBS) allelic similarity between of all possible pairs of all 89 QC+ individuals, and store this information in a file Purpose Calculate genome-wide IBS sharing based on pruned marker list Command plink --bfile wgas3 --extract prune1.prune.in --genome --out ibs1 Input wgas3.bed QC+ whole genome SNP binary fileset wgas3.bim wgas3.fam Output ibs1.genome IBS sharing data (1 row per pair of individuals) Notes Equivalently, one could --exclude prune1.prune.out Finally, using the pairwise IBS information in ibs1.genome, we perform stratification analysis: Note this may take a while. Purpose Cluster individuals into homogeneous groups and perform a multidimensional scaling analysis Command plink --bfile wgas3 --read-genome ibs1.genome --cluster --ppc 1e-3 --cc --mds-plot 2 --out strat1 Input wgas3.bed wgas3.bim wgas3.fam ibs1.genome QC+ whole genome SNP binary fileset Pre-calculated pairwise IBS values Output strat1.cluster2 strat1.mds Assignment to cluster for each individual First 2 MDS components for each individual Notes Constraints on clustering are the PPC test (--ppc 1e-3) and to ensure that each cluster contains at least one case and one control (--cc) Merge in new genotype data

15 The files extra.ped and extra.map contain new SNP data on the same set of individuals. These are SNPs taken from the region around rs , the best SNP in the previous WGAS analysis. We first examine these SNPs by themselves, and then merge them into the SNPs in that region from the original WGAS dataset. Purpose Examine the new SNPs, testing for association stratified by population Command plink --file extra --mh --within pop.cov --out strat2 Input extra.ped extra.map pop.cov New followup SNP genotyping Population label Output strat2.cmh CMH results for new genotypes Notes As evident in the result file strat2.cmh, there are some very strongly associated SNPs in this new set, in particular rs (with a P-value = ). We next merge this new data with the old. Purpose Command Input Output Notes Focus on region of association in WGAS data, and merge in new genotype data, creating a new fileset plink --bfile wgas3 --snp rs window merge extra.ped extra.map --make-bed --out followup wgas3.bed QC+ binary fileset wgas3.bim wgas3.fam extra.ped New genotype data (same individuals) extra.map followup.bed Merged fileset for region around top hit followup.bim followup.fam The --snp and --window commands extract a particular region from wgas3 first, and then merge in the new genotype data in extra.ped We can check that the associations remain the same after merging these two filesets: Purpose Command Input Re-run association to check integrity of file plink --bfile followup --mh --within pop.cov --out followup-cmh followup.bed Merged binary fileset for best region followup.bim followup.fam

16 Output followup-cmh.cmh CMH for top region in merged dataset Notes Now focusing on the top region, using --adjust is no longer appropriate Explore linkage disequilibrium Further analysis indicates four other SNPs that are associated and in LD with the primary SNP rs : rs rs rs rs rs Finally, we will extract just these five SNPs in another dataset Purpose Command Input Output Notes For convenience, focus on the 5 clumped SNPs for further analysis (and so create a new dataset containing just these) plink --bfile followup --snps rs ,rs ,rs ,rs ,rs make-bed --out followup2 followup.bed Merged binary fileset for best region followup.bim followup.fam followup2.bed Binary fileset of 5 SNPs in LD in top region followup2.bim followup2.fam Note that --snps (versus --snp) can take a comma-delimited list of SNPs The pairwise LD (r-squared) between these SNPs can also be calculated using PLINK. By default, only SNP pairs with high LD are shown in the output file. Purpose Report pairwise LD (r-squared) for SNPs in this region Command plink --bfile followup2 --r2 Input followup2.bed Merged binary fileset for best region followup2.bim followup2.fam Output plink.ld List of r-squared LD values (above threshold) Notes Add the --matrix option to get a 5 5 matrix of r-squared statistics 3. ZEBRA FINCH DATA Finally, we will examine some SNPs typed in a three generation zebra finch pedigree. The idea is to check the SNPs and individuals (i.e. quality control), then select unlinked SNPs for analysis in programs such as COANCESTRY and COLONY ( to investigate the pedigree and relatedness structure of the data.

17 HINT! PLINK is designed for humans - so unless you tell it otherwise, it will assume your genome has 23 chromosomes, with chr23 being the X chromosome! Use the option --dog for up to 39 chromosomes. The zebra finch genome has around 30 chromosomes. The zf.ped file is missing the column with the sex of the birds, so we use --allow-no-sex The zf.map file is in centimorgans not Morgans, so we use --cm Make bed file plink --file zf --cm --dog --make-bed --out zf How many SNPs are there? How many individuals are there? How many founders in the pedgiree? Is there phenotype information in the file? Test Hardy Weinberg equilibrium plink --bfile zf --cm --dog --hardy --out zf_hwe do any SNPs fail the HWE test? Test for parent-offspring mismatches plink --bfile zf --cm --dog --mendel --out zf_mendel do any individuals look like they're not related when we thought they were? Test for parent-offspring mismatches again, this time in a file with an error! plink --file zf_with_ped_error --cm --dog --mendel --out zf_mendel1 what is the incorrect pedigree link? missing rates per individual and locus plink --bfile zf --cm --dog --all --missing --out zf_miss_stat what is the maximum of missing genotypes for an individual? what is the maximum number of missing genotypes for a SNP? calculate allele frequencies in founders plink --bfile zf --cm --dog --freq --out zf_freq_stat what are the minimum and the maximum allele frequencies? delete individuals with more than 1% missing genotypes plink --bfile zf --cm --dog --mind out zf_highgeno how many individuals have been removed? Report pairwise LD (r-squared) for all SNPs plink --bfile zf --cm --dog --r2 --out zf_r2

18 how many SNPs are in perfect LD (r2 = 1)? Create a LD pruned set of markers (first step) plink --bfile zf --cm --dog --indep-pairwise out zf_prune how many SNPs are pruned from chromosome 1? how many SNPs are left in the dataset? Try to change the pruning parameters so that we end up with a datset of around 550 SNPs in 'zf_prune.prune.in' (hint: currently we have removed any SNP that has r-squared > 0.9 with another SNP within a 50-SNP window; this window is shifted across the chromosome 5 SNPs at a time). calculate genome-wide IBS sharing based on pruned marker list plink --bfile zf --cm --dog --extract zf_prune.prune.in --genome --out zf_ibs Extract the pruned SNPs into a file we might use in further analysis in other programs plink --bfile zf --allow-no-sex --cm --dog --extract zf_prune.prune.in --recode12 --out zf_pruned Can you extract the file so it shows genotypes as "12" rather than "1 2"?

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017 BICF Nano Course: GWAS GWAS Workflow Development using PLINK Julia Kozlitina Julia.Kozlitina@UTSouthwestern.edu April 28, 2017 Getting started Open the Terminal (Search -> Applications -> Terminal), and

More information

Polymorphism and Variant Analysis Lab

Polymorphism and Variant Analysis Lab Polymorphism and Variant Analysis Lab Arian Avalos PowerPoint by Casey Hanson Polymorphism and Variant Analysis Matt Hudson 2018 1 Exercise In this exercise, we will do the following:. 1. Gain familiarity

More information

Genetic Analysis. Page 1

Genetic Analysis. Page 1 Genetic Analysis Page 1 Genetic Analysis Objectives: 1) Set up Case-Control Association analysis and the Basic Genetics Workflow 2) Use JMP tools to interact with and explore results 3) Learn advanced

More information

Step-by-Step Guide to Basic Genetic Analysis

Step-by-Step Guide to Basic Genetic Analysis Step-by-Step Guide to Basic Genetic Analysis Page 1 Introduction This document shows you how to clean up your genetic data, assess its statistical properties and perform simple analyses such as case-control

More information

Optimising PLINK. Weronika Filinger. September 2, 2013

Optimising PLINK. Weronika Filinger. September 2, 2013 Optimising PLINK Weronika Filinger September 2, 2013 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2013 Abstract Every year the amount of genetic data increases greatly,

More information

GWAS Exercises 3 - GWAS with a Quantiative Trait

GWAS Exercises 3 - GWAS with a Quantiative Trait GWAS Exercises 3 - GWAS with a Quantiative Trait Peter Castaldi January 28, 2013 PLINK can also test for genetic associations with a quantitative trait (i.e. a continuous variable). In this exercise, we

More information

Step-by-Step Guide to Advanced Genetic Analysis

Step-by-Step Guide to Advanced Genetic Analysis Step-by-Step Guide to Advanced Genetic Analysis Page 1 Introduction In the previous document, 1 we covered the standard genetic analyses available in JMP Genomics. Here, we cover the more advanced options

More information

Association Analysis of Sequence Data using PLINK/SEQ (PSEQ)

Association Analysis of Sequence Data using PLINK/SEQ (PSEQ) Association Analysis of Sequence Data using PLINK/SEQ (PSEQ) Copyright (c) 2018 Stanley Hooker, Biao Li, Di Zhang and Suzanne M. Leal Purpose PLINK/SEQ (PSEQ) is an open-source C/C++ library for working

More information

Emile R. Chimusa Division of Human Genetics Department of Pathology University of Cape Town

Emile R. Chimusa Division of Human Genetics Department of Pathology University of Cape Town Advanced Genomic data manipulation and Quality Control with plink Emile R. Chimusa (emile.chimusa@uct.ac.za) Division of Human Genetics Department of Pathology University of Cape Town Outlines: 1.Introduction

More information

Small example of use of OmicABEL

Small example of use of OmicABEL Small example of use of OmicABEL Yurii Aulchenko for the OmicABEL developers July 1, 2013 Contents 1 Important note on data format for OmicABEL 1 2 Outline of the example 2 3 Prepare the data for analysis

More information

Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.4 Graphical User Interface (GUI) Manual

Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.4 Graphical User Interface (GUI) Manual Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.4 Graphical User Interface (GUI) Manual Department of Epidemiology and Biostatistics Wolstein Research Building 2103 Cornell Rd Case Western

More information

Package lodgwas. R topics documented: November 30, Type Package

Package lodgwas. R topics documented: November 30, Type Package Type Package Package lodgwas November 30, 2015 Title Genome-Wide Association Analysis of a Biomarker Accounting for Limit of Detection Version 1.0-7 Date 2015-11-10 Author Ahmad Vaez, Ilja M. Nolte, Peter

More information

SOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie

SOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie SOLOMON: Parentage Analysis 1 Corresponding author: Mark Christie christim@science.oregonstate.edu SOLOMON: Parentage Analysis 2 Table of Contents: Installing SOLOMON on Windows/Linux Pg. 3 Installing

More information

Family Based Association Tests Using the fbat package

Family Based Association Tests Using the fbat package Family Based Association Tests Using the fbat package Weiliang Qiu email: stwxq@channing.harvard.edu Ross Lazarus email: ross.lazarus@channing.harvard.edu Gregory Warnes email: warnes@bst.rochester.edu

More information

MQLS-XM Software Documentation

MQLS-XM Software Documentation MQLS-XM Software Documentation Version 1.0 Timothy Thornton 1 and Mary Sara McPeek 2,3 Department of Biostatistics 1 The University of Washington Departments of Statistics 2 and Human Genetics 3 The University

More information

PRSice: Polygenic Risk Score software v1.22

PRSice: Polygenic Risk Score software v1.22 PRSice: Polygenic Risk Score software v1.22 Jack Euesden jack.euesden@kcl.ac.uk Cathryn M. Lewis April 30, 2015 Paul F. O Reilly Contents 1 Overview 3 2 R packages required 3 3 Quickstart 3 3.1 Input Data...................................

More information

Step-by-Step Guide to Relatedness and Association Mapping Contents

Step-by-Step Guide to Relatedness and Association Mapping Contents Step-by-Step Guide to Relatedness and Association Mapping Contents OBJECTIVES... 2 INTRODUCTION... 2 RELATEDNESS MEASURES... 2 POPULATION STRUCTURE... 6 Q-K ASSOCIATION ANALYSIS... 10 K MATRIX COMPRESSION...

More information

Genetic type 1 Error Calculator (GEC)

Genetic type 1 Error Calculator (GEC) Genetic type 1 Error Calculator (GEC) (Version 0.2) User Manual Miao-Xin Li Department of Psychiatry and State Key Laboratory for Cognitive and Brain Sciences; the Centre for Reproduction, Development

More information

REAP Software Documentation

REAP Software Documentation REAP Software Documentation Version 1.2 Timothy Thornton 1 Department of Biostatistics 1 The University of Washington 1 REAP A C program for estimating kinship coefficients and IBD sharing probabilities

More information

GMDR User Manual Version 1.0

GMDR User Manual Version 1.0 GMDR User Manual Version 1.0 Oct 30, 2011 1 GMDR is a free, open-source interaction analysis tool, aimed to perform gene-gene interaction with generalized multifactor dimensionality methods. GMDR is being

More information

QUICKTEST user guide

QUICKTEST user guide QUICKTEST user guide Toby Johnson Zoltán Kutalik December 11, 2008 for quicktest version 0.94 Copyright c 2008 Toby Johnson and Zoltán Kutalik Permission is granted to copy, distribute and/or modify this

More information

MAGMA manual (version 1.06)

MAGMA manual (version 1.06) MAGMA manual (version 1.06) TABLE OF CONTENTS OVERVIEW 3 QUICKSTART 4 ANNOTATION 6 OVERVIEW 6 RUNNING THE ANNOTATION 6 ADDING AN ANNOTATION WINDOW AROUND GENES 7 RESTRICTING THE ANNOTATION TO A SUBSET

More information

Importing and Merging Data Tutorial

Importing and Merging Data Tutorial Importing and Merging Data Tutorial Release 1.0 Golden Helix, Inc. February 17, 2012 Contents 1. Overview 2 2. Import Pedigree Data 4 3. Import Phenotypic Data 6 4. Import Genetic Data 8 5. Import and

More information

ELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2

ELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2 ELAI user manual Yongtao Guan Baylor College of Medicine Version 1.0 25 June 2015 Contents 1 Copyright 2 2 What ELAI Can Do 2 3 A simple example 2 4 Input file formats 3 4.1 Genotype file format....................................

More information

PRSice: Polygenic Risk Score software - Vignette

PRSice: Polygenic Risk Score software - Vignette PRSice: Polygenic Risk Score software - Vignette Jack Euesden, Paul O Reilly March 22, 2016 1 The Polygenic Risk Score process PRSice ( precise ) implements a pipeline that has become standard in Polygenic

More information

ToCatchAThief c ryan campbell & jenn coughlan 7/23/2018

ToCatchAThief c ryan campbell & jenn coughlan 7/23/2018 ToCatchAThief c ryan campbell & jenn coughlan 7/23/2018 Welcome to the To Catch a Thief: With Data! walkthrough! https://bioconductor.org/packages/devel/ bioc/vignettes/snprelate/inst/doc/snprelatetutorial.html

More information

MAGMA manual (version 1.05)

MAGMA manual (version 1.05) MAGMA manual (version 1.05) TABLE OF CONTENTS OVERVIEW 3 QUICKSTART 4 ANNOTATION 6 OVERVIEW 6 RUNNING THE ANNOTATION 6 ADDING AN ANNOTATION WINDOW AROUND GENES 7 RESTRICTING THE ANNOTATION TO A SUBSET

More information

Linkage analysis with paramlink Session I: Introduction and pedigree drawing

Linkage analysis with paramlink Session I: Introduction and pedigree drawing Linkage analysis with paramlink Session I: Introduction and pedigree drawing In this session we will introduce R, and in particular the package paramlink. This package provides a complete environment for

More information

BOLT-LMM v1.2 User Manual

BOLT-LMM v1.2 User Manual BOLT-LMM v1.2 User Manual Po-Ru Loh November 4, 2014 Contents 1 Overview 2 1.1 Citing BOLT-LMM.................................. 2 2 Installation 2 2.1 Downloading reference LD Scores..........................

More information

haplo.score Score Tests for Association of Traits with Haplotypes when Linkage Phase is Ambiguous

haplo.score Score Tests for Association of Traits with Haplotypes when Linkage Phase is Ambiguous haploscore Score Tests for Association of Traits with Haplotypes when Linkage Phase is Ambiguous Charles M Rowland, David E Tines, and Daniel J Schaid Mayo Clinic Rochester, MN E-mail contact: rowland@mayoedu

More information

PLATO User Guide. Current version: PLATO 2.1. Last modified: September Ritchie Lab, Geisinger Health System

PLATO User Guide. Current version: PLATO 2.1. Last modified: September Ritchie Lab, Geisinger Health System PLATO User Guide Current version: PLATO 2.1 Last modified: September 2017 Ritchie Lab, Geisinger Health System Email: software@ritchielab.psu.edu 1 Table of Contents Overview... 3 PLATO Quick Reference...

More information

GSCAN GWAS Analysis Plan, v GSCAN GWAS ANALYSIS PLAN, Version 1.0 October 6, 2015

GSCAN GWAS Analysis Plan, v GSCAN GWAS ANALYSIS PLAN, Version 1.0 October 6, 2015 GSCAN GWAS Analysis Plan, v0.5 1 Overview GSCAN GWAS ANALYSIS PLAN, Version 1.0 October 6, 2015 There are three major components to this analysis plan. First, genome-wide genotypes must be on the correct

More information

ENIGMA2 Protocol For Association Testing Using Related Subjects

ENIGMA2 Protocol For Association Testing Using Related Subjects ENIGMA2 Protocol For Association Testing Using Related Subjects By Miguel E. Rentería, Derrek Hibar, Alejandro Arias Vasquez, Jason Stein and Sarah Medland Before we start, you need to download and install

More information

Package SimGbyE. July 20, 2009

Package SimGbyE. July 20, 2009 Package SimGbyE July 20, 2009 Type Package Title Simulated case/control or survival data sets with genetic and environmental interactions. Author Melanie Wilson Maintainer Melanie

More information

QTX. Tutorial for. by Kim M.Chmielewicz Kenneth F. Manly. Software for genetic mapping of Mendelian markers and quantitative trait loci.

QTX. Tutorial for. by Kim M.Chmielewicz Kenneth F. Manly. Software for genetic mapping of Mendelian markers and quantitative trait loci. Tutorial for QTX by Kim M.Chmielewicz Kenneth F. Manly Software for genetic mapping of Mendelian markers and quantitative trait loci. Available in versions for Mac OS and Microsoft Windows. revised for

More information

SUGEN 8.6 Overview. Misa Graff, July 2017

SUGEN 8.6 Overview. Misa Graff, July 2017 SUGEN 8.6 Overview Misa Graff, July 2017 General Information By Ran Tao, https://sites.google.com/site/dragontaoran/home Website: http://dlin.web.unc.edu/software/sugen/ Standalone command-line software

More information

PBAP Version 1 User Manual

PBAP Version 1 User Manual PBAP Version 1 User Manual Alejandro Q. Nato, Jr. 1, Nicola H. Chapman 1, Harkirat K. Sohi 1, Hiep D. Nguyen 1, Zoran Brkanac 2, and Ellen M. Wijsman 1,3,4,* 1 Division of Medical Genetics, Department

More information

FORMAT PED PHENO Software Documentation

FORMAT PED PHENO Software Documentation FORMAT PED PHENO Software Documentation Version 1.0 Timothy Thornton 1 and Mary Sara McPeek 2,3 Department of Biostatistics 1 University of Washington Departments of Statistics 2 and Human Genetics 3 The

More information

Release Notes. JMP Genomics. Version 4.0

Release Notes. JMP Genomics. Version 4.0 JMP Genomics Version 4.0 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP. A Business Unit of SAS SAS Campus Drive

More information

PBAP Version 1 User Manual

PBAP Version 1 User Manual PBAP Version 1 User Manual Alejandro Q. Nato, Jr. 1, Nicola H. Chapman 1, Harkirat K. Sohi 1, Hiep D. Nguyen 1, Zoran Brkanac 2, and Ellen M. Wijsman 1,3,4,* 1 Division of Medical Genetics, Department

More information

Recalling Genotypes with BEAGLECALL Tutorial

Recalling Genotypes with BEAGLECALL Tutorial Recalling Genotypes with BEAGLECALL Tutorial Release 8.1.4 Golden Helix, Inc. June 24, 2014 Contents 1. Format and Confirm Data Quality 2 A. Exclude Non-Autosomal Markers......................................

More information

EMIM: Estimation of Maternal, Imprinting and interaction effects using Multinomial modelling

EMIM: Estimation of Maternal, Imprinting and interaction effects using Multinomial modelling EMIM: Estimation of Maternal, Imprinting and interaction effects using Multinomial modelling 1 Contents 1 Introduction 4 1.1 Program information and citation...................... 4 2 Quick Start 5 3 Slow

More information

MAGA: Meta-Analysis of Gene-level Associations

MAGA: Meta-Analysis of Gene-level Associations MAGA: Meta-Analysis of Gene-level Associations SYNOPSIS MAGA [--sfile] [--chr] OPTIONS Option Default Description --sfile specification.txt Select a specification file --chr Select a chromosome DESCRIPTION

More information

Linkage analysis with paramlink Appendix: Running MERLIN from paramlink

Linkage analysis with paramlink Appendix: Running MERLIN from paramlink Linkage analysis with paramlink Appendix: Running MERLIN from paramlink Magnus Dehli Vigeland 1 Introduction While multipoint analysis is not implemented in paramlink, a convenient wrapper for MERLIN (arguably

More information

Handling sam and vcf data, quality control

Handling sam and vcf data, quality control Handling sam and vcf data, quality control We continue with the earlier analyses and get some new data: cd ~/session_3 wget http://wasabiapp.org/vbox/data/session_4/file3.tgz tar xzf file3.tgz wget http://wasabiapp.org/vbox/data/session_4/file4.tgz

More information

1. Summary statistics test_gwas. This file contains a set of 50K random SNPs of the Subjective Well-being GWAS of the Netherlands Twin Register

1. Summary statistics test_gwas. This file contains a set of 50K random SNPs of the Subjective Well-being GWAS of the Netherlands Twin Register Quality Control for Genome-Wide Association Studies Bart Baselmans & Meike Bartels Boulder 2017 Setting up files and directories To perform a quality control protocol in a Genome-Wide Association Meta

More information

SNP HiTLink Manual. Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1

SNP HiTLink Manual. Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1 SNP HiTLink Manual Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1 1 Department of Neurology, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan 2 Dynacom Co., Ltd, Kanagawa,

More information

The fgwas Package. Version 1.0. Pennsylvannia State University

The fgwas Package. Version 1.0. Pennsylvannia State University The fgwas Package Version 1.0 Zhong Wang 1 and Jiahan Li 2 1 Department of Public Health Science, 2 Department of Statistics, Pennsylvannia State University 1. Introduction The fgwas Package (Functional

More information

Bioinformatics - Homework 1 Q&A style

Bioinformatics - Homework 1 Q&A style Bioinformatics - Homework 1 Q&A style Instructions: in this assignment you will test your understanding of basic GWAS concepts and GenABEL functions. The materials needed for the homework (two datasets

More information

CTL mapping in R. Danny Arends, Pjotr Prins, and Ritsert C. Jansen. University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1

CTL mapping in R. Danny Arends, Pjotr Prins, and Ritsert C. Jansen. University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1 CTL mapping in R Danny Arends, Pjotr Prins, and Ritsert C. Jansen University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1 First written: Oct 2011 Last modified: Jan 2018 Abstract: Tutorial

More information

Package GWAF. March 12, 2015

Package GWAF. March 12, 2015 Type Package Package GWAF March 12, 2015 Title Genome-Wide Association/Interaction Analysis and Rare Variant Analysis with Family Data Version 2.2 Date 2015-03-12 Author Ming-Huei Chen

More information

GCTA: a tool for Genome- wide Complex Trait Analysis

GCTA: a tool for Genome- wide Complex Trait Analysis GCTA: a tool for Genome- wide Complex Trait Analysis Version 1.04, 13 Sep 2012 Overview GCTA (Genome- wide Complex Trait Analysis) is designed to estimate the proportion of phenotypic variance explained

More information

Package SMAT. January 29, 2013

Package SMAT. January 29, 2013 Package SMAT January 29, 2013 Type Package Title Scaled Multiple-phenotype Association Test Version 0.98 Date 2013-01-26 Author Lin Li, Ph.D.; Elizabeth D. Schifano, Ph.D. Maintainer Lin Li ;

More information

GWAsimulator: A rapid whole-genome simulation program

GWAsimulator: A rapid whole-genome simulation program GWAsimulator: A rapid whole-genome simulation program Version 1.1 Chun Li and Mingyao Li September 21, 2007 (revised October 9, 2007) 1. Introduction...1 2. Download and compile the program...2 3. Input

More information

KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li

KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual Miao-Xin Li, Jiang Li Department of Psychiatry Centre for Genomic Sciences Department

More information

Spotter Documentation Version 0.5, Released 4/12/2010

Spotter Documentation Version 0.5, Released 4/12/2010 Spotter Documentation Version 0.5, Released 4/12/2010 Purpose Spotter is a program for delineating an association signal from a genome wide association study using features such as recombination rates,

More information

Data input vignette Reading genotype data in snpstats

Data input vignette Reading genotype data in snpstats Data input vignette Reading genotype data in snpstats David Clayton November 9, 2017 Memory limitations Before we start it is important to emphasise that the SnpMatrix objects that hold genotype data in

More information

BEAGLECALL 1.0. Brian L. Browning Department of Medicine Division of Medical Genetics University of Washington. 15 November 2010

BEAGLECALL 1.0. Brian L. Browning Department of Medicine Division of Medical Genetics University of Washington. 15 November 2010 BEAGLECALL 1.0 Brian L. Browning Department of Medicine Division of Medical Genetics University of Washington 15 November 2010 BEAGLECALL 1.0 P a g e i Contents 1 Introduction... 1 1.1 Citing BEAGLECALL...

More information

JMP Genomics. Release Notes. Version 6.0

JMP Genomics. Release Notes. Version 6.0 JMP Genomics Version 6.0 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP, A Business Unit of SAS SAS Campus Drive

More information

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide

WHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide STEPS Epi Info Training Guide Department of Chronic Diseases and Health Promotion World Health Organization 20 Avenue Appia, 1211 Geneva 27, Switzerland For further information: www.who.int/chp/steps WHO

More information

Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.4 User Reference Manual

Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.4 User Reference Manual Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.4 User Reference Manual Department of Epidemiology and Biostatistics Wolstein Research Building 2103 Cornell Rd Case Western Reserve University

More information

MACAU User Manual. Xiang Zhou. March 15, 2017

MACAU User Manual. Xiang Zhou. March 15, 2017 MACAU User Manual Xiang Zhou March 15, 2017 Contents 1 Introduction 2 1.1 What is MACAU...................................... 2 1.2 How to Cite MACAU................................... 2 1.3 The Model.........................................

More information

SKAT Package. Seunggeun (Shawn) Lee. July 21, 2017

SKAT Package. Seunggeun (Shawn) Lee. July 21, 2017 SKAT Package Seunggeun (Shawn) Lee July 21, 2017 1 Overview SKAT package has functions to 1) test for associations between SNP sets and continuous/binary phenotypes with adjusting for covariates and kinships

More information

Correctly Compute Complex Samples Statistics

Correctly Compute Complex Samples Statistics SPSS Complex Samples 15.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample

More information

LD vignette Measures of linkage disequilibrium

LD vignette Measures of linkage disequilibrium LD vignette Measures of linkage disequilibrium David Clayton June 13, 2018 Calculating linkage disequilibrium statistics We shall first load some illustrative data. > data(ld.example) The data are drawn

More information

SEQGWAS: Integrative Analysis of SEQuencing and GWAS Data

SEQGWAS: Integrative Analysis of SEQuencing and GWAS Data SEQGWAS: Integrative Analysis of SEQuencing and GWAS Data SYNOPSIS SEQGWAS [--sfile] [--chr] OPTIONS Option Default Description --sfile specification.txt Select a specification file --chr Select a chromosome

More information

BOLT-LMM v2.0 User Manual

BOLT-LMM v2.0 User Manual BOLT-LMM v2.0 User Manual Po-Ru Loh March 13, 2015 Contents 1 Overview 2 1.1 BOLT-LMM mixed model association testing.................... 2 1.2 BOLT-REML variance components analysis.....................

More information

GMMAT: Generalized linear Mixed Model Association Tests Version 0.7

GMMAT: Generalized linear Mixed Model Association Tests Version 0.7 GMMAT: Generalized linear Mixed Model Association Tests Version 0.7 Han Chen Department of Biostatistics Harvard T.H. Chan School of Public Health Email: hanchen@hsph.harvard.edu Matthew P. Conomos Department

More information

Genome-Wide Association Study Using

Genome-Wide Association Study Using has to Department of Epidemiology UT MD Anderson Cancer Center Houston, TX April 2, 2008 Programmers Cross Training Outline has to 1 has 2 to 3 Going object-oriented: Outline has Brief introduction to

More information

BOLT-LMM v2.3 User Manual

BOLT-LMM v2.3 User Manual BOLT-LMM v2.3 User Manual Po-Ru Loh August 1, 2017 Contents 1 Overview 2 1.1 BOLT-LMM mixed model association testing.................... 2 1.2 BOLT-REML variance components analysis.....................

More information

User s Guide. Version 2.2. Semex Alliance, Ontario and Centre for Genetic Improvement of Livestock University of Guelph, Ontario

User s Guide. Version 2.2. Semex Alliance, Ontario and Centre for Genetic Improvement of Livestock University of Guelph, Ontario User s Guide Version 2.2 Semex Alliance, Ontario and Centre for Genetic Improvement of Livestock University of Guelph, Ontario Mehdi Sargolzaei, Jacques Chesnais and Flavio Schenkel Jan 2014 Disclaimer

More information

Population Genetics (52642)

Population Genetics (52642) Population Genetics (52642) Benny Yakir 1 Introduction In this course we will examine several topics that are related to population genetics. In each topic we will discuss briefly the biological background

More information

Package Eagle. January 31, 2019

Package Eagle. January 31, 2019 Type Package Package Eagle January 31, 2019 Title Multiple Locus Association Mapping on a Genome-Wide Scale Version 1.3.0 Maintainer Andrew George Author Andrew George [aut, cre],

More information

FVGWAS- 3.0 Manual. 1. Schematic overview of FVGWAS

FVGWAS- 3.0 Manual. 1. Schematic overview of FVGWAS FVGWAS- 3.0 Manual Hongtu Zhu @ UNC BIAS Chao Huang @ UNC BIAS Nov 8, 2015 More and more large- scale imaging genetic studies are being widely conducted to collect a rich set of imaging, genetic, and clinical

More information

Estimating Variance Components in MMAP

Estimating Variance Components in MMAP Last update: 6/1/2014 Estimating Variance Components in MMAP MMAP implements routines to estimate variance components within the mixed model. These estimates can be used for likelihood ratio tests to compare

More information

Package QCEWAS. R topics documented: February 1, Type Package

Package QCEWAS. R topics documented: February 1, Type Package Type Package Package QCEWAS February 1, 2019 Title Fast and Easy Quality Control of EWAS Results Files Version 1.2-2 Date 2019-02-01 Author Peter J. van der Most, Leanne K. Kupers, Ilja Nolte Maintainer

More information

BOLT-LMM v2.3.2 User Manual

BOLT-LMM v2.3.2 User Manual BOLT-LMM v2.3.2 User Manual Po-Ru Loh March 10, 2018 Contents 1 Overview 2 1.1 BOLT-LMM mixed model association testing.................... 3 1.2 BOLT-REML variance components analysis.....................

More information

Forensic Resource/Reference On Genetics knowledge base: FROG-kb User s Manual. Updated June, 2017

Forensic Resource/Reference On Genetics knowledge base: FROG-kb User s Manual. Updated June, 2017 Forensic Resource/Reference On Genetics knowledge base: FROG-kb User s Manual Updated June, 2017 Table of Contents 1. Introduction... 1 2. Accessing FROG-kb Home Page and Features... 1 3. Home Page and

More information

Affymetrix Genotyping Console 4.2 Release Notes (For research use only. Not for use in diagnostic procedures.)

Affymetrix Genotyping Console 4.2 Release Notes (For research use only. Not for use in diagnostic procedures.) Affymetrix Genotyping Console 4.2 Release Notes (For research use only. Not for use in diagnostic procedures.) Genotyping Console 4.2 includes the following changes and enhancements: 1. Edit Calls within

More information

User Manual for GIGI v1.06.1

User Manual for GIGI v1.06.1 1 User Manual for GIGI v1.06.1 Author: Charles Y K Cheung [cykc@uw.edu] Ellen M Wijsman [wijsman@uw.edu] Department of Biostatistics University of Washington Last Modified on 1/31/2015 2 Contents Introduction...

More information

Package RVS0.0 Jiafen Gong, Zeynep Baskurt, Andriy Derkach, Angelina Pesevski and Lisa Strug October, 2016

Package RVS0.0 Jiafen Gong, Zeynep Baskurt, Andriy Derkach, Angelina Pesevski and Lisa Strug October, 2016 Package RVS0.0 Jiafen Gong, Zeynep Baskurt, Andriy Derkach, Angelina Pesevski and Lisa Strug October, 2016 The Robust Variance Score (RVS) test is designed for association analysis for next generation

More information

SPSS TRAINING SPSS VIEWS

SPSS TRAINING SPSS VIEWS SPSS TRAINING SPSS VIEWS Dataset Data file Data View o Full data set, structured same as excel (variable = column name, row = record) Variable View o Provides details for each variable (column in Data

More information

Package ukbtools. February 5, 2018

Package ukbtools. February 5, 2018 Version 0.10.1 Title Manipulate and Explore UK Biobank Data Package ukbtools February 5, 2018 Maintainer Ken Hanscombe A set of tools to create a UK Biobank

More information

Development of linkage map using Mapmaker/Exp3.0

Development of linkage map using Mapmaker/Exp3.0 Development of linkage map using Mapmaker/Exp3.0 Balram Marathi 1, A. K. Singh 2, Rajender Parsad 3 and V.K. Gupta 3 1 Institute of Biotechnology, Acharya N. G. Ranga Agricultural University, Rajendranagar,

More information

Frequency Tables. Chapter 500. Introduction. Frequency Tables. Types of Categorical Variables. Data Structure. Missing Values

Frequency Tables. Chapter 500. Introduction. Frequency Tables. Types of Categorical Variables. Data Structure. Missing Values Chapter 500 Introduction This procedure produces tables of frequency counts and percentages for categorical and continuous variables. This procedure serves as a summary reporting tool and is often used

More information

Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University

Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University While your data tables or spreadsheets may look good to

More information

Table of Contents. 2. Files Input File Formats Output Files Export Options Auxiliary Input Files

Table of Contents. 2. Files Input File Formats Output Files Export Options Auxiliary Input Files GEVALT Documentation Table of Contents 1. Using GEVALT Loading a Dataset Saving and Loading Status Data Quality Checks LD Display Blocks and Haplotypes Phased Genotypes Individual Statistics Stampa Tagger

More information

BioBin User Guide Current version: BioBin 2.3

BioBin User Guide Current version: BioBin 2.3 BioBin User Guide Current version: BioBin 2.3 Last modified: April 2017 Ritchie Lab Geisinger Health System URL: http://www.ritchielab.com/software/biobin-download Email: software@ritchielab.psu.edu 1

More information

SPSS. (Statistical Packages for the Social Sciences)

SPSS. (Statistical Packages for the Social Sciences) Inger Persson SPSS (Statistical Packages for the Social Sciences) SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform basic statistical calculations in SPSS.

More information

Quality control of array genotyping data with argyle Andrew P Morgan

Quality control of array genotyping data with argyle Andrew P Morgan Quality control of array genotyping data with argyle Andrew P Morgan 2015-10-08 Introduction Proper quality control of array genotypes is an important prerequisite to further analysis. Genotype quality

More information

MAN Package for pedigree analysis. Contents.

MAN Package for pedigree analysis. Contents. 1 MAN Package for pedigree analysis. Contents. Introduction 5 1. Operations with pedigree data. 5 1.1. Data input options. 5 1.1.1. Import from file. 5 1.1.2. Manual input. 7 1.2. Drawing of the pedigree

More information

Computer lab 2 Course: Introduction to R for Biologists

Computer lab 2 Course: Introduction to R for Biologists Computer lab 2 Course: Introduction to R for Biologists April 23, 2012 1 Scripting As you have seen, you often want to run a sequence of commands several times, perhaps with small changes. An efficient

More information

Genomics tools: making quickly impressive outputs

Genomics tools: making quickly impressive outputs Genomics tools: making quickly impressive outputs Libor Mořkovský, Václav Janoušek, Anastassiya Zidkova, Anna Přistoupilová, Filip Sedlák http://ngs-course.readthedocs.org/en/praha-january-2017/ Genome

More information

Introduction to Hail. Cotton Seed, Technical Lead Tim Poterba, Software Engineer Hail Team, Neale Lab Broad Institute and MGH

Introduction to Hail. Cotton Seed, Technical Lead Tim Poterba, Software Engineer Hail Team, Neale Lab Broad Institute and MGH Introduction to Hail Cotton Seed, Technical Lead Tim Poterba, Software Engineer Hail Team, Neale Lab Broad Institute and MGH Why Hail? Genetic data is becoming absolutely massive Broad Genomics, by the

More information

Correlation. January 12, 2019

Correlation. January 12, 2019 Correlation January 12, 2019 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order

More information

Tutorial. Identification of Variants Using GATK. Sample to Insight. November 21, 2017

Tutorial. Identification of Variants Using GATK. Sample to Insight. November 21, 2017 Identification of Variants Using GATK November 21, 2017 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com AdvancedGenomicsSupport@qiagen.com

More information

Chapter One: Getting Started With IBM SPSS for Windows

Chapter One: Getting Started With IBM SPSS for Windows Chapter One: Getting Started With IBM SPSS for Windows Using Windows The Windows start-up screen should look something like Figure 1-1. Several standard desktop icons will always appear on start up. Note

More information

The fgwas software. Version 1.0. Pennsylvannia State University

The fgwas software. Version 1.0. Pennsylvannia State University The fgwas software Version 1.0 Zhong Wang 1 and Jiahan Li 2 1 Department of Public Health Science, 2 Department of Statistics, Pennsylvannia State University 1. Introduction Genome-wide association studies

More information

Estimating. Local Ancestry in admixed Populations (LAMP)

Estimating. Local Ancestry in admixed Populations (LAMP) Estimating Local Ancestry in admixed Populations (LAMP) QIAN ZHANG 572 6/05/2014 Outline 1) Sketch Method 2) Algorithm 3) Simulated Data: Accuracy Varying Pop1-Pop2 Ancestries r 2 pruning threshold Number

More information

The Lander-Green Algorithm in Practice. Biostatistics 666

The Lander-Green Algorithm in Practice. Biostatistics 666 The Lander-Green Algorithm in Practice Biostatistics 666 Last Lecture: Lander-Green Algorithm More general definition for I, the "IBD vector" Probability of genotypes given IBD vector Transition probabilities

More information