Download PLINK from
|
|
- Laurence Kennedy
- 6 years ago
- Views:
Transcription
1 PLINK tutorial Amended from two tutorials that the PLINK author Shaun Purcell wrote, see and 'Teaching materials and example dataset' at Download PLINK from In this tutorial, we will use PLINK to analyse some real and some example large-scale SNP data, to give a demonstration of what the program can do (e.g. data management, summary statistics and basic association analysis). What we do today might not be particularly realistic or accurate, but I hope it gives an idea of what PLINK is capable of! EXAMPLE DATASETS AND SOFTWARE 1. approximately 80,000 autosomal SNPs from the 89 Asian HapMap individuals (Han Chinese from Beijing and Japanese from Tokyo). A phenotype has been simulated based on the genotype at one SNP. Download from 2. a bigger file with all ~250,000 SNPs for 90 Asian HapMap individuals (Han Chinese from Beijing and Japanese from Tokyo), along with the simulated disease phenotype. In addition, a small subset of SNPs (N=29) genotyped on the same individuals represent a "follow-up genotyping" exercise are included, as well as a file with population membership (Chinese or Japanese). Download from 3. a file of 771 SNPs genotyped on a captive zebra finch pedigree CARDINAL RULES & CAVEATS When using PLINK there are a few key points to remember. Always consult the LOG file (console output) PLINK has no memory o each run loads data anew, previous filters lost Exact syntax and spelling is very important o minus minus
2 Not every option can be combined with every other option o For example, basic haplotype tests cannot take covariates o PLINK doesn t always warn you o LOG file often shows what has happened (or not) Consult the web documentation ( GETTING STARTED PLINK is a command line program, so we need to operate in a command line window. All commands involve typing plink at the command prompt (e.g. DOS window or Unix terminal) followed by a number of options (all starting with --option) to specify the data files / methods to be used. All results are written to files with various extensions. Putting your files in the same directory as the "plink.exe" file will let you do all analysis from this directory. Navigate to the plink directory, and to check you are in the correct folder, type plink and press enter, which should start PLINK and generate some output describing the program. If you get an error message, you are in the wrong directory. For most options, PLINK needs two plain text files: 1) a file with family ID, individual ID, father ID, mother ID, sex (1=male, 2 = female, other = unknown), phenotype and genotypes in columns for each individual, with the extension.ped. This file has NO HEADER so looks like this: FAM A A G G A C FAM A A A G 0 0 2) a file with marker information, including chromosome number, the SNP name, the position in morgans on the chromosome, and the position in base pairs on the chromosome, with the extension.map. Again, this file has NO HEADER so looks like this: 1 rs rs rs rs HINT! It is easiest if these files have the same name (e.g. sheep.ped and sheep.map). There are lots of options to change the format of these files (for example, you can provide genotypes as "AG" instead of "A G") - see the information on the PLINK website.
3 HINT! PLINK is designed for humans - so unless you tell it otherwise, it will assume your genome has 23 chromosomes, with chr23 being the X chromosome! Use the option --dog for up to 39 chromosomes 1. HAPMAP1 data Download the example data from and unzip the contents into your plink folder. A phenotype was simulated, so that a single SNP (rs ) should be associated with the 'disease'. The files are hapmap1.ped Genotype data for 83,000 SNPs on 90 individuals hapmap1.map Map file for these SNPs pop.phe Population membership coding (coded 1=CH / 2=JP) qt.phe Quantitative phenotype - we won't use this one Just typing plink and specifying a file with no further options is a good way to check that the file is intact, and to get some basic summary statistics about the file. plink --file hapmap1 The --file option takes a single parameter, the input file name, and will look for two files: a.ped file and a.map file with the name hapmap1 (i.e. hapmap1.ped and hapmap1.map). The above command should generate something like the following output in the console window. It will also save this information to a file called PLINK! v0.99l 27/Jul/ (C) 2006 Shaun Purcell, GNU General Public License, v Web-based version check ( --noweb to skip ) Connecting to web OK, v0.99l is current *** Pre-Release Testing Version *** Writing this text to log file [ plink.log ] Analysis started: Mon Jul 31 09:00: Options in effect: --file hapmap (of 83534) markers to be included from [ hapmap1.map ] 89 individuals read from [ hapmap1.ped ] 89 individuals with nonmissing phenotypes Assuming a binary trait (1=unaff, 2=aff, 0=miss)
4 Missing phenotype value is also -9 Before frequency and genotyping pruning, there are SNPs Applying filters (SNP-major mode) 89 founders and 0 non-founders found 0 of 89 individuals removed for low genotyping ( MIND > 0.1 ) 859 SNPs failed missingness test ( GENO > 0.1 ) SNPs failed frequency test ( MAF < 0.01 ) After frequency and genotyping pruning, there are SNPs Analysis finished: Mon Jul 31 09:00: The information contained here can be summarized as follows: A banner showing copyright information and the version number -- the web-based version check shows that this is an up-to-date version of PLINK and displays a message that v0.99l is a pre-release testing version. A message indicating that the log file will be saved in plink.log. The name of the output file can be changed with the --out option -- e.g. specifying --out anal1 will generate a log file called anal1.log instead. A list of the command options specified is given next: in this case it is only a single option, --file hapmap1. By keeping track of log files, and naming each analysis with its own --outname, it makes it easier to keep track of when and how the different output files were generated. Next is some information on the number of markers and individuals read from the MAP and PED file. In total, just over 80,000 SNPs were read in from the MAP file. It is written "83534 (of 83534)" because some SNPs might be excluded (by making the physical position a negative number in the MAP file), in which case the first number would indicate how many SNPs are included. In this case, all SNPs are read in from the PED file. We also see that 89 individuals were read in from the PED file, and that all these individuals had valid phenotype information. Next, PLINK tells us that the phenotype is an affection status variable, as opposed to a quantitative trait, and lets us know what the missing values are. The next stage is the filtering stage -- individuals and/or SNPs are removed on the basis of thresholds. Please see this page for more information on setting thresholds. In this case we see that no individuals were removed, but almost 20,000 SNPs were removed, based on missingness (859) and frequency (16994). This particularly high proportion of removed SNPs is based on the fact that these are random HapMap SNPs in the Chinese and Japanese samples, rather than preselected markers on a whole-genome association product: there will be many more rare and monomorphic markers here than one would normally expect. Finally, a line is given that indicates when this analysis finished. You can see that it took 8 seconds (on my machine at least) to read in the file and apply the filters. If other analyses had been requested, then the other output files generated would have been indicated in the log file. HINT! All output files that PLINK generates have the same format: root.extension where root is, by default, "plink" but can be changed with the --out option, and the extension will depend on the type of output file it is. Making a binary PED file The first thing we will do is to make a binary PED file. This more compact representation of the data saves space and speeds up subsequent analysis. To make a binary PED file, use the following command.
5 plink --file hapmap1 --make-bed --out hapmap1 If it runs correctly on your machine, you should see the following in your output: above as before Before frequency and genotyping pruning, there are SNPs Applying filters (SNP-major mode) 89 founders and 0 non-founders found 0 SNPs failed missingness test ( GENO > 1 ) 0 SNPs failed frequency test ( MAF < 0 ) After frequency and genotyping pruning, there are SNPs Writing pedigree information to [ hapmap1.fam ] Writing map (extended format) information to [ hapmap1.bim ] Writing genotype bitfile to [ hapmap1.bed ] Using (default) SNP-major mode Analysis finished: Mon Jul 31 09:10: There are several things to note: When using the --make-bed option, the threshold filters for missing rates and allele frequency were automatically set to exclude nobody. Although these filters can be specified manually (using --mind, --geno and --maf) to exclude people, this default tends to be wanted when creating a new PED or binary PED file. The commands --extract / --exclude and --keep /-- remove can also be applied at this stage. Three files are created with this command -- the binary file that contains the raw genotype data hapmap1.bed but also a revsied map file hapmap1.bim which contains two extra columns that give the allele names for each SNP, and hapmap1.fam which is just the first six columns of hapmap1.ped. You can view the.bim and.fam files -- but do not try to view the.bed file. None of these three files should be manually editted. If, for example, you wanted to create a new file that only includes individuals with high genotyping (at least 95% complete), you would run: plink --file hapmap1 --make-bed --mind out highgeno which would create files highgeno.bed highgeno.bim highgeno.fam Working with the binary PED file To specify that the input data are in binary format, as opposed to the normal text PED/MAP format, just use the --bfile option instead of --file. To repeat the first command we ran (which just loads the data and prints some basic summary statistics): plink --bfile hapmap1 Writing this text to log file [ plink.log ] Analysis started: Mon Jul 31 09:12:
6 Options in effect: --bfile hapmap1 Reading map (extended format) from [ hapmap1.bim ] markers to be included from [ hapmap1.bim ] Reading pedigree information from [ hapmap1.fam ] 89 individuals read from [ hapmap1.fam ] 89 individuals with nonmissing phenotypes Reading genotype bitfile from [ hapmap1.bed ] Detected that binary PED file is v1.00 SNP-major mode Before frequency and genotyping pruning, there are SNPs Applying filters (SNP-major mode) 89 founders and 0 non-founders found 0 of 89 individuals removed for low genotyping ( MIND > 0.1 ) 859 SNPs failed missingness test ( GENO > 0.1 ) SNPs failed frequency test ( MAF < 0.01 ) After frequency and genotyping pruning, there are SNPs Analysis finished: Mon Jul 31 09:12: The things to note here: That three files hapmap1.bim, hapmap1.fam and hapmap1.bed were loaded instead of the usual two files. That is, hapmap1.ped and hapmap1.map are not used in this analysis, and could in fact be deleted now. The data are loaded in much more quickly -- based on the timestamp at the beginning and end of the log output, this took 2 seconds instead of 10. Summary statistics: missing rates Next, we shall generate some simple summary statistics on rates of missing data in the file, using the -- missing option: plink --bfile hapmap1 --missing --out miss_stat which should generate the following output: 0 of 89 individuals removed for low genotyping ( MIND > 0.1 ) Writing individual missingness information to [ miss_stat.imiss ] Writing locus missingness information to [ miss_stat.lmiss ] Here we see that no individuals were removed for low genotypes (MIND > 0.1 implies that we accept people with less than 10 percent missingness). The per individual and per SNP (after excluding individuals on the basis of low genotyping) rates are then output to the files miss_stat.imiss and miss_stat.lmiss respectively. If we had not specified an -- out option, the root output filename would have defaulted to "plink". These output files are standard, plain text files that can be viewed in any text editor, pager, spreadsheet or statistics package (albeit one that can handle large files). Taking a look at the filemiss_stat.lmiss, for example using the more command which is present on most systems: more miss_stat.lmiss
7 we see CHR SNP N_MISS F_MISS 1 rs rs rs rs rs rs rs rs rs HINT! To exit from more, type 'q' to quit That is, for each SNP, we see the number of missing individuals (N_MISS) and the proportion of individuals missing (F_MISS). Similarly: more miss_stat.imiss we see FID IID MISS_PHENO N_MISS F_MISS HCB181 1 N HCB182 1 N HCB183 1 N HCB184 1 N HCB185 1 N HCB186 1 N HCB187 1 N The final column is the actual genotyping rate for that individual -- we see the genotyping rate is very high here. HINT If you are using a spreadsheet package that can only display a limited number of rows (some popular packages can handle just over 65,000 rows) then it might be desirable to ask PLINKto analyse the data by chromosome, using the --chr option. For example, to perform the above analysis for chromosome 1: plink --bfile hapmap1 --chr 1 --out res1 --missing then for chromosome 2: plink --bfile hapmap1 --chr 2 --out res2 --missing and so on.
8 Summary statistics: allele frequencies Next we perform a similar analysis, except requesting allele frequencies instead of genotyping rates. The following command generates a file called freq_stat.frq which contains the minor allele frequency and allele codes for each SNP. plink --bfile hapmap1 --freq --out freq_stat It is also possible to perform this frequency analysis (and the missingness analysis) stratified by a categorical, cluster variable. In this case, we shall use the file that indicates whether the individual is from the Chinese or the Japanese sample, pop.phe. This cluster file contains three columns; each row is an individual. The format is described more fully in the main documentation. To perform a stratified analysis, use the --within option. plink --bfile hapmap1 --freq --within pop.phe --out freq_stat The output will now indicate that a file called freq_stat.frq.strat. has been generated instead of freq_stat.frq. If we view this file: more freq_stat.frq.strat we see each row is now the allele frequency for each SNP stratifed by subpopulation: CHR SNP CLST A1 A2 MAF 1 rs rs rs rs rs rs rs rs Here we see that each SNP is represented twice - the CLST column indicates whether the frequency is from the Chinese or Japanese populations, coded as per the pop.phe file. If you were just interested in a specific SNP, and wanted to know what the frequency was in the two populations, you can use the --snp option to select this SNP: plink --bfile hapmap1 --snp rs freq --within pop.phe --out snp1_frq_stat would generate a file snp1_frq_stat.frq.strat containing only the population-specific frequencies for this single SNP. You can also specify a range of SNPs by adding the --window kb option or using the options -- from and --to, following each with a different SNP (they must be in the correct order and be on the same chromosome). Basic association analysis Let's now perform a basic association analysis on the disease trait for all single SNPs. The basic command is plink --bfile hapmap1 --assoc --out as1
9 which generates an output file as1.assoc which contains the following fields CHR SNP A1 F_A F_U A2 CHISQ P OR 1 rs rs rs rs rs rs rs rs rs where each row is a single SNP association result. The fields are: Chromosome SNP identifier Code for allele 1 (the minor, rare allele based on the entire sample frequencies) The frequency of this variant in cases The frequency of this variant in controls Code for the other allele The chi-squared statistic for this test (1 df) The asymptotic significance value for this test The odds ratio for this test If a test is not defined (for example, if the variant is monomorphic but was not excluded by the filters) then values of NA for not applicable will be given (as these are read by the package R to indicate missing data, which is convenient if using R to analyse the set of results). HINT In a Unix/Linux environment, you can use the available command line tools to sort the list of association statistics and print out the top ten, for example: sort --key=7 -nr as1.assoc head would give 13 rs e rs e Here we see that the simulated disease variant rs is actually the second most significant SNP in the list, with a large difference in allele frequencies of 0.28 in cases versus 0.62 in controls. However, we also see that, just by chance, a second SNP on chromosome 13 shows a slightly higher test result, with coincidentally similar allele frequencies in cases and controls. When performing so many tests, particularly in a small sample, we often expect the distribution of true positive results to be virtually indistinguishable from the best false positive results. That our variant appears in the top ten list is reassuring however. To get a sorted list of association results, that also includes a range of significance values that are adjusted for multiple testing, use the --adjust flag:
10 plink --bfile hapmap1 --assoc --adjust --out as2 This generates the file as2.assoc.adjust in addition to the basic as2.assoc output file. Using more (or opening the file), one can easily look at one's most significant associations: more as2.assoc.adjusted CHR SNP UNADJ GC BONF HOLM SIDAK_SS SIDAK_SD FDR_BH FDR_BY 13 rs e e rs e e rs e e rs e e rs e e rs e e rs e e rs e rs e rs e rs e Here we see a pre-sorted list of association results. The fields are as follows: Chromosome SNP identifier Unadjusted, asymptotic significance value Genomic control adjusted significance value. This is based on a simple estimation of the inflation factor based on median chi-square statistic. These values do not control for multiple testing therefore. Bonferroni adjusted significance value Holm step-down adjusted significance value Sidak single-step adjusted significance value Sidak step-down adjusted significance value Benjamini & Hochberg (1995) step-up FDR control Benjamini & Yekutieli (2001) step-up FDR control In this particular case, we see that no single variant is significant at the 0.05 level after genome-wide correction. Different correction measures have different properties which are beyond the scope of this tutorial to discuss: it is up to the investigator to decide which to use and how to interpret them. When the --adjust command is used, the log file records the inflation factor calculated for the genomic control analysis, and the mean chi-squared statistic (that should be 1 under the null): Genomic inflation factor (based on median chi-squared) is Mean chi-squared statistic is These values would actually suggest that although no very strong stratification exists, there is perhaps a hint of an increased false positive rate, as both values are greater than HINT The adjusted significance values that control for multiple testing are, by default, based on the unadjusted significance values. If the flag --gc is specified as well as --adjust then these adjusted values will be based on the genomic-control significance value instead. In this particular instance, where we already know about the Chinese/Japanese subpopulations, it might be of interest to directly look at the inflation factor that results from having population membership as the phenotype in a case/control analysis, just to provide extra information about the sample. That is,
11 running the command using the alternate phenotype option (i.e. replacing the disease phenotype with the one in pop.phe, which is actually subpopulation membership): plink --bfile hapmap1 --pheno pop.phe --assoc --adjust --out as3 we see that testing for frequency differences between Chinese and Japanese individuals, we do see some departure from the null distribution: Genomic inflation factor (based on median chi-squared) is Mean chi-squared statistic is That is, the inflation factor of 1.7 represents the maximum possible inflation factor if the disease were perfectly correlated with subpopulation that could arise from the Chinese/Japanese split in the sample (this does not account for any possible within-subpopulation structure, of course, that might also increase SNP-disease false positive rates). This is a good test of whether it is appropriate to do an association study without adjusting for population stratification. Extracting a SNP of interest Finally, given you've identified a SNP, set of SNPs or region of interest, you might want to extract those SNPs as a separate, smaller, more manageable file. In particular, for other applications to analyse the data, you will need to convert from the binary PED file format to a standard PED format. This is done using the --recode options. There are a few forms of this option: we will use the --recode12 that codes the genotypes in a manner that is convenient for subsequent analysis. To extract only this single SNP, use: plink --bfile hapmap1 --snp rs recode12 --out rec_snp1 This particular recode feature codes genotypes as 1/2 alleles, and outputs new.ped and.map files with this SNP. The files are wgas1.ped wgas1.map extra.ped extra.map pop.cov 2. LARGER SET OF HAPMAP1 DATA Genotype data for 250,000 SNPs on 90 individuals Map file for these SNPs Genotype data for an additional 29 SNPs genotyped for the same individuals Map file for these SNPs Population membership coding (coded 1=CH / 2=JP) First, make a new binary file of the data. Note this operation may take a while. plink --file wgas1 --make-bed --out wgas3 Previous analyses have shown that a SNP rs was the most highly associated with the phenotype. We now want to extract the data for rs and perform a series of more detailed analyses on this single SNP.
12 HINT Remember everything in the command should be typed on a single line (not across lines as shown in the boxes below) Purpose Command Input Output Notes Extract data for single SNP rs plink --bfile wgas3 --recode --snp rs out tophit wgas3.bed QC+ whole genome SNP binary fileset wgas3.bim wgas3.fam tophit.ped Standard PED file for this single SNP tophit.map Corresponding marker information We are converting back from the binary format to standard text format. The --snp command is a filter, just extracting data for this one SNP. For this single SNP, we shall next examine the genotyping rate and, second, the Hardy-Weinberg test statistic. In all cases, here and below, the analysis output files are small. These can be viewed by typing, for example, either the "more" or "type" DOS commands: more plink.lmiss or type plink.assoc Purpose Command Input Output Notes Examine genotyping rate for rs plink --file tophit --all --missing tophit.ped Standard PED file for single SNP tophit.map plink.lmiss Missing rate per locus (SNP) plink.imiss Missing rate per individual The --all flag is added because otherwise PLINK would first remove any individual with missing genotypes for this SNP, before calculating the per-snp genotyping rate. Also note use of --file instead of -bfile as tophit is in standard PED format. Finally, note that we do not always need to specify a unique output name when using PLINK directly, so all output files start plink.ext by default Purpose Command Input Examine Hardy-Weinberg equilibrium P-value for rs plink --file tophit --hardy tophit.ped Standard PED file for single SNP tophit.map Corresponding marker information
13 Output plink.hwe Hardy-Weinberg statistic and genotype counts Notes For case/control datasets, tests given for all individual, as well as for cases and controls separately Next, we can ask whether allele frequency differs between the two groups. This involves using the population label as the phenotype of an association test rather than as a covariate. Purpose Explicitly test whether allele frequency for rs differs between populations Command plink --file tophit --assoc --pheno pop.cov Input tophit.ped tophit.map pop.cov Standard PED file for single SNP Corresponding marker information Indicates Chinese (1) or Japanese (2) Output plink.assoc Association (with population) results Notes Here we specify population as the phenotype, not a covariate Purpose Explicitly test whether allele frequency for rs differs between populations, allowing for association with disease Command plink --file tophit --logistic --pheno pop.cov --covar tophit.ped --covar-number 4 Input tophit.ped tophit.map pop.cov Standard PED file for single SNP Corresponding marker information Indicates Chinese (1) or Japanese (2) Output plink.assoc.logistic Association (with population) results Notes We treat the PED file as a covariate file, extracting just the phenotype (i.e. the 4 th column after family ID and individual ID) These results would suggest that the frequency does indeed differ (again, make a note of exactly why this is). Population stratification Initially, we used the known population labels of Chinese versus Japanese. In many studies, we might not have this direct information, or the potential differences in ancestry can be subtle Analyses of population stratification should be performed on a set of SNPs that are approximately in linkage equilibrium: we achieve this by using PLINK's command to remove highly correlated, nearby SNPs. Note: this operation may take a while. Purpose Command Create a LD pruned set of markers (first step) plink --bfile wgas3
14 Input Output Notes --indep-pairwise out prune1 wgas3.bed QC+ whole genome SNP binary fileset wgas3.bim wgas3.fam prune1.prune.in List of SNPs included after pruning prune1.prune.out List of SNPs excluded after pruning This option does not actually remove any SNPs, it just creates two lists of SNPs, which we use below. This removes any SNP that has r-squared > 0.2 with another SNP within a 50-SNP window; this window is shifted across the chromosome 10 SNPs at a time. We next calculate identity-by-state (IBS) allelic similarity between of all possible pairs of all 89 QC+ individuals, and store this information in a file Purpose Calculate genome-wide IBS sharing based on pruned marker list Command plink --bfile wgas3 --extract prune1.prune.in --genome --out ibs1 Input wgas3.bed QC+ whole genome SNP binary fileset wgas3.bim wgas3.fam Output ibs1.genome IBS sharing data (1 row per pair of individuals) Notes Equivalently, one could --exclude prune1.prune.out Finally, using the pairwise IBS information in ibs1.genome, we perform stratification analysis: Note this may take a while. Purpose Cluster individuals into homogeneous groups and perform a multidimensional scaling analysis Command plink --bfile wgas3 --read-genome ibs1.genome --cluster --ppc 1e-3 --cc --mds-plot 2 --out strat1 Input wgas3.bed wgas3.bim wgas3.fam ibs1.genome QC+ whole genome SNP binary fileset Pre-calculated pairwise IBS values Output strat1.cluster2 strat1.mds Assignment to cluster for each individual First 2 MDS components for each individual Notes Constraints on clustering are the PPC test (--ppc 1e-3) and to ensure that each cluster contains at least one case and one control (--cc) Merge in new genotype data
15 The files extra.ped and extra.map contain new SNP data on the same set of individuals. These are SNPs taken from the region around rs , the best SNP in the previous WGAS analysis. We first examine these SNPs by themselves, and then merge them into the SNPs in that region from the original WGAS dataset. Purpose Examine the new SNPs, testing for association stratified by population Command plink --file extra --mh --within pop.cov --out strat2 Input extra.ped extra.map pop.cov New followup SNP genotyping Population label Output strat2.cmh CMH results for new genotypes Notes As evident in the result file strat2.cmh, there are some very strongly associated SNPs in this new set, in particular rs (with a P-value = ). We next merge this new data with the old. Purpose Command Input Output Notes Focus on region of association in WGAS data, and merge in new genotype data, creating a new fileset plink --bfile wgas3 --snp rs window merge extra.ped extra.map --make-bed --out followup wgas3.bed QC+ binary fileset wgas3.bim wgas3.fam extra.ped New genotype data (same individuals) extra.map followup.bed Merged fileset for region around top hit followup.bim followup.fam The --snp and --window commands extract a particular region from wgas3 first, and then merge in the new genotype data in extra.ped We can check that the associations remain the same after merging these two filesets: Purpose Command Input Re-run association to check integrity of file plink --bfile followup --mh --within pop.cov --out followup-cmh followup.bed Merged binary fileset for best region followup.bim followup.fam
16 Output followup-cmh.cmh CMH for top region in merged dataset Notes Now focusing on the top region, using --adjust is no longer appropriate Explore linkage disequilibrium Further analysis indicates four other SNPs that are associated and in LD with the primary SNP rs : rs rs rs rs rs Finally, we will extract just these five SNPs in another dataset Purpose Command Input Output Notes For convenience, focus on the 5 clumped SNPs for further analysis (and so create a new dataset containing just these) plink --bfile followup --snps rs ,rs ,rs ,rs ,rs make-bed --out followup2 followup.bed Merged binary fileset for best region followup.bim followup.fam followup2.bed Binary fileset of 5 SNPs in LD in top region followup2.bim followup2.fam Note that --snps (versus --snp) can take a comma-delimited list of SNPs The pairwise LD (r-squared) between these SNPs can also be calculated using PLINK. By default, only SNP pairs with high LD are shown in the output file. Purpose Report pairwise LD (r-squared) for SNPs in this region Command plink --bfile followup2 --r2 Input followup2.bed Merged binary fileset for best region followup2.bim followup2.fam Output plink.ld List of r-squared LD values (above threshold) Notes Add the --matrix option to get a 5 5 matrix of r-squared statistics 3. ZEBRA FINCH DATA Finally, we will examine some SNPs typed in a three generation zebra finch pedigree. The idea is to check the SNPs and individuals (i.e. quality control), then select unlinked SNPs for analysis in programs such as COANCESTRY and COLONY ( to investigate the pedigree and relatedness structure of the data.
17 HINT! PLINK is designed for humans - so unless you tell it otherwise, it will assume your genome has 23 chromosomes, with chr23 being the X chromosome! Use the option --dog for up to 39 chromosomes. The zebra finch genome has around 30 chromosomes. The zf.ped file is missing the column with the sex of the birds, so we use --allow-no-sex The zf.map file is in centimorgans not Morgans, so we use --cm Make bed file plink --file zf --cm --dog --make-bed --out zf How many SNPs are there? How many individuals are there? How many founders in the pedgiree? Is there phenotype information in the file? Test Hardy Weinberg equilibrium plink --bfile zf --cm --dog --hardy --out zf_hwe do any SNPs fail the HWE test? Test for parent-offspring mismatches plink --bfile zf --cm --dog --mendel --out zf_mendel do any individuals look like they're not related when we thought they were? Test for parent-offspring mismatches again, this time in a file with an error! plink --file zf_with_ped_error --cm --dog --mendel --out zf_mendel1 what is the incorrect pedigree link? missing rates per individual and locus plink --bfile zf --cm --dog --all --missing --out zf_miss_stat what is the maximum of missing genotypes for an individual? what is the maximum number of missing genotypes for a SNP? calculate allele frequencies in founders plink --bfile zf --cm --dog --freq --out zf_freq_stat what are the minimum and the maximum allele frequencies? delete individuals with more than 1% missing genotypes plink --bfile zf --cm --dog --mind out zf_highgeno how many individuals have been removed? Report pairwise LD (r-squared) for all SNPs plink --bfile zf --cm --dog --r2 --out zf_r2
18 how many SNPs are in perfect LD (r2 = 1)? Create a LD pruned set of markers (first step) plink --bfile zf --cm --dog --indep-pairwise out zf_prune how many SNPs are pruned from chromosome 1? how many SNPs are left in the dataset? Try to change the pruning parameters so that we end up with a datset of around 550 SNPs in 'zf_prune.prune.in' (hint: currently we have removed any SNP that has r-squared > 0.9 with another SNP within a 50-SNP window; this window is shifted across the chromosome 5 SNPs at a time). calculate genome-wide IBS sharing based on pruned marker list plink --bfile zf --cm --dog --extract zf_prune.prune.in --genome --out zf_ibs Extract the pruned SNPs into a file we might use in further analysis in other programs plink --bfile zf --allow-no-sex --cm --dog --extract zf_prune.prune.in --recode12 --out zf_pruned Can you extract the file so it shows genotypes as "12" rather than "1 2"?
BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017
BICF Nano Course: GWAS GWAS Workflow Development using PLINK Julia Kozlitina Julia.Kozlitina@UTSouthwestern.edu April 28, 2017 Getting started Open the Terminal (Search -> Applications -> Terminal), and
More informationPolymorphism and Variant Analysis Lab
Polymorphism and Variant Analysis Lab Arian Avalos PowerPoint by Casey Hanson Polymorphism and Variant Analysis Matt Hudson 2018 1 Exercise In this exercise, we will do the following:. 1. Gain familiarity
More informationGenetic Analysis. Page 1
Genetic Analysis Page 1 Genetic Analysis Objectives: 1) Set up Case-Control Association analysis and the Basic Genetics Workflow 2) Use JMP tools to interact with and explore results 3) Learn advanced
More informationStep-by-Step Guide to Basic Genetic Analysis
Step-by-Step Guide to Basic Genetic Analysis Page 1 Introduction This document shows you how to clean up your genetic data, assess its statistical properties and perform simple analyses such as case-control
More informationOptimising PLINK. Weronika Filinger. September 2, 2013
Optimising PLINK Weronika Filinger September 2, 2013 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2013 Abstract Every year the amount of genetic data increases greatly,
More informationGWAS Exercises 3 - GWAS with a Quantiative Trait
GWAS Exercises 3 - GWAS with a Quantiative Trait Peter Castaldi January 28, 2013 PLINK can also test for genetic associations with a quantitative trait (i.e. a continuous variable). In this exercise, we
More informationStep-by-Step Guide to Advanced Genetic Analysis
Step-by-Step Guide to Advanced Genetic Analysis Page 1 Introduction In the previous document, 1 we covered the standard genetic analyses available in JMP Genomics. Here, we cover the more advanced options
More informationAssociation Analysis of Sequence Data using PLINK/SEQ (PSEQ)
Association Analysis of Sequence Data using PLINK/SEQ (PSEQ) Copyright (c) 2018 Stanley Hooker, Biao Li, Di Zhang and Suzanne M. Leal Purpose PLINK/SEQ (PSEQ) is an open-source C/C++ library for working
More informationEmile R. Chimusa Division of Human Genetics Department of Pathology University of Cape Town
Advanced Genomic data manipulation and Quality Control with plink Emile R. Chimusa (emile.chimusa@uct.ac.za) Division of Human Genetics Department of Pathology University of Cape Town Outlines: 1.Introduction
More informationSmall example of use of OmicABEL
Small example of use of OmicABEL Yurii Aulchenko for the OmicABEL developers July 1, 2013 Contents 1 Important note on data format for OmicABEL 1 2 Outline of the example 2 3 Prepare the data for analysis
More informationStatistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.4 Graphical User Interface (GUI) Manual
Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.4 Graphical User Interface (GUI) Manual Department of Epidemiology and Biostatistics Wolstein Research Building 2103 Cornell Rd Case Western
More informationPackage lodgwas. R topics documented: November 30, Type Package
Type Package Package lodgwas November 30, 2015 Title Genome-Wide Association Analysis of a Biomarker Accounting for Limit of Detection Version 1.0-7 Date 2015-11-10 Author Ahmad Vaez, Ilja M. Nolte, Peter
More informationSOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie
SOLOMON: Parentage Analysis 1 Corresponding author: Mark Christie christim@science.oregonstate.edu SOLOMON: Parentage Analysis 2 Table of Contents: Installing SOLOMON on Windows/Linux Pg. 3 Installing
More informationFamily Based Association Tests Using the fbat package
Family Based Association Tests Using the fbat package Weiliang Qiu email: stwxq@channing.harvard.edu Ross Lazarus email: ross.lazarus@channing.harvard.edu Gregory Warnes email: warnes@bst.rochester.edu
More informationMQLS-XM Software Documentation
MQLS-XM Software Documentation Version 1.0 Timothy Thornton 1 and Mary Sara McPeek 2,3 Department of Biostatistics 1 The University of Washington Departments of Statistics 2 and Human Genetics 3 The University
More informationPRSice: Polygenic Risk Score software v1.22
PRSice: Polygenic Risk Score software v1.22 Jack Euesden jack.euesden@kcl.ac.uk Cathryn M. Lewis April 30, 2015 Paul F. O Reilly Contents 1 Overview 3 2 R packages required 3 3 Quickstart 3 3.1 Input Data...................................
More informationStep-by-Step Guide to Relatedness and Association Mapping Contents
Step-by-Step Guide to Relatedness and Association Mapping Contents OBJECTIVES... 2 INTRODUCTION... 2 RELATEDNESS MEASURES... 2 POPULATION STRUCTURE... 6 Q-K ASSOCIATION ANALYSIS... 10 K MATRIX COMPRESSION...
More informationGenetic type 1 Error Calculator (GEC)
Genetic type 1 Error Calculator (GEC) (Version 0.2) User Manual Miao-Xin Li Department of Psychiatry and State Key Laboratory for Cognitive and Brain Sciences; the Centre for Reproduction, Development
More informationREAP Software Documentation
REAP Software Documentation Version 1.2 Timothy Thornton 1 Department of Biostatistics 1 The University of Washington 1 REAP A C program for estimating kinship coefficients and IBD sharing probabilities
More informationGMDR User Manual Version 1.0
GMDR User Manual Version 1.0 Oct 30, 2011 1 GMDR is a free, open-source interaction analysis tool, aimed to perform gene-gene interaction with generalized multifactor dimensionality methods. GMDR is being
More informationQUICKTEST user guide
QUICKTEST user guide Toby Johnson Zoltán Kutalik December 11, 2008 for quicktest version 0.94 Copyright c 2008 Toby Johnson and Zoltán Kutalik Permission is granted to copy, distribute and/or modify this
More informationMAGMA manual (version 1.06)
MAGMA manual (version 1.06) TABLE OF CONTENTS OVERVIEW 3 QUICKSTART 4 ANNOTATION 6 OVERVIEW 6 RUNNING THE ANNOTATION 6 ADDING AN ANNOTATION WINDOW AROUND GENES 7 RESTRICTING THE ANNOTATION TO A SUBSET
More informationImporting and Merging Data Tutorial
Importing and Merging Data Tutorial Release 1.0 Golden Helix, Inc. February 17, 2012 Contents 1. Overview 2 2. Import Pedigree Data 4 3. Import Phenotypic Data 6 4. Import Genetic Data 8 5. Import and
More informationELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2
ELAI user manual Yongtao Guan Baylor College of Medicine Version 1.0 25 June 2015 Contents 1 Copyright 2 2 What ELAI Can Do 2 3 A simple example 2 4 Input file formats 3 4.1 Genotype file format....................................
More informationPRSice: Polygenic Risk Score software - Vignette
PRSice: Polygenic Risk Score software - Vignette Jack Euesden, Paul O Reilly March 22, 2016 1 The Polygenic Risk Score process PRSice ( precise ) implements a pipeline that has become standard in Polygenic
More informationToCatchAThief c ryan campbell & jenn coughlan 7/23/2018
ToCatchAThief c ryan campbell & jenn coughlan 7/23/2018 Welcome to the To Catch a Thief: With Data! walkthrough! https://bioconductor.org/packages/devel/ bioc/vignettes/snprelate/inst/doc/snprelatetutorial.html
More informationMAGMA manual (version 1.05)
MAGMA manual (version 1.05) TABLE OF CONTENTS OVERVIEW 3 QUICKSTART 4 ANNOTATION 6 OVERVIEW 6 RUNNING THE ANNOTATION 6 ADDING AN ANNOTATION WINDOW AROUND GENES 7 RESTRICTING THE ANNOTATION TO A SUBSET
More informationLinkage analysis with paramlink Session I: Introduction and pedigree drawing
Linkage analysis with paramlink Session I: Introduction and pedigree drawing In this session we will introduce R, and in particular the package paramlink. This package provides a complete environment for
More informationBOLT-LMM v1.2 User Manual
BOLT-LMM v1.2 User Manual Po-Ru Loh November 4, 2014 Contents 1 Overview 2 1.1 Citing BOLT-LMM.................................. 2 2 Installation 2 2.1 Downloading reference LD Scores..........................
More informationhaplo.score Score Tests for Association of Traits with Haplotypes when Linkage Phase is Ambiguous
haploscore Score Tests for Association of Traits with Haplotypes when Linkage Phase is Ambiguous Charles M Rowland, David E Tines, and Daniel J Schaid Mayo Clinic Rochester, MN E-mail contact: rowland@mayoedu
More informationPLATO User Guide. Current version: PLATO 2.1. Last modified: September Ritchie Lab, Geisinger Health System
PLATO User Guide Current version: PLATO 2.1 Last modified: September 2017 Ritchie Lab, Geisinger Health System Email: software@ritchielab.psu.edu 1 Table of Contents Overview... 3 PLATO Quick Reference...
More informationGSCAN GWAS Analysis Plan, v GSCAN GWAS ANALYSIS PLAN, Version 1.0 October 6, 2015
GSCAN GWAS Analysis Plan, v0.5 1 Overview GSCAN GWAS ANALYSIS PLAN, Version 1.0 October 6, 2015 There are three major components to this analysis plan. First, genome-wide genotypes must be on the correct
More informationENIGMA2 Protocol For Association Testing Using Related Subjects
ENIGMA2 Protocol For Association Testing Using Related Subjects By Miguel E. Rentería, Derrek Hibar, Alejandro Arias Vasquez, Jason Stein and Sarah Medland Before we start, you need to download and install
More informationPackage SimGbyE. July 20, 2009
Package SimGbyE July 20, 2009 Type Package Title Simulated case/control or survival data sets with genetic and environmental interactions. Author Melanie Wilson Maintainer Melanie
More informationQTX. Tutorial for. by Kim M.Chmielewicz Kenneth F. Manly. Software for genetic mapping of Mendelian markers and quantitative trait loci.
Tutorial for QTX by Kim M.Chmielewicz Kenneth F. Manly Software for genetic mapping of Mendelian markers and quantitative trait loci. Available in versions for Mac OS and Microsoft Windows. revised for
More informationSUGEN 8.6 Overview. Misa Graff, July 2017
SUGEN 8.6 Overview Misa Graff, July 2017 General Information By Ran Tao, https://sites.google.com/site/dragontaoran/home Website: http://dlin.web.unc.edu/software/sugen/ Standalone command-line software
More informationPBAP Version 1 User Manual
PBAP Version 1 User Manual Alejandro Q. Nato, Jr. 1, Nicola H. Chapman 1, Harkirat K. Sohi 1, Hiep D. Nguyen 1, Zoran Brkanac 2, and Ellen M. Wijsman 1,3,4,* 1 Division of Medical Genetics, Department
More informationFORMAT PED PHENO Software Documentation
FORMAT PED PHENO Software Documentation Version 1.0 Timothy Thornton 1 and Mary Sara McPeek 2,3 Department of Biostatistics 1 University of Washington Departments of Statistics 2 and Human Genetics 3 The
More informationRelease Notes. JMP Genomics. Version 4.0
JMP Genomics Version 4.0 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP. A Business Unit of SAS SAS Campus Drive
More informationPBAP Version 1 User Manual
PBAP Version 1 User Manual Alejandro Q. Nato, Jr. 1, Nicola H. Chapman 1, Harkirat K. Sohi 1, Hiep D. Nguyen 1, Zoran Brkanac 2, and Ellen M. Wijsman 1,3,4,* 1 Division of Medical Genetics, Department
More informationRecalling Genotypes with BEAGLECALL Tutorial
Recalling Genotypes with BEAGLECALL Tutorial Release 8.1.4 Golden Helix, Inc. June 24, 2014 Contents 1. Format and Confirm Data Quality 2 A. Exclude Non-Autosomal Markers......................................
More informationEMIM: Estimation of Maternal, Imprinting and interaction effects using Multinomial modelling
EMIM: Estimation of Maternal, Imprinting and interaction effects using Multinomial modelling 1 Contents 1 Introduction 4 1.1 Program information and citation...................... 4 2 Quick Start 5 3 Slow
More informationMAGA: Meta-Analysis of Gene-level Associations
MAGA: Meta-Analysis of Gene-level Associations SYNOPSIS MAGA [--sfile] [--chr] OPTIONS Option Default Description --sfile specification.txt Select a specification file --chr Select a chromosome DESCRIPTION
More informationLinkage analysis with paramlink Appendix: Running MERLIN from paramlink
Linkage analysis with paramlink Appendix: Running MERLIN from paramlink Magnus Dehli Vigeland 1 Introduction While multipoint analysis is not implemented in paramlink, a convenient wrapper for MERLIN (arguably
More informationHandling sam and vcf data, quality control
Handling sam and vcf data, quality control We continue with the earlier analyses and get some new data: cd ~/session_3 wget http://wasabiapp.org/vbox/data/session_4/file3.tgz tar xzf file3.tgz wget http://wasabiapp.org/vbox/data/session_4/file4.tgz
More information1. Summary statistics test_gwas. This file contains a set of 50K random SNPs of the Subjective Well-being GWAS of the Netherlands Twin Register
Quality Control for Genome-Wide Association Studies Bart Baselmans & Meike Bartels Boulder 2017 Setting up files and directories To perform a quality control protocol in a Genome-Wide Association Meta
More informationSNP HiTLink Manual. Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1
SNP HiTLink Manual Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1 1 Department of Neurology, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan 2 Dynacom Co., Ltd, Kanagawa,
More informationThe fgwas Package. Version 1.0. Pennsylvannia State University
The fgwas Package Version 1.0 Zhong Wang 1 and Jiahan Li 2 1 Department of Public Health Science, 2 Department of Statistics, Pennsylvannia State University 1. Introduction The fgwas Package (Functional
More informationBioinformatics - Homework 1 Q&A style
Bioinformatics - Homework 1 Q&A style Instructions: in this assignment you will test your understanding of basic GWAS concepts and GenABEL functions. The materials needed for the homework (two datasets
More informationCTL mapping in R. Danny Arends, Pjotr Prins, and Ritsert C. Jansen. University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1
CTL mapping in R Danny Arends, Pjotr Prins, and Ritsert C. Jansen University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1 First written: Oct 2011 Last modified: Jan 2018 Abstract: Tutorial
More informationPackage GWAF. March 12, 2015
Type Package Package GWAF March 12, 2015 Title Genome-Wide Association/Interaction Analysis and Rare Variant Analysis with Family Data Version 2.2 Date 2015-03-12 Author Ming-Huei Chen
More informationGCTA: a tool for Genome- wide Complex Trait Analysis
GCTA: a tool for Genome- wide Complex Trait Analysis Version 1.04, 13 Sep 2012 Overview GCTA (Genome- wide Complex Trait Analysis) is designed to estimate the proportion of phenotypic variance explained
More informationPackage SMAT. January 29, 2013
Package SMAT January 29, 2013 Type Package Title Scaled Multiple-phenotype Association Test Version 0.98 Date 2013-01-26 Author Lin Li, Ph.D.; Elizabeth D. Schifano, Ph.D. Maintainer Lin Li ;
More informationGWAsimulator: A rapid whole-genome simulation program
GWAsimulator: A rapid whole-genome simulation program Version 1.1 Chun Li and Mingyao Li September 21, 2007 (revised October 9, 2007) 1. Introduction...1 2. Download and compile the program...2 3. Input
More informationKGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li
KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual Miao-Xin Li, Jiang Li Department of Psychiatry Centre for Genomic Sciences Department
More informationSpotter Documentation Version 0.5, Released 4/12/2010
Spotter Documentation Version 0.5, Released 4/12/2010 Purpose Spotter is a program for delineating an association signal from a genome wide association study using features such as recombination rates,
More informationData input vignette Reading genotype data in snpstats
Data input vignette Reading genotype data in snpstats David Clayton November 9, 2017 Memory limitations Before we start it is important to emphasise that the SnpMatrix objects that hold genotype data in
More informationBEAGLECALL 1.0. Brian L. Browning Department of Medicine Division of Medical Genetics University of Washington. 15 November 2010
BEAGLECALL 1.0 Brian L. Browning Department of Medicine Division of Medical Genetics University of Washington 15 November 2010 BEAGLECALL 1.0 P a g e i Contents 1 Introduction... 1 1.1 Citing BEAGLECALL...
More informationJMP Genomics. Release Notes. Version 6.0
JMP Genomics Version 6.0 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP, A Business Unit of SAS SAS Campus Drive
More informationWHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide
STEPS Epi Info Training Guide Department of Chronic Diseases and Health Promotion World Health Organization 20 Avenue Appia, 1211 Geneva 27, Switzerland For further information: www.who.int/chp/steps WHO
More informationStatistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.4 User Reference Manual
Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.4 User Reference Manual Department of Epidemiology and Biostatistics Wolstein Research Building 2103 Cornell Rd Case Western Reserve University
More informationMACAU User Manual. Xiang Zhou. March 15, 2017
MACAU User Manual Xiang Zhou March 15, 2017 Contents 1 Introduction 2 1.1 What is MACAU...................................... 2 1.2 How to Cite MACAU................................... 2 1.3 The Model.........................................
More informationSKAT Package. Seunggeun (Shawn) Lee. July 21, 2017
SKAT Package Seunggeun (Shawn) Lee July 21, 2017 1 Overview SKAT package has functions to 1) test for associations between SNP sets and continuous/binary phenotypes with adjusting for covariates and kinships
More informationCorrectly Compute Complex Samples Statistics
SPSS Complex Samples 15.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample
More informationLD vignette Measures of linkage disequilibrium
LD vignette Measures of linkage disequilibrium David Clayton June 13, 2018 Calculating linkage disequilibrium statistics We shall first load some illustrative data. > data(ld.example) The data are drawn
More informationSEQGWAS: Integrative Analysis of SEQuencing and GWAS Data
SEQGWAS: Integrative Analysis of SEQuencing and GWAS Data SYNOPSIS SEQGWAS [--sfile] [--chr] OPTIONS Option Default Description --sfile specification.txt Select a specification file --chr Select a chromosome
More informationBOLT-LMM v2.0 User Manual
BOLT-LMM v2.0 User Manual Po-Ru Loh March 13, 2015 Contents 1 Overview 2 1.1 BOLT-LMM mixed model association testing.................... 2 1.2 BOLT-REML variance components analysis.....................
More informationGMMAT: Generalized linear Mixed Model Association Tests Version 0.7
GMMAT: Generalized linear Mixed Model Association Tests Version 0.7 Han Chen Department of Biostatistics Harvard T.H. Chan School of Public Health Email: hanchen@hsph.harvard.edu Matthew P. Conomos Department
More informationGenome-Wide Association Study Using
has to Department of Epidemiology UT MD Anderson Cancer Center Houston, TX April 2, 2008 Programmers Cross Training Outline has to 1 has 2 to 3 Going object-oriented: Outline has Brief introduction to
More informationBOLT-LMM v2.3 User Manual
BOLT-LMM v2.3 User Manual Po-Ru Loh August 1, 2017 Contents 1 Overview 2 1.1 BOLT-LMM mixed model association testing.................... 2 1.2 BOLT-REML variance components analysis.....................
More informationUser s Guide. Version 2.2. Semex Alliance, Ontario and Centre for Genetic Improvement of Livestock University of Guelph, Ontario
User s Guide Version 2.2 Semex Alliance, Ontario and Centre for Genetic Improvement of Livestock University of Guelph, Ontario Mehdi Sargolzaei, Jacques Chesnais and Flavio Schenkel Jan 2014 Disclaimer
More informationPopulation Genetics (52642)
Population Genetics (52642) Benny Yakir 1 Introduction In this course we will examine several topics that are related to population genetics. In each topic we will discuss briefly the biological background
More informationPackage Eagle. January 31, 2019
Type Package Package Eagle January 31, 2019 Title Multiple Locus Association Mapping on a Genome-Wide Scale Version 1.3.0 Maintainer Andrew George Author Andrew George [aut, cre],
More informationFVGWAS- 3.0 Manual. 1. Schematic overview of FVGWAS
FVGWAS- 3.0 Manual Hongtu Zhu @ UNC BIAS Chao Huang @ UNC BIAS Nov 8, 2015 More and more large- scale imaging genetic studies are being widely conducted to collect a rich set of imaging, genetic, and clinical
More informationEstimating Variance Components in MMAP
Last update: 6/1/2014 Estimating Variance Components in MMAP MMAP implements routines to estimate variance components within the mixed model. These estimates can be used for likelihood ratio tests to compare
More informationPackage QCEWAS. R topics documented: February 1, Type Package
Type Package Package QCEWAS February 1, 2019 Title Fast and Easy Quality Control of EWAS Results Files Version 1.2-2 Date 2019-02-01 Author Peter J. van der Most, Leanne K. Kupers, Ilja Nolte Maintainer
More informationBOLT-LMM v2.3.2 User Manual
BOLT-LMM v2.3.2 User Manual Po-Ru Loh March 10, 2018 Contents 1 Overview 2 1.1 BOLT-LMM mixed model association testing.................... 3 1.2 BOLT-REML variance components analysis.....................
More informationForensic Resource/Reference On Genetics knowledge base: FROG-kb User s Manual. Updated June, 2017
Forensic Resource/Reference On Genetics knowledge base: FROG-kb User s Manual Updated June, 2017 Table of Contents 1. Introduction... 1 2. Accessing FROG-kb Home Page and Features... 1 3. Home Page and
More informationAffymetrix Genotyping Console 4.2 Release Notes (For research use only. Not for use in diagnostic procedures.)
Affymetrix Genotyping Console 4.2 Release Notes (For research use only. Not for use in diagnostic procedures.) Genotyping Console 4.2 includes the following changes and enhancements: 1. Edit Calls within
More informationUser Manual for GIGI v1.06.1
1 User Manual for GIGI v1.06.1 Author: Charles Y K Cheung [cykc@uw.edu] Ellen M Wijsman [wijsman@uw.edu] Department of Biostatistics University of Washington Last Modified on 1/31/2015 2 Contents Introduction...
More informationPackage RVS0.0 Jiafen Gong, Zeynep Baskurt, Andriy Derkach, Angelina Pesevski and Lisa Strug October, 2016
Package RVS0.0 Jiafen Gong, Zeynep Baskurt, Andriy Derkach, Angelina Pesevski and Lisa Strug October, 2016 The Robust Variance Score (RVS) test is designed for association analysis for next generation
More informationSPSS TRAINING SPSS VIEWS
SPSS TRAINING SPSS VIEWS Dataset Data file Data View o Full data set, structured same as excel (variable = column name, row = record) Variable View o Provides details for each variable (column in Data
More informationPackage ukbtools. February 5, 2018
Version 0.10.1 Title Manipulate and Explore UK Biobank Data Package ukbtools February 5, 2018 Maintainer Ken Hanscombe A set of tools to create a UK Biobank
More informationDevelopment of linkage map using Mapmaker/Exp3.0
Development of linkage map using Mapmaker/Exp3.0 Balram Marathi 1, A. K. Singh 2, Rajender Parsad 3 and V.K. Gupta 3 1 Institute of Biotechnology, Acharya N. G. Ranga Agricultural University, Rajendranagar,
More informationFrequency Tables. Chapter 500. Introduction. Frequency Tables. Types of Categorical Variables. Data Structure. Missing Values
Chapter 500 Introduction This procedure produces tables of frequency counts and percentages for categorical and continuous variables. This procedure serves as a summary reporting tool and is often used
More informationMaximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University
Maximizing Statistical Interactions Part II: Database Issues Provided by: The Biostatistics Collaboration Center (BCC) at Northwestern University While your data tables or spreadsheets may look good to
More informationTable of Contents. 2. Files Input File Formats Output Files Export Options Auxiliary Input Files
GEVALT Documentation Table of Contents 1. Using GEVALT Loading a Dataset Saving and Loading Status Data Quality Checks LD Display Blocks and Haplotypes Phased Genotypes Individual Statistics Stampa Tagger
More informationBioBin User Guide Current version: BioBin 2.3
BioBin User Guide Current version: BioBin 2.3 Last modified: April 2017 Ritchie Lab Geisinger Health System URL: http://www.ritchielab.com/software/biobin-download Email: software@ritchielab.psu.edu 1
More informationSPSS. (Statistical Packages for the Social Sciences)
Inger Persson SPSS (Statistical Packages for the Social Sciences) SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform basic statistical calculations in SPSS.
More informationQuality control of array genotyping data with argyle Andrew P Morgan
Quality control of array genotyping data with argyle Andrew P Morgan 2015-10-08 Introduction Proper quality control of array genotypes is an important prerequisite to further analysis. Genotype quality
More informationMAN Package for pedigree analysis. Contents.
1 MAN Package for pedigree analysis. Contents. Introduction 5 1. Operations with pedigree data. 5 1.1. Data input options. 5 1.1.1. Import from file. 5 1.1.2. Manual input. 7 1.2. Drawing of the pedigree
More informationComputer lab 2 Course: Introduction to R for Biologists
Computer lab 2 Course: Introduction to R for Biologists April 23, 2012 1 Scripting As you have seen, you often want to run a sequence of commands several times, perhaps with small changes. An efficient
More informationGenomics tools: making quickly impressive outputs
Genomics tools: making quickly impressive outputs Libor Mořkovský, Václav Janoušek, Anastassiya Zidkova, Anna Přistoupilová, Filip Sedlák http://ngs-course.readthedocs.org/en/praha-january-2017/ Genome
More informationIntroduction to Hail. Cotton Seed, Technical Lead Tim Poterba, Software Engineer Hail Team, Neale Lab Broad Institute and MGH
Introduction to Hail Cotton Seed, Technical Lead Tim Poterba, Software Engineer Hail Team, Neale Lab Broad Institute and MGH Why Hail? Genetic data is becoming absolutely massive Broad Genomics, by the
More informationCorrelation. January 12, 2019
Correlation January 12, 2019 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order
More informationTutorial. Identification of Variants Using GATK. Sample to Insight. November 21, 2017
Identification of Variants Using GATK November 21, 2017 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com AdvancedGenomicsSupport@qiagen.com
More informationChapter One: Getting Started With IBM SPSS for Windows
Chapter One: Getting Started With IBM SPSS for Windows Using Windows The Windows start-up screen should look something like Figure 1-1. Several standard desktop icons will always appear on start up. Note
More informationThe fgwas software. Version 1.0. Pennsylvannia State University
The fgwas software Version 1.0 Zhong Wang 1 and Jiahan Li 2 1 Department of Public Health Science, 2 Department of Statistics, Pennsylvannia State University 1. Introduction Genome-wide association studies
More informationEstimating. Local Ancestry in admixed Populations (LAMP)
Estimating Local Ancestry in admixed Populations (LAMP) QIAN ZHANG 572 6/05/2014 Outline 1) Sketch Method 2) Algorithm 3) Simulated Data: Accuracy Varying Pop1-Pop2 Ancestries r 2 pruning threshold Number
More informationThe Lander-Green Algorithm in Practice. Biostatistics 666
The Lander-Green Algorithm in Practice Biostatistics 666 Last Lecture: Lander-Green Algorithm More general definition for I, the "IBD vector" Probability of genotypes given IBD vector Transition probabilities
More information