Package SimGbyE. July 20, 2009
|
|
- Lydia Barnett
- 5 years ago
- Views:
Transcription
1 Package SimGbyE July 20, 2009 Type Package Title Simulated case/control or survival data sets with genetic and environmental interactions. Author Melanie Wilson Maintainer Melanie Wilson Description The functions in this package create simulated case/control or survival data sets with one or more of the following assumed effects: genetic main effects (G), environmental main effects (E), Gene by Gene interactions (GbyG), Gene by environment interactions (GbyE). The assumed genetic one and two locus models of epistasis are chosen randomly from a set of models chosen from Li and Reich. Then given a set of assumed coefficients on the effects mentioned above, an outcome variable is simulated (case/control or survival) based on a set user specified distribution parameters. Version License Unlimited Depends R topics documented: gene.data sim sim.gbye sim.geno Index 9 1
2 2 sim gene.data Example Genetic Data Description Genotypes for 155 individuals at 5 different SNPs. Usage data(gene.data) Format A data frame with 155 observations on 5 SNPs. Details A set of genetic data on 155 individuals that is taken as input in the function sim.gbye and used to create a simulated survival data set and a case/control data set. Examples data(gene.data) sim Wrapper function to simulate genotypes and output Description This wrapper function is used to simulate genotypes with sim.geno.r for a set of chromo. and positions and then simulate output with sim.gbye.r. Usage sim<-function(chromo.pos,popn="ceu",n=100, outpath="./", exepath="/home/fac/iversen/bin/linux/", mappath="/local/recombrates/", happath="/local/hapmap.phased/", perlpath="/proj/design/simstudy/rcode/", rr1=1.0,rr2=1.0,minmaf=0.05, eniv,family="case.control",pop.char, G=0,E=0,GbyG=0,GbyE=0, dist.coef,outfile=null)
3 sim 3 Arguments chromo.pos Vector of three values (1) Integer in c(1:22) giving the specific chromo. to sample from (2) Nucleotide position of start of region on the given chromo and (3) Nucleotide position of end of region on the given chromo. (Note: chr17: is the 40KB region centered on TP53) popn n outpath exepath mappath happath perlpath rr1 rr2 minmaf The population in HapMap to simulate the genetic samples from: must be one of ("CEU","YRI","JPT+CHB") Number of individuals to simulate the genetic data on Path to directory where outfiles will be deposited Path to hapgen executable file Path to oxford genetic map Path to hapmap phased genotype data Path to PERL script Heterozygote relative risk (should be 1.0) for population samples Rare homozygote relative risk (should be 1.0) for population samples Smallest minor allele freq for disease locus eniv matrix of environmental data of dim (m X q) family pop.char G E Value GbyG GbyE coef.dist outfile Character vector giving the family type where "case.control" will assume a binomial family and "survival" assumes a weibull family If family="case.control" the user must specify the fraction of cases in the simulated data set and if family="survival" the user must specify the median survival time, and scale and shape parameters of the baseline survival function respectively. Integer giving the number of genetic main effects Integer giving the number of eniv. main effects Interger giving the number of GXG interactions Interger giving the number of GXE interactions Discrete distribution on the coefficients of each effect specified as a list of two vectors: the first vector giving the possible values for the coefficients and the second vector indicating the support for each of these values. File path for the simulation files once they are created This function outputs a list of the following values if a file "outfile" is not given: model A list of two matrices, the first is a matrix of dimension (2 X (G+E+GbyG+GbyE)) that specifics the variables (genetic or environmental) that are involved in each effect. Where (1) Genetic main effect: 1 SNP is specified in the first row and "NA" in the second row, (2) Eniv. main effect: a eniv. variable in the first row and a "NA" in the second row, (3) GbyG: two SNPs in row one and two, and (4) GbyE a SNP in the first row and a eniv. variable in the second row. The SNP variables are specified by their rs numbers. Each SNP genotype is normally
4 4 sim design coef data parametrized as a 0 for the rare hom. genotype (aa), 1 for the het. genotype (aa) and 2 for the common hom. genotype (AA). If the SNP rs number is specified with a ".c" after the name then the reverse parameterization is used (0 for AA, 1 for aa and 2 for aa). The second matrix is of dimension (9 X (G+E+GbyG+GbyE)) and is a matrix specifying the genetic epistasis models of the genetic markers in the case of the genetic main effects, GbyG interactions and GbyE interactions. Each model is specified using a vector of length 9 that is the "penetrance" at each possible set of genotypes for two makers (aa/bb,aa/bb,aa/bb,aa/bb,aa/bb,aa/bb,aa/bb,aa/bb,aa/bb) this is the same model specification as in pages of Li and Reich. For the (1) Environmental main effects no model effect specification is given, for (2) the GbyE effects, the model effect specification that is given is the specification for the genetic main effect that is interacting with the given environmental variable. The design matrix created based on the models described above and used for the simulation A vector of beta coefficients for each of the effects in the simulation (of length G+E+GbyG+GbyE) The simulated data set If a file is given the above items are saved to the files outfile.model.out, outfile.design.out, outfile.data.out, and outfile.coef.out respectively. Author(s) Melanie Wilson <maw27@stat.duke.edu> References Li W and Reich J (1999). A complete enumeration and classification of two-locus disease models. Human Heredity; 50: Examples ##Simulated Genotypes for TP53 Region chromo = 17 pos1 = pos2 = chromo.pos = c(chromo,pos1,pos2) popn = "CEU" n = 100 outpath = "./" exepath = "/home/fac/iversen/bin/linux/" mappath = "/local/recombrates/" happath = "/local/hapmap.phased/" ##make the environment data ##AGE age = rnorm(n,mean=55,sd=11) eniv = cbind(age) colnames(eniv) = c("age")
5 sim.gbye 5 eniv = data.frame(eniv) pop.char =.4coef.dist = list(log(c(1.25,1.5,1.75)),c(1/3,1/3,1/3)) p53.assoc = sim(chromo.pos,popn=popn,n=n,outpath=outpath, exepath=exepath,mappath=mappath,happath=happath, eniv=eniv,family="case.control",pop.char=pop.char, G = 3,E=1,GbyG=0,GbyE=0,dist.coef=coef.dist) sim.gbye Simulation Function for case/control or Survival Data sets with genetic and environmental interactions Description Usage This function takes genetic data and environmental data for a set of individuals and simulations either a case/control data set or a survival data set that assumes GbyG and GbyE interactions. genetic.sim = function(gene,eniv,family,n,pop.char, G=0,E=0,GbyG=0,GbyE=0, dist.coef,outpath="./") Arguments gene matrix of genetic data of dim (m X p) where m >= n (the total number of individuals wanted in the final simulated data set) eniv matrix of environmental data of dim (m X q) family n pop.char G E GbyG GbyE coef.dist outpath Character vector giving the family type where "case.control" will assume a binomial family and "survival" assumes a weibull family The total number of individuals requested in the final simulated data set where n <= m If family="case.control" the user must specify the fraction of cases in the simulated data set and if family="survival" the user must specify the median survival time, and scale and shape parameters of the baseline survival function respectively. Integer giving the number of genetic main effects Integer giving the number of eniv. main effects Interger giving the number of GXG interactions Interger giving the number of GXE interactions Discrete distribution on the coefficients of each effect specified as a list of two vectors: the first vector giving the possible values for the coefficients and the second vector indicating the support for each of these values. Path that the genetic data is written to in the sim.geno function
6 6 sim.gbye Value This function outputs a list of the following values: model design coef data A list of two matrices, the first is a matrix of dimension (2 X (G+E+GbyG+GbyE)) that specifics the variables (genetic or environmental) that are involved in each effect. Where (1) Genetic main effect: 1 SNP is specified in the first row and "NA" in the second row, (2) Eniv. main effect: a eniv. variable in the first row and a "NA" in the second row, (3) GbyG: two SNPs in row one and two, and (4) GbyE a SNP in the first row and a eniv. variable in the second row. The SNP variables are specified by their rs numbers. Each SNP genotype is normally parametrized as a 0 for the rare hom. genotype (aa), 1 for the het. genotype (aa) and 2 for the common hom. genotype (AA). If the SNP rs number is specified with a ".c" after the name then the reverse parameterization is used (0 for AA, 1 for aa and 2 for aa). The second matrix is of dimension (9 X (G+E+GbyG+GbyE)) and is a matrix specifying the genetic epistasis models of the genetic markers in the case of the genetic main effects, GbyG interactions and GbyE interactions. Each model is specified using a vector of length 9 that is the "penetrance" at each possible set of genotypes for two makers (aa/bb,aa/bb,aa/bb,aa/bb,aa/bb,aa/bb,aa/bb,aa/bb,aa/bb) this is the same model specification as in pages of Li and Reich. For the (1) Environmental main effects no model effect specification is given, for (2) the GbyE effects, the model effect specification that is given is the specification for the genetic main effect that is interacting with the given environmental variable. The design matrix created based on the models described above and used for the simulation A vector of beta coefficients for each of the effects in the simulation (of length G+E+GbyG+GbyE) The simulated data set Author(s) Melanie Wilson <maw27@stat.duke.edu> References Li W and Reich J (1999). A complete enumeration and classification of two-locus disease models. Human Heredity; 50: Examples ##Load the genetic data (155 individuals and 5 SNPs) data(gene.data) gene = gene.data ##make the environment data ##AGE age = rnorm(155,mean=55,sd=11) eniv = cbind(age) colnames(eniv) = c("age")
7 sim.geno 7 eniv = data.frame(eniv) ##Case control simulation parameters family="case.control" n = 155 pop.char =.4 coef.dist = list(log(c(1.25,1.5,1.75)),c(1/3,1/3,1/3)) ##create one simulation with only 2 genetic main effects and 1 environmental effect sim = genetic.sim(gene=gene,eniv=eniv,family=family,n=n,pop.char=pop.char, G=2,E=1,GbyG=0,GbyE=0,dist.coef=coef.dist) sim.geno Simulation Function for Genetic data Description This function creates genetic data for a population using R binary versions of hapmap and recomb map input files. Usage sim.geno<-function(regnum=1,chromo=17,pos1= ,pos2= ,popn="ceu",n=100, outpath="./", exepath="/home/fac/iversen/bin/linux/", mappath="/local/recombrates/", happath="/local/hapmap.phased/", rr1=1.0,rr2=1.0,minmaf=0.05) Arguments regnum chromo pos1 pos2 popn n outpath exepath mappath happath rr1 rr2 minmaf Integer specifying the index of the chromo. region Integer in c(1:22) giving the specific chromo. to sample from Nucleotide position of start of region on the given chromo. Nucleotide position of end of region on the given chromo. (Note: chr17: is the 40KB region centered on TP53) The population in HapMap to simulate the genetic samples from: must be one of ("CEU","YRI","JPT+CHB") Number of individuals to simulate the genetic data on Path to directory where outfiles will be deposited Path to hapgen executable file Path to oxford genetic map Path to hapmap phased genotype data Heterozygote relative risk (should be 1.0) for population samples Rare homozygote relative risk (should be 1.0) for population samples Smallest minor allele freq for disease locus
8 8 sim.geno Details Value Function Calls hapgen to simulate one replicate from a specified chromosomal region given data from one of the HapMap II populations. The code generates samples of genotypes in a contiguous range of DNA using hapmap release 21 (NCBI build 35) data (it drops all evidently monomorphic variants). The default (rr1=1.0, rr2=1.0) is to generate population-based samples. If one or the other of rr1 and rr2 are >1.0, it (hapgen) will randomly choose a variant with MAF in (minmaf,0.5) as the disease allele and generate a case-control sample (I ve forced n.case=n.control=(n/2)). So this code can be used for generating main-effects simulations that have 1 or 0 associated SNPs per region generated. The position range may encompase an entire chromosome or simply bracket a gene or locus of interest. Generating a candidate gene/pathway sample or a genome-wide sample will require multiple calls to this function, one for each independent chromosomal range (if there is ld between variants in two ranges, they should be combined into one). This function outputs a list of the following values: geno An n*p matrix of genotypes coded as the number of minor alleles present (0, 1 or 2) where p is the number of non-monomorphic SNPs in the chosen chromo. range y legend An n-vector of zeros and ones denoting case/control status (1/0) even when both rr1 and rr2 are set to 1.0 (in that case, the case/control designation is meaningless) The hapmap legend file for the in-range SNPs, columns are rs number, chromosomal position, allele 0 and allele 1 assoc Data from the hapgen.aux file: when one or more of rr1 and rr2 are not 1.0: rows are index of the disease SNP in the geno matrix), disease SNP position, minor allele (allele 0 or 1), estimated MAF, assumed rr s (specified rr1 and rr2) Author(s) Melanie Wilson <maw27@stat.duke.edu> Examples ##Simulated Genotypes for TP53 Region chromo = 17 pos1 = pos2 = popn = "CEU" n = 100 outpath = "./" exepath = "/home/fac/iversen/bin/linux/" mappath = "/local/recombrates/" happath = "/local/hapmap.phased/" p53.null = sim.geno(n=10000)
9 Index Topic datasets gene.data, 1 Topic models sim, 2 sim.gbye, 5 sim.geno, 7 gene.data, 1 sim, 2 sim.gbye, 2, 5 sim.geno, 7 9
Association Analysis of Sequence Data using PLINK/SEQ (PSEQ)
Association Analysis of Sequence Data using PLINK/SEQ (PSEQ) Copyright (c) 2018 Stanley Hooker, Biao Li, Di Zhang and Suzanne M. Leal Purpose PLINK/SEQ (PSEQ) is an open-source C/C++ library for working
More informationBICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017
BICF Nano Course: GWAS GWAS Workflow Development using PLINK Julia Kozlitina Julia.Kozlitina@UTSouthwestern.edu April 28, 2017 Getting started Open the Terminal (Search -> Applications -> Terminal), and
More informationBioBin User Guide Current version: BioBin 2.3
BioBin User Guide Current version: BioBin 2.3 Last modified: April 2017 Ritchie Lab Geisinger Health System URL: http://www.ritchielab.com/software/biobin-download Email: software@ritchielab.psu.edu 1
More informationSNP HiTLink Manual. Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1
SNP HiTLink Manual Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1 1 Department of Neurology, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan 2 Dynacom Co., Ltd, Kanagawa,
More informationPackage lodgwas. R topics documented: November 30, Type Package
Type Package Package lodgwas November 30, 2015 Title Genome-Wide Association Analysis of a Biomarker Accounting for Limit of Detection Version 1.0-7 Date 2015-11-10 Author Ahmad Vaez, Ilja M. Nolte, Peter
More informationStep-by-Step Guide to Basic Genetic Analysis
Step-by-Step Guide to Basic Genetic Analysis Page 1 Introduction This document shows you how to clean up your genetic data, assess its statistical properties and perform simple analyses such as case-control
More informationSpotter Documentation Version 0.5, Released 4/12/2010
Spotter Documentation Version 0.5, Released 4/12/2010 Purpose Spotter is a program for delineating an association signal from a genome wide association study using features such as recombination rates,
More informationPackage GEM. R topics documented: January 31, Type Package
Type Package Package GEM January 31, 2018 Title GEM: fast association study for the interplay of Gene, Environment and Methylation Version 1.5.0 Date 2015-12-05 Author Hong Pan, Joanna D Holbrook, Neerja
More informationPackage SMAT. January 29, 2013
Package SMAT January 29, 2013 Type Package Title Scaled Multiple-phenotype Association Test Version 0.98 Date 2013-01-26 Author Lin Li, Ph.D.; Elizabeth D. Schifano, Ph.D. Maintainer Lin Li ;
More informationPackage REGENT. R topics documented: August 19, 2015
Package REGENT August 19, 2015 Title Risk Estimation for Genetic and Environmental Traits Version 1.0.6 Date 2015-08-18 Author Daniel J.M. Crouch, Graham H.M. Goddard & Cathryn M. Lewis Maintainer Daniel
More informationMAGA: Meta-Analysis of Gene-level Associations
MAGA: Meta-Analysis of Gene-level Associations SYNOPSIS MAGA [--sfile] [--chr] OPTIONS Option Default Description --sfile specification.txt Select a specification file --chr Select a chromosome DESCRIPTION
More informationKGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li
KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual Miao-Xin Li, Jiang Li Department of Psychiatry Centre for Genomic Sciences Department
More informationPolymorphism and Variant Analysis Lab
Polymorphism and Variant Analysis Lab Arian Avalos PowerPoint by Casey Hanson Polymorphism and Variant Analysis Matt Hudson 2018 1 Exercise In this exercise, we will do the following:. 1. Gain familiarity
More informationSEQGWAS: Integrative Analysis of SEQuencing and GWAS Data
SEQGWAS: Integrative Analysis of SEQuencing and GWAS Data SYNOPSIS SEQGWAS [--sfile] [--chr] OPTIONS Option Default Description --sfile specification.txt Select a specification file --chr Select a chromosome
More informationPackage EBglmnet. January 30, 2016
Type Package Package EBglmnet January 30, 2016 Title Empirical Bayesian Lasso and Elastic Net Methods for Generalized Linear Models Version 4.1 Date 2016-01-15 Author Anhui Huang, Dianting Liu Maintainer
More informationToCatchAThief c ryan campbell & jenn coughlan 7/23/2018
ToCatchAThief c ryan campbell & jenn coughlan 7/23/2018 Welcome to the To Catch a Thief: With Data! walkthrough! https://bioconductor.org/packages/devel/ bioc/vignettes/snprelate/inst/doc/snprelatetutorial.html
More informationPackage GWAF. March 12, 2015
Type Package Package GWAF March 12, 2015 Title Genome-Wide Association/Interaction Analysis and Rare Variant Analysis with Family Data Version 2.2 Date 2015-03-12 Author Ming-Huei Chen
More informationGWAsimulator: A rapid whole-genome simulation program
GWAsimulator: A rapid whole-genome simulation program Version 1.1 Chun Li and Mingyao Li September 21, 2007 (revised October 9, 2007) 1. Introduction...1 2. Download and compile the program...2 3. Input
More informationBayesian Multiple QTL Mapping
Bayesian Multiple QTL Mapping Samprit Banerjee, Brian S. Yandell, Nengjun Yi April 28, 2006 1 Overview Bayesian multiple mapping of QTL library R/bmqtl provides Bayesian analysis of multiple quantitative
More informationPRSice: Polygenic Risk Score software - Vignette
PRSice: Polygenic Risk Score software - Vignette Jack Euesden, Paul O Reilly March 22, 2016 1 The Polygenic Risk Score process PRSice ( precise ) implements a pipeline that has become standard in Polygenic
More informationBGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14)
BGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14) Genome Informatics (Part 1) https://bioboot.github.io/bggn213_f17/lectures/#14 Dr. Barry Grant Nov 2017 Overview: The purpose of this lab session is
More informationPackage RVS0.0 Jiafen Gong, Zeynep Baskurt, Andriy Derkach, Angelina Pesevski and Lisa Strug October, 2016
Package RVS0.0 Jiafen Gong, Zeynep Baskurt, Andriy Derkach, Angelina Pesevski and Lisa Strug October, 2016 The Robust Variance Score (RVS) test is designed for association analysis for next generation
More informationImporting and Merging Data Tutorial
Importing and Merging Data Tutorial Release 1.0 Golden Helix, Inc. February 17, 2012 Contents 1. Overview 2 2. Import Pedigree Data 4 3. Import Phenotypic Data 6 4. Import Genetic Data 8 5. Import and
More informationEstimating. Local Ancestry in admixed Populations (LAMP)
Estimating Local Ancestry in admixed Populations (LAMP) QIAN ZHANG 572 6/05/2014 Outline 1) Sketch Method 2) Algorithm 3) Simulated Data: Accuracy Varying Pop1-Pop2 Ancestries r 2 pruning threshold Number
More informationGenetic type 1 Error Calculator (GEC)
Genetic type 1 Error Calculator (GEC) (Version 0.2) User Manual Miao-Xin Li Department of Psychiatry and State Key Laboratory for Cognitive and Brain Sciences; the Centre for Reproduction, Development
More informationNetwork Based Models For Analysis of SNPs Yalta Opt
Outline Network Based Models For Analysis of Yalta Optimization Conference 2010 Network Science Zeynep Ertem*, Sergiy Butenko*, Clare Gill** *Department of Industrial and Systems Engineering, **Department
More informationPackage inversion. R topics documented: July 18, Type Package. Title Inversions in genotype data. Version
Package inversion July 18, 2013 Type Package Title Inversions in genotype data Version 1.8.0 Date 2011-05-12 Author Alejandro Caceres Maintainer Package to find genetic inversions in genotype (SNP array)
More informationRAD Population Genomics Programs Paul Hohenlohe 6/2014
RAD Population Genomics Programs Paul Hohenlohe (hohenlohe@uidaho.edu) 6/2014 I. Overview These programs are designed to conduct population genomic analysis on RAD sequencing data. They were designed for
More informationPackage globalgsa. February 19, 2015
Type Package Package globalgsa February 19, 2015 Title Global -Set Analysis for Association Studies. Version 1.0 Date 2013-10-22 Author Natalia Vilor, M.Luz Calle Maintainer Natalia Vilor
More informationThe Lander-Green Algorithm in Practice. Biostatistics 666
The Lander-Green Algorithm in Practice Biostatistics 666 Last Lecture: Lander-Green Algorithm More general definition for I, the "IBD vector" Probability of genotypes given IBD vector Transition probabilities
More informationOptimising PLINK. Weronika Filinger. September 2, 2013
Optimising PLINK Weronika Filinger September 2, 2013 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2013 Abstract Every year the amount of genetic data increases greatly,
More informationGenetic Analysis. Page 1
Genetic Analysis Page 1 Genetic Analysis Objectives: 1) Set up Case-Control Association analysis and the Basic Genetics Workflow 2) Use JMP tools to interact with and explore results 3) Learn advanced
More informationMACAU User Manual. Xiang Zhou. March 15, 2017
MACAU User Manual Xiang Zhou March 15, 2017 Contents 1 Introduction 2 1.1 What is MACAU...................................... 2 1.2 How to Cite MACAU................................... 2 1.3 The Model.........................................
More informationsurvsnp: Power and Sample Size Calculations for SNP Association Studies with Censored Time to Event Outcomes
survsnp: Power and Sample Size Calculations for SNP Association Studies with Censored Time to Event Outcomes Kouros Owzar Zhiguo Li Nancy Cox Sin-Ho Jung Chanhee Yi June 29, 2016 1 Introduction This vignette
More informationGBS Bioinformatics Pipeline(s) Overview
GBS Bioinformatics Pipeline(s) Overview Getting from sequence files to genotypes. Pipeline Coding: Ed Buckler Jeff Glaubitz James Harriman Presentation: Terry Casstevens With supporting information from
More informationPackage RobustSNP. January 1, 2011
Package RobustSNP January 1, 2011 Type Package Title Robust SNP association tests under different genetic models, allowing for covariates Version 1.0 Depends mvtnorm,car,snpmatrix Date 2010-07-11 Author
More informationcalled Hadoop Distribution file System (HDFS). HDFS is designed to run on clusters of commodity hardware and is capable of handling large files. A fil
Parallel Genome-Wide Analysis With Central And Graphic Processing Units Muhamad Fitra Kacamarga mkacamarga@binus.edu James W. Baurley baurley@binus.edu Bens Pardamean bpardamean@binus.edu Abstract The
More informationTutorial. Identification of Variants Using GATK. Sample to Insight. November 21, 2017
Identification of Variants Using GATK November 21, 2017 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com AdvancedGenomicsSupport@qiagen.com
More informationDownload PLINK from
PLINK tutorial Amended from two tutorials that the PLINK author Shaun Purcell wrote, see http://pngu.mgh.harvard.edu/~purcell/plink/tutorial.shtml and 'Teaching materials and example dataset' at http://pngu.mgh.harvard.edu/~purcell/plink/res.shtml
More informationIntro to NGS Tutorial
Intro to NGS Tutorial Release 8.6.0 Golden Helix, Inc. October 31, 2016 Contents 1. Overview 2 2. Import Variants and Quality Fields 3 3. Quality Filters 10 Generate Alternate Read Ratio.........................................
More informationGWAS Exercises 3 - GWAS with a Quantiative Trait
GWAS Exercises 3 - GWAS with a Quantiative Trait Peter Castaldi January 28, 2013 PLINK can also test for genetic associations with a quantitative trait (i.e. a continuous variable). In this exercise, we
More informationThe LDheatmap Package
The LDheatmap Package May 6, 2006 Title Graphical display of pairwise linkage disequilibria between SNPs Version 0.2-1 Author Ji-Hyung Shin , Sigal Blay , Nicholas Lewin-Koh
More informationVignette for the package rehh (version 2+) Mathieu Gautier, Alexander Klassmann and Renaud Vitalis 24/10/2016
Vignette for the package rehh (version 2+) Mathieu Gautier, Alexander Klassmann and Renaud Vitalis 24/10/2016 Contents 1 Input Files 2 1.1 Haplotype data file..........................................
More informationAssignment 7: Single-cell genomics. Bio /02/2018
Assignment 7: Single-cell genomics Bio5488 03/02/2018 Assignment 7: Single-cell genomics Input Genotypes called from several exome-sequencing datasets derived from either bulk or small pools of cells (VCF
More informationConvert Dosages to Genotypes Author: Autumn Laughbaum, Golden Helix, Inc.
Convert Dosages to Genotypes Author: Autumn Laughbaum, Golden Helix, Inc. Overview This script converts allelic dosage values to genotypes based on user-specified thresholds. The dosage data may be in
More informationTutorial on gene-c ancestry es-ma-on: How to use LASER. Chaolong Wang Sequence Analysis Workshop June University of Michigan
Tutorial on gene-c ancestry es-ma-on: How to use LASER Chaolong Wang Sequence Analysis Workshop June 2014 @ University of Michigan LASER: Loca-ng Ancestry from SEquence Reads Main func:ons of the so
More informationPBAP Version 1 User Manual
PBAP Version 1 User Manual Alejandro Q. Nato, Jr. 1, Nicola H. Chapman 1, Harkirat K. Sohi 1, Hiep D. Nguyen 1, Zoran Brkanac 2, and Ellen M. Wijsman 1,3,4,* 1 Division of Medical Genetics, Department
More informationPackage coloc. February 24, 2018
Type Package Package coloc February 24, 2018 Imports ggplot2, snpstats, BMA, reshape, methods, flashclust, speedglm Suggests knitr, testthat Title Colocalisation Tests of Two Genetic Traits Version 3.1
More information4.1. Access the internet and log on to the UCSC Genome Bioinformatics Web Page (Figure 1-
1. PURPOSE To provide instructions for finding rs Numbers (SNP database ID numbers) and increasing sequence length by utilizing the UCSC Genome Bioinformatics Database. 2. MATERIALS 2.1. Sequence Information
More informationPackage ridge. R topics documented: February 15, Title Ridge Regression with automatic selection of the penalty parameter. Version 2.
Package ridge February 15, 2013 Title Ridge Regression with automatic selection of the penalty parameter Version 2.1-2 Date 2012-25-09 Author Erika Cule Linear and logistic ridge regression for small data
More informationIntroduction to GDS. Stephanie Gogarten. July 18, 2018
Introduction to GDS Stephanie Gogarten July 18, 2018 Genomic Data Structure CoreArray (C++ library) designed for large-scale data management of genome-wide variants data format (GDS) to store multiple
More informationGMDR User Manual Version 1.0
GMDR User Manual Version 1.0 Oct 30, 2011 1 GMDR is a free, open-source interaction analysis tool, aimed to perform gene-gene interaction with generalized multifactor dimensionality methods. GMDR is being
More informationPBAP Version 1 User Manual
PBAP Version 1 User Manual Alejandro Q. Nato, Jr. 1, Nicola H. Chapman 1, Harkirat K. Sohi 1, Hiep D. Nguyen 1, Zoran Brkanac 2, and Ellen M. Wijsman 1,3,4,* 1 Division of Medical Genetics, Department
More informationPackage skelesim. November 27, 2017
Package skelesim Type Package Title Genetic Simulation Engine Version 0.9.8 November 27, 2017 URL https://github.com/christianparobek/skelesim BugReports https://github.com/christianparobek/skelesim/issues
More informationPRSice: Polygenic Risk Score software v1.22
PRSice: Polygenic Risk Score software v1.22 Jack Euesden jack.euesden@kcl.ac.uk Cathryn M. Lewis April 30, 2015 Paul F. O Reilly Contents 1 Overview 3 2 R packages required 3 3 Quickstart 3 3.1 Input Data...................................
More informationMPG NGS workshop I: Quality assessment of SNP calls
MPG NGS workshop I: Quality assessment of SNP calls Kiran V Garimella (kiran@broadinstitute.org) Genome Sequencing and Analysis Medical and Population Genetics February 4, 2010 SNP calling workflow Filesize*
More informationPackage MultiMeta. February 19, 2015
Type Package Package MultiMeta February 19, 2015 Title Meta-analysis of Multivariate Genome Wide Association Studies Version 0.1 Date 2014-08-21 Author Dragana Vuckovic Maintainer Dragana Vuckovic
More informationComparative Analysis of Genetic Algorithm Implementations
Comparative Analysis of Genetic Algorithm Implementations Robert Soricone Dr. Melvin Neville Department of Computer Science Northern Arizona University Flagstaff, Arizona SIGAda 24 Outline Introduction
More informationCover Page. The handle holds various files of this Leiden University dissertation.
Cover Page The handle http://hdl.handle.net/1887/32015 holds various files of this Leiden University dissertation. Author: Akker, Erik Ben van den Title: Computational biology in human aging : an omics
More informationPackage dkdna. June 1, Description Compute diffusion kernels on DNA polymorphisms, including SNP and bi-allelic genotypes.
Package dkdna June 1, 2015 Type Package Title Diffusion Kernels on a Set of Genotypes Version 0.1.1 Date 2015-05-31 Author Gota Morota and Masanori Koyama Maintainer Gota Morota Compute
More informationELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2
ELAI user manual Yongtao Guan Baylor College of Medicine Version 1.0 25 June 2015 Contents 1 Copyright 2 2 What ELAI Can Do 2 3 A simple example 2 4 Input file formats 3 4.1 Genotype file format....................................
More informationStep-by-Step Guide to Relatedness and Association Mapping Contents
Step-by-Step Guide to Relatedness and Association Mapping Contents OBJECTIVES... 2 INTRODUCTION... 2 RELATEDNESS MEASURES... 2 POPULATION STRUCTURE... 6 Q-K ASSOCIATION ANALYSIS... 10 K MATRIX COMPRESSION...
More informationPackage genotypeeval
Title QA/QC of a gvcf or VCF file Version 1.14.0 Package genotypeeval November 22, 2018 Takes in a gvcf or VCF and reports metrics to assess quality of calls. Depends R (>= 3.4.0), VariantAnnotation Imports
More informationThe fgwas Package. Version 1.0. Pennsylvannia State University
The fgwas Package Version 1.0 Zhong Wang 1 and Jiahan Li 2 1 Department of Public Health Science, 2 Department of Statistics, Pennsylvannia State University 1. Introduction The fgwas Package (Functional
More informationQUICKTEST user guide
QUICKTEST user guide Toby Johnson Zoltán Kutalik December 11, 2008 for quicktest version 0.94 Copyright c 2008 Toby Johnson and Zoltán Kutalik Permission is granted to copy, distribute and/or modify this
More informationGMMAT: Generalized linear Mixed Model Association Tests Version 0.7
GMMAT: Generalized linear Mixed Model Association Tests Version 0.7 Han Chen Department of Biostatistics Harvard T.H. Chan School of Public Health Email: hanchen@hsph.harvard.edu Matthew P. Conomos Department
More informationEstimating Variance Components in MMAP
Last update: 6/1/2014 Estimating Variance Components in MMAP MMAP implements routines to estimate variance components within the mixed model. These estimates can be used for likelihood ratio tests to compare
More informationForensic Resource/Reference On Genetics knowledge base: FROG-kb User s Manual. Updated June, 2017
Forensic Resource/Reference On Genetics knowledge base: FROG-kb User s Manual Updated June, 2017 Table of Contents 1. Introduction... 1 2. Accessing FROG-kb Home Page and Features... 1 3. Home Page and
More informationfasta2genotype.py Version 1.10 Written for Python Available on request from the author 2017 Paul Maier
1 fasta2genotype.py Version 1.10 Written for Python 2.7.10 Available on request from the author 2017 Paul Maier This program takes a fasta file listing all sequence haplotypes of all individuals at all
More informationQuality control of array genotyping data with argyle Andrew P Morgan
Quality control of array genotyping data with argyle Andrew P Morgan 2015-10-08 Introduction Proper quality control of array genotypes is an important prerequisite to further analysis. Genotype quality
More informationPackage gpart. November 19, 2018
Package gpart November 19, 2018 Title Human genome partitioning of dense sequencing data by identifying haplotype blocks Version 1.0.0 Depends R (>= 3.5.0), grid, Homo.sapiens, TxDb.Hsapiens.UCSC.hg38.knownGene,
More informationLinkage Disequilibrium Map by Unidimensional Nonnegative Scaling
The First International Symposium on Optimization and Systems Biology (OSB 07) Beijing, China, August 8 10, 2007 Copyright 2007 ORSC & APORC pp. 302 308 Linkage Disequilibrium Map by Unidimensional Nonnegative
More informationPackage LEA. April 23, 2016
Package LEA April 23, 2016 Title LEA: an R package for Landscape and Ecological Association Studies Version 1.2.0 Date 2014-09-17 Author , Olivier Francois
More informationPackage LGRF. September 13, 2015
Type Package Package LGRF September 13, 2015 Title Set-Based Tests for Genetic Association in Longitudinal Studies Version 1.0 Date 2015-08-20 Author Zihuai He Maintainer Zihuai He Functions
More informationPackage MOJOV. R topics documented: February 19, 2015
Type Package Title Mojo Variants: Rare Variants analysis Version 1.0.1 Date 2013-02-25 Author Maintainer Package MOJOV February 19, 2015 A package for analysis between rare variants
More informationHandling sam and vcf data, quality control
Handling sam and vcf data, quality control We continue with the earlier analyses and get some new data: cd ~/session_3 wget http://wasabiapp.org/vbox/data/session_4/file3.tgz tar xzf file3.tgz wget http://wasabiapp.org/vbox/data/session_4/file4.tgz
More informationM(ARK)S(IM) Dec. 1, 2009 Payseur Lab University of Wisconsin
M(ARK)S(IM) Dec. 1, 2009 Payseur Lab University of Wisconsin M(ARK)S(IM) extends MS by enabling the user to simulate microsatellite data sets under a variety of mutational models. Simulated data sets are
More informationPhD: a web database application for phenotype data management
Bioinformatics Advance Access published June 28, 2005 The Author (2005). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oupjournals.org PhD:
More informationLD vignette Measures of linkage disequilibrium
LD vignette Measures of linkage disequilibrium David Clayton June 13, 2018 Calculating linkage disequilibrium statistics We shall first load some illustrative data. > data(ld.example) The data are drawn
More informationBeviMed Guide. Daniel Greene
BeviMed Guide Daniel Greene 1 Introduction BeviMed [1] is a procedure for evaluating the evidence of association between allele configurations across rare variants, typically within a genomic locus, and
More informationCircosVCF workshop, TAU, 9/11/2017
CircosVCF exercise In this exercise, we will create and design circos plots using CircosVCF. We will use vcf files of a published case "X-linked elliptocytosis with impaired growth is related to mutated
More informationLASER: Locating Ancestry from SEquence Reads version 2.04
LASER: Locating Ancestry from SEquence Reads version 2.04 Chaolong Wang 1 Computational and Systems Biology Genome Institute of Singapore A*STAR, Singapore 138672, Singapore Xiaowei Zhan 2 Department of
More informationA short manual for LFMM (command-line version)
A short manual for LFMM (command-line version) Eric Frichot efrichot@gmail.com April 16, 2013 Please, print this reference manual only if it is necessary. This short manual aims to help users to run LFMM
More informationSOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie
SOLOMON: Parentage Analysis 1 Corresponding author: Mark Christie christim@science.oregonstate.edu SOLOMON: Parentage Analysis 2 Table of Contents: Installing SOLOMON on Windows/Linux Pg. 3 Installing
More informationSUGEN 8.6 Overview. Misa Graff, July 2017
SUGEN 8.6 Overview Misa Graff, July 2017 General Information By Ran Tao, https://sites.google.com/site/dragontaoran/home Website: http://dlin.web.unc.edu/software/sugen/ Standalone command-line software
More informationStep-by-Step Guide to Advanced Genetic Analysis
Step-by-Step Guide to Advanced Genetic Analysis Page 1 Introduction In the previous document, 1 we covered the standard genetic analyses available in JMP Genomics. Here, we cover the more advanced options
More informationGenome-Wide Association Study Using
has to Department of Epidemiology UT MD Anderson Cancer Center Houston, TX April 2, 2008 Programmers Cross Training Outline has to 1 has 2 to 3 Going object-oriented: Outline has Brief introduction to
More informationGMDR User Manual. GMDR software Beta 0.9. Updated March 2011
GMDR User Manual GMDR software Beta 0.9 Updated March 2011 1 As an open source project, the source code of GMDR is published and made available to the public, enabling anyone to copy, modify and redistribute
More informationPopulation Genetics (52642)
Population Genetics (52642) Benny Yakir 1 Introduction In this course we will examine several topics that are related to population genetics. In each topic we will discuss briefly the biological background
More informationPopulation Genetics in BioPerl HOWTO
Population Genetics in BioPerl HOW Jason Stajich, Dept Molecular Genetics and Microbiology, Duke University $Id: PopGen.xml,v 1.2 2005/02/23 04:56:30 jason Exp $ This document
More informationPackage EMLRT. August 7, 2014
Package EMLRT August 7, 2014 Type Package Title Association Studies with Imputed SNPs Using Expectation-Maximization-Likelihood-Ratio Test LazyData yes Version 1.0 Date 2014-08-01 Author Maintainer
More informationA comprehensive modelling framework and a multiple-imputation approach to haplotypic analysis of unrelated individuals
A comprehensive modelling framework and a multiple-imputation approach to haplotypic analysis of unrelated individuals GUI Release v1.0.2: User Manual January 2009 If you find this software useful, please
More informationEmile R. Chimusa Division of Human Genetics Department of Pathology University of Cape Town
Advanced Genomic data manipulation and Quality Control with plink Emile R. Chimusa (emile.chimusa@uct.ac.za) Division of Human Genetics Department of Pathology University of Cape Town Outlines: 1.Introduction
More informationPackage triogxe. April 3, 2013
Type Package Package triogxe April 3, 2013 Title A data smoothing approach to explore and test gene-environment interaction in case-parent trio data Version 0.1-1 Date 2013-04-02 Author Ji-Hyung Shin ,
More informationDevelopment of linkage map using Mapmaker/Exp3.0
Development of linkage map using Mapmaker/Exp3.0 Balram Marathi 1, A. K. Singh 2, Rajender Parsad 3 and V.K. Gupta 3 1 Institute of Biotechnology, Acharya N. G. Ranga Agricultural University, Rajendranagar,
More informationUser s Guide. Version 2.2. Semex Alliance, Ontario and Centre for Genetic Improvement of Livestock University of Guelph, Ontario
User s Guide Version 2.2 Semex Alliance, Ontario and Centre for Genetic Improvement of Livestock University of Guelph, Ontario Mehdi Sargolzaei, Jacques Chesnais and Flavio Schenkel Jan 2014 Disclaimer
More informationCTL mapping in R. Danny Arends, Pjotr Prins, and Ritsert C. Jansen. University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1
CTL mapping in R Danny Arends, Pjotr Prins, and Ritsert C. Jansen University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1 First written: Oct 2011 Last modified: Jan 2018 Abstract: Tutorial
More informationMIRING: Minimum Information for Reporting Immunogenomic NGS Genotyping. Data Standards Hackathon for NGS HACKATHON 1.0 Bethesda, MD September
MIRING: Minimum Information for Reporting Immunogenomic NGS Genotyping Data Standards Hackathon for NGS HACKATHON 1.0 Bethesda, MD September 27 2014 Static Dynamic Static Minimum Information for Reporting
More informationPackage poolfstat. September 14, 2018
Package poolfstat September 14, 2018 Maintainer Mathieu Gautier Author Mathieu Gautier, Valentin Hivert and Renaud Vitalis Version 1.0.0 License GPL (>= 2) Title Computing F-Statistics
More informationGenetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland
Genetic Programming Charles Chilaka Department of Computational Science Memorial University of Newfoundland Class Project for Bio 4241 March 27, 2014 Charles Chilaka (MUN) Genetic algorithms and programming
More information