Package SimGbyE. July 20, 2009

Size: px
Start display at page:

Download "Package SimGbyE. July 20, 2009"

Transcription

1 Package SimGbyE July 20, 2009 Type Package Title Simulated case/control or survival data sets with genetic and environmental interactions. Author Melanie Wilson Maintainer Melanie Wilson Description The functions in this package create simulated case/control or survival data sets with one or more of the following assumed effects: genetic main effects (G), environmental main effects (E), Gene by Gene interactions (GbyG), Gene by environment interactions (GbyE). The assumed genetic one and two locus models of epistasis are chosen randomly from a set of models chosen from Li and Reich. Then given a set of assumed coefficients on the effects mentioned above, an outcome variable is simulated (case/control or survival) based on a set user specified distribution parameters. Version License Unlimited Depends R topics documented: gene.data sim sim.gbye sim.geno Index 9 1

2 2 sim gene.data Example Genetic Data Description Genotypes for 155 individuals at 5 different SNPs. Usage data(gene.data) Format A data frame with 155 observations on 5 SNPs. Details A set of genetic data on 155 individuals that is taken as input in the function sim.gbye and used to create a simulated survival data set and a case/control data set. Examples data(gene.data) sim Wrapper function to simulate genotypes and output Description This wrapper function is used to simulate genotypes with sim.geno.r for a set of chromo. and positions and then simulate output with sim.gbye.r. Usage sim<-function(chromo.pos,popn="ceu",n=100, outpath="./", exepath="/home/fac/iversen/bin/linux/", mappath="/local/recombrates/", happath="/local/hapmap.phased/", perlpath="/proj/design/simstudy/rcode/", rr1=1.0,rr2=1.0,minmaf=0.05, eniv,family="case.control",pop.char, G=0,E=0,GbyG=0,GbyE=0, dist.coef,outfile=null)

3 sim 3 Arguments chromo.pos Vector of three values (1) Integer in c(1:22) giving the specific chromo. to sample from (2) Nucleotide position of start of region on the given chromo and (3) Nucleotide position of end of region on the given chromo. (Note: chr17: is the 40KB region centered on TP53) popn n outpath exepath mappath happath perlpath rr1 rr2 minmaf The population in HapMap to simulate the genetic samples from: must be one of ("CEU","YRI","JPT+CHB") Number of individuals to simulate the genetic data on Path to directory where outfiles will be deposited Path to hapgen executable file Path to oxford genetic map Path to hapmap phased genotype data Path to PERL script Heterozygote relative risk (should be 1.0) for population samples Rare homozygote relative risk (should be 1.0) for population samples Smallest minor allele freq for disease locus eniv matrix of environmental data of dim (m X q) family pop.char G E Value GbyG GbyE coef.dist outfile Character vector giving the family type where "case.control" will assume a binomial family and "survival" assumes a weibull family If family="case.control" the user must specify the fraction of cases in the simulated data set and if family="survival" the user must specify the median survival time, and scale and shape parameters of the baseline survival function respectively. Integer giving the number of genetic main effects Integer giving the number of eniv. main effects Interger giving the number of GXG interactions Interger giving the number of GXE interactions Discrete distribution on the coefficients of each effect specified as a list of two vectors: the first vector giving the possible values for the coefficients and the second vector indicating the support for each of these values. File path for the simulation files once they are created This function outputs a list of the following values if a file "outfile" is not given: model A list of two matrices, the first is a matrix of dimension (2 X (G+E+GbyG+GbyE)) that specifics the variables (genetic or environmental) that are involved in each effect. Where (1) Genetic main effect: 1 SNP is specified in the first row and "NA" in the second row, (2) Eniv. main effect: a eniv. variable in the first row and a "NA" in the second row, (3) GbyG: two SNPs in row one and two, and (4) GbyE a SNP in the first row and a eniv. variable in the second row. The SNP variables are specified by their rs numbers. Each SNP genotype is normally

4 4 sim design coef data parametrized as a 0 for the rare hom. genotype (aa), 1 for the het. genotype (aa) and 2 for the common hom. genotype (AA). If the SNP rs number is specified with a ".c" after the name then the reverse parameterization is used (0 for AA, 1 for aa and 2 for aa). The second matrix is of dimension (9 X (G+E+GbyG+GbyE)) and is a matrix specifying the genetic epistasis models of the genetic markers in the case of the genetic main effects, GbyG interactions and GbyE interactions. Each model is specified using a vector of length 9 that is the "penetrance" at each possible set of genotypes for two makers (aa/bb,aa/bb,aa/bb,aa/bb,aa/bb,aa/bb,aa/bb,aa/bb,aa/bb) this is the same model specification as in pages of Li and Reich. For the (1) Environmental main effects no model effect specification is given, for (2) the GbyE effects, the model effect specification that is given is the specification for the genetic main effect that is interacting with the given environmental variable. The design matrix created based on the models described above and used for the simulation A vector of beta coefficients for each of the effects in the simulation (of length G+E+GbyG+GbyE) The simulated data set If a file is given the above items are saved to the files outfile.model.out, outfile.design.out, outfile.data.out, and outfile.coef.out respectively. Author(s) Melanie Wilson <maw27@stat.duke.edu> References Li W and Reich J (1999). A complete enumeration and classification of two-locus disease models. Human Heredity; 50: Examples ##Simulated Genotypes for TP53 Region chromo = 17 pos1 = pos2 = chromo.pos = c(chromo,pos1,pos2) popn = "CEU" n = 100 outpath = "./" exepath = "/home/fac/iversen/bin/linux/" mappath = "/local/recombrates/" happath = "/local/hapmap.phased/" ##make the environment data ##AGE age = rnorm(n,mean=55,sd=11) eniv = cbind(age) colnames(eniv) = c("age")

5 sim.gbye 5 eniv = data.frame(eniv) pop.char =.4coef.dist = list(log(c(1.25,1.5,1.75)),c(1/3,1/3,1/3)) p53.assoc = sim(chromo.pos,popn=popn,n=n,outpath=outpath, exepath=exepath,mappath=mappath,happath=happath, eniv=eniv,family="case.control",pop.char=pop.char, G = 3,E=1,GbyG=0,GbyE=0,dist.coef=coef.dist) sim.gbye Simulation Function for case/control or Survival Data sets with genetic and environmental interactions Description Usage This function takes genetic data and environmental data for a set of individuals and simulations either a case/control data set or a survival data set that assumes GbyG and GbyE interactions. genetic.sim = function(gene,eniv,family,n,pop.char, G=0,E=0,GbyG=0,GbyE=0, dist.coef,outpath="./") Arguments gene matrix of genetic data of dim (m X p) where m >= n (the total number of individuals wanted in the final simulated data set) eniv matrix of environmental data of dim (m X q) family n pop.char G E GbyG GbyE coef.dist outpath Character vector giving the family type where "case.control" will assume a binomial family and "survival" assumes a weibull family The total number of individuals requested in the final simulated data set where n <= m If family="case.control" the user must specify the fraction of cases in the simulated data set and if family="survival" the user must specify the median survival time, and scale and shape parameters of the baseline survival function respectively. Integer giving the number of genetic main effects Integer giving the number of eniv. main effects Interger giving the number of GXG interactions Interger giving the number of GXE interactions Discrete distribution on the coefficients of each effect specified as a list of two vectors: the first vector giving the possible values for the coefficients and the second vector indicating the support for each of these values. Path that the genetic data is written to in the sim.geno function

6 6 sim.gbye Value This function outputs a list of the following values: model design coef data A list of two matrices, the first is a matrix of dimension (2 X (G+E+GbyG+GbyE)) that specifics the variables (genetic or environmental) that are involved in each effect. Where (1) Genetic main effect: 1 SNP is specified in the first row and "NA" in the second row, (2) Eniv. main effect: a eniv. variable in the first row and a "NA" in the second row, (3) GbyG: two SNPs in row one and two, and (4) GbyE a SNP in the first row and a eniv. variable in the second row. The SNP variables are specified by their rs numbers. Each SNP genotype is normally parametrized as a 0 for the rare hom. genotype (aa), 1 for the het. genotype (aa) and 2 for the common hom. genotype (AA). If the SNP rs number is specified with a ".c" after the name then the reverse parameterization is used (0 for AA, 1 for aa and 2 for aa). The second matrix is of dimension (9 X (G+E+GbyG+GbyE)) and is a matrix specifying the genetic epistasis models of the genetic markers in the case of the genetic main effects, GbyG interactions and GbyE interactions. Each model is specified using a vector of length 9 that is the "penetrance" at each possible set of genotypes for two makers (aa/bb,aa/bb,aa/bb,aa/bb,aa/bb,aa/bb,aa/bb,aa/bb,aa/bb) this is the same model specification as in pages of Li and Reich. For the (1) Environmental main effects no model effect specification is given, for (2) the GbyE effects, the model effect specification that is given is the specification for the genetic main effect that is interacting with the given environmental variable. The design matrix created based on the models described above and used for the simulation A vector of beta coefficients for each of the effects in the simulation (of length G+E+GbyG+GbyE) The simulated data set Author(s) Melanie Wilson <maw27@stat.duke.edu> References Li W and Reich J (1999). A complete enumeration and classification of two-locus disease models. Human Heredity; 50: Examples ##Load the genetic data (155 individuals and 5 SNPs) data(gene.data) gene = gene.data ##make the environment data ##AGE age = rnorm(155,mean=55,sd=11) eniv = cbind(age) colnames(eniv) = c("age")

7 sim.geno 7 eniv = data.frame(eniv) ##Case control simulation parameters family="case.control" n = 155 pop.char =.4 coef.dist = list(log(c(1.25,1.5,1.75)),c(1/3,1/3,1/3)) ##create one simulation with only 2 genetic main effects and 1 environmental effect sim = genetic.sim(gene=gene,eniv=eniv,family=family,n=n,pop.char=pop.char, G=2,E=1,GbyG=0,GbyE=0,dist.coef=coef.dist) sim.geno Simulation Function for Genetic data Description This function creates genetic data for a population using R binary versions of hapmap and recomb map input files. Usage sim.geno<-function(regnum=1,chromo=17,pos1= ,pos2= ,popn="ceu",n=100, outpath="./", exepath="/home/fac/iversen/bin/linux/", mappath="/local/recombrates/", happath="/local/hapmap.phased/", rr1=1.0,rr2=1.0,minmaf=0.05) Arguments regnum chromo pos1 pos2 popn n outpath exepath mappath happath rr1 rr2 minmaf Integer specifying the index of the chromo. region Integer in c(1:22) giving the specific chromo. to sample from Nucleotide position of start of region on the given chromo. Nucleotide position of end of region on the given chromo. (Note: chr17: is the 40KB region centered on TP53) The population in HapMap to simulate the genetic samples from: must be one of ("CEU","YRI","JPT+CHB") Number of individuals to simulate the genetic data on Path to directory where outfiles will be deposited Path to hapgen executable file Path to oxford genetic map Path to hapmap phased genotype data Heterozygote relative risk (should be 1.0) for population samples Rare homozygote relative risk (should be 1.0) for population samples Smallest minor allele freq for disease locus

8 8 sim.geno Details Value Function Calls hapgen to simulate one replicate from a specified chromosomal region given data from one of the HapMap II populations. The code generates samples of genotypes in a contiguous range of DNA using hapmap release 21 (NCBI build 35) data (it drops all evidently monomorphic variants). The default (rr1=1.0, rr2=1.0) is to generate population-based samples. If one or the other of rr1 and rr2 are >1.0, it (hapgen) will randomly choose a variant with MAF in (minmaf,0.5) as the disease allele and generate a case-control sample (I ve forced n.case=n.control=(n/2)). So this code can be used for generating main-effects simulations that have 1 or 0 associated SNPs per region generated. The position range may encompase an entire chromosome or simply bracket a gene or locus of interest. Generating a candidate gene/pathway sample or a genome-wide sample will require multiple calls to this function, one for each independent chromosomal range (if there is ld between variants in two ranges, they should be combined into one). This function outputs a list of the following values: geno An n*p matrix of genotypes coded as the number of minor alleles present (0, 1 or 2) where p is the number of non-monomorphic SNPs in the chosen chromo. range y legend An n-vector of zeros and ones denoting case/control status (1/0) even when both rr1 and rr2 are set to 1.0 (in that case, the case/control designation is meaningless) The hapmap legend file for the in-range SNPs, columns are rs number, chromosomal position, allele 0 and allele 1 assoc Data from the hapgen.aux file: when one or more of rr1 and rr2 are not 1.0: rows are index of the disease SNP in the geno matrix), disease SNP position, minor allele (allele 0 or 1), estimated MAF, assumed rr s (specified rr1 and rr2) Author(s) Melanie Wilson <maw27@stat.duke.edu> Examples ##Simulated Genotypes for TP53 Region chromo = 17 pos1 = pos2 = popn = "CEU" n = 100 outpath = "./" exepath = "/home/fac/iversen/bin/linux/" mappath = "/local/recombrates/" happath = "/local/hapmap.phased/" p53.null = sim.geno(n=10000)

9 Index Topic datasets gene.data, 1 Topic models sim, 2 sim.gbye, 5 sim.geno, 7 gene.data, 1 sim, 2 sim.gbye, 2, 5 sim.geno, 7 9

Association Analysis of Sequence Data using PLINK/SEQ (PSEQ)

Association Analysis of Sequence Data using PLINK/SEQ (PSEQ) Association Analysis of Sequence Data using PLINK/SEQ (PSEQ) Copyright (c) 2018 Stanley Hooker, Biao Li, Di Zhang and Suzanne M. Leal Purpose PLINK/SEQ (PSEQ) is an open-source C/C++ library for working

More information

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017 BICF Nano Course: GWAS GWAS Workflow Development using PLINK Julia Kozlitina Julia.Kozlitina@UTSouthwestern.edu April 28, 2017 Getting started Open the Terminal (Search -> Applications -> Terminal), and

More information

BioBin User Guide Current version: BioBin 2.3

BioBin User Guide Current version: BioBin 2.3 BioBin User Guide Current version: BioBin 2.3 Last modified: April 2017 Ritchie Lab Geisinger Health System URL: http://www.ritchielab.com/software/biobin-download Email: software@ritchielab.psu.edu 1

More information

SNP HiTLink Manual. Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1

SNP HiTLink Manual. Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1 SNP HiTLink Manual Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1 1 Department of Neurology, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan 2 Dynacom Co., Ltd, Kanagawa,

More information

Package lodgwas. R topics documented: November 30, Type Package

Package lodgwas. R topics documented: November 30, Type Package Type Package Package lodgwas November 30, 2015 Title Genome-Wide Association Analysis of a Biomarker Accounting for Limit of Detection Version 1.0-7 Date 2015-11-10 Author Ahmad Vaez, Ilja M. Nolte, Peter

More information

Step-by-Step Guide to Basic Genetic Analysis

Step-by-Step Guide to Basic Genetic Analysis Step-by-Step Guide to Basic Genetic Analysis Page 1 Introduction This document shows you how to clean up your genetic data, assess its statistical properties and perform simple analyses such as case-control

More information

Spotter Documentation Version 0.5, Released 4/12/2010

Spotter Documentation Version 0.5, Released 4/12/2010 Spotter Documentation Version 0.5, Released 4/12/2010 Purpose Spotter is a program for delineating an association signal from a genome wide association study using features such as recombination rates,

More information

Package GEM. R topics documented: January 31, Type Package

Package GEM. R topics documented: January 31, Type Package Type Package Package GEM January 31, 2018 Title GEM: fast association study for the interplay of Gene, Environment and Methylation Version 1.5.0 Date 2015-12-05 Author Hong Pan, Joanna D Holbrook, Neerja

More information

Package SMAT. January 29, 2013

Package SMAT. January 29, 2013 Package SMAT January 29, 2013 Type Package Title Scaled Multiple-phenotype Association Test Version 0.98 Date 2013-01-26 Author Lin Li, Ph.D.; Elizabeth D. Schifano, Ph.D. Maintainer Lin Li ;

More information

Package REGENT. R topics documented: August 19, 2015

Package REGENT. R topics documented: August 19, 2015 Package REGENT August 19, 2015 Title Risk Estimation for Genetic and Environmental Traits Version 1.0.6 Date 2015-08-18 Author Daniel J.M. Crouch, Graham H.M. Goddard & Cathryn M. Lewis Maintainer Daniel

More information

MAGA: Meta-Analysis of Gene-level Associations

MAGA: Meta-Analysis of Gene-level Associations MAGA: Meta-Analysis of Gene-level Associations SYNOPSIS MAGA [--sfile] [--chr] OPTIONS Option Default Description --sfile specification.txt Select a specification file --chr Select a chromosome DESCRIPTION

More information

KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li

KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual Miao-Xin Li, Jiang Li Department of Psychiatry Centre for Genomic Sciences Department

More information

Polymorphism and Variant Analysis Lab

Polymorphism and Variant Analysis Lab Polymorphism and Variant Analysis Lab Arian Avalos PowerPoint by Casey Hanson Polymorphism and Variant Analysis Matt Hudson 2018 1 Exercise In this exercise, we will do the following:. 1. Gain familiarity

More information

SEQGWAS: Integrative Analysis of SEQuencing and GWAS Data

SEQGWAS: Integrative Analysis of SEQuencing and GWAS Data SEQGWAS: Integrative Analysis of SEQuencing and GWAS Data SYNOPSIS SEQGWAS [--sfile] [--chr] OPTIONS Option Default Description --sfile specification.txt Select a specification file --chr Select a chromosome

More information

Package EBglmnet. January 30, 2016

Package EBglmnet. January 30, 2016 Type Package Package EBglmnet January 30, 2016 Title Empirical Bayesian Lasso and Elastic Net Methods for Generalized Linear Models Version 4.1 Date 2016-01-15 Author Anhui Huang, Dianting Liu Maintainer

More information

ToCatchAThief c ryan campbell & jenn coughlan 7/23/2018

ToCatchAThief c ryan campbell & jenn coughlan 7/23/2018 ToCatchAThief c ryan campbell & jenn coughlan 7/23/2018 Welcome to the To Catch a Thief: With Data! walkthrough! https://bioconductor.org/packages/devel/ bioc/vignettes/snprelate/inst/doc/snprelatetutorial.html

More information

Package GWAF. March 12, 2015

Package GWAF. March 12, 2015 Type Package Package GWAF March 12, 2015 Title Genome-Wide Association/Interaction Analysis and Rare Variant Analysis with Family Data Version 2.2 Date 2015-03-12 Author Ming-Huei Chen

More information

GWAsimulator: A rapid whole-genome simulation program

GWAsimulator: A rapid whole-genome simulation program GWAsimulator: A rapid whole-genome simulation program Version 1.1 Chun Li and Mingyao Li September 21, 2007 (revised October 9, 2007) 1. Introduction...1 2. Download and compile the program...2 3. Input

More information

Bayesian Multiple QTL Mapping

Bayesian Multiple QTL Mapping Bayesian Multiple QTL Mapping Samprit Banerjee, Brian S. Yandell, Nengjun Yi April 28, 2006 1 Overview Bayesian multiple mapping of QTL library R/bmqtl provides Bayesian analysis of multiple quantitative

More information

PRSice: Polygenic Risk Score software - Vignette

PRSice: Polygenic Risk Score software - Vignette PRSice: Polygenic Risk Score software - Vignette Jack Euesden, Paul O Reilly March 22, 2016 1 The Polygenic Risk Score process PRSice ( precise ) implements a pipeline that has become standard in Polygenic

More information

BGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14)

BGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14) BGGN-213: FOUNDATIONS OF BIOINFORMATICS (Lecture 14) Genome Informatics (Part 1) https://bioboot.github.io/bggn213_f17/lectures/#14 Dr. Barry Grant Nov 2017 Overview: The purpose of this lab session is

More information

Package RVS0.0 Jiafen Gong, Zeynep Baskurt, Andriy Derkach, Angelina Pesevski and Lisa Strug October, 2016

Package RVS0.0 Jiafen Gong, Zeynep Baskurt, Andriy Derkach, Angelina Pesevski and Lisa Strug October, 2016 Package RVS0.0 Jiafen Gong, Zeynep Baskurt, Andriy Derkach, Angelina Pesevski and Lisa Strug October, 2016 The Robust Variance Score (RVS) test is designed for association analysis for next generation

More information

Importing and Merging Data Tutorial

Importing and Merging Data Tutorial Importing and Merging Data Tutorial Release 1.0 Golden Helix, Inc. February 17, 2012 Contents 1. Overview 2 2. Import Pedigree Data 4 3. Import Phenotypic Data 6 4. Import Genetic Data 8 5. Import and

More information

Estimating. Local Ancestry in admixed Populations (LAMP)

Estimating. Local Ancestry in admixed Populations (LAMP) Estimating Local Ancestry in admixed Populations (LAMP) QIAN ZHANG 572 6/05/2014 Outline 1) Sketch Method 2) Algorithm 3) Simulated Data: Accuracy Varying Pop1-Pop2 Ancestries r 2 pruning threshold Number

More information

Genetic type 1 Error Calculator (GEC)

Genetic type 1 Error Calculator (GEC) Genetic type 1 Error Calculator (GEC) (Version 0.2) User Manual Miao-Xin Li Department of Psychiatry and State Key Laboratory for Cognitive and Brain Sciences; the Centre for Reproduction, Development

More information

Network Based Models For Analysis of SNPs Yalta Opt

Network Based Models For Analysis of SNPs Yalta Opt Outline Network Based Models For Analysis of Yalta Optimization Conference 2010 Network Science Zeynep Ertem*, Sergiy Butenko*, Clare Gill** *Department of Industrial and Systems Engineering, **Department

More information

Package inversion. R topics documented: July 18, Type Package. Title Inversions in genotype data. Version

Package inversion. R topics documented: July 18, Type Package. Title Inversions in genotype data. Version Package inversion July 18, 2013 Type Package Title Inversions in genotype data Version 1.8.0 Date 2011-05-12 Author Alejandro Caceres Maintainer Package to find genetic inversions in genotype (SNP array)

More information

RAD Population Genomics Programs Paul Hohenlohe 6/2014

RAD Population Genomics Programs Paul Hohenlohe 6/2014 RAD Population Genomics Programs Paul Hohenlohe (hohenlohe@uidaho.edu) 6/2014 I. Overview These programs are designed to conduct population genomic analysis on RAD sequencing data. They were designed for

More information

Package globalgsa. February 19, 2015

Package globalgsa. February 19, 2015 Type Package Package globalgsa February 19, 2015 Title Global -Set Analysis for Association Studies. Version 1.0 Date 2013-10-22 Author Natalia Vilor, M.Luz Calle Maintainer Natalia Vilor

More information

The Lander-Green Algorithm in Practice. Biostatistics 666

The Lander-Green Algorithm in Practice. Biostatistics 666 The Lander-Green Algorithm in Practice Biostatistics 666 Last Lecture: Lander-Green Algorithm More general definition for I, the "IBD vector" Probability of genotypes given IBD vector Transition probabilities

More information

Optimising PLINK. Weronika Filinger. September 2, 2013

Optimising PLINK. Weronika Filinger. September 2, 2013 Optimising PLINK Weronika Filinger September 2, 2013 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2013 Abstract Every year the amount of genetic data increases greatly,

More information

Genetic Analysis. Page 1

Genetic Analysis. Page 1 Genetic Analysis Page 1 Genetic Analysis Objectives: 1) Set up Case-Control Association analysis and the Basic Genetics Workflow 2) Use JMP tools to interact with and explore results 3) Learn advanced

More information

MACAU User Manual. Xiang Zhou. March 15, 2017

MACAU User Manual. Xiang Zhou. March 15, 2017 MACAU User Manual Xiang Zhou March 15, 2017 Contents 1 Introduction 2 1.1 What is MACAU...................................... 2 1.2 How to Cite MACAU................................... 2 1.3 The Model.........................................

More information

survsnp: Power and Sample Size Calculations for SNP Association Studies with Censored Time to Event Outcomes

survsnp: Power and Sample Size Calculations for SNP Association Studies with Censored Time to Event Outcomes survsnp: Power and Sample Size Calculations for SNP Association Studies with Censored Time to Event Outcomes Kouros Owzar Zhiguo Li Nancy Cox Sin-Ho Jung Chanhee Yi June 29, 2016 1 Introduction This vignette

More information

GBS Bioinformatics Pipeline(s) Overview

GBS Bioinformatics Pipeline(s) Overview GBS Bioinformatics Pipeline(s) Overview Getting from sequence files to genotypes. Pipeline Coding: Ed Buckler Jeff Glaubitz James Harriman Presentation: Terry Casstevens With supporting information from

More information

Package RobustSNP. January 1, 2011

Package RobustSNP. January 1, 2011 Package RobustSNP January 1, 2011 Type Package Title Robust SNP association tests under different genetic models, allowing for covariates Version 1.0 Depends mvtnorm,car,snpmatrix Date 2010-07-11 Author

More information

called Hadoop Distribution file System (HDFS). HDFS is designed to run on clusters of commodity hardware and is capable of handling large files. A fil

called Hadoop Distribution file System (HDFS). HDFS is designed to run on clusters of commodity hardware and is capable of handling large files. A fil Parallel Genome-Wide Analysis With Central And Graphic Processing Units Muhamad Fitra Kacamarga mkacamarga@binus.edu James W. Baurley baurley@binus.edu Bens Pardamean bpardamean@binus.edu Abstract The

More information

Tutorial. Identification of Variants Using GATK. Sample to Insight. November 21, 2017

Tutorial. Identification of Variants Using GATK. Sample to Insight. November 21, 2017 Identification of Variants Using GATK November 21, 2017 Sample to Insight QIAGEN Aarhus Silkeborgvej 2 Prismet 8000 Aarhus C Denmark Telephone: +45 70 22 32 44 www.qiagenbioinformatics.com AdvancedGenomicsSupport@qiagen.com

More information

Download PLINK from

Download PLINK from PLINK tutorial Amended from two tutorials that the PLINK author Shaun Purcell wrote, see http://pngu.mgh.harvard.edu/~purcell/plink/tutorial.shtml and 'Teaching materials and example dataset' at http://pngu.mgh.harvard.edu/~purcell/plink/res.shtml

More information

Intro to NGS Tutorial

Intro to NGS Tutorial Intro to NGS Tutorial Release 8.6.0 Golden Helix, Inc. October 31, 2016 Contents 1. Overview 2 2. Import Variants and Quality Fields 3 3. Quality Filters 10 Generate Alternate Read Ratio.........................................

More information

GWAS Exercises 3 - GWAS with a Quantiative Trait

GWAS Exercises 3 - GWAS with a Quantiative Trait GWAS Exercises 3 - GWAS with a Quantiative Trait Peter Castaldi January 28, 2013 PLINK can also test for genetic associations with a quantitative trait (i.e. a continuous variable). In this exercise, we

More information

The LDheatmap Package

The LDheatmap Package The LDheatmap Package May 6, 2006 Title Graphical display of pairwise linkage disequilibria between SNPs Version 0.2-1 Author Ji-Hyung Shin , Sigal Blay , Nicholas Lewin-Koh

More information

Vignette for the package rehh (version 2+) Mathieu Gautier, Alexander Klassmann and Renaud Vitalis 24/10/2016

Vignette for the package rehh (version 2+) Mathieu Gautier, Alexander Klassmann and Renaud Vitalis 24/10/2016 Vignette for the package rehh (version 2+) Mathieu Gautier, Alexander Klassmann and Renaud Vitalis 24/10/2016 Contents 1 Input Files 2 1.1 Haplotype data file..........................................

More information

Assignment 7: Single-cell genomics. Bio /02/2018

Assignment 7: Single-cell genomics. Bio /02/2018 Assignment 7: Single-cell genomics Bio5488 03/02/2018 Assignment 7: Single-cell genomics Input Genotypes called from several exome-sequencing datasets derived from either bulk or small pools of cells (VCF

More information

Convert Dosages to Genotypes Author: Autumn Laughbaum, Golden Helix, Inc.

Convert Dosages to Genotypes Author: Autumn Laughbaum, Golden Helix, Inc. Convert Dosages to Genotypes Author: Autumn Laughbaum, Golden Helix, Inc. Overview This script converts allelic dosage values to genotypes based on user-specified thresholds. The dosage data may be in

More information

Tutorial on gene-c ancestry es-ma-on: How to use LASER. Chaolong Wang Sequence Analysis Workshop June University of Michigan

Tutorial on gene-c ancestry es-ma-on: How to use LASER. Chaolong Wang Sequence Analysis Workshop June University of Michigan Tutorial on gene-c ancestry es-ma-on: How to use LASER Chaolong Wang Sequence Analysis Workshop June 2014 @ University of Michigan LASER: Loca-ng Ancestry from SEquence Reads Main func:ons of the so

More information

PBAP Version 1 User Manual

PBAP Version 1 User Manual PBAP Version 1 User Manual Alejandro Q. Nato, Jr. 1, Nicola H. Chapman 1, Harkirat K. Sohi 1, Hiep D. Nguyen 1, Zoran Brkanac 2, and Ellen M. Wijsman 1,3,4,* 1 Division of Medical Genetics, Department

More information

Package coloc. February 24, 2018

Package coloc. February 24, 2018 Type Package Package coloc February 24, 2018 Imports ggplot2, snpstats, BMA, reshape, methods, flashclust, speedglm Suggests knitr, testthat Title Colocalisation Tests of Two Genetic Traits Version 3.1

More information

4.1. Access the internet and log on to the UCSC Genome Bioinformatics Web Page (Figure 1-

4.1. Access the internet and log on to the UCSC Genome Bioinformatics Web Page (Figure 1- 1. PURPOSE To provide instructions for finding rs Numbers (SNP database ID numbers) and increasing sequence length by utilizing the UCSC Genome Bioinformatics Database. 2. MATERIALS 2.1. Sequence Information

More information

Package ridge. R topics documented: February 15, Title Ridge Regression with automatic selection of the penalty parameter. Version 2.

Package ridge. R topics documented: February 15, Title Ridge Regression with automatic selection of the penalty parameter. Version 2. Package ridge February 15, 2013 Title Ridge Regression with automatic selection of the penalty parameter Version 2.1-2 Date 2012-25-09 Author Erika Cule Linear and logistic ridge regression for small data

More information

Introduction to GDS. Stephanie Gogarten. July 18, 2018

Introduction to GDS. Stephanie Gogarten. July 18, 2018 Introduction to GDS Stephanie Gogarten July 18, 2018 Genomic Data Structure CoreArray (C++ library) designed for large-scale data management of genome-wide variants data format (GDS) to store multiple

More information

GMDR User Manual Version 1.0

GMDR User Manual Version 1.0 GMDR User Manual Version 1.0 Oct 30, 2011 1 GMDR is a free, open-source interaction analysis tool, aimed to perform gene-gene interaction with generalized multifactor dimensionality methods. GMDR is being

More information

PBAP Version 1 User Manual

PBAP Version 1 User Manual PBAP Version 1 User Manual Alejandro Q. Nato, Jr. 1, Nicola H. Chapman 1, Harkirat K. Sohi 1, Hiep D. Nguyen 1, Zoran Brkanac 2, and Ellen M. Wijsman 1,3,4,* 1 Division of Medical Genetics, Department

More information

Package skelesim. November 27, 2017

Package skelesim. November 27, 2017 Package skelesim Type Package Title Genetic Simulation Engine Version 0.9.8 November 27, 2017 URL https://github.com/christianparobek/skelesim BugReports https://github.com/christianparobek/skelesim/issues

More information

PRSice: Polygenic Risk Score software v1.22

PRSice: Polygenic Risk Score software v1.22 PRSice: Polygenic Risk Score software v1.22 Jack Euesden jack.euesden@kcl.ac.uk Cathryn M. Lewis April 30, 2015 Paul F. O Reilly Contents 1 Overview 3 2 R packages required 3 3 Quickstart 3 3.1 Input Data...................................

More information

MPG NGS workshop I: Quality assessment of SNP calls

MPG NGS workshop I: Quality assessment of SNP calls MPG NGS workshop I: Quality assessment of SNP calls Kiran V Garimella (kiran@broadinstitute.org) Genome Sequencing and Analysis Medical and Population Genetics February 4, 2010 SNP calling workflow Filesize*

More information

Package MultiMeta. February 19, 2015

Package MultiMeta. February 19, 2015 Type Package Package MultiMeta February 19, 2015 Title Meta-analysis of Multivariate Genome Wide Association Studies Version 0.1 Date 2014-08-21 Author Dragana Vuckovic Maintainer Dragana Vuckovic

More information

Comparative Analysis of Genetic Algorithm Implementations

Comparative Analysis of Genetic Algorithm Implementations Comparative Analysis of Genetic Algorithm Implementations Robert Soricone Dr. Melvin Neville Department of Computer Science Northern Arizona University Flagstaff, Arizona SIGAda 24 Outline Introduction

More information

Cover Page. The handle holds various files of this Leiden University dissertation.

Cover Page. The handle   holds various files of this Leiden University dissertation. Cover Page The handle http://hdl.handle.net/1887/32015 holds various files of this Leiden University dissertation. Author: Akker, Erik Ben van den Title: Computational biology in human aging : an omics

More information

Package dkdna. June 1, Description Compute diffusion kernels on DNA polymorphisms, including SNP and bi-allelic genotypes.

Package dkdna. June 1, Description Compute diffusion kernels on DNA polymorphisms, including SNP and bi-allelic genotypes. Package dkdna June 1, 2015 Type Package Title Diffusion Kernels on a Set of Genotypes Version 0.1.1 Date 2015-05-31 Author Gota Morota and Masanori Koyama Maintainer Gota Morota Compute

More information

ELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2

ELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2 ELAI user manual Yongtao Guan Baylor College of Medicine Version 1.0 25 June 2015 Contents 1 Copyright 2 2 What ELAI Can Do 2 3 A simple example 2 4 Input file formats 3 4.1 Genotype file format....................................

More information

Step-by-Step Guide to Relatedness and Association Mapping Contents

Step-by-Step Guide to Relatedness and Association Mapping Contents Step-by-Step Guide to Relatedness and Association Mapping Contents OBJECTIVES... 2 INTRODUCTION... 2 RELATEDNESS MEASURES... 2 POPULATION STRUCTURE... 6 Q-K ASSOCIATION ANALYSIS... 10 K MATRIX COMPRESSION...

More information

Package genotypeeval

Package genotypeeval Title QA/QC of a gvcf or VCF file Version 1.14.0 Package genotypeeval November 22, 2018 Takes in a gvcf or VCF and reports metrics to assess quality of calls. Depends R (>= 3.4.0), VariantAnnotation Imports

More information

The fgwas Package. Version 1.0. Pennsylvannia State University

The fgwas Package. Version 1.0. Pennsylvannia State University The fgwas Package Version 1.0 Zhong Wang 1 and Jiahan Li 2 1 Department of Public Health Science, 2 Department of Statistics, Pennsylvannia State University 1. Introduction The fgwas Package (Functional

More information

QUICKTEST user guide

QUICKTEST user guide QUICKTEST user guide Toby Johnson Zoltán Kutalik December 11, 2008 for quicktest version 0.94 Copyright c 2008 Toby Johnson and Zoltán Kutalik Permission is granted to copy, distribute and/or modify this

More information

GMMAT: Generalized linear Mixed Model Association Tests Version 0.7

GMMAT: Generalized linear Mixed Model Association Tests Version 0.7 GMMAT: Generalized linear Mixed Model Association Tests Version 0.7 Han Chen Department of Biostatistics Harvard T.H. Chan School of Public Health Email: hanchen@hsph.harvard.edu Matthew P. Conomos Department

More information

Estimating Variance Components in MMAP

Estimating Variance Components in MMAP Last update: 6/1/2014 Estimating Variance Components in MMAP MMAP implements routines to estimate variance components within the mixed model. These estimates can be used for likelihood ratio tests to compare

More information

Forensic Resource/Reference On Genetics knowledge base: FROG-kb User s Manual. Updated June, 2017

Forensic Resource/Reference On Genetics knowledge base: FROG-kb User s Manual. Updated June, 2017 Forensic Resource/Reference On Genetics knowledge base: FROG-kb User s Manual Updated June, 2017 Table of Contents 1. Introduction... 1 2. Accessing FROG-kb Home Page and Features... 1 3. Home Page and

More information

fasta2genotype.py Version 1.10 Written for Python Available on request from the author 2017 Paul Maier

fasta2genotype.py Version 1.10 Written for Python Available on request from the author 2017 Paul Maier 1 fasta2genotype.py Version 1.10 Written for Python 2.7.10 Available on request from the author 2017 Paul Maier This program takes a fasta file listing all sequence haplotypes of all individuals at all

More information

Quality control of array genotyping data with argyle Andrew P Morgan

Quality control of array genotyping data with argyle Andrew P Morgan Quality control of array genotyping data with argyle Andrew P Morgan 2015-10-08 Introduction Proper quality control of array genotypes is an important prerequisite to further analysis. Genotype quality

More information

Package gpart. November 19, 2018

Package gpart. November 19, 2018 Package gpart November 19, 2018 Title Human genome partitioning of dense sequencing data by identifying haplotype blocks Version 1.0.0 Depends R (>= 3.5.0), grid, Homo.sapiens, TxDb.Hsapiens.UCSC.hg38.knownGene,

More information

Linkage Disequilibrium Map by Unidimensional Nonnegative Scaling

Linkage Disequilibrium Map by Unidimensional Nonnegative Scaling The First International Symposium on Optimization and Systems Biology (OSB 07) Beijing, China, August 8 10, 2007 Copyright 2007 ORSC & APORC pp. 302 308 Linkage Disequilibrium Map by Unidimensional Nonnegative

More information

Package LEA. April 23, 2016

Package LEA. April 23, 2016 Package LEA April 23, 2016 Title LEA: an R package for Landscape and Ecological Association Studies Version 1.2.0 Date 2014-09-17 Author , Olivier Francois

More information

Package LGRF. September 13, 2015

Package LGRF. September 13, 2015 Type Package Package LGRF September 13, 2015 Title Set-Based Tests for Genetic Association in Longitudinal Studies Version 1.0 Date 2015-08-20 Author Zihuai He Maintainer Zihuai He Functions

More information

Package MOJOV. R topics documented: February 19, 2015

Package MOJOV. R topics documented: February 19, 2015 Type Package Title Mojo Variants: Rare Variants analysis Version 1.0.1 Date 2013-02-25 Author Maintainer Package MOJOV February 19, 2015 A package for analysis between rare variants

More information

Handling sam and vcf data, quality control

Handling sam and vcf data, quality control Handling sam and vcf data, quality control We continue with the earlier analyses and get some new data: cd ~/session_3 wget http://wasabiapp.org/vbox/data/session_4/file3.tgz tar xzf file3.tgz wget http://wasabiapp.org/vbox/data/session_4/file4.tgz

More information

M(ARK)S(IM) Dec. 1, 2009 Payseur Lab University of Wisconsin

M(ARK)S(IM) Dec. 1, 2009 Payseur Lab University of Wisconsin M(ARK)S(IM) Dec. 1, 2009 Payseur Lab University of Wisconsin M(ARK)S(IM) extends MS by enabling the user to simulate microsatellite data sets under a variety of mutational models. Simulated data sets are

More information

PhD: a web database application for phenotype data management

PhD: a web database application for phenotype data management Bioinformatics Advance Access published June 28, 2005 The Author (2005). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oupjournals.org PhD:

More information

LD vignette Measures of linkage disequilibrium

LD vignette Measures of linkage disequilibrium LD vignette Measures of linkage disequilibrium David Clayton June 13, 2018 Calculating linkage disequilibrium statistics We shall first load some illustrative data. > data(ld.example) The data are drawn

More information

BeviMed Guide. Daniel Greene

BeviMed Guide. Daniel Greene BeviMed Guide Daniel Greene 1 Introduction BeviMed [1] is a procedure for evaluating the evidence of association between allele configurations across rare variants, typically within a genomic locus, and

More information

CircosVCF workshop, TAU, 9/11/2017

CircosVCF workshop, TAU, 9/11/2017 CircosVCF exercise In this exercise, we will create and design circos plots using CircosVCF. We will use vcf files of a published case "X-linked elliptocytosis with impaired growth is related to mutated

More information

LASER: Locating Ancestry from SEquence Reads version 2.04

LASER: Locating Ancestry from SEquence Reads version 2.04 LASER: Locating Ancestry from SEquence Reads version 2.04 Chaolong Wang 1 Computational and Systems Biology Genome Institute of Singapore A*STAR, Singapore 138672, Singapore Xiaowei Zhan 2 Department of

More information

A short manual for LFMM (command-line version)

A short manual for LFMM (command-line version) A short manual for LFMM (command-line version) Eric Frichot efrichot@gmail.com April 16, 2013 Please, print this reference manual only if it is necessary. This short manual aims to help users to run LFMM

More information

SOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie

SOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie SOLOMON: Parentage Analysis 1 Corresponding author: Mark Christie christim@science.oregonstate.edu SOLOMON: Parentage Analysis 2 Table of Contents: Installing SOLOMON on Windows/Linux Pg. 3 Installing

More information

SUGEN 8.6 Overview. Misa Graff, July 2017

SUGEN 8.6 Overview. Misa Graff, July 2017 SUGEN 8.6 Overview Misa Graff, July 2017 General Information By Ran Tao, https://sites.google.com/site/dragontaoran/home Website: http://dlin.web.unc.edu/software/sugen/ Standalone command-line software

More information

Step-by-Step Guide to Advanced Genetic Analysis

Step-by-Step Guide to Advanced Genetic Analysis Step-by-Step Guide to Advanced Genetic Analysis Page 1 Introduction In the previous document, 1 we covered the standard genetic analyses available in JMP Genomics. Here, we cover the more advanced options

More information

Genome-Wide Association Study Using

Genome-Wide Association Study Using has to Department of Epidemiology UT MD Anderson Cancer Center Houston, TX April 2, 2008 Programmers Cross Training Outline has to 1 has 2 to 3 Going object-oriented: Outline has Brief introduction to

More information

GMDR User Manual. GMDR software Beta 0.9. Updated March 2011

GMDR User Manual. GMDR software Beta 0.9. Updated March 2011 GMDR User Manual GMDR software Beta 0.9 Updated March 2011 1 As an open source project, the source code of GMDR is published and made available to the public, enabling anyone to copy, modify and redistribute

More information

Population Genetics (52642)

Population Genetics (52642) Population Genetics (52642) Benny Yakir 1 Introduction In this course we will examine several topics that are related to population genetics. In each topic we will discuss briefly the biological background

More information

Population Genetics in BioPerl HOWTO

Population Genetics in BioPerl HOWTO Population Genetics in BioPerl HOW Jason Stajich, Dept Molecular Genetics and Microbiology, Duke University $Id: PopGen.xml,v 1.2 2005/02/23 04:56:30 jason Exp $ This document

More information

Package EMLRT. August 7, 2014

Package EMLRT. August 7, 2014 Package EMLRT August 7, 2014 Type Package Title Association Studies with Imputed SNPs Using Expectation-Maximization-Likelihood-Ratio Test LazyData yes Version 1.0 Date 2014-08-01 Author Maintainer

More information

A comprehensive modelling framework and a multiple-imputation approach to haplotypic analysis of unrelated individuals

A comprehensive modelling framework and a multiple-imputation approach to haplotypic analysis of unrelated individuals A comprehensive modelling framework and a multiple-imputation approach to haplotypic analysis of unrelated individuals GUI Release v1.0.2: User Manual January 2009 If you find this software useful, please

More information

Emile R. Chimusa Division of Human Genetics Department of Pathology University of Cape Town

Emile R. Chimusa Division of Human Genetics Department of Pathology University of Cape Town Advanced Genomic data manipulation and Quality Control with plink Emile R. Chimusa (emile.chimusa@uct.ac.za) Division of Human Genetics Department of Pathology University of Cape Town Outlines: 1.Introduction

More information

Package triogxe. April 3, 2013

Package triogxe. April 3, 2013 Type Package Package triogxe April 3, 2013 Title A data smoothing approach to explore and test gene-environment interaction in case-parent trio data Version 0.1-1 Date 2013-04-02 Author Ji-Hyung Shin ,

More information

Development of linkage map using Mapmaker/Exp3.0

Development of linkage map using Mapmaker/Exp3.0 Development of linkage map using Mapmaker/Exp3.0 Balram Marathi 1, A. K. Singh 2, Rajender Parsad 3 and V.K. Gupta 3 1 Institute of Biotechnology, Acharya N. G. Ranga Agricultural University, Rajendranagar,

More information

User s Guide. Version 2.2. Semex Alliance, Ontario and Centre for Genetic Improvement of Livestock University of Guelph, Ontario

User s Guide. Version 2.2. Semex Alliance, Ontario and Centre for Genetic Improvement of Livestock University of Guelph, Ontario User s Guide Version 2.2 Semex Alliance, Ontario and Centre for Genetic Improvement of Livestock University of Guelph, Ontario Mehdi Sargolzaei, Jacques Chesnais and Flavio Schenkel Jan 2014 Disclaimer

More information

CTL mapping in R. Danny Arends, Pjotr Prins, and Ritsert C. Jansen. University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1

CTL mapping in R. Danny Arends, Pjotr Prins, and Ritsert C. Jansen. University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1 CTL mapping in R Danny Arends, Pjotr Prins, and Ritsert C. Jansen University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1 First written: Oct 2011 Last modified: Jan 2018 Abstract: Tutorial

More information

MIRING: Minimum Information for Reporting Immunogenomic NGS Genotyping. Data Standards Hackathon for NGS HACKATHON 1.0 Bethesda, MD September

MIRING: Minimum Information for Reporting Immunogenomic NGS Genotyping. Data Standards Hackathon for NGS HACKATHON 1.0 Bethesda, MD September MIRING: Minimum Information for Reporting Immunogenomic NGS Genotyping Data Standards Hackathon for NGS HACKATHON 1.0 Bethesda, MD September 27 2014 Static Dynamic Static Minimum Information for Reporting

More information

Package poolfstat. September 14, 2018

Package poolfstat. September 14, 2018 Package poolfstat September 14, 2018 Maintainer Mathieu Gautier Author Mathieu Gautier, Valentin Hivert and Renaud Vitalis Version 1.0.0 License GPL (>= 2) Title Computing F-Statistics

More information

Genetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland

Genetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland Genetic Programming Charles Chilaka Department of Computational Science Memorial University of Newfoundland Class Project for Bio 4241 March 27, 2014 Charles Chilaka (MUN) Genetic algorithms and programming

More information