GemTools Documentation

Size: px
Start display at page:

Download "GemTools Documentation"

Transcription

1 Literature: GemTools Documentation Bert Klei and Brian P. Kent February 2011 This software is described in GemTools: a fast and efficient approach to estimating genetic ancestry (in preparation) Klei L, Kent BP, Melhem N, Devlin B, Roeder K The GemTools functions are primarily based on the methods described in Discovering genetic ancestry using spectral graph theory. Genet Epidemiol Jan;34(1):51 9. Lee AB, Luca D, Klei L, Devlin B, Roeder K. The projection methods are described in: Using ancestry matching to combine family based and unrelated samples for genome wide association studies. Stat Med 2010 Dec 10;29(28): Crossett A, Kent BP, Klei L, Ringquist S, Trucco M, Roeder K, Devlin B. GEM uses the spectral graph methods described in Lee et al. (2010) to find a low dimensional representation of the genetic similarities between individuals, which is referred to as an eigenmap. A key feature of the eigenmap is D, the number of eigenvectors required to represent the variability in the data. For instance, to separate 3 major ancestry groups we usually need D=2 dimensions. D=1 models a cline. If D=0, this suggests the sample is genetically homogeneous. D is determined using a test of significance (Lee et al 2010). Assuming an eigenmap is constructed using a representative base sample, additional individuals can be projected onto the map using the Nystrom approximation (Crosset et al. 2010). Description GemTools is a package of functions to help the user account for genetic ancestry of a large number of individuals using spectral graph theory. The package has three components:

2 1) dacgem This function organizes a large number of individuals into smaller clusters of individuals with similar genetic ancestry. The approach samples a representative base sample to create an eigenmap of the genotype information. The remaining non base individuals are then projected into that eigenmap using a Nystrom projection. Working from the base sample, clusters are formed. Non base individuals are assigned to the cluster of their genetically closest base neighbor. 2) clustergem After the population is divided in clusters of manageable size, this function further sub divides the dacgem clusters until each subcluster is genetically homogeneous (D=0). In case of relative small data sets (<2000) this function can be used on the original data to generate traditional eigenvectors to account for genetic ancestry. 3) ccmatchgem This function finds the best matches among cases and controls based on ancestry within the clusters generated by dacgem. Matches are determined with the function fullmatch from the R library optmatch. Fullmatch creates strata that include 1 case matched to 1 or more controls, or 1 control matched to 1 or more cases. This function can be used on the subdivided data or on the complete data if it is relatively small in size. Results of this function can be used as strata in conditional logistic regression, or other genetic analyses. In addition to the three main functions there are 3 additional functions that will help the user plot (plotclusterspdf) and save (saveclusterstxt) the results from dacgem and clustergem as well as save (savematchtxt) the results from ccmatchgem. Part of the GEM package utilizes the library optmatch. This library should be downloaded from the R repository and loaded (library( optmatch )) before the function ccmatchgem is used. dacgem Usage dacgem(gnt, id, n.ind.base = 500, max.ind.cluster = 1000, min.dim = 2, max.dim = 15, method =c( homogeneous, quick ), verbose = c(true, FALSE)) Arguments gnt: the genotype matrix. Rows are individuals and columns are SNPs. id: a vector of unique ID strings for each person in the 'gnt' matrix. n.ind.base: the number of individuals who are chosen at random to be in the base set. This is also the number used in the base for all levels of sub clustering.

3 max.ind.cluster: the maximum number of individuals allowed in each final cluster. When after the initial clustering there are still more than max.ind.cluster individuals in a cluster, that cluster is further broken up into subclusters until each subcluster is small enough. min.dim: the minimum number of dimensions to be utilized in spectral decompositions. max.dim: the maximum number of dimensions to be considered significant in spectral decompositions. method: the desired method of clustering. The homogeneous" method is the default and creates clusters that have no significant spectral dimensions for the current (sub) data set. The quick method creates a number of clusters equal to the number of significant dimensions plus one in the spectral decomposition of the base set. verbose: toggles the amount of output, both written to the screen and in the values returned. TRUE is the default. Details min.dim influences the dimension of the eigenvectors in the elements of frames (the list of data frames with results from each round of clustering. Even if the base set for a particular cluster has D=0, min.dim dimensions will be used in the eigenmap of that cluster. When calculating the genetic distance between individuals max(d,min.dim) dimensions are used. The chosen method of clustering is used consistently at each level of the algorithm. When the homogeneous clustering method is applied to a cluster that already has D=0 the algorithm continutes to produce subclusters until the number of individuals in each cluster is less than max.ind.cluster. The quick method of clustering splits each cluster into max(d+1,2) subclusters. When D=0, it splits the cluster into the minimum number of groups necessary to get each group to have fewer than max.ind.cluster members. The quick method is recommended when the homogenous method creates many small subsets even though there were few significant dimensions. One reason for many small subsets is the existence of family members in the data, in particular twins, full sibs, and parent offspring pairs. These should be removed before clustering. clusters: a vector with final cluster labels (strings) for each individual. The names of the vector are the same as 'id'. The top level clusters are given by the first digit. Clusters that are broken up further are indicated by a _ symbol, ie. 3_1_2 indicates main cluster 3, subcluster 1, and subsubcluster 2. frames: only returned if verbose is set to TRUE. A list of data frames with detailed results of each level of clustering and subclustering. In each data frame, the rownames are individual ID strings. Column 1 is "cluster", the cluster labeling for that round of clustering. Column 2 is "is.base" which is 1 if the individual was selected to be in the base set for that round of clustering and 0 if the individual was projected into the eigenmap. The remaining columns are the eigenvectors from that round of clustering and projection. dictionary: only returned if verbose is set to TRUE. A data frame that describes which cluster was broken up in each round. The first column is the index of the 'frames' list and the round of clustering, the second column is the cluster that was broken into subclusters in that round, and the third column is D, the number of significant spectral dimensions for that round.

4 Example A worked out example is provided at the end of this file. Here we provide a sketch of a genotype input file and a bit of R code to show the use of dacgem This genotype input file is for the first 5 individuals and 10 SNP, assume the name of the file is gnt.in.txt: Ind Ind Ind Ind Ind etc SNP genotypes are coded as an allele count: i.e., 0 for the 1/1 genotype, 1 for the 1/2 genotype, 2 for the 2/2 genotype and anything else for all others (in the example 3 denotes 0/0). The following R code would ready these data for processing: gnt = read.table( gnt.in.txt, header = F) id = as.matrix(gnt[,1]) gnt = as.matrix(gnt[, 1]) gnt[gnt < 0 gnt > 2] = NA example.out = dacgem(id = id, gnt = gnt) ###this example is too small to work clustergem Usage clustergem(gnt, id, pre.clusters =NULL, min.dim = 2, max.dim = 15, verbose = c(true, FALSE)) Arguments gnt: the genotype matrix. Rows are individuals and columns are SNPs. id: a vector of unique ID strings for each person in the 'gnt' matrix. pre.cluster: a vector with information on the clusters to process. min.dim: the minimum number of dimensions to be considered significant in spectral decompositions. max.dim: the maximum number of dimensions to be considered significant in spectral decompositions. verbose: toggles the amount of output, both written to the screen and in the values returned. TRUE is the default.

5 Details pre.cluster is usually the output from dacgem. When there is no need to use dacgem to create manageable clusters using dacgem, pre.cluster does not need to be specified and all the data will be treated as coming from one cluster. min.dim influences the dimension of the eigenvectors in the elements of frames (the list of data frames with results from each round of clustering. Even if a particular cluster has 0 significant spectral dimensions, there will still be min.dim + 1 dimensions in the results for the branching of that cluster. When calculating the genetic distance between individuals a minimum number of min.dim dimensions are used even though there might be fewer significant dimensions. clusters: a vector with final cluster labels (strings) for each individual. The names of the vector are the same as 'id'. The top level clusters are given by the first digit. Clusters that are broken up further are indicated by a _" symbol. frames: only returned if verbose is set to TRUE. A list of data frames with detailed results of each level of clustering and subclustering. In each data frame, the rownames are individual ID strings. Column 1 is "cluster", the cluster labeling for that round of clustering. Column 2 is "is.base" which is 1 if the individual was selected to be in the base set for that round of clustering and 0 if the individual was projected into the base eigenspace. The remaining columns are the eigenvectors from that round of clustering and projection. dictionary: only returned if verbose is set to TRUE. A data frame that describes which cluster was broken up in each round. The first column is the index of the 'frames' list and the round of clustering, the second column is the cluster that was broken into subclusters in that round, and the third column is the number of significant spectral dimensions in the base set for that round. Example When data id and gnt are pre clustered using dacgem and the resulting output of dacgem is named example.out: example.cluster = clustergem(gnt = gnt, id = id, pre.clusters = example.out$clusters) When no pre clustering is available: example.cluster = clustergem(gnt = gnt, id = id) ccmatchgem Usage Restriction ccmatchgem relies on the optmatch package. Because of the usage restrictions of optmatch, ccmatchgem should only be used for academic purposes.

6 Usage ccmatchgem(gnt, id, dx, cdx = NULL, pre.clusters = NULL, min.dim = 2, max.dim = 15, verbose = c(true,false)) Arguments gnt: the genotype matrix. Rows are individuals and columns are SNPs. id: a vector of unique ID strings for each person in the 'gnt' matrix. dx: a vector or matrix with case control status for the individuals in id. cdx: a string with the name of the disease information to use. pre.clusters: a vector with information on the clusters to process. min.dim: the minimum number of dimensions to be considered significant in spectral decompositions. max.dim: the maximum number of dimensions to be considered significant in spectral decompositions. verbose: toggles the amount of output, both written to the screen and in the values returned. TRUE is the default. Details pre.cluster is usually the output from dacgem. When there is no need to use dacgem to create manageable clusters using dacgem, pre.cluster does not need to be specified and all the data will be clustered as one group. dx can either be a vector or a matrix with disease diagnosis information. Individuals coded 2 are considered to be cases, those that are coded 1 are controls, all others are considered to have unknown diagnosis and will not be used for matching. Either the attribute names in case of dx being a vector, or rownames in case of dx being a matrix will be used to match the diagnosis information to the id. cdx is only required when more than one diagnosis is specified in dx, cdx will then be matched to the attribute colnames of dx to determine which column of dx to use as the case control status information. min.dim influences the dimension of the eigenvectors in the elements of frames (the list of data frames with results from each round of clustering. Even if the base set for a particular cluster has 0 significant spectral dimensions, there will still be min.dim + 1 dimensions in the results for the branching of that cluster. When calculating the genetic distance between individuals a minimum number of min.dim dimensions are used even though there might be fewer significant dimensions. strata: a vector with final case control strata labels (strings) for each individual. The names of the vector are the same as 'id'. dx: a vector with the diagnosis status for each individual dist: only when verbose = TRUE, a vector with the distance of the individual to its closest genetic neighbor of the opposite diagnosis (case > closest control, control > closest case).

7 closest: only when verbose = TRUE, a vector of ids of the closest neighbors of the opposite diagnosis. Example Assume the following information is stored in the diagnosis file example.dx DX1 DX2 Ind1 2 2 Ind2 2 0 Ind4 2 2 Ind6 1 1 Ind etc. Read this information using the following R command Dx = read.table( example.dx, header = T) When data (id and gnt) are pre clustered using dacgem with resulting output in example.out, the command to match cases to controls for DX1 using ccmatchgem is: example.match = ccmatchgem(id = id, gnt = gnt, pre.cluster = example.out$clusters, dx = Dx, cdx = DX1 ) When no pre clustering is needed and the diagnosis file example2.dx has the following lay out: DXalt Ind1 2 Ind2 2 Ind4 0 Ind6 1 Ind23 1 etc. Dx = read.table( example2.dx, header = T) Example2.match = ccmatchgem(id = id, gnt = gnt, dx = Dx) plotclusterspdf Usage plotclusterspdf(out, step, root.pdf.file="anc_cluster")

8 Arguments out: Data frame in the format produced by dacgem or clustergem using the option verbose = TRUE in those two function calls. step: a numeric indicating the frames.index in out$dictionary. root: the root of the filename to use for the.pdf file. This name will be augmented with trunk from out$dictionary$trunk[step]. Default is anc_cluster. Details The symbols used for plotting are A for cluster 1, B for cluster 2, etc. When more than 26 clusters are formed in one step a will be used for cluster 27, b for cluster 28, etc. Ancestry plots created from dacgem output will show the base individuals plotted over the projected ones. In general the projected individuals will be concentrated in the center of the plots with the base individuals spread out, filling the complete space. This is typical when using projections. EV.0 is never plotted, this eigenvector represents an overall mean and is used in calculating distances between individuals. No values are generated by this function Example Assume the following is the information stored in example.out$dictionary from dacgem frames.index trunk base.sig.dims To plot the initial ancestry cluster for the full data (trunk = 0) in a pdf file starting with example, issue the following command plotclusterpdf(out = example.out, step = 1, file= example ) To plot the subclusters in trunk 3 issue the following command: plotclusterpdf(out = example.out, step = 2, file = example ) saveclusterstxt Usage saveclusterstxt(out, step, root.txt.file="anc_cluster") Arguments

9 out: Data frame in the format produced by dacgem or clustergem using the option verbose = TRUE in those two function calls. step: a numeric matching the values in frames.index in out$dictionary. root: the root of the filename to use for the.txt file. This name will be augmented with trunk from out$dictionary$trunk[step]. Default is anc_cluster. Details No values are generated by this function Example Assume the following information is stored in example.cluster$dictionary from clustergem frames.index trunk base.sig.dims _ _ _3 0 To save the ancestry information from trunk 3_2 to a txt file starting with example issue the following command: saveclustertxt(out = example.cluster, step = 4, file = example ) savematchestxt Usage saveclusterstxt(results, root.txt.file="matches") Arguments results: Data frame in the format produced by ccmatchgem root: the root of the filename to use for the.txt file. Details No values are generated by this function

10 Example Write the result from ccmatchgem that were stored in ccmatchgemresults to a file with the name matching.example.txt. savematchestxt(results = ccmatchgemresults, file = matching.example ) PRACTICAL NOTES Computer Requirements The method has been used successfully with a dataset of ~20,000 individuals and 12,000 selected SNP. Memory requirements for this data were ~5Gb and it took ~40 minutes to run the function dacgem on our computer (AMD Dual Core Opteron processor running at 2.6GHz with 32Gb of RAM). When more memory is available, larger datasets can be used. The method is approximately linear in memory requirements and computing time for both number of individuals and number of SNP. For large datasets you do need a computer with a 64 bit operating system and adequate amount of RAM (8Gb or more). Data Quality When using GemTools it advisable to use a set of ~5K to ~20K high quality SNPs. This means a high completion rate (> 99.9%), and minor allele frequency > 0.01 for the SNP. It is also suggested to take SNPs that are in low LD with each other (r 2 <0.01). For individuals, the data should be screened to remove duplicates and close relatives (full sibs, parent offspring). When these quality checks have not been used, GEM tends to find spurious dimensions of ancestry as well as many small homogeneous clusters with fewer than 5 individuals. Typical Results When starting with a global population it is typical to find 3 or 4 dimensions of ancestry on the first pass. This will break the global population in roughly African, Asian (East), Asian (South), European, and Latin ancestry. Depending on the sizes of the subclusters they will then be broken up in smaller ancestry groups. For African one typically sees 3 or 4 subgroups, a North South and East West cline can typically be found for the Europeans, etc. Keep in mind that dacgem will keep dividing clusters until all the clusters have fewer than max.ind.cluster individuals in them. Some of the later splits might just be splits to satisfy that requirement even though there is no real reason to split as far as ancestry is concerned. EXTENDED HGDP EXAMPLE

11 Genomic DNA samples from 1,043 individuals from around the world were collected by the Human Genome Diversity Project (HGDP), in a collaboration with the Centre Etude Polymorphism Humain (CEPH) in Paris. They represent 51 different populations from Africa, Europe, the Middle East, South and Central Asia, East Asia, Oceania and the Americas. For details on the individuals in this collection, see H. Cann et al. Science 296: (2002) and its Supplemental Data; Rosenberg et al. Science 298: (2002); and Rosenberg et al. PLoS Genetics 1: (2005). In this example we focus on individuals from two continents (Africa and Europe) with 4 and 7 tribes representing each continent, respectively. The African tribes are Biaka Pygmies (102), Mandenka (103), Mbuti Pygmies (104), and Yoruba (106). Tribes representing Europe are Adygei (538), French (539), French Basques (540), Italian (541), Orcadian (542), Russian (543), and Sardinian (544). The numbers between the brackets represent the last three digits of the id that is used in the example, i.e., HGDP123456_103 is an individual from the Mandenka tribe. In the file HGDP_example.R we provide a worked example of these data stored in a gzipped file HGDP.sub.gnt.gz which fully utilizes GemTools. The example R code includes extensive comments. In addition to analysis of the population structure, three approaches are provided that exhibit how to use the output from GemTools to control for structure in an analysis of association between genotype and phenotype.

Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation

Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation The American Journal of Human Genetics Supplemental Data Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation Chaolong Wang,

More information

User Manual for TreeMix v1.1. Joseph K. Pickrell, Jonathan K. Pritchard

User Manual for TreeMix v1.1. Joseph K. Pickrell, Jonathan K. Pritchard User Manual for TreeMix v1.1 Joseph K. Pickrell, Jonathan K. Pritchard October 1, 2012 Contents 1 Introduction 2 2 Installation 2 3 Input file format 2 3.1 SNP data..........................................

More information

Step-by-Step Guide to Advanced Genetic Analysis

Step-by-Step Guide to Advanced Genetic Analysis Step-by-Step Guide to Advanced Genetic Analysis Page 1 Introduction In the previous document, 1 we covered the standard genetic analyses available in JMP Genomics. Here, we cover the more advanced options

More information

Applications of admixture models

Applications of admixture models Applications of admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price Applications of admixture models 1 / 27

More information

A short manual for LFMM (command-line version)

A short manual for LFMM (command-line version) A short manual for LFMM (command-line version) Eric Frichot efrichot@gmail.com April 16, 2013 Please, print this reference manual only if it is necessary. This short manual aims to help users to run LFMM

More information

Package ridge. R topics documented: February 15, Title Ridge Regression with automatic selection of the penalty parameter. Version 2.

Package ridge. R topics documented: February 15, Title Ridge Regression with automatic selection of the penalty parameter. Version 2. Package ridge February 15, 2013 Title Ridge Regression with automatic selection of the penalty parameter Version 2.1-2 Date 2012-25-09 Author Erika Cule Linear and logistic ridge regression for small data

More information

SOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie

SOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie SOLOMON: Parentage Analysis 1 Corresponding author: Mark Christie christim@science.oregonstate.edu SOLOMON: Parentage Analysis 2 Table of Contents: Installing SOLOMON on Windows/Linux Pg. 3 Installing

More information

Package GWAF. March 12, 2015

Package GWAF. March 12, 2015 Type Package Package GWAF March 12, 2015 Title Genome-Wide Association/Interaction Analysis and Rare Variant Analysis with Family Data Version 2.2 Date 2015-03-12 Author Ming-Huei Chen

More information

Package snpstatswriter

Package snpstatswriter Type Package Package snpstatswriter February 20, 2015 Title Flexible writing of snpstats objects to flat files Version 1.5-6 Date 2013-12-05 Author Maintainer Write snpstats

More information

Step-by-Step Guide to Relatedness and Association Mapping Contents

Step-by-Step Guide to Relatedness and Association Mapping Contents Step-by-Step Guide to Relatedness and Association Mapping Contents OBJECTIVES... 2 INTRODUCTION... 2 RELATEDNESS MEASURES... 2 POPULATION STRUCTURE... 6 Q-K ASSOCIATION ANALYSIS... 10 K MATRIX COMPRESSION...

More information

Package GEM. R topics documented: January 31, Type Package

Package GEM. R topics documented: January 31, Type Package Type Package Package GEM January 31, 2018 Title GEM: fast association study for the interplay of Gene, Environment and Methylation Version 1.5.0 Date 2015-12-05 Author Hong Pan, Joanna D Holbrook, Neerja

More information

Network Based Models For Analysis of SNPs Yalta Opt

Network Based Models For Analysis of SNPs Yalta Opt Outline Network Based Models For Analysis of Yalta Optimization Conference 2010 Network Science Zeynep Ertem*, Sergiy Butenko*, Clare Gill** *Department of Industrial and Systems Engineering, **Department

More information

Bayesian analysis of genetic population structure using BAPS: Exercises

Bayesian analysis of genetic population structure using BAPS: Exercises Bayesian analysis of genetic population structure using BAPS: Exercises p S u k S u p u,s S, Jukka Corander Department of Mathematics, Åbo Akademi University, Finland Exercise 1: Clustering of groups of

More information

LFMM version Reference Manual (Graphical User Interface version)

LFMM version Reference Manual (Graphical User Interface version) LFMM version 1.2 - Reference Manual (Graphical User Interface version) Eric Frichot 1, Sean Schoville 1, Guillaume Bouchard 2, Olivier François 1 * 1. Université Joseph Fourier Grenoble, Centre National

More information

called Hadoop Distribution file System (HDFS). HDFS is designed to run on clusters of commodity hardware and is capable of handling large files. A fil

called Hadoop Distribution file System (HDFS). HDFS is designed to run on clusters of commodity hardware and is capable of handling large files. A fil Parallel Genome-Wide Analysis With Central And Graphic Processing Units Muhamad Fitra Kacamarga mkacamarga@binus.edu James W. Baurley baurley@binus.edu Bens Pardamean bpardamean@binus.edu Abstract The

More information

Package REGENT. R topics documented: August 19, 2015

Package REGENT. R topics documented: August 19, 2015 Package REGENT August 19, 2015 Title Risk Estimation for Genetic and Environmental Traits Version 1.0.6 Date 2015-08-18 Author Daniel J.M. Crouch, Graham H.M. Goddard & Cathryn M. Lewis Maintainer Daniel

More information

Package allehap. August 19, 2017

Package allehap. August 19, 2017 Package allehap August 19, 2017 Type Package Title Allele Imputation and Haplotype Reconstruction from Pedigree Databases Version 0.9.9 Date 2017-08-19 Author Nathan Medina-Rodriguez and Angelo Santana

More information

GMDR User Manual. GMDR software Beta 0.9. Updated March 2011

GMDR User Manual. GMDR software Beta 0.9. Updated March 2011 GMDR User Manual GMDR software Beta 0.9 Updated March 2011 1 As an open source project, the source code of GMDR is published and made available to the public, enabling anyone to copy, modify and redistribute

More information

Package lodgwas. R topics documented: November 30, Type Package

Package lodgwas. R topics documented: November 30, Type Package Type Package Package lodgwas November 30, 2015 Title Genome-Wide Association Analysis of a Biomarker Accounting for Limit of Detection Version 1.0-7 Date 2015-11-10 Author Ahmad Vaez, Ilja M. Nolte, Peter

More information

STAT 3304/5304 Introduction to Statistical Computing. Introduction to SAS

STAT 3304/5304 Introduction to Statistical Computing. Introduction to SAS STAT 3304/5304 Introduction to Statistical Computing Introduction to SAS What is SAS? SAS (originally an acronym for Statistical Analysis System, now it is not an acronym for anything) is a program designed

More information

Package SMAT. January 29, 2013

Package SMAT. January 29, 2013 Package SMAT January 29, 2013 Type Package Title Scaled Multiple-phenotype Association Test Version 0.98 Date 2013-01-26 Author Lin Li, Ph.D.; Elizabeth D. Schifano, Ph.D. Maintainer Lin Li ;

More information

CTL mapping in R. Danny Arends, Pjotr Prins, and Ritsert C. Jansen. University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1

CTL mapping in R. Danny Arends, Pjotr Prins, and Ritsert C. Jansen. University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1 CTL mapping in R Danny Arends, Pjotr Prins, and Ritsert C. Jansen University of Groningen Groningen Bioinformatics Centre & GCC Revision # 1 First written: Oct 2011 Last modified: Jan 2018 Abstract: Tutorial

More information

Package globalgsa. February 19, 2015

Package globalgsa. February 19, 2015 Type Package Package globalgsa February 19, 2015 Title Global -Set Analysis for Association Studies. Version 1.0 Date 2013-10-22 Author Natalia Vilor, M.Luz Calle Maintainer Natalia Vilor

More information

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017 BICF Nano Course: GWAS GWAS Workflow Development using PLINK Julia Kozlitina Julia.Kozlitina@UTSouthwestern.edu April 28, 2017 Getting started Open the Terminal (Search -> Applications -> Terminal), and

More information

4/4/16 Comp 555 Spring

4/4/16 Comp 555 Spring 4/4/16 Comp 555 Spring 2016 1 A clique is a graph where every vertex is connected via an edge to every other vertex A clique graph is a graph where each connected component is a clique The concept of clustering

More information

Estimating. Local Ancestry in admixed Populations (LAMP)

Estimating. Local Ancestry in admixed Populations (LAMP) Estimating Local Ancestry in admixed Populations (LAMP) QIAN ZHANG 572 6/05/2014 Outline 1) Sketch Method 2) Algorithm 3) Simulated Data: Accuracy Varying Pop1-Pop2 Ancestries r 2 pruning threshold Number

More information

REAP Software Documentation

REAP Software Documentation REAP Software Documentation Version 1.2 Timothy Thornton 1 Department of Biostatistics 1 The University of Washington 1 REAP A C program for estimating kinship coefficients and IBD sharing probabilities

More information

Lecture 20: Clustering and Evolution

Lecture 20: Clustering and Evolution Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 11/12/2013 Comp 465 Fall 2013 1 Clique Graphs A clique is a graph where every vertex is connected via an edge to every other vertex A clique

More information

Package RobustSNP. January 1, 2011

Package RobustSNP. January 1, 2011 Package RobustSNP January 1, 2011 Type Package Title Robust SNP association tests under different genetic models, allowing for covariates Version 1.0 Depends mvtnorm,car,snpmatrix Date 2010-07-11 Author

More information

Lecture 20: Clustering and Evolution

Lecture 20: Clustering and Evolution Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 11/11/2014 Comp 555 Bioalgorithms (Fall 2014) 1 Clique Graphs A clique is a graph where every vertex is connected via an edge to every other

More information

Package LEA. April 23, 2016

Package LEA. April 23, 2016 Package LEA April 23, 2016 Title LEA: an R package for Landscape and Ecological Association Studies Version 1.2.0 Date 2014-09-17 Author , Olivier Francois

More information

Dealing with heterogeneity: group-specific variances and stratified analyses

Dealing with heterogeneity: group-specific variances and stratified analyses Dealing with heterogeneity: group-specific variances and stratified analyses Tamar Sofer July 2017 1 / 32 The HCHS/SOL population is quite heterogeneous 1. Due to admixture: Hispanics are admixed with

More information

Ch.5 Classification and Clustering. In machine learning, there are two main types of learning problems, supervised and unsupervised learning.

Ch.5 Classification and Clustering. In machine learning, there are two main types of learning problems, supervised and unsupervised learning. Ch.5 Classification and Clustering In machine learning, there are two main types of learning problems, supervised and unsupervised learning. An analogy for the former is a French class where the teacher

More information

TRACE: fast and Robust Ancestry Coordinate Estimation version 1.02

TRACE: fast and Robust Ancestry Coordinate Estimation version 1.02 TRACE: fast and Robust Ancestry Coordinate Estimation version 1.02 Chaolong Wang 1 Computational and Systems Biology Genome Institute of Singapore A*STAR, Singapore 138672, Singapore February 21, 2016

More information

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015 STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................

More information

ELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2

ELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2 ELAI user manual Yongtao Guan Baylor College of Medicine Version 1.0 25 June 2015 Contents 1 Copyright 2 2 What ELAI Can Do 2 3 A simple example 2 4 Input file formats 3 4.1 Genotype file format....................................

More information

Genetic Analysis. Page 1

Genetic Analysis. Page 1 Genetic Analysis Page 1 Genetic Analysis Objectives: 1) Set up Case-Control Association analysis and the Basic Genetics Workflow 2) Use JMP tools to interact with and explore results 3) Learn advanced

More information

Importing and Merging Data Tutorial

Importing and Merging Data Tutorial Importing and Merging Data Tutorial Release 1.0 Golden Helix, Inc. February 17, 2012 Contents 1. Overview 2 2. Import Pedigree Data 4 3. Import Phenotypic Data 6 4. Import Genetic Data 8 5. Import and

More information

MAGA: Meta-Analysis of Gene-level Associations

MAGA: Meta-Analysis of Gene-level Associations MAGA: Meta-Analysis of Gene-level Associations SYNOPSIS MAGA [--sfile] [--chr] OPTIONS Option Default Description --sfile specification.txt Select a specification file --chr Select a chromosome DESCRIPTION

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

Step-by-Step Guide to Basic Genetic Analysis

Step-by-Step Guide to Basic Genetic Analysis Step-by-Step Guide to Basic Genetic Analysis Page 1 Introduction This document shows you how to clean up your genetic data, assess its statistical properties and perform simple analyses such as case-control

More information

Axiom Analysis Suite Release Notes (For research use only. Not for use in diagnostic procedures.)

Axiom Analysis Suite Release Notes (For research use only. Not for use in diagnostic procedures.) Axiom Analysis Suite 4.0.1 Release Notes (For research use only. Not for use in diagnostic procedures.) Axiom Analysis Suite 4.0.1 includes the following changes/updates: 1. For library packages that support

More information

GMDR User Manual Version 1.0

GMDR User Manual Version 1.0 GMDR User Manual Version 1.0 Oct 30, 2011 1 GMDR is a free, open-source interaction analysis tool, aimed to perform gene-gene interaction with generalized multifactor dimensionality methods. GMDR is being

More information

Package allelematch. R topics documented: February 19, Type Package

Package allelematch. R topics documented: February 19, Type Package Type Package Package allelematch February 19, 2015 Title Identifying unique multilocus genotypes where genotyping error and missing data may be present Version 2.5 Date 2014-09-18 Author Paul Galpern

More information

Package gpart. November 19, 2018

Package gpart. November 19, 2018 Package gpart November 19, 2018 Title Human genome partitioning of dense sequencing data by identifying haplotype blocks Version 1.0.0 Depends R (>= 3.5.0), grid, Homo.sapiens, TxDb.Hsapiens.UCSC.hg38.knownGene,

More information

Package OmicKriging. August 29, 2016

Package OmicKriging. August 29, 2016 Type Package Title Poly-Omic Prediction of Complex TRaits Version 1.4.0 Date 2016-03-03 Package OmicKriging August 29, 2016 Author Hae Kyung Im, Heather E. Wheeler, Keston Aquino Michaels, Vassily Trubetskoy

More information

Relative Constraints as Features

Relative Constraints as Features Relative Constraints as Features Piotr Lasek 1 and Krzysztof Lasek 2 1 Chair of Computer Science, University of Rzeszow, ul. Prof. Pigonia 1, 35-510 Rzeszow, Poland, lasek@ur.edu.pl 2 Institute of Computer

More information

Package EBglmnet. January 30, 2016

Package EBglmnet. January 30, 2016 Type Package Package EBglmnet January 30, 2016 Title Empirical Bayesian Lasso and Elastic Net Methods for Generalized Linear Models Version 4.1 Date 2016-01-15 Author Anhui Huang, Dianting Liu Maintainer

More information

Package MultiMeta. February 19, 2015

Package MultiMeta. February 19, 2015 Type Package Package MultiMeta February 19, 2015 Title Meta-analysis of Multivariate Genome Wide Association Studies Version 0.1 Date 2014-08-21 Author Dragana Vuckovic Maintainer Dragana Vuckovic

More information

PRSice: Polygenic Risk Score software - Vignette

PRSice: Polygenic Risk Score software - Vignette PRSice: Polygenic Risk Score software - Vignette Jack Euesden, Paul O Reilly March 22, 2016 1 The Polygenic Risk Score process PRSice ( precise ) implements a pipeline that has become standard in Polygenic

More information

Package SimGbyE. July 20, 2009

Package SimGbyE. July 20, 2009 Package SimGbyE July 20, 2009 Type Package Title Simulated case/control or survival data sets with genetic and environmental interactions. Author Melanie Wilson Maintainer Melanie

More information

Click on "+" button Select your VCF data files (see #Input Formats->1 above) Remove file from files list:

Click on + button Select your VCF data files (see #Input Formats->1 above) Remove file from files list: CircosVCF: CircosVCF is a web based visualization tool of genome-wide variant data described in VCF files using circos plots. The provided visualization capabilities, gives a broad overview of the genomic

More information

FVGWAS- 3.0 Manual. 1. Schematic overview of FVGWAS

FVGWAS- 3.0 Manual. 1. Schematic overview of FVGWAS FVGWAS- 3.0 Manual Hongtu Zhu @ UNC BIAS Chao Huang @ UNC BIAS Nov 8, 2015 More and more large- scale imaging genetic studies are being widely conducted to collect a rich set of imaging, genetic, and clinical

More information

Package asymld. August 29, 2016

Package asymld. August 29, 2016 Package asymld August 29, 2016 Type Package Title Asymmetric Linkage Disequilibrium (ALD) for Polymorphic Genetic Data Version 0.1 Date 2016-01-29 Author Richard M. Single Maintainer Richard M. Single

More information

RLMM - Robust Linear Model with Mahalanobis Distance Classifier

RLMM - Robust Linear Model with Mahalanobis Distance Classifier RLMM - Robust Linear Model with Mahalanobis Distance Classifier Nusrat Rabbee and Gary Wong June 13, 2018 Contents 1 Introduction 1 2 Instructions for Genotyping Affymetrix Mapping 100K array - Xba set

More information

Application of Spectral Clustering Algorithm

Application of Spectral Clustering Algorithm 1/27 Application of Spectral Clustering Algorithm Danielle Middlebrooks dmiddle1@math.umd.edu Advisor: Kasso Okoudjou kasso@umd.edu Department of Mathematics University of Maryland- College Park Advance

More information

Polymorphism and Variant Analysis Lab

Polymorphism and Variant Analysis Lab Polymorphism and Variant Analysis Lab Arian Avalos PowerPoint by Casey Hanson Polymorphism and Variant Analysis Matt Hudson 2018 1 Exercise In this exercise, we will do the following:. 1. Gain familiarity

More information

Association Analysis of Sequence Data using PLINK/SEQ (PSEQ)

Association Analysis of Sequence Data using PLINK/SEQ (PSEQ) Association Analysis of Sequence Data using PLINK/SEQ (PSEQ) Copyright (c) 2018 Stanley Hooker, Biao Li, Di Zhang and Suzanne M. Leal Purpose PLINK/SEQ (PSEQ) is an open-source C/C++ library for working

More information

QUICKTEST user guide

QUICKTEST user guide QUICKTEST user guide Toby Johnson Zoltán Kutalik December 11, 2008 for quicktest version 0.94 Copyright c 2008 Toby Johnson and Zoltán Kutalik Permission is granted to copy, distribute and/or modify this

More information

Global modelling of air pollution using multiple data sources

Global modelling of air pollution using multiple data sources Global modelling of air pollution using multiple data sources Matthew Thomas M.L.Thomas@bath.ac.uk Supervised by Dr. Gavin Shaddick In collaboration with IHME and WHO June 14, 2016 1/ 1 MOTIVATION Air

More information

CircosVCF workshop, TAU, 9/11/2017

CircosVCF workshop, TAU, 9/11/2017 CircosVCF exercise In this exercise, we will create and design circos plots using CircosVCF. We will use vcf files of a published case "X-linked elliptocytosis with impaired growth is related to mutated

More information

Updates and Case Study

Updates and Case Study Archipelago Measurement Infrastructure Updates and Case Study Young Hyun CAIDA ISMA 2010 AIMS Workshop Feb 9, 2010 2 Outline Introduction Monitor Deployment Measurements & Collaborations Tools Development

More information

LEA: An R Package for Landscape and Ecological Association Studies

LEA: An R Package for Landscape and Ecological Association Studies LEA: An R Package for Landscape and Ecological Association Studies Eric Frichot and Olivier François Université Grenoble-Alpes, Centre National de la Recherche Scientifique, TIMC-IMAG UMR 5525, Grenoble,

More information

Hidden Markov Models in the context of genetic analysis

Hidden Markov Models in the context of genetic analysis Hidden Markov Models in the context of genetic analysis Vincent Plagnol UCL Genetics Institute November 22, 2012 Outline 1 Introduction 2 Two basic problems Forward/backward Baum-Welch algorithm Viterbi

More information

JMP Clinical. Release Notes. Version 5.0

JMP Clinical. Release Notes. Version 5.0 JMP Clinical Version 5.0 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP, A Business Unit of SAS SAS Campus Drive

More information

Maximizing Public Data Sources for Sequencing and GWAS

Maximizing Public Data Sources for Sequencing and GWAS Maximizing Public Data Sources for Sequencing and GWAS February 4, 2014 G Bryce Christensen Director of Services Questions during the presentation Use the Questions pane in your GoToWebinar window Agenda

More information

DeltaGen: Quick start manual

DeltaGen: Quick start manual 1 DeltaGen: Quick start manual Dr. Zulfi Jahufer & Dr. Dongwen Luo CONTENTS Page Main operations tab commands 2 Uploading a data file 3 Matching variable identifiers 4 Data check 5 Univariate analysis

More information

To finish the current project and start a new project. File Open a text data

To finish the current project and start a new project. File Open a text data GGEbiplot version 5 In addition to being the most complete, most powerful, and most user-friendly software package for biplot analysis, GGEbiplot also has powerful components for on-the-fly data manipulation,

More information

CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM

CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM 96 CHAPTER 6 IDENTIFICATION OF CLUSTERS USING VISUAL VALIDATION VAT ALGORITHM Clustering is the process of combining a set of relevant information in the same group. In this process KM algorithm plays

More information

11/17/2009 Comp 590/Comp Fall

11/17/2009 Comp 590/Comp Fall Lecture 20: Clustering and Evolution Study Chapter 10.4 10.8 Problem Set #5 will be available tonight 11/17/2009 Comp 590/Comp 790-90 Fall 2009 1 Clique Graphs A clique is a graph with every vertex connected

More information

Package detectruns. February 6, 2018

Package detectruns. February 6, 2018 Type Package Package detectruns February 6, 2018 Title Detect Runs of Homozygosity and Runs of Heterozygosity in Diploid Genomes Version 0.9.5 Date 2018-02-05 Detection of runs of homozygosity and of heterozygosity

More information

Package inversion. R topics documented: July 18, Type Package. Title Inversions in genotype data. Version

Package inversion. R topics documented: July 18, Type Package. Title Inversions in genotype data. Version Package inversion July 18, 2013 Type Package Title Inversions in genotype data Version 1.8.0 Date 2011-05-12 Author Alejandro Caceres Maintainer Package to find genetic inversions in genotype (SNP array)

More information

GenViewer Tutorial / Manual

GenViewer Tutorial / Manual GenViewer Tutorial / Manual Table of Contents Importing Data Files... 2 Configuration File... 2 Primary Data... 4 Primary Data Format:... 4 Connectivity Data... 5 Module Declaration File Format... 5 Module

More information

The Economist rate card 2017 (GBP)

The Economist rate card 2017 (GBP) The Economist rate card 2017 (GBP) The Economist newspaper, Digital Editions app, Snapchat, and Global Business Review The Economist allows you to reach our influential audience through print and our award

More information

The LDheatmap Package

The LDheatmap Package The LDheatmap Package May 6, 2006 Title Graphical display of pairwise linkage disequilibria between SNPs Version 0.2-1 Author Ji-Hyung Shin , Sigal Blay , Nicholas Lewin-Koh

More information

MACAU User Manual. Xiang Zhou. March 15, 2017

MACAU User Manual. Xiang Zhou. March 15, 2017 MACAU User Manual Xiang Zhou March 15, 2017 Contents 1 Introduction 2 1.1 What is MACAU...................................... 2 1.2 How to Cite MACAU................................... 2 1.3 The Model.........................................

More information

Predicting Popular Xbox games based on Search Queries of Users

Predicting Popular Xbox games based on Search Queries of Users 1 Predicting Popular Xbox games based on Search Queries of Users Chinmoy Mandayam and Saahil Shenoy I. INTRODUCTION This project is based on a completed Kaggle competition. Our goal is to predict which

More information

Package RPMM. August 10, 2010

Package RPMM. August 10, 2010 Package RPMM August 10, 2010 Type Package Title Recursively Partitioned Mixture Model Version 1.06 Date 2009-11-16 Author E. Andres Houseman, Sc.D. Maintainer E. Andres Houseman

More information

Package LEA. December 24, 2017

Package LEA. December 24, 2017 Package LEA December 24, 2017 Title LEA: an R package for Landscape and Ecological Association Studies Version 2.0.0 Date 2017-04-03 Author , Olivier Francois

More information

The Economist rate card 2017 (USD)

The Economist rate card 2017 (USD) The Economist rate card 2017 (USD) The Economist newspaper, Digital Editions app, Snapchat, and Global Business Review The Economist allows you to reach our influential audience through print and our award

More information

Package kofnga. November 24, 2015

Package kofnga. November 24, 2015 Type Package Package kofnga November 24, 2015 Title A Genetic Algorithm for Fixed-Size Subset Selection Version 1.2 Date 2015-11-24 Author Mark A. Wolters Maintainer Mark A. Wolters

More information

500K Data Analysis Workflow using BRLMM

500K Data Analysis Workflow using BRLMM 500K Data Analysis Workflow using BRLMM I. INTRODUCTION TO BRLMM ANALYSIS TOOL... 2 II. INSTALLATION AND SET-UP... 2 III. HARDWARE REQUIREMENTS... 3 IV. BRLMM ANALYSIS TOOL WORKFLOW... 3 V. RESULTS/OUTPUT

More information

bimm vignette Matti Pirinen & Christian Benner University of Helsinki November 15, 2016

bimm vignette Matti Pirinen & Christian Benner University of Helsinki November 15, 2016 bimm vignette Matti Pirinen & Christian Benner University of Helsinki November 15, 2016 1 Introduction bimm is a software package to efficiently estimate variance parameters of a bivariate lineax mixed

More information

Package ibbig. R topics documented: December 24, 2018

Package ibbig. R topics documented: December 24, 2018 Type Package Title Iterative Binary Biclustering of Genesets Version 1.26.0 Date 2011-11-23 Author Daniel Gusenleitner, Aedin Culhane Package ibbig December 24, 2018 Maintainer Aedin Culhane

More information

Genetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland

Genetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland Genetic Programming Charles Chilaka Department of Computational Science Memorial University of Newfoundland Class Project for Bio 4241 March 27, 2014 Charles Chilaka (MUN) Genetic algorithms and programming

More information

arxiv: v2 [q-bio.qm] 17 Nov 2013

arxiv: v2 [q-bio.qm] 17 Nov 2013 arxiv:1308.2150v2 [q-bio.qm] 17 Nov 2013 GeneZip: A software package for storage-efficient processing of genotype data Palmer, Cameron 1 and Pe er, Itsik 1 1 Center for Computational Biology and Bioinformatics,

More information

Drug versus Disease (DrugVsDisease) package

Drug versus Disease (DrugVsDisease) package 1 Introduction Drug versus Disease (DrugVsDisease) package The Drug versus Disease (DrugVsDisease) package provides a pipeline for the comparison of drug and disease gene expression profiles where negatively

More information

STAT 540 Computing in Statistics

STAT 540 Computing in Statistics STAT 540 Computing in Statistics Introduces programming skills in two important statistical computer languages/packages. 30-40% R and 60-70% SAS Examples of Programming Skills: 1. Importing Data from External

More information

Midterm I Exam Principles of Imperative Computation André Platzer Ananda Gunawardena. February 23, Name: Andrew ID: Section:

Midterm I Exam Principles of Imperative Computation André Platzer Ananda Gunawardena. February 23, Name: Andrew ID: Section: Midterm I Exam 15-122 Principles of Imperative Computation André Platzer Ananda Gunawardena February 23, 2012 Name: Andrew ID: Section: Instructions This exam is closed-book with one sheet of notes permitted.

More information

Package MOJOV. R topics documented: February 19, 2015

Package MOJOV. R topics documented: February 19, 2015 Type Package Title Mojo Variants: Rare Variants analysis Version 1.0.1 Date 2013-02-25 Author Maintainer Package MOJOV February 19, 2015 A package for analysis between rare variants

More information

Package PedCNV. February 19, 2015

Package PedCNV. February 19, 2015 Type Package Package PedCNV February 19, 2015 Title An implementation for association analysis with CNV data. Version 0.1 Date 2013-08-03 Author, Sungho Won and Weicheng Zhu Maintainer

More information

Package PTE. October 10, 2017

Package PTE. October 10, 2017 Type Package Title Personalized Treatment Evaluator Version 1.6 Date 2017-10-9 Package PTE October 10, 2017 Author Adam Kapelner, Alina Levine & Justin Bleich Maintainer Adam Kapelner

More information

Package ukbtools. February 5, 2018

Package ukbtools. February 5, 2018 Version 0.10.1 Title Manipulate and Explore UK Biobank Data Package ukbtools February 5, 2018 Maintainer Ken Hanscombe A set of tools to create a UK Biobank

More information

Microsoft IT Leverages its Compute Service to Virtualize SharePoint 2010

Microsoft IT Leverages its Compute Service to Virtualize SharePoint 2010 Microsoft IT Leverages its Compute Service to Virtualize SharePoint 2010 Published: June 2011 The following content may no longer reflect Microsoft s current position or infrastructure. This content should

More information

Tree Models of Similarity and Association. Clustering and Classification Lecture 5

Tree Models of Similarity and Association. Clustering and Classification Lecture 5 Tree Models of Similarity and Association Clustering and Lecture 5 Today s Class Tree models. Hierarchical clustering methods. Fun with ultrametrics. 2 Preliminaries Today s lecture is based on the monograph

More information

Genetic type 1 Error Calculator (GEC)

Genetic type 1 Error Calculator (GEC) Genetic type 1 Error Calculator (GEC) (Version 0.2) User Manual Miao-Xin Li Department of Psychiatry and State Key Laboratory for Cognitive and Brain Sciences; the Centre for Reproduction, Development

More information

BEAGLECALL 1.0. Brian L. Browning Department of Medicine Division of Medical Genetics University of Washington. 15 November 2010

BEAGLECALL 1.0. Brian L. Browning Department of Medicine Division of Medical Genetics University of Washington. 15 November 2010 BEAGLECALL 1.0 Brian L. Browning Department of Medicine Division of Medical Genetics University of Washington 15 November 2010 BEAGLECALL 1.0 P a g e i Contents 1 Introduction... 1 1.1 Citing BEAGLECALL...

More information

A GENETIC ALGORITHM FOR CLUSTERING ON VERY LARGE DATA SETS

A GENETIC ALGORITHM FOR CLUSTERING ON VERY LARGE DATA SETS A GENETIC ALGORITHM FOR CLUSTERING ON VERY LARGE DATA SETS Jim Gasvoda and Qin Ding Department of Computer Science, Pennsylvania State University at Harrisburg, Middletown, PA 17057, USA {jmg289, qding}@psu.edu

More information

Package QCEWAS. R topics documented: February 1, Type Package

Package QCEWAS. R topics documented: February 1, Type Package Type Package Package QCEWAS February 1, 2019 Title Fast and Easy Quality Control of EWAS Results Files Version 1.2-2 Date 2019-02-01 Author Peter J. van der Most, Leanne K. Kupers, Ilja Nolte Maintainer

More information

Biology Project 1

Biology Project 1 Biology 6317 Project 1 Data and illustrations courtesy of Professor Tony Frankino, Department of Biology/Biochemistry 1. Background The data set www.math.uh.edu/~charles/wing_xy.dat has measurements related

More information