Estimating. Local Ancestry in admixed Populations (LAMP)

Size: px
Start display at page:

Download "Estimating. Local Ancestry in admixed Populations (LAMP)"

Transcription

1 Estimating Local Ancestry in admixed Populations (LAMP) QIAN ZHANG 572 6/05/2014

2 Outline 1) Sketch Method 2) Algorithm 3) Simulated Data: Accuracy Varying Pop1-Pop2 Ancestries r 2 pruning threshold Number of SNPs to be analyzed

3 Ordered genotype 01 at this SNP (Possibilities: 00, 01, 10, 11) In one person s genome Position Chr (Mom) Chr (Dad) 1. Sketch Method

4 = Chromosome from Population 1 (e.g. Africans) = Chromosome from Population 2 (e.g. Europeans) Admixed Person (e.g. African American) LAMP Fraction Pop 2 Ancestry Position 1. Sketch Method

5 In each window, for each person, say that person has 0, 1, or 2 alleles from Pop 2 For each SNP position, majority vote over windows to get number of Pop 2 alleles: i=1 i=2 0 1 i=3 1 2 Window (L SNPs) Slide 1 allele from Pop 2 1. Sketch Method

6 INPUTS - SNP positions - Unphased genotypes (0/0, 0/1, or 1/1) of admixed people - K = 2 populations - Generations g = 7 of admixing - Fractions (α 1, α 2 ) = (0.8, 0.2) of all genomes in sample from Pop 1, Pop 2 - r: Recombination rate (# of breakpoints per meiosis per generation) - Offset: Fraction of window s bp length by which to shift window OUTPUT: - Fraction of Pop 2 alleles at each SNP position in each person 1. Sketch Method

7 Outline 1) Sketch Method 2) Algorithm 3) Simulated Data: Accuracy Varying Pop1-Pop2 Ancestries r 2 pruning threshold Number of SNPs to be analyzed 2. Algorithm

8 1) Choose the window length L - Want big: Enough SNPs to tell apart ancestries - Want small: No breakpoint in the window - Expected fraction of bp positions in a window of 2 Chr s that is incorrectly called due to a breakpoint. Fraction = 1/2 here: Call: Truth: Pick the largest L such that epsilon bounds the expected fraction of errors in a window L e (g -1)r(a1)(a2) 2. Algorithm

9 For each window: 2) Prune SNPS to get independence - Plink: Window of 50, slide 5, remove SNP if r 2 > 0.1 (what I did) - Greedy removal of SNPs with r 2 > 0.1 (paper) 3) MAXVAR algorithm to get initial ancestry estimates 0, 1, or 2 for a window 4) ICM iterates Eqn s (1) and (2), with EM to solve Eqn (2) Get a window s call 0, 1, or 2, for the number of Pop 2 alleles for each person For person i, θ i = θ 1 i, θ 2 i 1, 2 x 1, 2, like an unordered genotype # of Pop 2 alleles θ i (1,1) (1,2) = (2,1) (2,2) 2. Algorithm

10 If know: - MAFs f 1,, f K f K1,, f Kn of K ancestral populations - Genotypes G 1,, G m of m individuals at n SNPs Then estimate individual i s ancestries θ i = θ 1 i, θ 2 i 1,, K x 1,, K in the window as θ i = argmax θ(i) P θ(i) f 1,, f K, G i ) (1) E-Step = argmax θ i P G i f 1,, f K, θ i ) P(θ i f 1,, f K ) Mode, not mean, so called ICM not EM. 2. Algorithm

11 θ i = argmax θ(i) P θ(i) f 1,, f K, G i ) (1) E-Step = argmax θ i P G i f 1,, f K, θ i ) P(θ i f 1,, f K ) s, t P(θ i = s, t f 1,, f K ) = α s α t 1[s = t] + 2α s α t 1[s t] 2. Algorithm

12 If know: - Individual i s ancestries θ i = θ 1 i, θ 2 i 1,, K x 1,, K - Genotypes G 1,, G m of m individuals at n SNPs Then estimate ancestral MAFs f 1,, f K as argmax f1,,f K m i=1 P(G i f 1,, f K, θ(i)) (2) 2. How LAMP Works

13 If ordered genotype G ij, then MLEs of f 1,, f K that maximize Equation (2): Count minor alleles from each population. Divide by number of alleles from the population. i = MLE of f 1 = ( 1 3, 1 2, 1 2, 0) i = j MLE of f 2 = (1, 1 2, 1, 1 3 ) 2. Algorithm

14 If unordered genotype, cannot simply count to solve equation (2), because see: not Genotype 0/1 with ancestry 1/2: Count minor allele 1 to Pop 1 or Pop 2? λ j i is a K-long vector with 1 in the s-th entry if the minor allele gets assigned to Pop s: 2. Algorithm and P λ j i = e θ1 (i) f 1j,, f Kj, θ i ) = f θ1 i j(1 f θ2 i j) P λ j i = e θ2 (i) f 1j,, f Kj, θ i ) = f θ2 i j(1 f θ1 i j) P λ j i = s e θ1 (i), e θ2 (i) f 1j,, f Kj, θ i ) = 0

15 If unordered genotypes, write equation (2) in terms of λ j i and solve via EM: m i=1 j not amb ) (2) argmax f1,,f ( P(G K j amb ij f 1,, f K, θ i ) ( P(G ij f 1,, f K, θ i ) For ambiguous j, P(G ij f 1,, f K, θ i ) =P λ j i = e θ1 (i) f i1,, f Kj, θ i ) + P λ j i = e θ2 (i) f i1,, f Kj, θ i ) 2. Algorithm

16 Let λ j,s i = s-th coordinate of λ j i, indicating whether the minor allele at position j in individual i gets assigned ancestry s. Start θ(i) from MAXVAR E-Step: λ j,s i = E λ j,s i f θ1 i j, f θ2 i j, θ i, G ij = 1) = f θ1 i j 1 f θ2 i j f θ1 i j 1 f θ2 i j + (1 f θ1 i j)f θ2 i j, if s = θ 1 i f θ2 i j 1 f θ1 i j f θ1 i j 1 f θ2 i j + (1 f θ1 i j)f θ2 i j, if s = θ 2 i 2. Algorithm

17 M-Step: 2n sj 2,2 + n sj 2,1 + n sj 1,2 + j ambiguous λ j,s i f sj = 2n sj 2,2 + 2n sj 2,1 + 2n sj 2,0 + n sj 1,2 + n sj sj 1,1 + n 1,0 - n sj k,u = Number of individuals with u 0,1,2 minor alleles and k 1,2 copies of alleles from population s at site j - Iterate EM to get MLEs of f 1,, f K solving Eq (2) - Plug MLEs of f 1,, f K into Eq (1) 2. Algorithm

18 Outline 1) Sketch Method 2) Algorithm 3) Simulated Data: Accuracy Varying Pop1-Pop2 Ancestries r 2 pruning threshold Number of SNPs to be analyzed

19 Simulate Data - Chr 1 SNP genotypes from HapMap - YRI: African. CEU: European. JPT: Japanese. CHB: Chinese. - Repeat for g = 7 generations: - Let (α 1, α 2 ) = (0.8, 0.2) - {α 1 m people from Pop 1, α 2 m people from Pop 2} = Combined set - Create child Chr 1: Randomly pick parents from combined set, generate recombinations with r = 10 8, randomly transmit a chromosome from each parent - Create children until there are m children 3. Simulated Data: Accuracy Varying

20 Analyze bottom generation of admixed data with LAMP, given different parameters (Pop1Pop2, r 2 pruning threshold, number n of SNPs) Under each setting, obtain matrix of ancestry estimates: Individual i 0, 1, or 2 for the number of alleles from Pop 2 SNP j 3. Simulated Data: Accuracy Varying

21 Vary Pop 1-Pop 2 Setting Number pruned SNPs (my LAMP) Number pruned SNPs (real LAMP) Overall accuracy Pop1-Pop2 (my LAMP) 1 yri-ceu ceu-jpt jpt-chb Common across settings: r 2 = 0.1, m = 10 people, n = 4000 SNPs. Overall accuracy (real LAMP) - Real LAMP: Accuracy goes down. Harder to distinguish similar populations. - My LAMP: Accuracy about same - Real LAMP is doing much worse than paper s > 90% accuracies: - Paper used m = 500 => Estimate f s better - Despite same r 2, different numbers of SNPs pruned in 3. Simulated Data: Accuracy Varying

22 Accuracy by Person (for YRI CEU) Accuracy (Avg over SNPs) Accuracy (Mean over SNPs) Accuracy By Person (YRI-CEU) Red: My LAMP Black: Real LAMP ID Individual ID (1-10)

23 Accuracy by Person (for CEU JPT) Accuracy (Mean over SNPs) Accuracy (Mean over SNPs) Accuracy By Person (CEU-JPT) Red: My LAMP Black: Real LAMP ID Individual ID (1-10)

24 Accuracy by Person (for JPT CHB) Accuracy (Mean over SNPs) Accuracy (Mean over SNPs) Accuracy By Person (JPT-CHB) Red: My LAMP Black: Real LAMP ID Individual ID (1-10)

25 Vary r 2 pruning threshold Number pruned Overall accuracy (my Setting r 2 SNPs (my LAMP) LAMP) Common to settings: P1-P2 = YRI-CEU, m = 10 people, n = 4000 SNPs - Accuracy goes down very slightly - Consistent with paper s slight change in accuracy when varying r 2 (m = 500, n = 38,864) 3. Simulated Data: Accuracy Varying

26 Vary number n of SNPs Setting n snps Number pruned SNPs (my LAMP) Overall accuracy (my LAMP) Common to settings: P1-P2 = YRI-CEU, m = 10 people, r 2 = Reduce n Lower accuracy - Intuition: n = 4000 SNPs has 9 windows, while n = 1000 SNPs has 1 window, so no majority vote 3. Simulated Data: Accuracy Varying

27 Questions?

28 Similarity score between individuals i1 and i2 S i 1, i 2 = n j=1 G i1,j u j σ j 2 G i2,j u j which is like a sample correlation(g i1, G i2 ). Higher score, more similar. G i,j is the Genotype (# of minor alleles) at position j in individual i. u j = σ 2 j = m G ij i=1 is the Genotype at position j averaged over m individuals i. m m 2 G ij u j i=1 is the variance in Genotype m 2. How LAMP Works

29 A) For each individual i, Var(i) = S i, i 2 i :i i measures how genetically similar i is to everyone else (the other m - 1 people). B) Find the person i* who is most genetically similar to everyone else: i* = argmax i Var i. C) Assign initial ancestry estimates of θ i based on how similar i is to i*: - Assume - - i* has ancestry (2, 2) 0 S(i*, i*) For each individual i, assign ancestry to i: S(i, i*) - Lowest 1 α 2 n scores S i, i (1, 1) - Highest α 2 n scores S i, i (2, 2) - Everyone else (1, 2) (Pop1, Pop 1) (Pop1, Pop 2) (Pop2, Pop 2) θ i = (θ 1 i, θ 2 (i)) 2. How LAMP Works

Applications of admixture models

Applications of admixture models Applications of admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price Applications of admixture models 1 / 27

More information

ToCatchAThief c ryan campbell & jenn coughlan 7/23/2018

ToCatchAThief c ryan campbell & jenn coughlan 7/23/2018 ToCatchAThief c ryan campbell & jenn coughlan 7/23/2018 Welcome to the To Catch a Thief: With Data! walkthrough! https://bioconductor.org/packages/devel/ bioc/vignettes/snprelate/inst/doc/snprelatetutorial.html

More information

ELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2

ELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2 ELAI user manual Yongtao Guan Baylor College of Medicine Version 1.0 25 June 2015 Contents 1 Copyright 2 2 What ELAI Can Do 2 3 A simple example 2 4 Input file formats 3 4.1 Genotype file format....................................

More information

Polymorphism and Variant Analysis Lab

Polymorphism and Variant Analysis Lab Polymorphism and Variant Analysis Lab Arian Avalos PowerPoint by Casey Hanson Polymorphism and Variant Analysis Matt Hudson 2018 1 Exercise In this exercise, we will do the following:. 1. Gain familiarity

More information

Lab Manual for Anth/Biol c Alan Rogers and Jon Seger

Lab Manual for Anth/Biol c Alan Rogers and Jon Seger Lab Manual for Anth/Biol 5221 c Alan Rogers and Jon Seger August 20, 2015 Contents 4 Simulating Genetic Drift and Mutation 3 4.1 The easy way to sample from an urn........................... 3 4.2 Iterating

More information

The Continuous Genetic Algorithm. Universidad de los Andes-CODENSA

The Continuous Genetic Algorithm. Universidad de los Andes-CODENSA The Continuous Genetic Algorithm Universidad de los Andes-CODENSA 1. Components of a Continuous Genetic Algorithm The flowchart in figure1 provides a big picture overview of a continuous GA.. Figure 1.

More information

REAP Software Documentation

REAP Software Documentation REAP Software Documentation Version 1.2 Timothy Thornton 1 Department of Biostatistics 1 The University of Washington 1 REAP A C program for estimating kinship coefficients and IBD sharing probabilities

More information

Genetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland

Genetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland Genetic Programming Charles Chilaka Department of Computational Science Memorial University of Newfoundland Class Project for Bio 4241 March 27, 2014 Charles Chilaka (MUN) Genetic algorithms and programming

More information

Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation

Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation The American Journal of Human Genetics Supplemental Data Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation Chaolong Wang,

More information

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017

BICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017 BICF Nano Course: GWAS GWAS Workflow Development using PLINK Julia Kozlitina Julia.Kozlitina@UTSouthwestern.edu April 28, 2017 Getting started Open the Terminal (Search -> Applications -> Terminal), and

More information

Network Based Models For Analysis of SNPs Yalta Opt

Network Based Models For Analysis of SNPs Yalta Opt Outline Network Based Models For Analysis of Yalta Optimization Conference 2010 Network Science Zeynep Ertem*, Sergiy Butenko*, Clare Gill** *Department of Industrial and Systems Engineering, **Department

More information

Evolutionary Computation. Chao Lan

Evolutionary Computation. Chao Lan Evolutionary Computation Chao Lan Outline Introduction Genetic Algorithm Evolutionary Strategy Genetic Programming Introduction Evolutionary strategy can jointly optimize multiple variables. - e.g., max

More information

PriVar documentation

PriVar documentation PriVar documentation PriVar is a cross-platform Java application toolkit to prioritize variants (SNVs and InDels) from exome or whole genome sequencing data by using different filtering strategies and

More information

Package SimGbyE. July 20, 2009

Package SimGbyE. July 20, 2009 Package SimGbyE July 20, 2009 Type Package Title Simulated case/control or survival data sets with genetic and environmental interactions. Author Melanie Wilson Maintainer Melanie

More information

Association Analysis of Sequence Data using PLINK/SEQ (PSEQ)

Association Analysis of Sequence Data using PLINK/SEQ (PSEQ) Association Analysis of Sequence Data using PLINK/SEQ (PSEQ) Copyright (c) 2018 Stanley Hooker, Biao Li, Di Zhang and Suzanne M. Leal Purpose PLINK/SEQ (PSEQ) is an open-source C/C++ library for working

More information

Clustering and The Expectation-Maximization Algorithm

Clustering and The Expectation-Maximization Algorithm Clustering and The Expectation-Maximization Algorithm Unsupervised Learning Marek Petrik 3/7 Some of the figures in this presentation are taken from An Introduction to Statistical Learning, with applications

More information

Haplotype Analysis. 02 November 2003 Mendel Short IGES Slide 1

Haplotype Analysis. 02 November 2003 Mendel Short IGES Slide 1 Haplotype Analysis Specifies the genetic information descending through a pedigree Useful visualization of the gene flow through a pedigree A haplotype for a given individual and set of loci is defined

More information

Estimation of haplotypes

Estimation of haplotypes Estimation of haplotypes Cavan Reilly October 4, 2013 Table of contents Estimating haplotypes with the EM algorithm Individual level haplotypes Testing for differences in haplotype frequency Using the

More information

Step-by-Step Guide to Relatedness and Association Mapping Contents

Step-by-Step Guide to Relatedness and Association Mapping Contents Step-by-Step Guide to Relatedness and Association Mapping Contents OBJECTIVES... 2 INTRODUCTION... 2 RELATEDNESS MEASURES... 2 POPULATION STRUCTURE... 6 Q-K ASSOCIATION ANALYSIS... 10 K MATRIX COMPRESSION...

More information

Step-by-Step Guide to Basic Genetic Analysis

Step-by-Step Guide to Basic Genetic Analysis Step-by-Step Guide to Basic Genetic Analysis Page 1 Introduction This document shows you how to clean up your genetic data, assess its statistical properties and perform simple analyses such as case-control

More information

Mutations for Permutations

Mutations for Permutations Mutations for Permutations Insert mutation: Pick two allele values at random Move the second to follow the first, shifting the rest along to accommodate Note: this preserves most of the order and adjacency

More information

Population structure Jerome Goudet and Bruce Weir

Population structure Jerome Goudet and Bruce Weir Population structure Jerome Goudet and Bruce Weir 2018-07-13 Contents Individual populations Betas....................................... 1 Hapmap data 6 Sliding windows...............................................

More information

Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2015 1 Sequence Alignment Dannie Durand Pairwise Sequence Alignment The goal of pairwise sequence alignment is to establish a correspondence between the

More information

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Search & Optimization Search and Optimization method deals with

More information

SOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie

SOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie SOLOMON: Parentage Analysis 1 Corresponding author: Mark Christie christim@science.oregonstate.edu SOLOMON: Parentage Analysis 2 Table of Contents: Installing SOLOMON on Windows/Linux Pg. 3 Installing

More information

The k-means Algorithm and Genetic Algorithm

The k-means Algorithm and Genetic Algorithm The k-means Algorithm and Genetic Algorithm k-means algorithm Genetic algorithm Rough set approach Fuzzy set approaches Chapter 8 2 The K-Means Algorithm The K-Means algorithm is a simple yet effective

More information

MAGA: Meta-Analysis of Gene-level Associations

MAGA: Meta-Analysis of Gene-level Associations MAGA: Meta-Analysis of Gene-level Associations SYNOPSIS MAGA [--sfile] [--chr] OPTIONS Option Default Description --sfile specification.txt Select a specification file --chr Select a chromosome DESCRIPTION

More information

Cover Page. The handle holds various files of this Leiden University dissertation.

Cover Page. The handle   holds various files of this Leiden University dissertation. Cover Page The handle http://hdl.handle.net/1887/32015 holds various files of this Leiden University dissertation. Author: Akker, Erik Ben van den Title: Computational biology in human aging : an omics

More information

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio. 1990. CS 466 Saurabh Sinha Motivation Sequence homology to a known protein suggest function of newly sequenced protein Bioinformatics

More information

Genome Assembly Using de Bruijn Graphs. Biostatistics 666

Genome Assembly Using de Bruijn Graphs. Biostatistics 666 Genome Assembly Using de Bruijn Graphs Biostatistics 666 Previously: Reference Based Analyses Individual short reads are aligned to reference Genotypes generated by examining reads overlapping each position

More information

Hidden Markov Models in the context of genetic analysis

Hidden Markov Models in the context of genetic analysis Hidden Markov Models in the context of genetic analysis Vincent Plagnol UCL Genetics Institute November 22, 2012 Outline 1 Introduction 2 Two basic problems Forward/backward Baum-Welch algorithm Viterbi

More information

User manual for Trendsetter

User manual for Trendsetter User manual for Trendsetter Mehreen R. Mughal and Michael DeGiorgio May 11, 2018 Contents 1 Introduction 2 2 Operation 2 2.1 Installation.............................................. 2 2.2 Calculating

More information

Forensic Resource/Reference On Genetics knowledge base: FROG-kb User s Manual. Updated June, 2017

Forensic Resource/Reference On Genetics knowledge base: FROG-kb User s Manual. Updated June, 2017 Forensic Resource/Reference On Genetics knowledge base: FROG-kb User s Manual Updated June, 2017 Table of Contents 1. Introduction... 1 2. Accessing FROG-kb Home Page and Features... 1 3. Home Page and

More information

Spotter Documentation Version 0.5, Released 4/12/2010

Spotter Documentation Version 0.5, Released 4/12/2010 Spotter Documentation Version 0.5, Released 4/12/2010 Purpose Spotter is a program for delineating an association signal from a genome wide association study using features such as recombination rates,

More information

GWAsimulator: A rapid whole-genome simulation program

GWAsimulator: A rapid whole-genome simulation program GWAsimulator: A rapid whole-genome simulation program Version 1.1 Chun Li and Mingyao Li September 21, 2007 (revised October 9, 2007) 1. Introduction...1 2. Download and compile the program...2 3. Input

More information

Grouping preprocess for haplotype inference from SNP and CNV data

Grouping preprocess for haplotype inference from SNP and CNV data Journal of Physics: Conference Series Grouping preprocess for haplotype inference from SNP and CNV data Recent citations - A haplotype inference method based on sparsely connected multi-body ising model

More information

On a Divide and Conquer Approach for Haplotype Inference with Pure Parsimony

On a Divide and Conquer Approach for Haplotype Inference with Pure Parsimony On a Divide and Conquer Approach for Haplotype Inference with Pure Parsimony Konstantinos Kalpakis, and Parag Namjoshi Department of Computer Science and Electrical Engineering University of Maryland Baltimore

More information

GSNAP: Fast and SNP-tolerant detection of complex variants and splicing in short reads by Thomas D. Wu and Serban Nacu

GSNAP: Fast and SNP-tolerant detection of complex variants and splicing in short reads by Thomas D. Wu and Serban Nacu GSNAP: Fast and SNP-tolerant detection of complex variants and splicing in short reads by Thomas D. Wu and Serban Nacu Matt Huska Freie Universität Berlin Computational Methods for High-Throughput Omics

More information

MAGMA manual (version 1.06)

MAGMA manual (version 1.06) MAGMA manual (version 1.06) TABLE OF CONTENTS OVERVIEW 3 QUICKSTART 4 ANNOTATION 6 OVERVIEW 6 RUNNING THE ANNOTATION 6 ADDING AN ANNOTATION WINDOW AROUND GENES 7 RESTRICTING THE ANNOTATION TO A SUBSET

More information

User s Guide. Version 2.2. Semex Alliance, Ontario and Centre for Genetic Improvement of Livestock University of Guelph, Ontario

User s Guide. Version 2.2. Semex Alliance, Ontario and Centre for Genetic Improvement of Livestock University of Guelph, Ontario User s Guide Version 2.2 Semex Alliance, Ontario and Centre for Genetic Improvement of Livestock University of Guelph, Ontario Mehdi Sargolzaei, Jacques Chesnais and Flavio Schenkel Jan 2014 Disclaimer

More information

Note Set 4: Finite Mixture Models and the EM Algorithm

Note Set 4: Finite Mixture Models and the EM Algorithm Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for

More information

Documentation for BayesAss 1.3

Documentation for BayesAss 1.3 Documentation for BayesAss 1.3 Program Description BayesAss is a program that estimates recent migration rates between populations using MCMC. It also estimates each individual s immigrant ancestry, the

More information

Data Analysis & Probability

Data Analysis & Probability Unit 5 Probability Distributions Name: Date: Hour: Section 7.2: The Standard Normal Distribution (Area under the curve) Notes By the end of this lesson, you will be able to Find the area under the standard

More information

Information Fusion Dr. B. K. Panigrahi

Information Fusion Dr. B. K. Panigrahi Information Fusion By Dr. B. K. Panigrahi Asst. Professor Department of Electrical Engineering IIT Delhi, New Delhi-110016 01/12/2007 1 Introduction Classification OUTLINE K-fold cross Validation Feature

More information

Package GHap. R topics documented: February 18, 2017

Package GHap. R topics documented: February 18, 2017 Type Package Title Genome-Wide Haplotyping Version 1.2.2 Date 2017-02-18 Author Yuri Tani Utsunomiya, Marco Milanesi Package GHap February 18, 2017 Maintainer Yuri Tani Utsunomiya

More information

Genetic type 1 Error Calculator (GEC)

Genetic type 1 Error Calculator (GEC) Genetic type 1 Error Calculator (GEC) (Version 0.2) User Manual Miao-Xin Li Department of Psychiatry and State Key Laboratory for Cognitive and Brain Sciences; the Centre for Reproduction, Development

More information

Outline. Motivation. Introduction of GAs. Genetic Algorithm 9/7/2017. Motivation Genetic algorithms An illustrative example Hypothesis space search

Outline. Motivation. Introduction of GAs. Genetic Algorithm 9/7/2017. Motivation Genetic algorithms An illustrative example Hypothesis space search Outline Genetic Algorithm Motivation Genetic algorithms An illustrative example Hypothesis space search Motivation Evolution is known to be a successful, robust method for adaptation within biological

More information

LASER: Locating Ancestry from SEquence Reads version 2.04

LASER: Locating Ancestry from SEquence Reads version 2.04 LASER: Locating Ancestry from SEquence Reads version 2.04 Chaolong Wang 1 Computational and Systems Biology Genome Institute of Singapore A*STAR, Singapore 138672, Singapore Xiaowei Zhan 2 Department of

More information

PRSice: Polygenic Risk Score software - Vignette

PRSice: Polygenic Risk Score software - Vignette PRSice: Polygenic Risk Score software - Vignette Jack Euesden, Paul O Reilly March 22, 2016 1 The Polygenic Risk Score process PRSice ( precise ) implements a pipeline that has become standard in Polygenic

More information

Package EMLRT. August 7, 2014

Package EMLRT. August 7, 2014 Package EMLRT August 7, 2014 Type Package Title Association Studies with Imputed SNPs Using Expectation-Maximization-Likelihood-Ratio Test LazyData yes Version 1.0 Date 2014-08-01 Author Maintainer

More information

Random Forest in Genomic Selection

Random Forest in Genomic Selection Random Forest in genomic selection 1 Dpto Mejora Genética Animal, INIA, Madrid; Universidad Politécnica de Valencia, 20-24 September, 2010. Outline 1 Remind 2 Random Forest Introduction Classification

More information

Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis

Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis Barbara E. Engelhardt 1 *, Matthew Stephens 2 1 Computer Science Department, University of Chicago,

More information

Introduction to Genetic Algorithms. Genetic Algorithms

Introduction to Genetic Algorithms. Genetic Algorithms Introduction to Genetic Algorithms Genetic Algorithms We ve covered enough material that we can write programs that use genetic algorithms! More advanced example of using arrays Could be better written

More information

Package gpart. November 19, 2018

Package gpart. November 19, 2018 Package gpart November 19, 2018 Title Human genome partitioning of dense sequencing data by identifying haplotype blocks Version 1.0.0 Depends R (>= 3.5.0), grid, Homo.sapiens, TxDb.Hsapiens.UCSC.hg38.knownGene,

More information

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016 CPSC 34: Machine Learning and Data Mining Feature Selection Fall 26 Assignment 3: Admin Solutions will be posted after class Wednesday. Extra office hours Thursday: :3-2 and 4:3-6 in X836. Midterm Friday:

More information

SNP HiTLink Manual. Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1

SNP HiTLink Manual. Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1 SNP HiTLink Manual Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1 1 Department of Neurology, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan 2 Dynacom Co., Ltd, Kanagawa,

More information

The Lander-Green Algorithm in Practice. Biostatistics 666

The Lander-Green Algorithm in Practice. Biostatistics 666 The Lander-Green Algorithm in Practice Biostatistics 666 Last Lecture: Lander-Green Algorithm More general definition for I, the "IBD vector" Probability of genotypes given IBD vector Transition probabilities

More information

Genetic Analysis. Page 1

Genetic Analysis. Page 1 Genetic Analysis Page 1 Genetic Analysis Objectives: 1) Set up Case-Control Association analysis and the Basic Genetics Workflow 2) Use JMP tools to interact with and explore results 3) Learn advanced

More information

We extend SVM s in order to support multi-class classification problems. Consider the training dataset

We extend SVM s in order to support multi-class classification problems. Consider the training dataset p. / One-versus-the-Rest We extend SVM s in order to support multi-class classification problems. Consider the training dataset D = {(x, y ),(x, y ),..., (x l, y l )} R n {,..., M}, where the label y i

More information

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar Ensemble Learning: An Introduction Adapted from Slides by Tan, Steinbach, Kumar 1 General Idea D Original Training data Step 1: Create Multiple Data Sets... D 1 D 2 D t-1 D t Step 2: Build Multiple Classifiers

More information

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2017

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2017 CPSC 340: Machine Learning and Data Mining Feature Selection Fall 2017 Assignment 2: Admin 1 late day to hand in tonight, 2 for Wednesday, answers posted Thursday. Extra office hours Thursday at 4pm (ICICS

More information

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Midterm Exam Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Covers topics through Decision Trees and Random Forests (does not include constraint satisfaction) Closed book 8.5 x 11 sheet with notes

More information

Bayesian analysis of genetic population structure using BAPS: Exercises

Bayesian analysis of genetic population structure using BAPS: Exercises Bayesian analysis of genetic population structure using BAPS: Exercises p S u k S u p u,s S, Jukka Corander Department of Mathematics, Åbo Akademi University, Finland Exercise 1: Clustering of groups of

More information

The fgwas Package. Version 1.0. Pennsylvannia State University

The fgwas Package. Version 1.0. Pennsylvannia State University The fgwas Package Version 1.0 Zhong Wang 1 and Jiahan Li 2 1 Department of Public Health Science, 2 Department of Statistics, Pennsylvannia State University 1. Introduction The fgwas Package (Functional

More information

Clustering: Classic Methods and Modern Views

Clustering: Classic Methods and Modern Views Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Package inversion. R topics documented: July 18, Type Package. Title Inversions in genotype data. Version

Package inversion. R topics documented: July 18, Type Package. Title Inversions in genotype data. Version Package inversion July 18, 2013 Type Package Title Inversions in genotype data Version 1.8.0 Date 2011-05-12 Author Alejandro Caceres Maintainer Package to find genetic inversions in genotype (SNP array)

More information

A Computational Theory of Clustering

A Computational Theory of Clustering A Computational Theory of Clustering Avrim Blum Carnegie Mellon University Based on work joint with Nina Balcan, Anupam Gupta, and Santosh Vempala Point of this talk A new way to theoretically analyze

More information

Vignette for the package rehh (version 2+) Mathieu Gautier, Alexander Klassmann and Renaud Vitalis 24/10/2016

Vignette for the package rehh (version 2+) Mathieu Gautier, Alexander Klassmann and Renaud Vitalis 24/10/2016 Vignette for the package rehh (version 2+) Mathieu Gautier, Alexander Klassmann and Renaud Vitalis 24/10/2016 Contents 1 Input Files 2 1.1 Haplotype data file..........................................

More information

KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li

KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual Miao-Xin Li, Jiang Li Department of Psychiatry Centre for Genomic Sciences Department

More information

A Comparison of Heuristic Algorithms to Solve the View Selection Problem

A Comparison of Heuristic Algorithms to Solve the View Selection Problem A A Comparison of Heuristic Algorithms to Solve the View Selection Problem Ray Hylock Management Sciences Department, University of Iowa Iowa City, IA 52242, ray-hylock@uiowa.edu vital component to the

More information

Ensembles. An ensemble is a set of classifiers whose combined results give the final decision. test feature vector

Ensembles. An ensemble is a set of classifiers whose combined results give the final decision. test feature vector Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier 3 super classifier result 1 * *A model is the learned

More information

Genetic Algorithms. PHY 604: Computational Methods in Physics and Astrophysics II

Genetic Algorithms. PHY 604: Computational Methods in Physics and Astrophysics II Genetic Algorithms Genetic Algorithms Iterative method for doing optimization Inspiration from biology General idea (see Pang or Wikipedia for more details): Create a collection of organisms/individuals

More information

MAGMA manual (version 1.05)

MAGMA manual (version 1.05) MAGMA manual (version 1.05) TABLE OF CONTENTS OVERVIEW 3 QUICKSTART 4 ANNOTATION 6 OVERVIEW 6 RUNNING THE ANNOTATION 6 ADDING AN ANNOTATION WINDOW AROUND GENES 7 RESTRICTING THE ANNOTATION TO A SUBSET

More information

RAD Population Genomics Programs Paul Hohenlohe 6/2014

RAD Population Genomics Programs Paul Hohenlohe 6/2014 RAD Population Genomics Programs Paul Hohenlohe (hohenlohe@uidaho.edu) 6/2014 I. Overview These programs are designed to conduct population genomic analysis on RAD sequencing data. They were designed for

More information

More on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.

More on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5. More on Neural Networks Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.6 Recall the MLP Training Example From Last Lecture log likelihood

More information

Classification. 1 o Semestre 2007/2008

Classification. 1 o Semestre 2007/2008 Classification Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 Single-Class

More information

Missing Data and Imputation

Missing Data and Imputation Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH, 2017 Outline Types of missing data Simple methods for dealing with missing data Single and multiple imputation R example Missing data is a complex

More information

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra

1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra Pattern Recall Analysis of the Hopfield Neural Network with a Genetic Algorithm Susmita Mohapatra Department of Computer Science, Utkal University, India Abstract: This paper is focused on the implementation

More information

A Course on Meta-Heuristic Search Methods for Combinatorial Optimization Problems

A Course on Meta-Heuristic Search Methods for Combinatorial Optimization Problems A Course on Meta-Heuristic Search Methods for Combinatorial Optimization Problems AutOrI LAB, DIA, Roma Tre Email: mandal@dia.uniroma3.it January 16, 2014 Outline 1 An example Assignment-I Tips Variants

More information

Genetic Programming. Modern optimization methods 1

Genetic Programming. Modern optimization methods 1 Genetic Programming Developed in USA during 90 s Patented by J. Koza Solves typical problems: Prediction, classification, approximation, programming Properties Competitor of neural networks Need for huge

More information

Statistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes.

Statistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes. Outliers Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Concepts What is an outlier? The set of data points that are considerably different than the remainder of the

More information

Leaning Graphical Model Structures using L1-Regularization Paths (addendum)

Leaning Graphical Model Structures using L1-Regularization Paths (addendum) Leaning Graphical Model Structures using -Regularization Paths (addendum) Mark Schmidt and Kevin Murphy Computer Science Dept. University of British Columbia {schmidtm,murphyk}@cs.ubc.ca 1 Introduction

More information

LD vignette Measures of linkage disequilibrium

LD vignette Measures of linkage disequilibrium LD vignette Measures of linkage disequilibrium David Clayton June 13, 2018 Calculating linkage disequilibrium statistics We shall first load some illustrative data. > data(ld.example) The data are drawn

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information

Estimating Variance Components in MMAP

Estimating Variance Components in MMAP Last update: 6/1/2014 Estimating Variance Components in MMAP MMAP implements routines to estimate variance components within the mixed model. These estimates can be used for likelihood ratio tests to compare

More information

Chapter 6: Linear Model Selection and Regularization

Chapter 6: Linear Model Selection and Regularization Chapter 6: Linear Model Selection and Regularization As p (the number of predictors) comes close to or exceeds n (the sample size) standard linear regression is faced with problems. The variance of the

More information

Outline Purpose How to analyze algorithms Examples. Algorithm Analysis. Seth Long. January 15, 2010

Outline Purpose How to analyze algorithms Examples. Algorithm Analysis. Seth Long. January 15, 2010 January 15, 2010 Intuitive Definitions Common Runtimes Final Notes Compare space and time requirements for algorithms Understand how an algorithm scales with larger datasets Intuitive Definitions Outline

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Introduction to Hail. Cotton Seed, Technical Lead Tim Poterba, Software Engineer Hail Team, Neale Lab Broad Institute and MGH

Introduction to Hail. Cotton Seed, Technical Lead Tim Poterba, Software Engineer Hail Team, Neale Lab Broad Institute and MGH Introduction to Hail Cotton Seed, Technical Lead Tim Poterba, Software Engineer Hail Team, Neale Lab Broad Institute and MGH Why Hail? Genetic data is becoming absolutely massive Broad Genomics, by the

More information

GENETIC ALGORITHM with Hands-On exercise

GENETIC ALGORITHM with Hands-On exercise GENETIC ALGORITHM with Hands-On exercise Adopted From Lecture by Michael Negnevitsky, Electrical Engineering & Computer Science University of Tasmania 1 Objective To understand the processes ie. GAs Basic

More information

Tuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017

Tuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017 Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 The Story so Far: Generative Models 1 The definition of translation probability follows a mathematical derivation argmax e p(e f) = argmax

More information

Lecturers: Sanjam Garg and Prasad Raghavendra March 20, Midterm 2 Solutions

Lecturers: Sanjam Garg and Prasad Raghavendra March 20, Midterm 2 Solutions U.C. Berkeley CS70 : Algorithms Midterm 2 Solutions Lecturers: Sanjam Garg and Prasad aghavra March 20, 207 Midterm 2 Solutions. (0 points) True/False Clearly put your answers in the answer box in front

More information

Chapter 11 Search Algorithms for Discrete Optimization Problems

Chapter 11 Search Algorithms for Discrete Optimization Problems Chapter Search Algorithms for Discrete Optimization Problems (Selected slides) A. Grama, A. Gupta, G. Karypis, and V. Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003.

More information

Approximation Techniques for Utilitarian Mechanism Design

Approximation Techniques for Utilitarian Mechanism Design Approximation Techniques for Utilitarian Mechanism Design Department of Computer Science RWTH Aachen Germany joint work with Patrick Briest and Piotr Krysta 05/16/2006 1 Introduction to Utilitarian Mechanism

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 22.1 Introduction We spent the last two lectures proving that for certain problems, we can

More information

Workload Characterization Techniques

Workload Characterization Techniques Workload Characterization Techniques Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/

More information

A Genetic Algorithm-Based Approach for Energy- Efficient Clustering of Wireless Sensor Networks

A Genetic Algorithm-Based Approach for Energy- Efficient Clustering of Wireless Sensor Networks A Genetic Algorithm-Based Approach for Energy- Efficient Clustering of Wireless Sensor Networks A. Zahmatkesh and M. H. Yaghmaee Abstract In this paper, we propose a Genetic Algorithm (GA) to optimize

More information

Evolutionary tree reconstruction (Chapter 10)

Evolutionary tree reconstruction (Chapter 10) Evolutionary tree reconstruction (Chapter 10) Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships between species since Darwin till early

More information

HybridCheck User Manual

HybridCheck User Manual HybridCheck User Manual Ben J. Ward February 2015 HybridCheck is a software package to visualise the recombination signal in assembled next generation sequence data, and it can be used to detect recombination,

More information