Estimating. Local Ancestry in admixed Populations (LAMP)
|
|
- Miles Waters
- 5 years ago
- Views:
Transcription
1 Estimating Local Ancestry in admixed Populations (LAMP) QIAN ZHANG 572 6/05/2014
2 Outline 1) Sketch Method 2) Algorithm 3) Simulated Data: Accuracy Varying Pop1-Pop2 Ancestries r 2 pruning threshold Number of SNPs to be analyzed
3 Ordered genotype 01 at this SNP (Possibilities: 00, 01, 10, 11) In one person s genome Position Chr (Mom) Chr (Dad) 1. Sketch Method
4 = Chromosome from Population 1 (e.g. Africans) = Chromosome from Population 2 (e.g. Europeans) Admixed Person (e.g. African American) LAMP Fraction Pop 2 Ancestry Position 1. Sketch Method
5 In each window, for each person, say that person has 0, 1, or 2 alleles from Pop 2 For each SNP position, majority vote over windows to get number of Pop 2 alleles: i=1 i=2 0 1 i=3 1 2 Window (L SNPs) Slide 1 allele from Pop 2 1. Sketch Method
6 INPUTS - SNP positions - Unphased genotypes (0/0, 0/1, or 1/1) of admixed people - K = 2 populations - Generations g = 7 of admixing - Fractions (α 1, α 2 ) = (0.8, 0.2) of all genomes in sample from Pop 1, Pop 2 - r: Recombination rate (# of breakpoints per meiosis per generation) - Offset: Fraction of window s bp length by which to shift window OUTPUT: - Fraction of Pop 2 alleles at each SNP position in each person 1. Sketch Method
7 Outline 1) Sketch Method 2) Algorithm 3) Simulated Data: Accuracy Varying Pop1-Pop2 Ancestries r 2 pruning threshold Number of SNPs to be analyzed 2. Algorithm
8 1) Choose the window length L - Want big: Enough SNPs to tell apart ancestries - Want small: No breakpoint in the window - Expected fraction of bp positions in a window of 2 Chr s that is incorrectly called due to a breakpoint. Fraction = 1/2 here: Call: Truth: Pick the largest L such that epsilon bounds the expected fraction of errors in a window L e (g -1)r(a1)(a2) 2. Algorithm
9 For each window: 2) Prune SNPS to get independence - Plink: Window of 50, slide 5, remove SNP if r 2 > 0.1 (what I did) - Greedy removal of SNPs with r 2 > 0.1 (paper) 3) MAXVAR algorithm to get initial ancestry estimates 0, 1, or 2 for a window 4) ICM iterates Eqn s (1) and (2), with EM to solve Eqn (2) Get a window s call 0, 1, or 2, for the number of Pop 2 alleles for each person For person i, θ i = θ 1 i, θ 2 i 1, 2 x 1, 2, like an unordered genotype # of Pop 2 alleles θ i (1,1) (1,2) = (2,1) (2,2) 2. Algorithm
10 If know: - MAFs f 1,, f K f K1,, f Kn of K ancestral populations - Genotypes G 1,, G m of m individuals at n SNPs Then estimate individual i s ancestries θ i = θ 1 i, θ 2 i 1,, K x 1,, K in the window as θ i = argmax θ(i) P θ(i) f 1,, f K, G i ) (1) E-Step = argmax θ i P G i f 1,, f K, θ i ) P(θ i f 1,, f K ) Mode, not mean, so called ICM not EM. 2. Algorithm
11 θ i = argmax θ(i) P θ(i) f 1,, f K, G i ) (1) E-Step = argmax θ i P G i f 1,, f K, θ i ) P(θ i f 1,, f K ) s, t P(θ i = s, t f 1,, f K ) = α s α t 1[s = t] + 2α s α t 1[s t] 2. Algorithm
12 If know: - Individual i s ancestries θ i = θ 1 i, θ 2 i 1,, K x 1,, K - Genotypes G 1,, G m of m individuals at n SNPs Then estimate ancestral MAFs f 1,, f K as argmax f1,,f K m i=1 P(G i f 1,, f K, θ(i)) (2) 2. How LAMP Works
13 If ordered genotype G ij, then MLEs of f 1,, f K that maximize Equation (2): Count minor alleles from each population. Divide by number of alleles from the population. i = MLE of f 1 = ( 1 3, 1 2, 1 2, 0) i = j MLE of f 2 = (1, 1 2, 1, 1 3 ) 2. Algorithm
14 If unordered genotype, cannot simply count to solve equation (2), because see: not Genotype 0/1 with ancestry 1/2: Count minor allele 1 to Pop 1 or Pop 2? λ j i is a K-long vector with 1 in the s-th entry if the minor allele gets assigned to Pop s: 2. Algorithm and P λ j i = e θ1 (i) f 1j,, f Kj, θ i ) = f θ1 i j(1 f θ2 i j) P λ j i = e θ2 (i) f 1j,, f Kj, θ i ) = f θ2 i j(1 f θ1 i j) P λ j i = s e θ1 (i), e θ2 (i) f 1j,, f Kj, θ i ) = 0
15 If unordered genotypes, write equation (2) in terms of λ j i and solve via EM: m i=1 j not amb ) (2) argmax f1,,f ( P(G K j amb ij f 1,, f K, θ i ) ( P(G ij f 1,, f K, θ i ) For ambiguous j, P(G ij f 1,, f K, θ i ) =P λ j i = e θ1 (i) f i1,, f Kj, θ i ) + P λ j i = e θ2 (i) f i1,, f Kj, θ i ) 2. Algorithm
16 Let λ j,s i = s-th coordinate of λ j i, indicating whether the minor allele at position j in individual i gets assigned ancestry s. Start θ(i) from MAXVAR E-Step: λ j,s i = E λ j,s i f θ1 i j, f θ2 i j, θ i, G ij = 1) = f θ1 i j 1 f θ2 i j f θ1 i j 1 f θ2 i j + (1 f θ1 i j)f θ2 i j, if s = θ 1 i f θ2 i j 1 f θ1 i j f θ1 i j 1 f θ2 i j + (1 f θ1 i j)f θ2 i j, if s = θ 2 i 2. Algorithm
17 M-Step: 2n sj 2,2 + n sj 2,1 + n sj 1,2 + j ambiguous λ j,s i f sj = 2n sj 2,2 + 2n sj 2,1 + 2n sj 2,0 + n sj 1,2 + n sj sj 1,1 + n 1,0 - n sj k,u = Number of individuals with u 0,1,2 minor alleles and k 1,2 copies of alleles from population s at site j - Iterate EM to get MLEs of f 1,, f K solving Eq (2) - Plug MLEs of f 1,, f K into Eq (1) 2. Algorithm
18 Outline 1) Sketch Method 2) Algorithm 3) Simulated Data: Accuracy Varying Pop1-Pop2 Ancestries r 2 pruning threshold Number of SNPs to be analyzed
19 Simulate Data - Chr 1 SNP genotypes from HapMap - YRI: African. CEU: European. JPT: Japanese. CHB: Chinese. - Repeat for g = 7 generations: - Let (α 1, α 2 ) = (0.8, 0.2) - {α 1 m people from Pop 1, α 2 m people from Pop 2} = Combined set - Create child Chr 1: Randomly pick parents from combined set, generate recombinations with r = 10 8, randomly transmit a chromosome from each parent - Create children until there are m children 3. Simulated Data: Accuracy Varying
20 Analyze bottom generation of admixed data with LAMP, given different parameters (Pop1Pop2, r 2 pruning threshold, number n of SNPs) Under each setting, obtain matrix of ancestry estimates: Individual i 0, 1, or 2 for the number of alleles from Pop 2 SNP j 3. Simulated Data: Accuracy Varying
21 Vary Pop 1-Pop 2 Setting Number pruned SNPs (my LAMP) Number pruned SNPs (real LAMP) Overall accuracy Pop1-Pop2 (my LAMP) 1 yri-ceu ceu-jpt jpt-chb Common across settings: r 2 = 0.1, m = 10 people, n = 4000 SNPs. Overall accuracy (real LAMP) - Real LAMP: Accuracy goes down. Harder to distinguish similar populations. - My LAMP: Accuracy about same - Real LAMP is doing much worse than paper s > 90% accuracies: - Paper used m = 500 => Estimate f s better - Despite same r 2, different numbers of SNPs pruned in 3. Simulated Data: Accuracy Varying
22 Accuracy by Person (for YRI CEU) Accuracy (Avg over SNPs) Accuracy (Mean over SNPs) Accuracy By Person (YRI-CEU) Red: My LAMP Black: Real LAMP ID Individual ID (1-10)
23 Accuracy by Person (for CEU JPT) Accuracy (Mean over SNPs) Accuracy (Mean over SNPs) Accuracy By Person (CEU-JPT) Red: My LAMP Black: Real LAMP ID Individual ID (1-10)
24 Accuracy by Person (for JPT CHB) Accuracy (Mean over SNPs) Accuracy (Mean over SNPs) Accuracy By Person (JPT-CHB) Red: My LAMP Black: Real LAMP ID Individual ID (1-10)
25 Vary r 2 pruning threshold Number pruned Overall accuracy (my Setting r 2 SNPs (my LAMP) LAMP) Common to settings: P1-P2 = YRI-CEU, m = 10 people, n = 4000 SNPs - Accuracy goes down very slightly - Consistent with paper s slight change in accuracy when varying r 2 (m = 500, n = 38,864) 3. Simulated Data: Accuracy Varying
26 Vary number n of SNPs Setting n snps Number pruned SNPs (my LAMP) Overall accuracy (my LAMP) Common to settings: P1-P2 = YRI-CEU, m = 10 people, r 2 = Reduce n Lower accuracy - Intuition: n = 4000 SNPs has 9 windows, while n = 1000 SNPs has 1 window, so no majority vote 3. Simulated Data: Accuracy Varying
27 Questions?
28 Similarity score between individuals i1 and i2 S i 1, i 2 = n j=1 G i1,j u j σ j 2 G i2,j u j which is like a sample correlation(g i1, G i2 ). Higher score, more similar. G i,j is the Genotype (# of minor alleles) at position j in individual i. u j = σ 2 j = m G ij i=1 is the Genotype at position j averaged over m individuals i. m m 2 G ij u j i=1 is the variance in Genotype m 2. How LAMP Works
29 A) For each individual i, Var(i) = S i, i 2 i :i i measures how genetically similar i is to everyone else (the other m - 1 people). B) Find the person i* who is most genetically similar to everyone else: i* = argmax i Var i. C) Assign initial ancestry estimates of θ i based on how similar i is to i*: - Assume - - i* has ancestry (2, 2) 0 S(i*, i*) For each individual i, assign ancestry to i: S(i, i*) - Lowest 1 α 2 n scores S i, i (1, 1) - Highest α 2 n scores S i, i (2, 2) - Everyone else (1, 2) (Pop1, Pop 1) (Pop1, Pop 2) (Pop2, Pop 2) θ i = (θ 1 i, θ 2 (i)) 2. How LAMP Works
Applications of admixture models
Applications of admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price Applications of admixture models 1 / 27
More informationToCatchAThief c ryan campbell & jenn coughlan 7/23/2018
ToCatchAThief c ryan campbell & jenn coughlan 7/23/2018 Welcome to the To Catch a Thief: With Data! walkthrough! https://bioconductor.org/packages/devel/ bioc/vignettes/snprelate/inst/doc/snprelatetutorial.html
More informationELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2
ELAI user manual Yongtao Guan Baylor College of Medicine Version 1.0 25 June 2015 Contents 1 Copyright 2 2 What ELAI Can Do 2 3 A simple example 2 4 Input file formats 3 4.1 Genotype file format....................................
More informationPolymorphism and Variant Analysis Lab
Polymorphism and Variant Analysis Lab Arian Avalos PowerPoint by Casey Hanson Polymorphism and Variant Analysis Matt Hudson 2018 1 Exercise In this exercise, we will do the following:. 1. Gain familiarity
More informationLab Manual for Anth/Biol c Alan Rogers and Jon Seger
Lab Manual for Anth/Biol 5221 c Alan Rogers and Jon Seger August 20, 2015 Contents 4 Simulating Genetic Drift and Mutation 3 4.1 The easy way to sample from an urn........................... 3 4.2 Iterating
More informationThe Continuous Genetic Algorithm. Universidad de los Andes-CODENSA
The Continuous Genetic Algorithm Universidad de los Andes-CODENSA 1. Components of a Continuous Genetic Algorithm The flowchart in figure1 provides a big picture overview of a continuous GA.. Figure 1.
More informationREAP Software Documentation
REAP Software Documentation Version 1.2 Timothy Thornton 1 Department of Biostatistics 1 The University of Washington 1 REAP A C program for estimating kinship coefficients and IBD sharing probabilities
More informationGenetic Programming. Charles Chilaka. Department of Computational Science Memorial University of Newfoundland
Genetic Programming Charles Chilaka Department of Computational Science Memorial University of Newfoundland Class Project for Bio 4241 March 27, 2014 Charles Chilaka (MUN) Genetic algorithms and programming
More informationImproved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation
The American Journal of Human Genetics Supplemental Data Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation Chaolong Wang,
More informationBICF Nano Course: GWAS GWAS Workflow Development using PLINK. Julia Kozlitina April 28, 2017
BICF Nano Course: GWAS GWAS Workflow Development using PLINK Julia Kozlitina Julia.Kozlitina@UTSouthwestern.edu April 28, 2017 Getting started Open the Terminal (Search -> Applications -> Terminal), and
More informationNetwork Based Models For Analysis of SNPs Yalta Opt
Outline Network Based Models For Analysis of Yalta Optimization Conference 2010 Network Science Zeynep Ertem*, Sergiy Butenko*, Clare Gill** *Department of Industrial and Systems Engineering, **Department
More informationEvolutionary Computation. Chao Lan
Evolutionary Computation Chao Lan Outline Introduction Genetic Algorithm Evolutionary Strategy Genetic Programming Introduction Evolutionary strategy can jointly optimize multiple variables. - e.g., max
More informationPriVar documentation
PriVar documentation PriVar is a cross-platform Java application toolkit to prioritize variants (SNVs and InDels) from exome or whole genome sequencing data by using different filtering strategies and
More informationPackage SimGbyE. July 20, 2009
Package SimGbyE July 20, 2009 Type Package Title Simulated case/control or survival data sets with genetic and environmental interactions. Author Melanie Wilson Maintainer Melanie
More informationAssociation Analysis of Sequence Data using PLINK/SEQ (PSEQ)
Association Analysis of Sequence Data using PLINK/SEQ (PSEQ) Copyright (c) 2018 Stanley Hooker, Biao Li, Di Zhang and Suzanne M. Leal Purpose PLINK/SEQ (PSEQ) is an open-source C/C++ library for working
More informationClustering and The Expectation-Maximization Algorithm
Clustering and The Expectation-Maximization Algorithm Unsupervised Learning Marek Petrik 3/7 Some of the figures in this presentation are taken from An Introduction to Statistical Learning, with applications
More informationHaplotype Analysis. 02 November 2003 Mendel Short IGES Slide 1
Haplotype Analysis Specifies the genetic information descending through a pedigree Useful visualization of the gene flow through a pedigree A haplotype for a given individual and set of loci is defined
More informationEstimation of haplotypes
Estimation of haplotypes Cavan Reilly October 4, 2013 Table of contents Estimating haplotypes with the EM algorithm Individual level haplotypes Testing for differences in haplotype frequency Using the
More informationStep-by-Step Guide to Relatedness and Association Mapping Contents
Step-by-Step Guide to Relatedness and Association Mapping Contents OBJECTIVES... 2 INTRODUCTION... 2 RELATEDNESS MEASURES... 2 POPULATION STRUCTURE... 6 Q-K ASSOCIATION ANALYSIS... 10 K MATRIX COMPRESSION...
More informationStep-by-Step Guide to Basic Genetic Analysis
Step-by-Step Guide to Basic Genetic Analysis Page 1 Introduction This document shows you how to clean up your genetic data, assess its statistical properties and perform simple analyses such as case-control
More informationMutations for Permutations
Mutations for Permutations Insert mutation: Pick two allele values at random Move the second to follow the first, shifting the rest along to accommodate Note: this preserves most of the order and adjacency
More informationPopulation structure Jerome Goudet and Bruce Weir
Population structure Jerome Goudet and Bruce Weir 2018-07-13 Contents Individual populations Betas....................................... 1 Hapmap data 6 Sliding windows...............................................
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2015 1 Sequence Alignment Dannie Durand Pairwise Sequence Alignment The goal of pairwise sequence alignment is to establish a correspondence between the
More informationData Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University
Data Mining Chapter 8: Search and Optimization Methods Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Search & Optimization Search and Optimization method deals with
More informationSOLOMON: Parentage Analysis 1. Corresponding author: Mark Christie
SOLOMON: Parentage Analysis 1 Corresponding author: Mark Christie christim@science.oregonstate.edu SOLOMON: Parentage Analysis 2 Table of Contents: Installing SOLOMON on Windows/Linux Pg. 3 Installing
More informationThe k-means Algorithm and Genetic Algorithm
The k-means Algorithm and Genetic Algorithm k-means algorithm Genetic algorithm Rough set approach Fuzzy set approaches Chapter 8 2 The K-Means Algorithm The K-Means algorithm is a simple yet effective
More informationMAGA: Meta-Analysis of Gene-level Associations
MAGA: Meta-Analysis of Gene-level Associations SYNOPSIS MAGA [--sfile] [--chr] OPTIONS Option Default Description --sfile specification.txt Select a specification file --chr Select a chromosome DESCRIPTION
More informationCover Page. The handle holds various files of this Leiden University dissertation.
Cover Page The handle http://hdl.handle.net/1887/32015 holds various files of this Leiden University dissertation. Author: Akker, Erik Ben van den Title: Computational biology in human aging : an omics
More informationBLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio. 1990. CS 466 Saurabh Sinha Motivation Sequence homology to a known protein suggest function of newly sequenced protein Bioinformatics
More informationGenome Assembly Using de Bruijn Graphs. Biostatistics 666
Genome Assembly Using de Bruijn Graphs Biostatistics 666 Previously: Reference Based Analyses Individual short reads are aligned to reference Genotypes generated by examining reads overlapping each position
More informationHidden Markov Models in the context of genetic analysis
Hidden Markov Models in the context of genetic analysis Vincent Plagnol UCL Genetics Institute November 22, 2012 Outline 1 Introduction 2 Two basic problems Forward/backward Baum-Welch algorithm Viterbi
More informationUser manual for Trendsetter
User manual for Trendsetter Mehreen R. Mughal and Michael DeGiorgio May 11, 2018 Contents 1 Introduction 2 2 Operation 2 2.1 Installation.............................................. 2 2.2 Calculating
More informationForensic Resource/Reference On Genetics knowledge base: FROG-kb User s Manual. Updated June, 2017
Forensic Resource/Reference On Genetics knowledge base: FROG-kb User s Manual Updated June, 2017 Table of Contents 1. Introduction... 1 2. Accessing FROG-kb Home Page and Features... 1 3. Home Page and
More informationSpotter Documentation Version 0.5, Released 4/12/2010
Spotter Documentation Version 0.5, Released 4/12/2010 Purpose Spotter is a program for delineating an association signal from a genome wide association study using features such as recombination rates,
More informationGWAsimulator: A rapid whole-genome simulation program
GWAsimulator: A rapid whole-genome simulation program Version 1.1 Chun Li and Mingyao Li September 21, 2007 (revised October 9, 2007) 1. Introduction...1 2. Download and compile the program...2 3. Input
More informationGrouping preprocess for haplotype inference from SNP and CNV data
Journal of Physics: Conference Series Grouping preprocess for haplotype inference from SNP and CNV data Recent citations - A haplotype inference method based on sparsely connected multi-body ising model
More informationOn a Divide and Conquer Approach for Haplotype Inference with Pure Parsimony
On a Divide and Conquer Approach for Haplotype Inference with Pure Parsimony Konstantinos Kalpakis, and Parag Namjoshi Department of Computer Science and Electrical Engineering University of Maryland Baltimore
More informationGSNAP: Fast and SNP-tolerant detection of complex variants and splicing in short reads by Thomas D. Wu and Serban Nacu
GSNAP: Fast and SNP-tolerant detection of complex variants and splicing in short reads by Thomas D. Wu and Serban Nacu Matt Huska Freie Universität Berlin Computational Methods for High-Throughput Omics
More informationMAGMA manual (version 1.06)
MAGMA manual (version 1.06) TABLE OF CONTENTS OVERVIEW 3 QUICKSTART 4 ANNOTATION 6 OVERVIEW 6 RUNNING THE ANNOTATION 6 ADDING AN ANNOTATION WINDOW AROUND GENES 7 RESTRICTING THE ANNOTATION TO A SUBSET
More informationUser s Guide. Version 2.2. Semex Alliance, Ontario and Centre for Genetic Improvement of Livestock University of Guelph, Ontario
User s Guide Version 2.2 Semex Alliance, Ontario and Centre for Genetic Improvement of Livestock University of Guelph, Ontario Mehdi Sargolzaei, Jacques Chesnais and Flavio Schenkel Jan 2014 Disclaimer
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationDocumentation for BayesAss 1.3
Documentation for BayesAss 1.3 Program Description BayesAss is a program that estimates recent migration rates between populations using MCMC. It also estimates each individual s immigrant ancestry, the
More informationData Analysis & Probability
Unit 5 Probability Distributions Name: Date: Hour: Section 7.2: The Standard Normal Distribution (Area under the curve) Notes By the end of this lesson, you will be able to Find the area under the standard
More informationInformation Fusion Dr. B. K. Panigrahi
Information Fusion By Dr. B. K. Panigrahi Asst. Professor Department of Electrical Engineering IIT Delhi, New Delhi-110016 01/12/2007 1 Introduction Classification OUTLINE K-fold cross Validation Feature
More informationPackage GHap. R topics documented: February 18, 2017
Type Package Title Genome-Wide Haplotyping Version 1.2.2 Date 2017-02-18 Author Yuri Tani Utsunomiya, Marco Milanesi Package GHap February 18, 2017 Maintainer Yuri Tani Utsunomiya
More informationGenetic type 1 Error Calculator (GEC)
Genetic type 1 Error Calculator (GEC) (Version 0.2) User Manual Miao-Xin Li Department of Psychiatry and State Key Laboratory for Cognitive and Brain Sciences; the Centre for Reproduction, Development
More informationOutline. Motivation. Introduction of GAs. Genetic Algorithm 9/7/2017. Motivation Genetic algorithms An illustrative example Hypothesis space search
Outline Genetic Algorithm Motivation Genetic algorithms An illustrative example Hypothesis space search Motivation Evolution is known to be a successful, robust method for adaptation within biological
More informationLASER: Locating Ancestry from SEquence Reads version 2.04
LASER: Locating Ancestry from SEquence Reads version 2.04 Chaolong Wang 1 Computational and Systems Biology Genome Institute of Singapore A*STAR, Singapore 138672, Singapore Xiaowei Zhan 2 Department of
More informationPRSice: Polygenic Risk Score software - Vignette
PRSice: Polygenic Risk Score software - Vignette Jack Euesden, Paul O Reilly March 22, 2016 1 The Polygenic Risk Score process PRSice ( precise ) implements a pipeline that has become standard in Polygenic
More informationPackage EMLRT. August 7, 2014
Package EMLRT August 7, 2014 Type Package Title Association Studies with Imputed SNPs Using Expectation-Maximization-Likelihood-Ratio Test LazyData yes Version 1.0 Date 2014-08-01 Author Maintainer
More informationRandom Forest in Genomic Selection
Random Forest in genomic selection 1 Dpto Mejora Genética Animal, INIA, Madrid; Universidad Politécnica de Valencia, 20-24 September, 2010. Outline 1 Remind 2 Random Forest Introduction Classification
More informationAnalysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis
Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis Barbara E. Engelhardt 1 *, Matthew Stephens 2 1 Computer Science Department, University of Chicago,
More informationIntroduction to Genetic Algorithms. Genetic Algorithms
Introduction to Genetic Algorithms Genetic Algorithms We ve covered enough material that we can write programs that use genetic algorithms! More advanced example of using arrays Could be better written
More informationPackage gpart. November 19, 2018
Package gpart November 19, 2018 Title Human genome partitioning of dense sequencing data by identifying haplotype blocks Version 1.0.0 Depends R (>= 3.5.0), grid, Homo.sapiens, TxDb.Hsapiens.UCSC.hg38.knownGene,
More informationCPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016
CPSC 34: Machine Learning and Data Mining Feature Selection Fall 26 Assignment 3: Admin Solutions will be posted after class Wednesday. Extra office hours Thursday: :3-2 and 4:3-6 in X836. Midterm Friday:
More informationSNP HiTLink Manual. Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1
SNP HiTLink Manual Yoko Fukuda 1, Hiroki Adachi 2, Eiji Nakamura 2, and Shoji Tsuji 1 1 Department of Neurology, Graduate School of Medicine, the University of Tokyo, Tokyo, Japan 2 Dynacom Co., Ltd, Kanagawa,
More informationThe Lander-Green Algorithm in Practice. Biostatistics 666
The Lander-Green Algorithm in Practice Biostatistics 666 Last Lecture: Lander-Green Algorithm More general definition for I, the "IBD vector" Probability of genotypes given IBD vector Transition probabilities
More informationGenetic Analysis. Page 1
Genetic Analysis Page 1 Genetic Analysis Objectives: 1) Set up Case-Control Association analysis and the Basic Genetics Workflow 2) Use JMP tools to interact with and explore results 3) Learn advanced
More informationWe extend SVM s in order to support multi-class classification problems. Consider the training dataset
p. / One-versus-the-Rest We extend SVM s in order to support multi-class classification problems. Consider the training dataset D = {(x, y ),(x, y ),..., (x l, y l )} R n {,..., M}, where the label y i
More informationEnsemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar
Ensemble Learning: An Introduction Adapted from Slides by Tan, Steinbach, Kumar 1 General Idea D Original Training data Step 1: Create Multiple Data Sets... D 1 D 2 D t-1 D t Step 2: Build Multiple Classifiers
More informationCPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2017
CPSC 340: Machine Learning and Data Mining Feature Selection Fall 2017 Assignment 2: Admin 1 late day to hand in tonight, 2 for Wednesday, answers posted Thursday. Extra office hours Thursday at 4pm (ICICS
More informationUninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall
Midterm Exam Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall Covers topics through Decision Trees and Random Forests (does not include constraint satisfaction) Closed book 8.5 x 11 sheet with notes
More informationBayesian analysis of genetic population structure using BAPS: Exercises
Bayesian analysis of genetic population structure using BAPS: Exercises p S u k S u p u,s S, Jukka Corander Department of Mathematics, Åbo Akademi University, Finland Exercise 1: Clustering of groups of
More informationThe fgwas Package. Version 1.0. Pennsylvannia State University
The fgwas Package Version 1.0 Zhong Wang 1 and Jiahan Li 2 1 Department of Public Health Science, 2 Department of Statistics, Pennsylvannia State University 1. Introduction The fgwas Package (Functional
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationPackage inversion. R topics documented: July 18, Type Package. Title Inversions in genotype data. Version
Package inversion July 18, 2013 Type Package Title Inversions in genotype data Version 1.8.0 Date 2011-05-12 Author Alejandro Caceres Maintainer Package to find genetic inversions in genotype (SNP array)
More informationA Computational Theory of Clustering
A Computational Theory of Clustering Avrim Blum Carnegie Mellon University Based on work joint with Nina Balcan, Anupam Gupta, and Santosh Vempala Point of this talk A new way to theoretically analyze
More informationVignette for the package rehh (version 2+) Mathieu Gautier, Alexander Klassmann and Renaud Vitalis 24/10/2016
Vignette for the package rehh (version 2+) Mathieu Gautier, Alexander Klassmann and Renaud Vitalis 24/10/2016 Contents 1 Input Files 2 1.1 Haplotype data file..........................................
More informationKGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual. Miao-Xin Li, Jiang Li
KGG: A systematic biological Knowledge-based mining system for Genomewide Genetic studies (Version 3.5) User Manual Miao-Xin Li, Jiang Li Department of Psychiatry Centre for Genomic Sciences Department
More informationA Comparison of Heuristic Algorithms to Solve the View Selection Problem
A A Comparison of Heuristic Algorithms to Solve the View Selection Problem Ray Hylock Management Sciences Department, University of Iowa Iowa City, IA 52242, ray-hylock@uiowa.edu vital component to the
More informationEnsembles. An ensemble is a set of classifiers whose combined results give the final decision. test feature vector
Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier 3 super classifier result 1 * *A model is the learned
More informationGenetic Algorithms. PHY 604: Computational Methods in Physics and Astrophysics II
Genetic Algorithms Genetic Algorithms Iterative method for doing optimization Inspiration from biology General idea (see Pang or Wikipedia for more details): Create a collection of organisms/individuals
More informationMAGMA manual (version 1.05)
MAGMA manual (version 1.05) TABLE OF CONTENTS OVERVIEW 3 QUICKSTART 4 ANNOTATION 6 OVERVIEW 6 RUNNING THE ANNOTATION 6 ADDING AN ANNOTATION WINDOW AROUND GENES 7 RESTRICTING THE ANNOTATION TO A SUBSET
More informationRAD Population Genomics Programs Paul Hohenlohe 6/2014
RAD Population Genomics Programs Paul Hohenlohe (hohenlohe@uidaho.edu) 6/2014 I. Overview These programs are designed to conduct population genomic analysis on RAD sequencing data. They were designed for
More informationMore on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.
More on Neural Networks Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.6 Recall the MLP Training Example From Last Lecture log likelihood
More informationClassification. 1 o Semestre 2007/2008
Classification Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 Single-Class
More informationMissing Data and Imputation
Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH, 2017 Outline Types of missing data Simple methods for dealing with missing data Single and multiple imputation R example Missing data is a complex
More information1. Introduction. 2. Motivation and Problem Definition. Volume 8 Issue 2, February Susmita Mohapatra
Pattern Recall Analysis of the Hopfield Neural Network with a Genetic Algorithm Susmita Mohapatra Department of Computer Science, Utkal University, India Abstract: This paper is focused on the implementation
More informationA Course on Meta-Heuristic Search Methods for Combinatorial Optimization Problems
A Course on Meta-Heuristic Search Methods for Combinatorial Optimization Problems AutOrI LAB, DIA, Roma Tre Email: mandal@dia.uniroma3.it January 16, 2014 Outline 1 An example Assignment-I Tips Variants
More informationGenetic Programming. Modern optimization methods 1
Genetic Programming Developed in USA during 90 s Patented by J. Koza Solves typical problems: Prediction, classification, approximation, programming Properties Competitor of neural networks Need for huge
More informationStatistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes.
Outliers Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Concepts What is an outlier? The set of data points that are considerably different than the remainder of the
More informationLeaning Graphical Model Structures using L1-Regularization Paths (addendum)
Leaning Graphical Model Structures using -Regularization Paths (addendum) Mark Schmidt and Kevin Murphy Computer Science Dept. University of British Columbia {schmidtm,murphyk}@cs.ubc.ca 1 Introduction
More informationLD vignette Measures of linkage disequilibrium
LD vignette Measures of linkage disequilibrium David Clayton June 13, 2018 Calculating linkage disequilibrium statistics We shall first load some illustrative data. > data(ld.example) The data are drawn
More informationRegularization and model selection
CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial
More informationEstimating Variance Components in MMAP
Last update: 6/1/2014 Estimating Variance Components in MMAP MMAP implements routines to estimate variance components within the mixed model. These estimates can be used for likelihood ratio tests to compare
More informationChapter 6: Linear Model Selection and Regularization
Chapter 6: Linear Model Selection and Regularization As p (the number of predictors) comes close to or exceeds n (the sample size) standard linear regression is faced with problems. The variance of the
More informationOutline Purpose How to analyze algorithms Examples. Algorithm Analysis. Seth Long. January 15, 2010
January 15, 2010 Intuitive Definitions Common Runtimes Final Notes Compare space and time requirements for algorithms Understand how an algorithm scales with larger datasets Intuitive Definitions Outline
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationIntroduction to Hail. Cotton Seed, Technical Lead Tim Poterba, Software Engineer Hail Team, Neale Lab Broad Institute and MGH
Introduction to Hail Cotton Seed, Technical Lead Tim Poterba, Software Engineer Hail Team, Neale Lab Broad Institute and MGH Why Hail? Genetic data is becoming absolutely massive Broad Genomics, by the
More informationGENETIC ALGORITHM with Hands-On exercise
GENETIC ALGORITHM with Hands-On exercise Adopted From Lecture by Michael Negnevitsky, Electrical Engineering & Computer Science University of Tasmania 1 Objective To understand the processes ie. GAs Basic
More informationTuning. Philipp Koehn presented by Gaurav Kumar. 28 September 2017
Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 The Story so Far: Generative Models 1 The definition of translation probability follows a mathematical derivation argmax e p(e f) = argmax
More informationLecturers: Sanjam Garg and Prasad Raghavendra March 20, Midterm 2 Solutions
U.C. Berkeley CS70 : Algorithms Midterm 2 Solutions Lecturers: Sanjam Garg and Prasad aghavra March 20, 207 Midterm 2 Solutions. (0 points) True/False Clearly put your answers in the answer box in front
More informationChapter 11 Search Algorithms for Discrete Optimization Problems
Chapter Search Algorithms for Discrete Optimization Problems (Selected slides) A. Grama, A. Gupta, G. Karypis, and V. Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003.
More informationApproximation Techniques for Utilitarian Mechanism Design
Approximation Techniques for Utilitarian Mechanism Design Department of Computer Science RWTH Aachen Germany joint work with Patrick Briest and Piotr Krysta 05/16/2006 1 Introduction to Utilitarian Mechanism
More information/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18
601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 22.1 Introduction We spent the last two lectures proving that for certain problems, we can
More informationWorkload Characterization Techniques
Workload Characterization Techniques Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/
More informationA Genetic Algorithm-Based Approach for Energy- Efficient Clustering of Wireless Sensor Networks
A Genetic Algorithm-Based Approach for Energy- Efficient Clustering of Wireless Sensor Networks A. Zahmatkesh and M. H. Yaghmaee Abstract In this paper, we propose a Genetic Algorithm (GA) to optimize
More informationEvolutionary tree reconstruction (Chapter 10)
Evolutionary tree reconstruction (Chapter 10) Early Evolutionary Studies Anatomical features were the dominant criteria used to derive evolutionary relationships between species since Darwin till early
More informationHybridCheck User Manual
HybridCheck User Manual Ben J. Ward February 2015 HybridCheck is a software package to visualise the recombination signal in assembled next generation sequence data, and it can be used to detect recombination,
More information