Applications of admixture models
|
|
- Chastity Grant
- 6 years ago
- Views:
Transcription
1 Applications of admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price Applications of admixture models 1 / 27
2 Outline Admixture models Population structure and GWAS Applications of admixture models Admixture models 2 / 27
3 Mixture model for genetic data K: the number of populations (mixture components) π k : mixture weights they represent how much each population contributes to the final distribution f m,k : allele frequency in each of K populations. p(x, z) = p(z)p(x z) Applications of admixture models Admixture models 3 / 27
4 Mixture model for genetic data Denote K p(z = k π) = k=1 π k 1{z=k} Now, assume the conditional distributions are independent Binomial. p(x z = k, (f m,k ) M m=1) = m = m p(x m z = k) Bin(2, f m,k ) Then, the marginal distribution of x is K p(x θ) = p(z = k π)p(x z = k, f k ) k=1 The parameters θ = (π k, f k ) K k=1. Applications of admixture models Admixture models 3 / 27
5 Mixture model for genetic data Given N individuals over M SNPs : x n, n {1,..., N}, write the log likelihood LL(θ). Estimate the maximum likelihood parameters θ using EM. Applications of admixture models Admixture models 4 / 27
6 Mixture model for genetic data: Example Supervised mixture models SNPs Allele frequency POP POP Individual x Does individual x belong to population 1 or 2? P (Data x is in population 1) = (0.25) 2 (0.75) 0 (0.57) 0 (0.43) 2... = P (Data x is in population 2) = (0.40) 2 (0.60) 0 (0.32) 0 (0.68) 2... = Applications of admixture models Admixture models 5 / 27
7 Mixture model for genetic data: Example Supervised mixture models SNPs Allele frequency POP POP Individual x Does individual x belong to population 1 or 2? P (Data x is in population 1) = (0.25) 2 (0.75) 0 (0.57) 0 (0.43) 2... = P (Data x is in population 2) = (0.40) 2 (0.60) 0 (0.32) 0 (0.68) 2... = Applications of admixture models Admixture models 5 / 27
8 Mixture models for genetic data Unsupervised mixture models What if allele frequencies are not known? Use EM to infer parameters (HW problem). Applications of admixture models Admixture models 6 / 27
9 Admixture models/latent Dirichlet Allocation Clustering: sample belongs to exactly one cluster. In genetics: Cluster = population Individuals could belong to more than one population. Applications of admixture models Admixture models 7 / 27
10 Admixture models Individual can now have fractional memberships in each population. Each SNP can have different ancestry. Applications of admixture models Admixture models 8 / 27
11 Population admixture Admixed population is one that has ancestry from multiple distinct populations. Applications of admixture models Admixture models 9 / 27
12 Admixture better reflects human biology Applications of admixture models Admixture models 10 / 27
13 Admixture better reflects human biology 1 Hellenthal et al. Science 2014 Applications of admixture models Admixture models 11 / 27
14 Examples of admixed populations African-Americans: African and European ancestry. 10% of US population Latino Americans (Hispanics): European, Native American and African 15% of US population Mexican Americans, Puerto Ricans Hawaiians South Asians Middle Easterners Applications of admixture models Admixture models 12 / 27
15 Admixture and ancestry Applications of admixture models Admixture models 13 / 27
16 PCA on genetic data Applications of admixture models Admixture models 14 / 27
17 Admixture leads to variation in proportions of genome-wide ancestry Applications of admixture models Admixture models 15 / 27
18 PCA on HapMap Phase 3 Applications of admixture models Admixture models 16 / 27
19 PCA on HapMap Phase 3 Applications of admixture models Admixture models 16 / 27
20 Admixture model Each individual n has a parameter g n = (g n,1,..., g n,k ) where g n,k 0 and k g n,k = 1. Each population has a parameter for a SNP f k = (f 1,k,..., f M,k ). z n,m,l g n Mult(g n ), l {1, 2} x n,m z n,m,1, z n,m,2, f k Ber ( f m,zn,m,1 ) + Ber ( fm,zn,m,2 ) Applications of admixture models Admixture models 17 / 27
21 Inference in the admixture model Parameters:θ = (g n, f k ). Use EM to estimate parameters. E-step: Compute r (t) n,m,a,b p(z n,m = (a, b) x n,m, g (t) n, f (t) m,k ). M-step: Update estimates of the parameters. Work out the updates! Applications of admixture models Admixture models 18 / 27
22 Inference in the admixture model Parameters:θ = (g n, f k ). Use EM to estimate parameters. E-step: Compute r (t) n,m,a,b p(z n,m = (a, b) x n,m, g (t) n, f (t) m,k ). M-step: Update estimates of the parameters. Work out the updates! Applications of admixture models Admixture models 18 / 27
23 Admixture model for genetic data: Example Supervised admixture models SNPs Allele frequency POP POP Individual x Individual x has ancestry α from population 1 and (1 α) from population 2. Find α. P (Data α) = [0.25α (1 α)] 2 [(1 0.25)α + (1 0.40)(1 α)] 0 [0.57α (1 α)] 0 [(1 0.57)α + (1 0.32)(1 α)] 2... Maximum value of P attained at α = Applications of admixture models Admixture models 19 / 27
24 Admixture model for genetic data: Example Supervised admixture models SNPs Allele frequency POP POP Individual x Individual x has ancestry α from population 1 and (1 α) from population 2. Find α. P (Data α) = [0.25α (1 α)] 2 [(1 0.25)α + (1 0.40)(1 α)] 0 [0.57α (1 α)] 0 [(1 0.57)α + (1 0.32)(1 α)] 2... Maximum value of P attained at α = Applications of admixture models Admixture models 19 / 27
25 Applying admixture models to HGDP Human Genome Diversity Project Applications of admixture models Admixture models 20 / 27
26 Applying admixture models to HGDP Human Genome Diversity Project 2 Li et al. Science 2008 Applications of admixture models Admixture models 20 / 27
27 Admixture models outside of genetics Also known as topic models or LDA (Latent Dirichlet Allocation). Used to model topics in documents. Genotypes = words Individual = document Population = topic Each document has different distributions over topics. Each topic specifies distribution over words. Applications of admixture models Admixture models 21 / 27
28 Admixture models outside of genetics 3 Griffiths and Steyvers, PNAS 2004 Applications of admixture models Admixture models 21 / 27
29 Outline Admixture models Population structure and GWAS Applications of admixture models Population structure and GWAS 22 / 27
30 Population structure can lead to false discoveries Applications of admixture models Population structure and GWAS 23 / 27
31 Population structure can lead to false discoveries Applications of admixture models Population structure and GWAS 23 / 27
32 Appraches to deal with population stratification Structured association Cluster individuals into populations. Do GWAS in each population. Combine results. Applications of admixture models Population structure and GWAS 24 / 27
33 Appraches to deal with population stratification Principal Components Include Principal Components in the model. Applications of admixture models Population structure and GWAS 24 / 27
34 Example n = 200 m = 1000 Z n {1, 2} Z n = 1, n 100 Z n = 2, n > 100 { N (10, 1), Zn = 1 Y n Z n N (0, 1), Z n = 2 X n,m Z n Ber (f Zn,m) Applications of admixture models Population structure and GWAS 25 / 27
35 How well does the model fit? True ancestry Z known Applications of admixture models Population structure and GWAS 26 / 27
36 How well does the model fit? True ancestry Z unknown We find 222 SNPs that are statistically significant (p-value <.05/1000) Applications of admixture models Population structure and GWAS 26 / 27
37 How well does the model fit? Visualize these associations Applications of admixture models Population structure and GWAS 26 / 27
38 How well does the model fit? Visualize these associations in each population Applications of admixture models Population structure and GWAS 26 / 27
39 How well does the model fit? Infer PCs (PC scores for first PC) Applications of admixture models Population structure and GWAS 26 / 27
40 How well does the model fit? Infer PCs (PC1 vs PC2) Applications of admixture models Population structure and GWAS 26 / 27
41 How well does the model fit? Fraction of variance explained About 6% variance explained by PC1 Applications of admixture models Population structure and GWAS 26 / 27
42 How well does the model fit? Correct for PCs No association is significant! Applications of admixture models Population structure and GWAS 26 / 27
43 Summary PCA is an example of a latent variable model with continuous latent variable. Unlike clustering, where the latent variable is discrete. Probabilistic model corresponding to PCA. Admixture models or topic models or LDA are generalizations of clustering. Applications to infer ancestry and correct for population structure. Question: When do we include PCs in our regression? Applications of admixture models Population structure and GWAS 27 / 27
Estimating. Local Ancestry in admixed Populations (LAMP)
Estimating Local Ancestry in admixed Populations (LAMP) QIAN ZHANG 572 6/05/2014 Outline 1) Sketch Method 2) Algorithm 3) Simulated Data: Accuracy Varying Pop1-Pop2 Ancestries r 2 pruning threshold Number
More informationELAI user manual. Yongtao Guan Baylor College of Medicine. Version June Copyright 2. 3 A simple example 2
ELAI user manual Yongtao Guan Baylor College of Medicine Version 1.0 25 June 2015 Contents 1 Copyright 2 2 What ELAI Can Do 2 3 A simple example 2 4 Input file formats 3 4.1 Genotype file format....................................
More informationCS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed
More informationStep-by-Step Guide to Advanced Genetic Analysis
Step-by-Step Guide to Advanced Genetic Analysis Page 1 Introduction In the previous document, 1 we covered the standard genetic analyses available in JMP Genomics. Here, we cover the more advanced options
More informationClustering Lecture 5: Mixture Model
Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Unsupervised Learning: Kmeans, GMM, EM Readings: Barber 20.1-20.3 Stefan Lee Virginia Tech Tasks Supervised Learning x Classification y Discrete x Regression
More informationExpectation Maximization (EM) and Gaussian Mixture Models
Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation
More informationAnalysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis
Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis Barbara E. Engelhardt 1 *, Matthew Stephens 2 1 Computer Science Department, University of Chicago,
More informationClustering and The Expectation-Maximization Algorithm
Clustering and The Expectation-Maximization Algorithm Unsupervised Learning Marek Petrik 3/7 Some of the figures in this presentation are taken from An Introduction to Statistical Learning, with applications
More informationCOMPUTATIONAL STATISTICS UNSUPERVISED LEARNING
COMPUTATIONAL STATISTICS UNSUPERVISED LEARNING Luca Bortolussi Department of Mathematics and Geosciences University of Trieste Office 238, third floor, H2bis luca@dmi.units.it Trieste, Winter Semester
More informationfaststructure: Variational Inference of Population Structure in Large SNP Datasets
Genetics: Early Online, published on April 15, 2014 as 10.1534/genetics.114.164350 faststructure: Variational Inference of Population Structure in Large SNP Datasets Anil Raj, Matthew Stephens, Jonathan
More informationHidden Markov Models in the context of genetic analysis
Hidden Markov Models in the context of genetic analysis Vincent Plagnol UCL Genetics Institute November 22, 2012 Outline 1 Introduction 2 Two basic problems Forward/backward Baum-Welch algorithm Viterbi
More informationInference and Representation
Inference and Representation Rachel Hodos New York University Lecture 5, October 6, 2015 Rachel Hodos Lecture 5: Inference and Representation Today: Learning with hidden variables Outline: Unsupervised
More informationarxiv: v2 [q-bio.qm] 17 Nov 2013
arxiv:1308.2150v2 [q-bio.qm] 17 Nov 2013 GeneZip: A software package for storage-efficient processing of genotype data Palmer, Cameron 1 and Pe er, Itsik 1 1 Center for Computational Biology and Bioinformatics,
More informationBayesian analysis of genetic population structure using BAPS: Exercises
Bayesian analysis of genetic population structure using BAPS: Exercises p S u k S u p u,s S, Jukka Corander Department of Mathematics, Åbo Akademi University, Finland Exercise 1: Clustering of groups of
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationImproved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation
The American Journal of Human Genetics Supplemental Data Improved Ancestry Estimation for both Genotyping and Sequencing Data using Projection Procrustes Analysis and Genotype Imputation Chaolong Wang,
More informationNote Set 4: Finite Mixture Models and the EM Algorithm
Note Set 4: Finite Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine Finite Mixture Models A finite mixture model with K components, for
More informationA short manual for LFMM (command-line version)
A short manual for LFMM (command-line version) Eric Frichot efrichot@gmail.com April 16, 2013 Please, print this reference manual only if it is necessary. This short manual aims to help users to run LFMM
More informationDealing with heterogeneity: group-specific variances and stratified analyses
Dealing with heterogeneity: group-specific variances and stratified analyses Tamar Sofer July 2017 1 / 32 The HCHS/SOL population is quite heterogeneous 1. Due to admixture: Hispanics are admixed with
More informationGenome Assembly Using de Bruijn Graphs. Biostatistics 666
Genome Assembly Using de Bruijn Graphs Biostatistics 666 Previously: Reference Based Analyses Individual short reads are aligned to reference Genotypes generated by examining reads overlapping each position
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 43 K-Means Clustering Example: Old Faithful Geyser
More informationFeature LDA: a Supervised Topic Model for Automatic Detection of Web API Documentations from the Web
Feature LDA: a Supervised Topic Model for Automatic Detection of Web API Documentations from the Web Chenghua Lin, Yulan He, Carlos Pedrinaci, and John Domingue Knowledge Media Institute, The Open University
More informationLEA: An R Package for Landscape and Ecological Association Studies
LEA: An R Package for Landscape and Ecological Association Studies Eric Frichot and Olivier François Université Grenoble-Alpes, Centre National de la Recherche Scientifique, TIMC-IMAG UMR 5525, Grenoble,
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationUnsupervised Learning: Clustering
Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning
More informationMixture Models and the EM Algorithm
Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is
More informationClustering web search results
Clustering K-means Machine Learning CSE546 Emily Fox University of Washington November 4, 2013 1 Clustering images Set of Images [Goldberger et al.] 2 1 Clustering web search results 3 Some Data 4 2 K-means
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More informationDocumentation for BayesAss 1.3
Documentation for BayesAss 1.3 Program Description BayesAss is a program that estimates recent migration rates between populations using MCMC. It also estimates each individual s immigrant ancestry, the
More informationMachine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University K- Means + GMMs Clustering Readings: Murphy 25.5 Bishop 12.1, 12.3 HTF 14.3.0 Mitchell
More informationGemTools Documentation
Literature: GemTools Documentation Bert Klei and Brian P. Kent February 2011 This software is described in GemTools: a fast and efficient approach to estimating genetic ancestry (in preparation) Klei L,
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationParallelism for LDA Yang Ruan, Changsi An
Parallelism for LDA Yang Ruan, Changsi An (yangruan@indiana.edu, anch@indiana.edu) 1. Overview As parallelism is very important for large scale of data, we want to use different technology to parallelize
More informationTHE deluge of genomic polymorphism data, such as
Copyright Ó 2009 by the Genetics Society of America DOI: 10.1534/genetics.108.100222 mstruct: Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations Suyash Shringarpure*
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationCSE 158. Web Mining and Recommender Systems. Midterm recap
CSE 158 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday! 5:10 pm 6:10 pm Closed book but I ll provide a similar level of basic info as in the last page of previous midterms CSE 158
More informationECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov
ECE521: Week 11, Lecture 20 27 March 2017: HMM learning/inference With thanks to Russ Salakhutdinov Examples of other perspectives Murphy 17.4 End of Russell & Norvig 15.2 (Artificial Intelligence: A Modern
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim,
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationStep-by-Step Guide to Basic Genetic Analysis
Step-by-Step Guide to Basic Genetic Analysis Page 1 Introduction This document shows you how to clean up your genetic data, assess its statistical properties and perform simple analyses such as case-control
More informationA Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression
Journal of Data Analysis and Information Processing, 2016, 4, 55-63 Published Online May 2016 in SciRes. http://www.scirp.org/journal/jdaip http://dx.doi.org/10.4236/jdaip.2016.42005 A Comparative Study
More informationREAP Software Documentation
REAP Software Documentation Version 1.2 Timothy Thornton 1 Department of Biostatistics 1 The University of Washington 1 REAP A C program for estimating kinship coefficients and IBD sharing probabilities
More informationSpatial Latent Dirichlet Allocation
Spatial Latent Dirichlet Allocation Xiaogang Wang and Eric Grimson Computer Science and Computer Science and Artificial Intelligence Lab Massachusetts Tnstitute of Technology, Cambridge, MA, 02139, USA
More informationClustering Relational Data using the Infinite Relational Model
Clustering Relational Data using the Infinite Relational Model Ana Daglis Supervised by: Matthew Ludkin September 4, 2015 Ana Daglis Clustering Data using the Infinite Relational Model September 4, 2015
More informationCOMP 551 Applied Machine Learning Lecture 13: Unsupervised learning
COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationClassifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao
Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped
More informationLiangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison*
Tracking Trends: Incorporating Term Volume into Temporal Topic Models Liangjie Hong*, Dawei Yin*, Jian Guo, Brian D. Davison* Dept. of Computer Science and Engineering, Lehigh University, Bethlehem, PA,
More informationStatistical relationship discovery in SNP data using Bayesian networks
Statistical relationship discovery in SNP data using Bayesian networks Pawe l Szlendak and Robert M. Nowak Institute of Electronic Systems, Warsaw University of Technology, Nowowiejska 5/9, -665 Warsaw,
More informationTutorial on gene-c ancestry es-ma-on: How to use LASER. Chaolong Wang Sequence Analysis Workshop June University of Michigan
Tutorial on gene-c ancestry es-ma-on: How to use LASER Chaolong Wang Sequence Analysis Workshop June 2014 @ University of Michigan LASER: Loca-ng Ancestry from SEquence Reads Main func:ons of the so
More informationDocumentation for MavericK software: Version 1.0
Documentation for MavericK software: Version 1.0 Robert Verity MRC centre for outbreak analysis and modelling Imperial College London and Richard A. Nichols Queen Mary University of London May 19, 2016
More informationScalable Bayes Clustering for Outlier Detection Under Informative Sampling
Scalable Bayes Clustering for Outlier Detection Under Informative Sampling Based on JMLR paper of T. D. Savitsky Terrance D. Savitsky Office of Survey Methods Research FCSM - 2018 March 7-9, 2018 1 / 21
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 9, 2012 Today: Graphical models Bayes Nets: Inference Learning Readings: Required: Bishop chapter
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany
More informationUser Manual for TreeMix v1.1. Joseph K. Pickrell, Jonathan K. Pritchard
User Manual for TreeMix v1.1 Joseph K. Pickrell, Jonathan K. Pritchard October 1, 2012 Contents 1 Introduction 2 2 Installation 2 3 Input file format 2 3.1 SNP data..........................................
More informationData Analytics. Qualification Exam, May 18, am 12noon
CS220 Data Analytics Number assigned to you: Qualification Exam, May 18, 2014 9am 12noon Note: DO NOT write any information related to your name or KAUST student ID. 1. There should be 12 pages including
More informationClustering: Classic Methods and Modern Views
Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering
More informationStatistics 202: Data Mining. c Jonathan Taylor. Outliers Based in part on slides from textbook, slides of Susan Holmes.
Outliers Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Concepts What is an outlier? The set of data points that are considerably different than the remainder of the
More informationA Taxonomy of Semi-Supervised Learning Algorithms
A Taxonomy of Semi-Supervised Learning Algorithms Olivier Chapelle Max Planck Institute for Biological Cybernetics December 2005 Outline 1 Introduction 2 Generative models 3 Low density separation 4 Graph
More informationClassication of Corporate and Public Text
Classication of Corporate and Public Text Kevin Nguyen December 16, 2011 1 Introduction In this project we try to tackle the problem of classifying a body of text as a corporate message (text usually sent
More informationHierarchical Mixture Models for Nested Data Structures
Hierarchical Mixture Models for Nested Data Structures Jeroen K. Vermunt 1 and Jay Magidson 2 1 Department of Methodology and Statistics, Tilburg University, PO Box 90153, 5000 LE Tilburg, Netherlands
More informationScalable Object Classification using Range Images
Scalable Object Classification using Range Images Eunyoung Kim and Gerard Medioni Institute for Robotics and Intelligent Systems University of Southern California 1 What is a Range Image? Depth measurement
More informationStep-by-Step Guide to Relatedness and Association Mapping Contents
Step-by-Step Guide to Relatedness and Association Mapping Contents OBJECTIVES... 2 INTRODUCTION... 2 RELATEDNESS MEASURES... 2 POPULATION STRUCTURE... 6 Q-K ASSOCIATION ANALYSIS... 10 K MATRIX COMPRESSION...
More informationClustering algorithms
Clustering algorithms Machine Learning Hamid Beigy Sharif University of Technology Fall 1393 Hamid Beigy (Sharif University of Technology) Clustering algorithms Fall 1393 1 / 22 Table of contents 1 Supervised
More informationCS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning
CS325 Artificial Intelligence Cengiz Spring 2013 Unsupervised Learning Missing teacher No labels, y Just input data, x What can you learn with it? Unsupervised Learning Missing teacher No labels, y Just
More informationPackage semisup. March 10, Version Title Semi-Supervised Mixture Model
Version 1.7.1 Title Semi-Supervised Mixture Model Package semisup March 10, 2019 Description Useful for detecting SNPs with interactive effects on a quantitative trait. This R packages moves away from
More informationClustering & Dimensionality Reduction. 273A Intro Machine Learning
Clustering & Dimensionality Reduction 273A Intro Machine Learning What is Unsupervised Learning? In supervised learning we were given attributes & targets (e.g. class labels). In unsupervised learning
More informationLecture 8: The EM algorithm
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 8: The EM algorithm Lecturer: Manuela M. Veloso, Eric P. Xing Scribes: Huiting Liu, Yifan Yang 1 Introduction Previous lecture discusses
More informationPackage LEA. April 23, 2016
Package LEA April 23, 2016 Title LEA: an R package for Landscape and Ecological Association Studies Version 1.2.0 Date 2014-09-17 Author , Olivier Francois
More informationGenetic Analysis. Page 1
Genetic Analysis Page 1 Genetic Analysis Objectives: 1) Set up Case-Control Association analysis and the Basic Genetics Workflow 2) Use JMP tools to interact with and explore results 3) Learn advanced
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP
More informationCIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]
CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.
More informationClustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin
Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014
More informationjldadmm: A Java package for the LDA and DMM topic models
jldadmm: A Java package for the LDA and DMM topic models Dat Quoc Nguyen School of Computing and Information Systems The University of Melbourne, Australia dqnguyen@unimelb.edu.au Abstract: In this technical
More informationMachine Learning. Unsupervised Learning. Manfred Huber
Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training
More informationUnsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning
Unsupervised Learning Clustering and the EM Algorithm Susanna Ricco Supervised Learning Given data in the form < x, y >, y is the target to learn. Good news: Easy to tell if our algorithm is giving the
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 2014-2015 Jakob Verbeek, November 28, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15
More informationLatent Topic Model Based on Gaussian-LDA for Audio Retrieval
Latent Topic Model Based on Gaussian-LDA for Audio Retrieval Pengfei Hu, Wenju Liu, Wei Jiang, and Zhanlei Yang National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationApplied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University
Applied Bayesian Nonparametrics 5. Spatial Models via Gaussian Processes, not MRFs Tutorial at CVPR 2012 Erik Sudderth Brown University NIPS 2008: E. Sudderth & M. Jordan, Shared Segmentation of Natural
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 2: Probability: Discrete Random Variables Classification: Validation & Model Selection Many figures
More informationClustering using Topic Models
Clustering using Topic Models Compiled by Sujatha Das, Cornelia Caragea Credits for slides: Blei, Allan, Arms, Manning, Rai, Lund, Noble, Page. Clustering Partition unlabeled examples into disjoint subsets
More informationPromoting Ranking Diversity for Biomedical Information Retrieval based on LDA
Promoting Ranking Diversity for Biomedical Information Retrieval based on LDA Yan Chen, Xiaoshi Yin, Zhoujun Li, Xiaohua Hu and Jimmy Huang State Key Laboratory of Software Development Environment, Beihang
More informationK-means and Hierarchical Clustering
K-means and Hierarchical Clustering Xiaohui Xie University of California, Irvine K-means and Hierarchical Clustering p.1/18 Clustering Given n data points X = {x 1, x 2,, x n }. Clustering is the partitioning
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationLast week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints
Last week Multi-Frame Structure from Motion: Multi-View Stereo Unknown camera viewpoints Last week PCA Today Recognition Today Recognition Recognition problems What is it? Object detection Who is it? Recognizing
More informationLatent Variable Models for the Analysis, Visualization and Prediction of Network and Nodal Attribute Data. Isabella Gollini.
z i! z j Latent Variable Models for the Analysis, Visualization and Prediction of etwork and odal Attribute Data School of Engineering University of Bristol isabella.gollini@bristol.ac.uk January 4th,
More informationCluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University
Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University Kinds of Clustering Sequential Fast Cost Optimization Fixed number of clusters Hierarchical
More informationMultiple cosegmentation
Armand Joulin, Francis Bach and Jean Ponce. INRIA -Ecole Normale Supérieure April 25, 2012 Segmentation Introduction Segmentation Supervised and weakly-supervised segmentation Cosegmentation Segmentation
More informationQuiz Section Week 8 May 17, Machine learning and Support Vector Machines
Quiz Section Week 8 May 17, 2016 Machine learning and Support Vector Machines Another definition of supervised machine learning Given N training examples (objects) {(x 1,y 1 ), (x 2,y 2 ),, (x N,y N )}
More informationLFMM version Reference Manual (Graphical User Interface version)
LFMM version 1.2 - Reference Manual (Graphical User Interface version) Eric Frichot 1, Sean Schoville 1, Guillaume Bouchard 2, Olivier François 1 * 1. Université Joseph Fourier Grenoble, Centre National
More informationKernels for Structured Data
T-122.102 Special Course in Information Science VI: Co-occurence methods in analysis of discrete data Kernels for Structured Data Based on article: A Survey of Kernels for Structured Data by Thomas Gärtner
More informationFisher vector image representation
Fisher vector image representation Jakob Verbeek January 13, 2012 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.11.12.php Fisher vector representation Alternative to bag-of-words image representation
More informationGPU Data Mining in Neuroimaging Genomics
GPU Data Mining in Neuroimaging Genomics Bob Zigon Beckman Coulter Indianapolis, Indiana May 10, 2017 1 / 20 Outline Background ANOVA for Voxels and SNPs VEGAS for Voxels and Genes High Speed GPU Monte-Carlo
More informationLASER: Locating Ancestry from SEquence Reads version 2.04
LASER: Locating Ancestry from SEquence Reads version 2.04 Chaolong Wang 1 Computational and Systems Biology Genome Institute of Singapore A*STAR, Singapore 138672, Singapore Xiaowei Zhan 2 Department of
More informationSEQGWAS: Integrative Analysis of SEQuencing and GWAS Data
SEQGWAS: Integrative Analysis of SEQuencing and GWAS Data SYNOPSIS SEQGWAS [--sfile] [--chr] OPTIONS Option Default Description --sfile specification.txt Select a specification file --chr Select a chromosome
More information