SWIFT: SCALABLE WEIGHTED ITERATIVE FLOW-CLUSTERING TECHNIQUE

Size: px
Start display at page:

Download "SWIFT: SCALABLE WEIGHTED ITERATIVE FLOW-CLUSTERING TECHNIQUE"

Transcription

1 SWIFT: SCALABLE WEIGHTED ITERATIVE FLOW-CLUSTERING TECHNIQUE Iftekhar Naim, Gaurav Sharma, Suprakash Datta, James S. Cavenaugh, Jyh-Chiang E. Wang, Jonathan A. Rebhahn, Sally A. Quataert, and Tim R. Mosmann University of Rochester, Rochester, NY York University, Toronto, ON FlowCAP Summit, / 48 SWIFT

2 OUTLINE 1 INTRODUCTION Flow cytometry (FC) Data Analysis Automated Multivariate clustering of FC Data 2 SWIFT METHOD FOR FC DATA ANALYSIS SWIFT Algorithm Weighted Iterative Sampling based EM Bimodality Splitting Graph-based Merging 3 DOES IT WORK? Does It Work? How Do We Know It Works? 4 FLOWCAP CONTEST Results on FlowCAP Datasets Few Thoughts for FlowCAP II 5 CONCLUSION 2 / 48 SWIFT

3 OUTLINE 1 INTRODUCTION Flow cytometry (FC) Data Analysis Automated Multivariate clustering of FC Data 2 SWIFT METHOD FOR FC DATA ANALYSIS SWIFT Algorithm Weighted Iterative Sampling based EM Bimodality Splitting Graph-based Merging 3 DOES IT WORK? Does It Work? How Do We Know It Works? 4 FLOWCAP CONTEST Results on FlowCAP Datasets Few Thoughts for FlowCAP II 5 CONCLUSION 3 / 48 SWIFT

4 FLOW CYTOMETRY (FC) OVERVIEW Rapid multivariate analysis of individual cells. High throughput data generation (description of 1 million cells). High dimensionality ( 20 measurements per cell). Fluorochrome Antibody Antigen Cell FIGURE: Flow cytometry system (Ref: 4 / 48 SWIFT

5 FC DATA ANALYSIS Traditionally FC data analyzed by Manual Gating Subjective, Scales poorly with increasing dimensions 1D/2D Projections may not represent full picture Inaccurate for overlapping clusters (a) Two overlapping (b) Combined view (c) Manual gating clusters FIGURE: Manual gating for overlapping clusters. Automated multivariate clustering is desirable for FC Data analysis. Repeatable, nonsubjective, comprehends multivariate structure. 5 / 48 SWIFT

6 CHALLENGES OF AUTOMATED CLUSTERING OF FC DATA Challenges of Automated Clustering: Large FC datasets 1 million events High dimensionality ( 20 or more dimensions) Very small clusters that are important in immunological analysis ( cells out of millions) Overlapping clusters and background noise 6 / 48 SWIFT

7 CHALLENGES OF AUTOMATED CLUSTERING OF FC DATA Challenges of Automated Clustering: Large FC datasets 1 million events High dimensionality ( 20 or more dimensions) Very small clusters that are important in immunological analysis ( cells out of millions) Overlapping clusters and background noise Our goal: Design automated clustering method capable of addressing these challenges 6 / 48 SWIFT

8 MANY DIFFERENT CLUSTERING METHODS Patitional Clustering Soft Hard Mixture Fuzzy K-means Model Clustering Grid Based Spectral Clustering... 7 / 48 SWIFT

9 MANY DIFFERENT CLUSTERING METHODS Patitional Clustering Soft Hard Mixture Fuzzy K-means Model Clustering Grid Based Spectral Clustering... 8 / 48 SWIFT

10 MODEL BASED CLUSTERING FOR FC DATA Model based clustering offers several advantages: Soft clustering- comprehends overlapping clusters, background noise BUT, computationally expensive and choice of model imposes limitation 9 / 48 SWIFT

11 MODEL BASED CLUSTERING FOR FC DATA Model based clustering offers several advantages: Soft clustering- comprehends overlapping clusters, background noise BUT, computationally expensive and choice of model imposes limitation Recent proposals for statistical model based FC clustering (Chan et al. [2008], Lo et al. [2008],Finak et al. [2009], Pyne et al. [2009]) 9 / 48 SWIFT

12 MODEL BASED CLUSTERING FOR FC DATA Model based clustering offers several advantages: Soft clustering- comprehends overlapping clusters, background noise BUT, computationally expensive and choice of model imposes limitation Recent proposals for statistical model based FC clustering (Chan et al. [2008], Lo et al. [2008],Finak et al. [2009], Pyne et al. [2009]) We propose computationally efficient model-based clustering method SWIFT (Naim et al. [2010]) that offers two advantages: Scalability: Faster Computation + Less Memory Usage Detection of Small Populations: 100 cells out of 1 million 9 / 48 SWIFT

13 OUTLINE 1 INTRODUCTION Flow cytometry (FC) Data Analysis Automated Multivariate clustering of FC Data 2 SWIFT METHOD FOR FC DATA ANALYSIS SWIFT Algorithm Weighted Iterative Sampling based EM Bimodality Splitting Graph-based Merging 3 DOES IT WORK? Does It Work? How Do We Know It Works? 4 FLOWCAP CONTEST Results on FlowCAP Datasets Few Thoughts for FlowCAP II 5 CONCLUSION 10 / 48 SWIFT

14 SWIFT ALGORITHM FOR FC DATA CLUSTERING SWIFT: a three stage algorithm: 11 / 48 SWIFT

15 SWIFT ALGORITHM FOR FC DATA CLUSTERING SWIFT: a three stage algorithm: 1 Weighted Iterative Sampling based EM : Gaussian mixture model clustering + novel weighted iterative sampling Bayesian Information Criterion (BIC) 11 / 48 SWIFT

16 SWIFT ALGORITHM FOR FC DATA CLUSTERING SWIFT: a three stage algorithm: 1 Weighted Iterative Sampling based EM : Gaussian mixture model clustering + novel weighted iterative sampling Bayesian Information Criterion (BIC) 2 Bimodality Splitting: Split any cluster that is, Bimodal in any dimensions or any principal components. Useful for clustering high dimensional data. 11 / 48 SWIFT

17 SWIFT ALGORITHM FOR FC DATA CLUSTERING SWIFT: a three stage algorithm: 1 Weighted Iterative Sampling based EM : Gaussian mixture model clustering + novel weighted iterative sampling Bayesian Information Criterion (BIC) 2 Bimodality Splitting: Split any cluster that is, Bimodal in any dimensions or any principal components. Useful for clustering high dimensional data. 3 Graph-based Merging: Merge overlapping Gaussians ( Hennig [2009], Finak et al. [2009], Baudry et al. [2010]). Allows representation of non-gaussian clusters 11 / 48 SWIFT

18 CLUSTERING STRATEGY: SWIFT GMM clustering with Sampling for k [K min, K max ] BIC to decide number of Gaussians ( ˆK) Split Bimodal Clusters until Unimodal. Results in K split Clusters Graph-based Merging using Overlap/Entropy criteria Results in K entropy Clusters Soft clustering for K entropy clusters 12 / 48 SWIFT

19 STAGE 1: GAUSSIAN MIXTURE MODEL CLUSTERING GMM clustering with Sampling for k [K min, K max ] BIC to decide number of Gaussians ( ˆK) Split Bimodal Clusters until Unimodal. Results in K split Clusters Graph-based Merging using Overlap/Entropy criteria Results in K entropy Clusters Soft clustering for K entropy clusters 13 / 48 SWIFT

20 STAGE 1: GAUSSIAN MIXTURE MODEL CLUSTERING Gaussian mixture model (GMM) clustering is chosen among the model based methods Faster than other model based clustering methods Closed form solution 14 / 48 SWIFT

21 STAGE 1: GAUSSIAN MIXTURE MODEL CLUSTERING Gaussian mixture model (GMM) clustering is chosen among the model based methods Faster than other model based clustering methods Closed form solution Expectation Maximization (EM) algorithm for parameter estimation Computational complexity of each iteration: O(Nkd 2 ) N = the number of data-vectors in the dataset k = is the number of Gaussian components d = is the dimension of each data-vectors 14 / 48 SWIFT

22 STAGE 1: SAMPLING FOR SCALABILITY Operate on smaller subsample of dataset for better computation performance. Challenge: Poor representation of smaller clusters. (a) 4 Gaussians with 150K, 100K, 50K and 150 datapoints (b) After 10% sampling 15 / 48 SWIFT

23 STAGE 1: SAMPLING FOR SCALABILITY Operate on smaller subsample of dataset for better computation performance. Challenge: Poor representation of smaller clusters. (c) 4 Gaussians with 150K, 100K, 50K and 150 datapoints (d) After 10% sampling Solution: Weighted iterative sampling Faster computation Better detection of small clusters 15 / 48 SWIFT

24 STAGE 1: WEIGHTED ITERATIVE SAMPLING BASED EM FCS Dataset X Subsample S from X GMM fitting to S using EM Fix p largest clusters and add them to F. Initially F = Resample S from X with probability l F γ (i) j All the clusters fixed? No Yes Perform few EM iterations on X Output model parameters (θ) 16 / 48 SWIFT

25 STAGE 1: WEIGHTED ITERATIVE SAMPLING BASED EM FCS Dataset X Subsample S from X F = set of clusters whose parameters are fixed. GMM fitting to S using EM Fix p largest clusters and add them to F. Initially F = Resample S from X with probability l F γ (i) j All the clusters fixed? No Yes Perform few EM iterations on X Output model parameters (θ) 16 / 48 SWIFT

26 STAGE 1: WEIGHTED ITERATIVE SAMPLING BASED EM FCS Dataset X Subsample S from X GMM fitting to S using EM F = set of clusters whose parameters are fixed. P(X (i) is selected in S) = l F γ (i) l Fix p largest clusters and add them to F. Initially F = Resample S from X with probability l F γ (i) j All the clusters fixed? No Yes Perform few EM iterations on X Output model parameters (θ) 16 / 48 SWIFT

27 STAGE 1: WEIGHTED ITERATIVE SAMPLING BASED EM FIGURE: 4 Gaussian clusters with 150K, 100K, 50K and 150 datapoints 17 / 48 SWIFT

28 WEIGHTED ITERATIVE SAMPLING: FIRST SAMPLE (a) First sample (b) Clustering first sample Uniform random sampling 18 / 48 SWIFT

29 WEIGHTED ITERATIVE SAMPLING: SECOND SAMPLE (c) Second sample (d) Clustering second sample Sampling probability: (1 l {1} γ (i) l ) 19 / 48 SWIFT

30 WEIGHTED ITERATIVE SAMPLING: THIRD SAMPLE (e) Third sample (f) Clustering third sample Sampling probability: (1 l {1,2} γ (i) l ) 20 / 48 SWIFT

31 WEIGHTED ITERATIVE SAMPLING: LAST SAMPLE (g) Last sample (h) Final clustering Sampling probability: (1 l {1,2,3} γ (i) l ) 21 / 48 SWIFT

32 STAGE 2: BIMODALITY SPLITTING GMM clustering with Sampling for k [K min, K max ] BIC to decide number of Gaussians ( ˆK) Split Bimodal Clusters until Unimodal. Results in K split Clusters Graph-based Merging using Overlap/Entropy criteria Results in K entropy Clusters Soft clustering for K entropy clusters 22 / 48 SWIFT

33 STAGE 2: BIMODALITY SPLITTING Motivated by Biology Separation along only one dimension can be significant 23 / 48 SWIFT

34 STAGE 2: BIMODALITY SPLITTING Motivated by Biology Separation along only one dimension can be significant Clustering is challenging for high-dimensional data. Curse of Dimensionality. Discrimination in one dimension can be obfuscated by strong similarity in other dimensions. 23 / 48 SWIFT

35 STAGE 2: BIMODALITY SPLITTING Motivated by Biology Separation along only one dimension can be significant Clustering is challenging for high-dimensional data. Problem: Curse of Dimensionality. Discrimination in one dimension can be obfuscated by strong similarity in other dimensions. Gaussian mixture model for high dimensional data: Sometimes results in small clusters that are bimodal in one or two dimensions 23 / 48 SWIFT

36 STAGE 2: BIMODALITY SPLITTING Motivated by Biology Separation along only one dimension can be significant Clustering is challenging for high-dimensional data. Problem: Curse of Dimensionality. Discrimination in one dimension can be obfuscated by strong similarity in other dimensions. Gaussian mixture model for high dimensional data: Solution: Sometimes results in small clusters that are bimodal in one or two dimensions Detect bimodal clusters and split them. 23 / 48 SWIFT

37 STAGE 2: BIMODALITY SPLITTING Bimodality Detection: detect clusters that are bimodal in Any given dimensions. Any principal components. Perform 1-D Kernel density estimation Compute number of modes Split each bimodal clusters until all subclusters are unimodal 24 / 48 SWIFT

38 STAGE 3: GRAPH-BASED MERGING GMM clustering with Sampling for k [K min, K max ] BIC to decide number of Gaussians ( ˆK) Split Bimodal Clusters until Unimodal. Results in K split Clusters Graph-based Merging using Overlap/Entropy criteria Results in K entropy Clusters Soft clustering for K entropy clusters 25 / 48 SWIFT

39 STAGE 3: GRAPH-BASED MERGING Merging of overlapping Gaussian components. Allows representing non-gaussian clusers. (k) After fitting 10 Gaussians (l) After merging down to 2 clusters 26 / 48 SWIFT

40 STAGE 3: GRAPH-BASED MERGING Merging Criterion: Normalized Overlap Measure (NO) Jaccard Index E i = Ellipsoid approximating i-th Gaussian NO(i,j) = Vol(E i E j ) Vol(E i E j ) (1) Merge (i,j ) such that, (i,j ) = max (i,j) NO(i,j) (2) 27 / 48 SWIFT

41 STAGE 3: GRAPH-BASED MERGING Merging Criterion: Normalized Overlap Measure (NO) Jaccard Index E i = Ellipsoid approximating i-th Gaussian NO(i,j) = Vol(E i E j ) Vol(E i E j ) (1) Merge (i,j ) such that, (i,j ) = max (i,j) NO(i,j) (2) Stopping Criterion: Merge until no significant changes in entropy (Finak et al. [2009], Baudry et al. [2010]). Ent(K) = n K i=1 j=1 γ (i) j log(γ (i) j ) (3) 27 / 48 SWIFT

42 STAGE 3: GRAPH-BASED MERGING 5 Gaussian clusters / 48 SWIFT

43 STAGE 3: GRAPH-BASED MERGING 5 Gaussian clusters / 48 SWIFT

44 STAGE 3: GRAPH-BASED MERGING 5 Gaussian clusters / 48 SWIFT

45 STAGE 3: GRAPH-BASED MERGING 5 Gaussian clusters / 48 SWIFT

46 OUTLINE 1 INTRODUCTION Flow cytometry (FC) Data Analysis Automated Multivariate clustering of FC Data 2 SWIFT METHOD FOR FC DATA ANALYSIS SWIFT Algorithm Weighted Iterative Sampling based EM Bimodality Splitting Graph-based Merging 3 DOES IT WORK? Does It Work? How Do We Know It Works? 4 FLOWCAP CONTEST Results on FlowCAP Datasets Few Thoughts for FlowCAP II 5 CONCLUSION 32 / 48 SWIFT

47 DOES IT WORK? Experiment: Cluster high-dimensional FC data Dataset: 544,000 Events 21 Dimensions 33 / 48 SWIFT

48 DOES IT WORK? Experiment: Cluster high-dimensional FC data Dataset: 544,000 Events 21 Dimensions SWIFT Output: 191 Gaussians (Gaussian fitting + Bimodality Splitting) 143 Clusters (Post Merging) 33 / 48 SWIFT

49 544,000 EVENTS, 21 DIMENSIONS, 143 CLUSTERS 34 / 48 SWIFT

50 HOW DO WE KNOW IT WORKS? Experiments to produce datasets with ground truth Rochester Human Immunology Center 35 / 48 SWIFT

51 HOW DO WE KNOW IT WORKS? Experiments to produce datasets with ground truth Rochester Human Immunology Center Electronic mixture of Human cells and Mouse cells Two datafiles: Human cells and Mouse cells only Human Datafile: 276,418 events Mouse Datafile: 267,582 events Total: 544,000 events Stained using both human and mouse antibodies. Human/mouse label is known for every cell. 35 / 48 SWIFT

52 HOW DO WE KNOW IT WORKS? Experiments to produce datasets with ground truth Rochester Human Immunology Center Electronic mixture of Human cells and Mouse cells Two datafiles: Human cells and Mouse cells only Human Datafile: 276,418 events Mouse Datafile: 267,582 events Total: 544,000 events Stained using both human and mouse antibodies. Human/mouse label is known for every cell. Examine every clusters: Human only? Mouse only? or Both? 35 / 48 SWIFT

53 FRACTIONAL MEMBERSHIP OF HUMAN AND MOUSE Fractional Membership (Human or Mouse) Human Mouse Cluster Numbers 36 / 48 SWIFT

54 SMALL CLUSTER DETECTION Electronic human-mouse mixture Varying proportions of human cells. Five datasets: 50%, 25%, 10%, 1%, 0.1% Human cells. 37 / 48 SWIFT

55 SMALL CLUSTER DETECTION Electronic human-mouse mixture Varying proportions of human cells. Five datasets: 50%, 25%, 10%, 1%, 0.1% Human cells. Sensitivity Analysis: Precision: Precision = Recall: Recall = TP TP+FP TP TP+FN 37 / 48 SWIFT

56 SMALL CLUSTER DETECTION Electronic human-mouse mixture Varying proportions of human cells. Five datasets: 50%, 25%, 10%, 1%, 0.1% Human cells. Sensitivity Analysis: Precision: Precision = Recall: Recall = TP TP+FP TP TP+FN % of Human cells Precision Recall Human Clusters 50% % % 84 25% % % 59 10% % % 38 1% % 92.2% % % 98.5% 4 37 / 48 SWIFT

57 ADVANTAGES OF SWIFT Scalable memory and computation time. Complexity of each EM iteration reduced from O(Nkd 2 ) to O(nkd 2 ) n = Sample size Better resolution of small clusters Capable of detecting non-gaussian clusters Works well for overlapping clusters (true for all model-based methods) 38 / 48 SWIFT

58 OUTLINE 1 INTRODUCTION Flow cytometry (FC) Data Analysis Automated Multivariate clustering of FC Data 2 SWIFT METHOD FOR FC DATA ANALYSIS SWIFT Algorithm Weighted Iterative Sampling based EM Bimodality Splitting Graph-based Merging 3 DOES IT WORK? Does It Work? How Do We Know It Works? 4 FLOWCAP CONTEST Results on FlowCAP Datasets Few Thoughts for FlowCAP II 5 CONCLUSION 39 / 48 SWIFT

59 CLUSTERING RESULTS: GVHD DATASET GvHD Dataset: Data Sample 001.fcs Events 6 dimensions 40 / 48 SWIFT

60 CLUSTERING RESULTS: GVHD DATASET GvHD Dataset: Data Sample 001.fcs Events 6 dimensions 13 Gaussians using BIC 11 Clusters after merging 40 / 48 SWIFT

61 CLUSTERING RESULTS: GVHD DATASET FL2.H FL1.H 41 / 48 SWIFT

62 CLUSTERING RESULTS: GVHD DATASET FL2.H FL1.H 42 / 48 SWIFT

63 CLUSTERING RESULTS: GVHD DATASET SSC.H SSC.H FSC.H FSC.H SSC.H SSC.H FSC.H FSC.H 43 / 48 SWIFT

64 FEW THOUGHTS FOR FLOWCAP II FlowCAP-I datasets were relatively small Less than 100,000 events. Maximum 12 dimensions Number of clusters usually smaller than / 48 SWIFT

65 FEW THOUGHTS FOR FLOWCAP II FlowCAP-I datasets were relatively small Less than 100,000 events. Maximum 12 dimensions Number of clusters usually smaller than 25 Introduction of larger datasets for FlowCAP II 1 millions events, 20 dimensions are common 44 / 48 SWIFT

66 FEW THOUGHTS FOR FLOWCAP II FlowCAP-I datasets were relatively small Less than 100,000 events. Maximum 12 dimensions Number of clusters usually smaller than 25 Introduction of larger datasets for FlowCAP II 1 millions events, 20 dimensions are common Introduction of different tasks and corresponding performance measure Detection of very small clusters Detection of overlapping populations 44 / 48 SWIFT

67 FEW THOUGHTS FOR FLOWCAP II FlowCAP-I datasets were relatively small Less than 100,000 events. Maximum 12 dimensions Number of clusters usually smaller than 25 Introduction of larger datasets for FlowCAP II 1 millions events, 20 dimensions are common Introduction of different tasks and corresponding performance measure Detection of very small clusters Detection of overlapping populations Gold standard for validation? Manual Gating: Focused rather than exhaustive Does not comprehend overlapping populations 44 / 48 SWIFT

68 FEW THOUGHTS FOR FLOWCAP II FlowCAP-I datasets were relatively small Less than 100,000 events. Maximum 12 dimensions Number of clusters usually smaller than 25 Introduction of larger datasets for FlowCAP II 1 millions events, 20 dimensions are common Introduction of different tasks and corresponding performance measure Detection of very small clusters Detection of overlapping populations Gold standard for validation? Manual Gating: Focused rather than exhaustive Does not comprehend overlapping populations Electronically Mixed Datasets: for objective evaluation Human/Mouse dataset 44 / 48 SWIFT

69 OUTLINE 1 INTRODUCTION Flow cytometry (FC) Data Analysis Automated Multivariate clustering of FC Data 2 SWIFT METHOD FOR FC DATA ANALYSIS SWIFT Algorithm Weighted Iterative Sampling based EM Bimodality Splitting Graph-based Merging 3 DOES IT WORK? Does It Work? How Do We Know It Works? 4 FLOWCAP CONTEST Results on FlowCAP Datasets Few Thoughts for FlowCAP II 5 CONCLUSION 45 / 48 SWIFT

70 CONCLUSION SWIFT: Scalable algorithm for FC data clustering Posterior sampling based EM + Bimodality Splitting + Graph-based merging Advantages: lower computational complexity + better small cluster resolution 46 / 48 SWIFT

71 CONCLUSION SWIFT: Scalable algorithm for FC data clustering Posterior sampling based EM + Bimodality Splitting + Graph-based merging Advantages: lower computational complexity + better small cluster resolution Extensible to other soft clustering methods Mixture of t, skewed t distributions or fuzzy clustering 46 / 48 SWIFT

72 CONCLUSION SWIFT: Scalable algorithm for FC data clustering Posterior sampling based EM + Bimodality Splitting + Graph-based merging Advantages: lower computational complexity + better small cluster resolution Extensible to other soft clustering methods Mixture of t, skewed t distributions or fuzzy clustering Further speed-up can be achieved by combining with parallelization. Parallelization using GPU (Suchard et al. [2010],Espenshade et al. [2009]). 46 / 48 SWIFT

73 CONCLUSION SWIFT: Scalable algorithm for FC data clustering Posterior sampling based EM + Bimodality Splitting + Graph-based merging Advantages: lower computational complexity + better small cluster resolution Extensible to other soft clustering methods Mixture of t, skewed t distributions or fuzzy clustering Further speed-up can be achieved by combining with parallelization. Parallelization using GPU (Suchard et al. [2010],Espenshade et al. [2009]). Future work: Improve stability and robustness Cross-sample cluster matching for biological inference. 46 / 48 SWIFT

74 C OLLABORATORS SWIFT Naim, Sharma, Datta, Cavenaugh, Rebhahn, Wang, Mosmann Acceleration Pangborn, Cavenaugh, von Laszewski NYICE Influenza GAFF Rebhahn, Cavenaugh, Naim, Sharma, Mosmann Rochester Human Immunology Center Lymphoma Quataert, Mosmann Bernstein, Quataert Treanor, Topham, Sant, Kim, Whittaker, Mosmann RPBIP Immunocompromised Sanz, Looney, Mosmann, Ritchlin, Anolik, Quataert Asthma ACE Autoimmunity Georas, Looney, Mosmann Sanz, Fowell, Looney, Quataert, Mosmann 47 / 48 SWIFT

75 REFERENCES J.P. Baudry, A.E. Raftery, G. Celeux, K. Lo, and R. Gottardo. Combining mixture components for clustering. Journal of Computational and Graphical Statistics, 19 (2): , C. Chan, F. Feng, J. Ottinger, D. Foster, M. West, and T.B. Kepler. Statistical mixture modeling for cell subtype identification in flow cytometry. Cytometry Part A, (8), J. Espenshade, A. Pangborn, G. von Laszewski, D. Roberts, and J.S. Cavenaugh. Accelerating Partitional Algorithms for Flow Cytometry on GPUs. pages , G. Finak, R. Gottardo, R. Brinkman, et al. Merging mixture components for cell population identification in flow cytometry. Advances in Bioinformatics, 2009, C. Hennig. Methods for merging Gaussian mixture components. Advances in Data Analysis and Classification, pages 1 32, K. Lo, R.R. Brinkman, and R. Gottardo. Automated gating of flow cytometry data via robust model-based clustering. Cytometry Part A, 73: , Iftekhar Naim, Suprakash Datta, Gaurav Sharma, James Cavenaugh, and Tim Mosmann. SWIFT: Scalable weighted iterative sampling for flow cytometry clustering. In Proc. IEEE Intl. Conf. Acoustics Speech and Sig. Proc., pages , Dallas, Texas, USA, Mar S. Pyne et al. Automated high-dimensional flow cytometric data analysis. PNAS, 106 (21):8519, / 48 SWIFT

Merging Mixture Components for Cell Population Identification in Flow Cytometry Data The flowmerge package

Merging Mixture Components for Cell Population Identification in Flow Cytometry Data The flowmerge package Merging Mixture Components for Cell Population Identification in Flow Cytometry Data The flowmerge package Greg Finak, Raphael Gottardo October 30, 2017 greg.finak@ircm.qc.ca, raphael.gottardo@ircm.qc.ca

More information

Robust Model-based Clustering of Flow Cytometry Data The flowclust package

Robust Model-based Clustering of Flow Cytometry Data The flowclust package Robust Model-based Clustering of Flow Cytometry Data The flowclust package Raphael Gottardo, Kenneth Lo July 12, 2018 rgottard@fhcrc.org, c.lo@stat.ubc.ca Contents 1 Licensing 2 2 Overview 2 3 Installation

More information

Automated Identification and Comparison of. Cytometry Data

Automated Identification and Comparison of. Cytometry Data FLOCK: A Density Based Clustering Method for Automated Identification and Comparison of Cell Populations in High Dimensional Flow Cytometry Data Max Yu Qian, Ph.D. Division of Biomedical Informatics and

More information

10701 Machine Learning. Clustering

10701 Machine Learning. Clustering 171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:

More information

INF 4300 Classification III Anne Solberg The agenda today:

INF 4300 Classification III Anne Solberg The agenda today: INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15

More information

Clustering Lecture 5: Mixture Model

Clustering Lecture 5: Mixture Model Clustering Lecture 5: Mixture Model Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics

More information

SGN (4 cr) Chapter 11

SGN (4 cr) Chapter 11 SGN-41006 (4 cr) Chapter 11 Clustering Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology February 25, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006 (4 cr) Chapter

More information

Motivation. Technical Background

Motivation. Technical Background Handling Outliers through Agglomerative Clustering with Full Model Maximum Likelihood Estimation, with Application to Flow Cytometry Mark Gordon, Justin Li, Kevin Matzen, Bryce Wiedenbeck Motivation Clustering

More information

Package flowmerge. February 8, 2018

Package flowmerge. February 8, 2018 Type Package Title Cluster Merging for Flow Cytometry Data Version 2.26.0 Date 2011-05-26 Package flowmerge February 8, 2018 Author Greg Finak , Raphael Gottardo

More information

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2 161 Machine Learning Hierarchical clustering Reading: Bishop: 9-9.2 Second half: Overview Clustering - Hierarchical, semi-supervised learning Graphical models - Bayesian networks, HMMs, Reasoning under

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

Clustering: Classic Methods and Modern Views

Clustering: Classic Methods and Modern Views Clustering: Classic Methods and Modern Views Marina Meilă University of Washington mmp@stat.washington.edu June 22, 2015 Lorentz Center Workshop on Clusters, Games and Axioms Outline Paradigms for clustering

More information

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning

More information

Tools and methods for model-based clustering in R

Tools and methods for model-based clustering in R Tools and methods for model-based clustering in R Bettina Grün Rennes 08 Cluster analysis The task of grouping a set of objects such that Objects in the same group are as similar as possible and Objects

More information

Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning. Unsupervised Learning. Manfred Huber Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training

More information

Misty Mountain A Parallel Clustering Method. Application to Fast Unsupervised Flow Cytometry Gating

Misty Mountain A Parallel Clustering Method. Application to Fast Unsupervised Flow Cytometry Gating Misty Mountain A Parallel Clustering Method. Application to Fast Unsupervised Flow Cytometry Gating István P. Sugár and Stuart C. Sealfon Department t of Neurology and Center for Translational Systems

More information

University of Florida CISE department Gator Engineering. Clustering Part 5

University of Florida CISE department Gator Engineering. Clustering Part 5 Clustering Part 5 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville SNN Approach to Clustering Ordinary distance measures have problems Euclidean

More information

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups

More information

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

More information

Bioimage Informatics

Bioimage Informatics Bioimage Informatics Lecture 13, Spring 2012 Bioimage Data Analysis (IV) Image Segmentation (part 2) Lecture 13 February 29, 2012 1 Outline Review: Steger s line/curve detection algorithm Intensity thresholding

More information

Clustering Relational Data using the Infinite Relational Model

Clustering Relational Data using the Infinite Relational Model Clustering Relational Data using the Infinite Relational Model Ana Daglis Supervised by: Matthew Ludkin September 4, 2015 Ana Daglis Clustering Data using the Infinite Relational Model September 4, 2015

More information

MSA220 - Statistical Learning for Big Data

MSA220 - Statistical Learning for Big Data MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups

More information

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.)

10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) 10. MLSP intro. (Clustering: K-means, EM, GMM, etc.) Rahil Mahdian 01.04.2016 LSV Lab, Saarland University, Germany What is clustering? Clustering is the classification of objects into different groups,

More information

Mixture Models and the EM Algorithm

Mixture Models and the EM Algorithm Mixture Models and the EM Algorithm Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Finite Mixture Models Say we have a data set D = {x 1,..., x N } where x i is

More information

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 205-206 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

flowtrans: A Package for Optimizing Data Transformations for Flow Cytometry

flowtrans: A Package for Optimizing Data Transformations for Flow Cytometry flowtrans: A Package for Optimizing Data Transformations for Flow Cytometry Greg Finak, Raphael Gottardo October 30, 2018 greg.finak@ircm.qc.ca, raphael.gottardo@ircm.qc.ca Contents 1 Licensing 2 2 Overview

More information

CS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering

CS435 Introduction to Big Data Spring 2018 Colorado State University. 3/21/2018 Week 10-B Sangmi Lee Pallickara. FAQs. Collaborative filtering W10.B.0.0 CS435 Introduction to Big Data W10.B.1 FAQs Term project 5:00PM March 29, 2018 PA2 Recitation: Friday PART 1. LARGE SCALE DATA AALYTICS 4. RECOMMEDATIO SYSTEMS 5. EVALUATIO AD VALIDATIO TECHIQUES

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning

More information

Bioimage Informatics

Bioimage Informatics Bioimage Informatics Lecture 12, Spring 2012 Bioimage Data Analysis (III): Line/Curve Detection Bioimage Data Analysis (IV) Image Segmentation (part 1) Lecture 12 February 27, 2012 1 Outline Review: Line/curve

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Tri-modal Human Body Segmentation

Tri-modal Human Body Segmentation Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4

More information

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering

Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms

More information

The K-modes and Laplacian K-modes algorithms for clustering

The K-modes and Laplacian K-modes algorithms for clustering The K-modes and Laplacian K-modes algorithms for clustering Miguel Á. Carreira-Perpiñán Electrical Engineering and Computer Science University of California, Merced http://faculty.ucmerced.edu/mcarreira-perpinan

More information

Cluster Analysis. Ying Shen, SSE, Tongji University

Cluster Analysis. Ying Shen, SSE, Tongji University Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group

More information

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms.

Keywords Clustering, Goals of clustering, clustering techniques, clustering algorithms. Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey of Clustering

More information

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple

More information

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,

More information

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin

Clustering K-means. Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, Carlos Guestrin Clustering K-means Machine Learning CSEP546 Carlos Guestrin University of Washington February 18, 2014 Carlos Guestrin 2005-2014 1 Clustering images Set of Images [Goldberger et al.] Carlos Guestrin 2005-2014

More information

The Projected Dip-means Clustering Algorithm

The Projected Dip-means Clustering Algorithm Theofilos Chamalis Department of Computer Science & Engineering University of Ioannina GR 45110, Ioannina, Greece thchama@cs.uoi.gr ABSTRACT One of the major research issues in data clustering concerns

More information

TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION. Prateek Verma, Yang-Kai Lin, Li-Fan Yu. Stanford University

TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION. Prateek Verma, Yang-Kai Lin, Li-Fan Yu. Stanford University TWO-STEP SEMI-SUPERVISED APPROACH FOR MUSIC STRUCTURAL CLASSIFICATION Prateek Verma, Yang-Kai Lin, Li-Fan Yu Stanford University ABSTRACT Structural segmentation involves finding hoogeneous sections appearing

More information

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning Unsupervised Learning Clustering and the EM Algorithm Susanna Ricco Supervised Learning Given data in the form < x, y >, y is the target to learn. Good news: Easy to tell if our algorithm is giving the

More information

Bioimage Informatics

Bioimage Informatics Bioimage Informatics Lecture 14, Spring 2012 Bioimage Data Analysis (IV) Image Segmentation (part 3) Lecture 14 March 07, 2012 1 Outline Review: intensity thresholding based image segmentation Morphological

More information

Introduction to Mobile Robotics

Introduction to Mobile Robotics Introduction to Mobile Robotics Clustering Wolfram Burgard Cyrill Stachniss Giorgio Grisetti Maren Bennewitz Christian Plagemann Clustering (1) Common technique for statistical data analysis (machine learning,

More information

Clustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures

Clustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures Clustering and Dissimilarity Measures Clustering APR Course, Delft, The Netherlands Marco Loog May 19, 2008 1 What salient structures exist in the data? How many clusters? May 19, 2008 2 Cluster Analysis

More information

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a

Part I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering

More information

Transfer Learning for Automatic Gating of Flow Cytometry Data

Transfer Learning for Automatic Gating of Flow Cytometry Data Transfer Learning for Automatic Gating of Flow Cytometry Data Gyemin Lee Lloyd Stoolman Department of Electrical Engineering and Computer Science Department of Pathology University of Michigan, Ann Arbor,

More information

Clustering. Supervised vs. Unsupervised Learning

Clustering. Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised

More information

Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

COMP 465: Data Mining Still More on Clustering

COMP 465: Data Mining Still More on Clustering 3/4/015 Exercise COMP 465: Data Mining Still More on Clustering Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Describe each of the following

More information

Clustering. Shishir K. Shah

Clustering. Shishir K. Shah Clustering Shishir K. Shah Acknowledgement: Notes by Profs. M. Pollefeys, R. Jin, B. Liu, Y. Ukrainitz, B. Sarel, D. Forsyth, M. Shah, K. Grauman, and S. K. Shah Clustering l Clustering is a technique

More information

Clustering with Confidence: A Low-Dimensional Binning Approach

Clustering with Confidence: A Low-Dimensional Binning Approach Clustering with Confidence: A Low-Dimensional Binning Approach Rebecca Nugent and Werner Stuetzle Abstract We present a plug-in method for estimating the cluster tree of a density. The method takes advantage

More information

Parallel and Hierarchical Mode Association Clustering with an R Package Modalclust

Parallel and Hierarchical Mode Association Clustering with an R Package Modalclust Parallel and Hierarchical Mode Association Clustering with an R Package Modalclust Surajit Ray and Yansong Cheng Department of Mathematics and Statistics Boston University 111 Cummington Street, Boston,

More information

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering SYDE 372 - Winter 2011 Introduction to Pattern Recognition Clustering Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 5 All the approaches we have learned

More information

Client Dependent GMM-SVM Models for Speaker Verification

Client Dependent GMM-SVM Models for Speaker Verification Client Dependent GMM-SVM Models for Speaker Verification Quan Le, Samy Bengio IDIAP, P.O. Box 592, CH-1920 Martigny, Switzerland {quan,bengio}@idiap.ch Abstract. Generative Gaussian Mixture Models (GMMs)

More information

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University K- Means + GMMs Clustering Readings: Murphy 25.5 Bishop 12.1, 12.3 HTF 14.3.0 Mitchell

More information

Introduction to Machine Learning

Introduction to Machine Learning Department of Computer Science, University of Helsinki Autumn 2009, second term Session 8, November 27 th, 2009 1 2 3 Multiplicative Updates for L1-Regularized Linear and Logistic Last time I gave you

More information

CS Introduction to Data Mining Instructor: Abdullah Mueen

CS Introduction to Data Mining Instructor: Abdullah Mueen CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 8: ADVANCED CLUSTERING (FUZZY AND CO -CLUSTERING) Review: Basic Cluster Analysis Methods (Chap. 10) Cluster Analysis: Basic Concepts

More information

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 10: Learning with Partially Observed Data Theo Rekatsinas 1 Partially Observed GMs Speech recognition 2 Partially Observed GMs Evolution 3 Partially Observed

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering K-means and Hierarchical Clustering Xiaohui Xie University of California, Irvine K-means and Hierarchical Clustering p.1/18 Clustering Given n data points X = {x 1, x 2,, x n }. Clustering is the partitioning

More information

Cluster analysis formalism, algorithms. Department of Cybernetics, Czech Technical University in Prague.

Cluster analysis formalism, algorithms. Department of Cybernetics, Czech Technical University in Prague. Cluster analysis formalism, algorithms Jiří Kléma Department of Cybernetics, Czech Technical University in Prague http://ida.felk.cvut.cz poutline motivation why clustering? applications, clustering as

More information

Model-Based Clustering for Online Crisis Identification in Distributed Computing

Model-Based Clustering for Online Crisis Identification in Distributed Computing Model-Based Clustering for Crisis Identification in Distributed Computing Dawn Woodard Operations Research and Information Engineering Cornell University with Moises Goldszmidt Microsoft Research 1 Outline

More information

Critical Assessment of Automated Flow Cytometry Data Analysis Techniques

Critical Assessment of Automated Flow Cytometry Data Analysis Techniques Critical Assessment of Automated Flow Cytometry Data Analysis Techniques Nima Aghaeepour, Greg Finak, The FlowCAP Consortium, The DREAM Consortium, Holger Hoos, Tim R. Mosmann, Raphael Gottardo, Ryan R.

More information

University of Florida CISE department Gator Engineering. Clustering Part 2

University of Florida CISE department Gator Engineering. Clustering Part 2 Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical

More information

Mondrian Processes for Flow Cytometry Analysis

Mondrian Processes for Flow Cytometry Analysis Mondrian Processes for Flow Cytometry Analysis Disi Ji, Eric Nalisnick, Padhraic Smyth Department of Computer Science University of California, Irvine {disij, enalisni, p.smyth}@uci.edu Abstract Analysis

More information

Application of Principal Components Analysis and Gaussian Mixture Models to Printer Identification

Application of Principal Components Analysis and Gaussian Mixture Models to Printer Identification Application of Principal Components Analysis and Gaussian Mixture Models to Printer Identification Gazi. Ali, Pei-Ju Chiang Aravind K. Mikkilineni, George T. Chiu Edward J. Delp, and Jan P. Allebach School

More information

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning Associate Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/comp551

More information

Clustering CS 550: Machine Learning

Clustering CS 550: Machine Learning Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf

More information

Expectation Maximization!

Expectation Maximization! Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University and http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Steps in Clustering Select Features

More information

Robust Event Boundary Detection in Sensor Networks A Mixture Model Based Approach

Robust Event Boundary Detection in Sensor Networks A Mixture Model Based Approach Robust Event Boundary Detection in Sensor Networks A Mixture Model Based Approach Min Ding Department of Computer Science The George Washington University Washington DC 20052, USA Email: minding@gwu.edu

More information

Warped Mixture Models

Warped Mixture Models Warped Mixture Models Tomoharu Iwata, David Duvenaud, Zoubin Ghahramani Cambridge University Computational and Biological Learning Lab March 11, 2013 OUTLINE Motivation Gaussian Process Latent Variable

More information

Hidden Markov Models. Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017

Hidden Markov Models. Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017 Hidden Markov Models Gabriela Tavares and Juri Minxha Mentor: Taehwan Kim CS159 04/25/2017 1 Outline 1. 2. 3. 4. Brief review of HMMs Hidden Markov Support Vector Machines Large Margin Hidden Markov Models

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 2014-2015 Jakob Verbeek, November 28, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15

More information

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany

More information

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other

More information

COMS 4771 Clustering. Nakul Verma

COMS 4771 Clustering. Nakul Verma COMS 4771 Clustering Nakul Verma Supervised Learning Data: Supervised learning Assumption: there is a (relatively simple) function such that for most i Learning task: given n examples from the data, find

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Clustering and EM Barnabás Póczos & Aarti Singh Contents Clustering K-means Mixture of Gaussians Expectation Maximization Variational Methods 2 Clustering 3 K-

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Collective Entity Resolution in Relational Data

Collective Entity Resolution in Relational Data Collective Entity Resolution in Relational Data I. Bhattacharya, L. Getoor University of Maryland Presented by: Srikar Pyda, Brett Walenz CS590.01 - Duke University Parts of this presentation from: http://www.norc.org/pdfs/may%202011%20personal%20validation%20and%20entity%20resolution%20conference/getoorcollectiveentityresolution

More information

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster

More information

Optimization of Observation Membership Function By Particle Swarm Method for Enhancing Performances of Speaker Identification

Optimization of Observation Membership Function By Particle Swarm Method for Enhancing Performances of Speaker Identification Proceedings of the 6th WSEAS International Conference on SIGNAL PROCESSING, Dallas, Texas, USA, March 22-24, 2007 52 Optimization of Observation Membership Function By Particle Swarm Method for Enhancing

More information

MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014

MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION. Steve Tjoa June 25, 2014 MACHINE LEARNING: CLUSTERING, AND CLASSIFICATION Steve Tjoa kiemyang@gmail.com June 25, 2014 Review from Day 2 Supervised vs. Unsupervised Unsupervised - clustering Supervised binary classifiers (2 classes)

More information

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06 Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,

More information

COMP5318 Knowledge Management & Data Mining Assignment 1

COMP5318 Knowledge Management & Data Mining Assignment 1 COMP538 Knowledge Management & Data Mining Assignment Enoch Lau SID 20045765 7 May 2007 Abstract 5.5 Scalability............... 5 Clustering is a fundamental task in data mining that aims to place similar

More information

Clustering and The Expectation-Maximization Algorithm

Clustering and The Expectation-Maximization Algorithm Clustering and The Expectation-Maximization Algorithm Unsupervised Learning Marek Petrik 3/7 Some of the figures in this presentation are taken from An Introduction to Statistical Learning, with applications

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University Kinds of Clustering Sequential Fast Cost Optimization Fixed number of clusters Hierarchical

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

Lec 08 Feature Aggregation II: Fisher Vector, Super Vector and AKULA

Lec 08 Feature Aggregation II: Fisher Vector, Super Vector and AKULA Image Analysis & Retrieval CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W 4-5:15pm@Bloch 0012 Lec 08 Feature Aggregation II: Fisher Vector, Super Vector and AKULA Zhu Li Dept of CSEE,

More information

Learning Generative Graph Prototypes using Simplified Von Neumann Entropy

Learning Generative Graph Prototypes using Simplified Von Neumann Entropy Learning Generative Graph Prototypes using Simplified Von Neumann Entropy Lin Han, Edwin R. Hancock and Richard C. Wilson Department of Computer Science The University of York YO10 5DD, UK Graph representations

More information

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,

More information

Content-based image and video analysis. Machine learning

Content-based image and video analysis. Machine learning Content-based image and video analysis Machine learning for multimedia retrieval 04.05.2009 What is machine learning? Some problems are very hard to solve by writing a computer program by hand Almost all

More information

Association Rule Mining and Clustering

Association Rule Mining and Clustering Association Rule Mining and Clustering Lecture Outline: Classification vs. Association Rule Mining vs. Clustering Association Rule Mining Clustering Types of Clusters Clustering Algorithms Hierarchical:

More information

Chapter 6 Continued: Partitioning Methods

Chapter 6 Continued: Partitioning Methods Chapter 6 Continued: Partitioning Methods Partitioning methods fix the number of clusters k and seek the best possible partition for that k. The goal is to choose the partition which gives the optimal

More information