Tight Clustering: a method for extracting stable and tight patterns in expression profiles
|
|
- Patricia Webb
- 6 years ago
- Views:
Transcription
1 Statistical issues in microarra analsis Tight Clustering: a method for etracting stable and tight patterns in epression profiles Eperimental design Image analsis Normalization George C. Tseng Dept. of Biostatistics & Human Genetics Universit of Pittsburgh Identif differentiall epressed genes Data visualization Clustering Regulator network Classification Data matri Heatmap (data visualization) Data: X={ ij } n d, an n (genes) d (samples) matri. row.names chromosome sample1 sample2 sample3 sample4 sample5 time time3 time5 time7 time NA 96669_at _at _at _at _at 15. NA NA NA NA NA 16378_at _at NA 98569_at 2. NA NA 93794_at _at _at _at _at 19. NA -.22 NA NA NA 95124_i_at _at _at _at NA 99674_at _at row.names chromosome sampl1 sample2 sample3 sample4 sample5 time time3 time5 time7 time NA 96669_at _at _at _at _at 15. NA NA NA NA NA 16378_at _at NA 98569_at 2. NA NA 93794_at _at _at _at _at 19. NA -.22 NA NA NA 95124_i_at _at _at _at NA 99674_at _at
2 Wh clustering: Cluster genes: similar epression pattern implies co-regulation. Although man sophisticated methods for detecting regulator interactions (e.g. Shortest-path and Liquid Association), cluster analsis remains a useful routine in arra analsis. Subsequent analsis: Identif novel genes participating in known cellular process Enrichment of particular Gene Ontolog (GO) terms in clusters Motif finding in clusters Cluster samples: identif potential sub-classes of disease Clustering in microarra: an eample Gene epression during the life ccle of Drosophila melanogaster. (22) Science 297: genes monitored. Reference sample is pooled from all samples. 66 sequential time points spanning embronic (E), larval (L), pupal (P) and adult (A) periods. Filter genes without significant pattern (11 genes) and standardize each gene to have mean and stdev 1. Eample: Data from life ccle of Drosophila melanogaster. (22) Science 297: k=1 k=15 k=3 Main challenges for clustering in microarra Challenge 1: Lots of scattered genes. i.e. genes not belonging to an tight cluster of biological function. K-means Clustering looks informative A closer look, however, finds lots of noises in each cluster
3 Main challenges for clustering in microarra Challenge 2: Microarra is an eplorator tool to guide further biological eperiments Hpothesis driven: hpothesis => eperimental data. Data driven: high-throughput eperiment => data mining => hpothesis => further validation eperiment Important to provide the most informative clusters instead of lots of loose clusters (reduce false positives). Current Methods Dimension reduction and data visualization: Principle Component Analsis (PCA) (Alter 2) Multi-Dimensional Scaling (MDS) Clustering methods Hierarchical Clustering (Eisen 1998) K-means (Hartigan 1975) K-memoids Self-Organizing Map (SOM) (Tamao 1999) CLICK (Ron Shamir 21) Model-based approach (Frale and Rafter 1998) Model-based approach Model-based approach Frale and Rafter (1998) applied a Gaussian miture model. (1)EM algorithm to maimize the classification likelihood. (2) Baesian Information Criterion (BIC) for determining k and the compleit of the covariance matri. Advantage: A sound probabilistic model for inference: model selection and estimation Can easil etend to model scattered genes Problems: Local minimum Model selection is usuall inapplicable in arra data; BIC is approimate
4 K-means clustering Procedures: Step 1: estimate the number of clusters, k. Step 2: minimize the within-cluster dispersion to the cluster centers. k 2 W ( k) = i C j j= 1 i Cj Note: 1. Points should be in Euclidean space. 2. Optimization performed b iterative relocation algorithms. Local minimum inevitable. 3. k has to be correctl estimated. K-means clustering K-means is a special case of model-based approach. Problems: Local minimum Does not allow scattered genes Estimation of number of clusters k Hierarchical clustering Estimate the number of clusters k: Milligan & Cooper(1985) compared 3 published rules. 1. Calinski & Harabasz (1974) 2. Hartigan (1975) B( k) /( k 1) ma CH ( k) = W ( k) /( n k), Stop when H(k)<1 3. Tibshirani, Walther & Hastie (2) * ma Gap ( k) = E (log( W ( k))) log( W ( k)) n n 4. Tibshirani et al(21), Dudoit & Fridland(22) Prediction-based resampling approach. Hierarchical clustering Iterativel agglomerate nearest nodes to form bottom-up tree. Single Linkage: shortest distance between points in the two nodes. Complete Linkage: largest distance between points in the two nodes. Note: Clusters can be obtained b cutting the hierarchical tree. 4
5 Eample of hierarchical clustering Hierarchical clustering Eisen et al 1998 Other Methods Current methods aim to find tight clusters: 1. CLICK: graph-theoretical techniques to find tight kernels. Several heuristic procedures then used to epand the kernels into full clustering. 2. Committee algorithm: similar idea to find tight committees and then epand to full clustering. Traditional: Estimate the number of clusters, k. (ecept for hierarchical clustering) Perform clustering through assigning all genes into clusters Tight Clustering: Directl identif informative, tight and stable clusters with reasonable size, sa, 2~6 genes. Need not estimate k!! Need not assign all genes into clusters
6 whole data Tight Clustering subsample subsample 2 judgement b subsample 1 judgement b subsample Original Data X co-membership matri D[C(X', k), X] X={ ij } n d : data to be clustered. X'={' ij } n/2 d : random sub-sample C(X', k)=(c 1, C 2,, C k ): the cluster centers obtained from clustering X' into k clusters. sub-sample X' K-means cluster centers C(X', k)=(c 1,, C k ) D[C(X', k), X] : an n n matri denoting co-membership relations of X classified b C(X', k). (Tibshirani 21) D[C(X', k), X] ij =1 if i and j in the same cluster. = o.w. Vi I Vj s(v i,v j) = V U V i j :a measure of similarit of two sets of genes 6
7 Algorithm 1 (when fiing k): 1. Fi k. Random sub-sampling X (1),, X (B). Define the average co-membership matri to be (1) (B) D = mean( D[C(X, k), X], K, D[C(X, k), X] ). Note: a. D ij =1 i and j alwas clustered together in each sub-sampling judgment. b. D ij = i and j never clustered together in each sub-sampling judgment. c. Dii = 1 i Algorithm 1 (when fiing k): (cont d) 2. Search for a large set of points V = { v 1, K, vm} {1, K, n} such that Dv i v j 1 α i, j α close to. Sets with this propert are candidates of tight clusters. Order sets with this propert b their size to obtain V k1,v k2, Tight Clustering Algorithm: k k 1 k 2 k Tight Clustering Algorithm: 1. Start with a suitable k. Search for consecutive k s and choose the top 3 clusters for each k. V k,1 V k,2 V k,3.7.1 V( k +1),1.1 V( k +1), V( k +1),3.1 V( k +2), V( k +2), V( k +2),3.17 V( k +3), V( k +3), V( k +3),3 { Vk, Vk 2, Vk 3},{ V( k + 1)1, V( k + 1)2, V( k 1) 3}, K 2. Stop when s( V, V Select 1 + k ' l ( k ' + 1) m k' k, V ( k ' + 1) m ) β, s( V k + m, V k + ( ' 1) ( ' 2) n ) β l, m, n {1,2,3}, β close to1 to be the tightest cluster. 7
8 Tight Clustering Algorithm: (cont d) 3. Identif the tightest cluster and remove it from the whole data. 4. Decrease k b 1. Repeat 1.~3. to identif the net tight cluster. Remark: α, β and k determines the tightness and size of resulting clusters. Simulation A simple simulation on 2-D: 14 clusters normall distributed (5 points each) plus 175 sporadic points. Stdev=.1,.2,, Simulation Tight clustering on simulated data: α =, β =.7, B = 1, k = 1, 2, 25 and remain truth alpha beta.7 k= k= k= k= Simulation k = 25, α =, β =.7, B =
9 Eample 1: Data from life ccle of Drosophila melanogaster. (22) Science 297: Tight Clustering α =.1, β =.6, k = 15 Eample 1: Data from life ccle of Drosophila melanogaster. (22) Science 297: k=1 k=15 k= K-means Clustering looks informative. 11 clusters and 661 remaining scattered genes A closer look, however, finds lots of noises in each cluster. Comparison: a corresponding cluster of K-means & Tight Clustering 22 common genes Eample 1: Data from life ccle of Drosophila melanogaster. (22) Science 297: Tight Clustering total of 28 genes K-means clustering total of 18 genes Eample 2: Mouse embronic eperiment Mouse embronic eperiment: oligonucleotide arra (U74Av2 mouse arra from Affmetri) containing probe sets for about 1, mouse genes. Totall 126 samples. Half of them are from different stages of mouse embronic development. The remaining half is a diverse collection of samples from various tissues, including several tpes of adult stem cells. Mean sq. distance:
10 Eample 2: Mouse embronic eperiment Comparison of various K-means and tight clustering: Eample 3: Simulated data A. simulated gene epression of 15 clusters and 5 scattered genes. B. Randoml permuted from A. a. K-means b. K-memoid c. SOM d. CLICK e. Model-based clustering f. Tight clustering Eample 3: Simulated data Adjusted Rand inde is a measure to compare similarit of two clustering results. We compare clustering results from each method to the underling truth. Ongoing developments Theoretical foundation for re-sampling approach. Multi-resolution tight clustering. Etend the idea to bi-clustering. Incorporating multiple tight clustering results. Other general and fundamental problems in clustering. 1
11 tightclust: a software for Tight Clustering Acknowledgement: Harvard: Wing H. Wong (Department of Statistics) Inputs from: Chen Li (Department of Biostatistics) Rung Kim Richard Zhong 11
Discussion: Clustering Random Curves Under Spatial Dependence
Discussion: Clustering Random Curves Under Spatial Dependence Gareth M. James, Wenguang Sun and Xinghao Qiao Abstract We discuss the advantages and disadvantages of a functional approach to clustering
More information6.867 Machine learning
6.867 Machine learning Final eam December 3, 24 Your name and MIT ID: J. D. (Optional) The grade ou would give to ourself + a brief justification. A... wh not? Problem 5 4.5 4 3.5 3 2.5 2.5 + () + (2)
More information6.867 Machine learning
6.867 Machine learning Final eam December 3, 24 Your name and MIT ID: J. D. (Optional) The grade ou would give to ourself + a brief justification. A... wh not? Cite as: Tommi Jaakkola, course materials
More informationClustering. Supervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationUnsupervised Learning. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis
7 Supervised learning vs unsupervised learning Unsupervised Learning Supervised learning: discover patterns in the data that relate data attributes with a target (class) attribute These patterns are then
More informationk-means Gaussian mixture model Maximize the likelihood exp(
k-means Gaussian miture model Maimize the likelihood Centers : c P( {, i c j,...,, c n },...c k, ) ep( i c j ) k-means P( i c j, ) ep( c i j ) Minimize i c j Sum of squared errors (SSE) criterion (k clusters
More informationClustering Part 2. A Partitional Clustering
Universit of Florida CISE department Gator Engineering Clustering Part Dr. Sanja Ranka Professor Computer and Information Science and Engineering Universit of Florida, Gainesville Universit of Florida
More informationOutline. Advanced Digital Image Processing and Others. Importance of Segmentation (Cont.) Importance of Segmentation
Advanced Digital Image Processing and Others Xiaojun Qi -- REU Site Program in CVIP (7 Summer) Outline Segmentation Strategies and Data Structures Algorithms Overview K-Means Algorithm Hidden Markov Model
More informationHigh throughput Data Analysis 2. Cluster Analysis
High throughput Data Analysis 2 Cluster Analysis Overview Why clustering? Hierarchical clustering K means clustering Issues with above two Other methods Quality of clustering results Introduction WHY DO
More informationEvaluation and comparison of gene clustering methods in microarray analysis
Evaluation and comparison of gene clustering methods in microarray analysis Anbupalam Thalamuthu 1 Indranil Mukhopadhyay 1 Xiaojing Zheng 1 George C. Tseng 1,2 1 Department of Human Genetics 2 Department
More informationUnsupervised Learning
Unsupervised Learning Pierre Gaillard ENS Paris September 28, 2018 1 Supervised vs unsupervised learning Two main categories of machine learning algorithms: - Supervised learning: predict output Y from
More informationA Quick Guide for the EMCluster Package
A Quick Guide for the EMCluster Package Wei-Chen Chen 1, Ranjan Maitra 2 1 pbdr Core Team 2 Department of Statistics, Iowa State Universit, Ames, IA, USA Contents Acknowledgement ii 1. Introduction 1 2.
More informationModule 3 Graph Theoretic Segmentation
Module 3 Graph Theoretic Segmentation Scott T. Acton Virginia Image and Video Analsis VIVA Charles L. Brown Department of Electrical and Computer Engineering Department of Biomedical Engineering Universit
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar What is Cluster Analsis? Finding groups of objects such that the
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised
More informationUnsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi
Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which
More informationChapters 11 and 13, Graph Data Mining
CSI 4352, Introduction to Data Mining Chapters 11 and 13, Graph Data Mining Young-Rae Cho Associate Professor Department of Computer Science Balor Universit Graph Representation Graph An ordered pair GV,E
More informationMath 1050 Lab Activity: Graphing Transformations
Math 00 Lab Activit: Graphing Transformations Name: We'll focus on quadratic functions to eplore graphing transformations. A quadratic function is a second degree polnomial function. There are two common
More informationHierarchical clustering. Copyright 2000, Kevin Wayne 1
Hierarchical Clustering continued & more about trees Clustering genes in microarra eperiments Function prediction Genetic networks Pathwa discover Gene regulation studies Comparative genomics How does
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning
More informationMicroarray data analysis
Microarray data analysis Computational Biology IST Technical University of Lisbon Ana Teresa Freitas 016/017 Microarrays Rows represent genes Columns represent samples Many problems may be solved using
More informationNon-linear models. Basis expansion. Overfitting. Regularization.
Non-linear models. Basis epansion. Overfitting. Regularization. Petr Pošík Czech Technical Universit in Prague Facult of Electrical Engineering Dept. of Cbernetics Non-linear models Basis epansion.....................................................................................................
More informationstreammoa: Interface to Algorithms from MOA for stream
streammoa: Interface to Algorithms from MOA for stream Matthew Bolaños Southern Methodist Universit John Forrest Microsoft Michael Hahsler Southern Methodist Universit Abstract This packages provides an
More informationTan,Steinbach, Kumar Introduction to Data Mining 4/18/ Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter Introduction to Data Mining b Tan, Steinbach, Kumar What is Cluster Analsis? Finding groups of objects such that the
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 Introduction to Data Mining b Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster
More informationDECISION-TREE-BASED MULTICLASS SUPPORT VECTOR MACHINES. Fumitake Takahashi, Shigeo Abe
DECISION-TREE-BASED MULTICLASS SUPPORT VECTOR MACHINES Fumitake Takahashi, Shigeo Abe Graduate School of Science and Technology, Kobe University, Kobe, Japan (E-mail: abe@eedept.kobe-u.ac.jp) ABSTRACT
More informationLinear Programming. Revised Simplex Method, Duality of LP problems and Sensitivity analysis
Linear Programming Revised Simple Method, Dualit of LP problems and Sensitivit analsis Introduction Revised simple method is an improvement over simple method. It is computationall more efficient and accurate.
More information! Introduction. ! Partitioning methods. ! Hierarchical methods. ! Model-based methods. ! Density-based methods. ! Scalability
Preview Lecture Clustering! Introduction! Partitioning methods! Hierarchical methods! Model-based methods! Densit-based methods What is Clustering?! Cluster: a collection of data objects! Similar to one
More informationScale Invariant Feature Transform (SIFT) CS 763 Ajit Rajwade
Scale Invariant Feature Transform (SIFT) CS 763 Ajit Rajwade What is SIFT? It is a technique for detecting salient stable feature points in an image. For ever such point it also provides a set of features
More informationPredictor Selection Algorithm for Bayesian Lasso
Predictor Selection Algorithm for Baesian Lasso Quan Zhang Ma 16, 2014 1 Introduction The Lasso [1] is a method in regression model for coefficients shrinkage and model selection. It is often used in the
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More information9/17/2009. Wenyan Li (Emily Li) Sep. 15, Introduction to Clustering Analysis
Introduction ti to K-means Algorithm Wenan Li (Emil Li) Sep. 5, 9 Outline Introduction to Clustering Analsis K-means Algorithm Description Eample of K-means Algorithm Other Issues of K-means Algorithm
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationBiclustering for Microarray Data: A Short and Comprehensive Tutorial
Biclustering for Microarray Data: A Short and Comprehensive Tutorial 1 Arabinda Panda, 2 Satchidananda Dehuri 1 Department of Computer Science, Modern Engineering & Management Studies, Balasore 2 Department
More informationTan,Steinbach, Kumar Introduction to Data Mining 4/18/ Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter Introduction to Data Mining b Tan, Steinbach, Kumar What is Cluster Analsis? Finding groups of objects such that the
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationFeature-Based Dissimilarity Space Classification
Feature-Based Dissimilarit Space Classification Robert P.W. Duin 1, Marco Loog 1,Elżbieta Pȩkalska 2, and David M.J. Ta 1 1 Facult of Electrical Engineering, Mathematics and Computer Sciences, Delft Universit
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationGlobal Ordering For Multi-dimensional Data: Comparison with K-means Clustering
DIMACS Technical Report 9- April 9 Global Ordering For Multi-dimensional Data: Comparison with K-means Clustering b Baiang Liu Dept. of Computer Science Rutgers Universit New Brunswick, New Jerse 89 Casimir
More informationClustering. Lecture 6, 1/24/03 ECS289A
Clustering Lecture 6, 1/24/03 What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed
More informationBiclustering Bioinformatics Data Sets. A Possibilistic Approach
Possibilistic algorithm Bioinformatics Data Sets: A Possibilistic Approach Dept Computer and Information Sciences, University of Genova ITALY EMFCSC Erice 20/4/2007 Bioinformatics Data Sets Outline Introduction
More informationECS 234: Data Analysis: Clustering ECS 234
: Data Analysis: Clustering What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed
More informationChapter 6: Cluster Analysis
Chapter 6: Cluster Analysis The major goal of cluster analysis is to separate individual observations, or items, into groups, or clusters, on the basis of the values for the q variables measured on each
More informationClustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures
Clustering and Dissimilarity Measures Clustering APR Course, Delft, The Netherlands Marco Loog May 19, 2008 1 What salient structures exist in the data? How many clusters? May 19, 2008 2 Cluster Analysis
More informationClustering fundamentals
Elena Baralis, Tania Cerquitelli Politecnico di Torino What is Cluster Analsis? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from
More informationWhere we are. Exploratory Graph Analysis (40 min) Focused Graph Mining (40 min) Refinement of Query Results (40 min)
Where we are Background (15 min) Graph models, subgraph isomorphism, subgraph mining, graph clustering Eploratory Graph Analysis (40 min) Focused Graph Mining (40 min) Refinement of Query Results (40 min)
More informationCluster Analysis: Agglomerate Hierarchical Clustering
Cluster Analysis: Agglomerate Hierarchical Clustering Yonghee Lee Department of Statistics, The University of Seoul Oct 29, 2015 Contents 1 Cluster Analysis Introduction Distance matrix Agglomerative Hierarchical
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationClustering gene expression data
Clustering gene expression data 1 How Gene Expression Data Looks Entries of the Raw Data matrix: Ratio values Absolute values Row = gene s expression pattern Column = experiment/condition s profile genes
More informationSolution Guide II-D. Classification. HALCON Progress
Solution Guide II-D Classification HALCON 17.12 Progress How to use classification, Version 17.12 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted
More informationMachine Learning in Biology
Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant
More informationDouble Self-Organizing Maps to Cluster Gene Expression Data
Double Self-Organizing Maps to Cluster Gene Expression Data Dali Wang, Habtom Ressom, Mohamad Musavi, Cristian Domnisoru University of Maine, Department of Electrical & Computer Engineering, Intelligent
More informationSTAD Research Report 2015/02. Parsimonious Time Series Clustering.
STAD Research Report 2015/02 Parsimonious Time Series Clustering. arxiv:1509.00729v1 [stat.me] 2 Sep 2015 Carmela Iorio*, Gianluca Frasso***, Antonio D Ambrosio*,Roberta Siciliano** *Department of Economics
More informationA Hybrid Intelligent System for Fault Detection in Power Systems
A Hybrid Intelligent System for Fault Detection in Power Systems Hiroyuki Mori Hikaru Aoyama Dept. of Electrical and Electronics Eng. Meii University Tama-ku, Kawasaki 14-8571 Japan Toshiyuki Yamanaka
More informationMeasure of Distance. We wish to define the distance between two objects Distance metric between points:
Measure of Distance We wish to define the distance between two objects Distance metric between points: Euclidean distance (EUC) Manhattan distance (MAN) Pearson sample correlation (COR) Angle distance
More informationSupervised vs. Unsupervised Learning
Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now
More informationSolution Guide II-D. Classification. Building Vision for Business. MVTec Software GmbH
Solution Guide II-D Classification MVTec Software GmbH Building Vision for Business Overview In a broad range of applications classification is suitable to find specific objects or detect defects in images.
More informationExpectation Maximization (EM) and Gaussian Mixture Models
Expectation Maximization (EM) and Gaussian Mixture Models Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 2 3 4 5 6 7 8 Unsupervised Learning Motivation
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationAnnouncements. Recognition I. Optical Flow: Where do pixels move to? dy dt. I + y. I = x. di dt. dx dt. = t
Announcements I Introduction to Computer Vision CSE 152 Lecture 18 Assignment 4: Due Toda Assignment 5: Posted toda Read: Trucco & Verri, Chapter 10 on recognition Final Eam: Wed, 6/9/04, 11:30-2:30, WLH
More informationWhat is clustering. Organizing data into clusters such that there is high intra- cluster similarity low inter- cluster similarity
Clustering What is clustering Organizing data into clusters such that there is high intra- cluster similarity low inter- cluster similarity Informally, finding natural groupings among objects. High dimensional
More informationthe power of machine vision Solution Guide II-D Classification
the power of machine vision Solution Guide II-D Classification How to use classification, Version 12.0.2 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system,
More informationSolution Guide II-D. Classification. Building Vision for Business. MVTec Software GmbH
Solution Guide II-D Classification MVTec Software GmbH Building Vision for Business How to use classification, Version 10.0.4 All rights reserved. No part of this publication may be reproduced, stored
More information10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2
161 Machine Learning Hierarchical clustering Reading: Bishop: 9-9.2 Second half: Overview Clustering - Hierarchical, semi-supervised learning Graphical models - Bayesian networks, HMMs, Reasoning under
More informationFigure (5) Kohonen Self-Organized Map
2- KOHONEN SELF-ORGANIZING MAPS (SOM) - The self-organizing neural networks assume a topological structure among the cluster units. - There are m cluster units, arranged in a one- or two-dimensional array;
More informationCluster Analysis for Microarray Data
Cluster Analysis for Microarray Data Seventh International Long Oligonucleotide Microarray Workshop Tucson, Arizona January 7-12, 2007 Dan Nettleton IOWA STATE UNIVERSITY 1 Clustering Group objects that
More informationand Algorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 8/30/ Introduction to Data Mining 08/06/2006 1
Cluster Analsis: Basic Concepts and Algorithms Dr. Hui Xiong Rutgers Universit Introduction to Data Mining 8//6 Introduction to Data Mining 8/6/6 What is Cluster Analsis? Finding groups of objects such
More informationCross-validation for detecting and preventing overfitting
Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.
More informationData Mining. Cluster Analysis: Basic Concepts and Algorithms
Data Mining Cluster Analsis: Basic Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining /8/ What is Cluster Analsis? Finding groups of objects such that the objects in a group will
More informationClustering Analysis Basics
Clustering Analysis Basics Ke Chen Reading: [Ch. 7, EA], [5., KPM] Outline Introduction Data Types and Representations Distance Measures Major Clustering Methodologies Summary Introduction Cluster: A collection/group
More informationA Line Drawings Degradation Model for Performance Characterization
A Line Drawings Degradation Model for Performance Characterization 1 Jian Zhai, 2 Liu Wenin, 3 Dov Dori, 1 Qing Li 1 Dept. of Computer Engineering and Information Technolog; 2 Dept of Computer Science
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationUnsupervised Learning
Unsupervised Learning A review of clustering and other exploratory data analysis methods HST.951J: Medical Decision Support Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision
More informationMachine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme
Machine Learning B. Unsupervised Learning B.1 Cluster Analysis Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany
More informationHard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering
An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other
More informationFitting a transformation: Feature-based alignment April 30 th, Yong Jae Lee UC Davis
Fitting a transformation: Feature-based alignment April 3 th, 25 Yong Jae Lee UC Davis Announcements PS2 out toda; due 5/5 Frida at :59 pm Color quantization with k-means Circle detection with the Hough
More informationWhat and Why Transformations?
2D transformations What and Wh Transformations? What? : The geometrical changes of an object from a current state to modified state. Changing an object s position (translation), orientation (rotation)
More informationGlobal Optimization with MATLAB Products
Global Optimization with MATLAB Products Account Manager 이장원차장 Application Engineer 엄준상 The MathWorks, Inc. Agenda Introduction to Global Optimization Peaks Surve of Solvers with Eamples 8 MultiStart 6
More informationClustering and The Expectation-Maximization Algorithm
Clustering and The Expectation-Maximization Algorithm Unsupervised Learning Marek Petrik 3/7 Some of the figures in this presentation are taken from An Introduction to Statistical Learning, with applications
More informationStructured prediction using the network perceptron
Structured prediction using the network perceptron Ta-tsen Soong Joint work with Stuart Andrews and Prof. Tony Jebara Motivation A lot of network-structured data Social networks Citation networks Biological
More informationIntroduction to GE Microarray data analysis Practical Course MolBio 2012
Introduction to GE Microarray data analysis Practical Course MolBio 2012 Claudia Pommerenke Nov-2012 Transkriptomanalyselabor TAL Microarray and Deep Sequencing Core Facility Göttingen University Medical
More informationMethods for Intelligent Systems
Methods for Intelligent Systems Lecture Notes on Clustering (II) Davide Eynard eynard@elet.polimi.it Department of Electronics and Information Politecnico di Milano Davide Eynard - Lecture Notes on Clustering
More informationFall 2017 ECEN Special Topics in Data Mining and Analysis
Fall 2017 ECEN 689-600 Special Topics in Data Mining and Analysis Nick Duffield Department of Electrical & Computer Engineering Teas A&M University Organization Organization Instructor: Nick Duffield,
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition
Data Mining Cluster Analsis: Basic Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining, nd Edition b Tan, Steinbach, Karpatne, Kumar What is Cluster Analsis? Finding groups
More informationIEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. X, X X 1
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. XX, NO. X, X X 1 Joint video frame set division and low-rank decomposition for background subtraction Jiajun Wen, Yong Xu, Member, IEEE,
More informationStatistically Analyzing the Impact of Automated ETL Testing on Data Quality
Chapter 5 Statisticall Analzing the Impact of Automated ETL Testing on Data Qualit 5.0 INTRODUCTION In the previous chapter some prime components of hand coded ETL prototpe were reinforced with automated
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationUnlabeled Data Classification by Support Vector Machines
Unlabeled Data Classification by Support Vector Machines Glenn Fung & Olvi L. Mangasarian University of Wisconsin Madison www.cs.wisc.edu/ olvi www.cs.wisc.edu/ gfung The General Problem Given: Points
More informationAPPLICATION OF RECIRCULATION NEURAL NETWORK AND PRINCIPAL COMPONENT ANALYSIS FOR FACE RECOGNITION
APPLICATION OF RECIRCULATION NEURAL NETWORK AND PRINCIPAL COMPONENT ANALYSIS FOR FACE RECOGNITION Dmitr Brliuk and Valer Starovoitov Institute of Engineering Cbernetics, Laborator of Image Processing and
More informationAssociation Rule Mining and Clustering
Association Rule Mining and Clustering Lecture Outline: Classification vs. Association Rule Mining vs. Clustering Association Rule Mining Clustering Types of Clusters Clustering Algorithms Hierarchical:
More informationChapter 3. Interpolation. 3.1 Introduction
Chapter 3 Interpolation 3 Introduction One of the fundamental problems in Numerical Methods is the problem of interpolation, that is given a set of data points ( k, k ) for k =,, n, how do we find a function
More informationCS 157: Assignment 6
CS 7: Assignment Douglas R. Lanman 8 Ma Problem : Evaluating Conve Polgons This write-up presents several simple algorithms for determining whether a given set of twodimensional points defines a conve
More informationMachine Learning 15/04/2015. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis
// Supervised learning vs unsupervised learning Machine Learning Unsupervised Learning Supervised learning: discover patterns in the data that relate data attributes with a target (class) attribute These
More informationClassification of High Dimensional Data By Two-way Mixture Models
Classification of High Dimensional Data By Two-way Mixture Models Jia Li Statistics Department The Pennsylvania State University 1 Outline Goals Two-way mixture model approach Background: mixture discriminant
More informationSemi-supervised learning
Semi-supervised Learning COMP 790-90 Seminar Spring 2011 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Overview 2 Semi-supervised learning Semi-supervised classification Semi-supervised clustering Semi-supervised
More informationSEEK User Manual. Introduction
SEEK User Manual Introduction SEEK is a computational gene co-expression search engine. It utilizes a vast human gene expression compendium to deliver fast, integrative, cross-platform co-expression analyses.
More informationIBL and clustering. Relationship of IBL with CBR
IBL and clustering Distance based methods IBL and knn Clustering Distance based and hierarchical Probability-based Expectation Maximization (EM) Relationship of IBL with CBR + uses previously processed
More informationData Clustering Hierarchical Clustering, Density based clustering Grid based clustering
Data Clustering Hierarchical Clustering, Density based clustering Grid based clustering Team 2 Prof. Anita Wasilewska CSE 634 Data Mining All Sources Used for the Presentation Olson CF. Parallel algorithms
More information