Topics. Clustering. Unsupervised vs. Supervised. Vehicle Example. Vehicle Clusters Advanced Algorithmics

Similar documents
Data Mining MTAT (4AP = 6EAP)

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Machine Learning: Algorithms and Applications

Unsupervised Learning

Hierarchical clustering for gene expression data analysis

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

K-means and Hierarchical Clustering

Clustering. A. Bellaachia Page: 1

CS 534: Computer Vision Model Fitting

Machine Learning 9. week

Machine Learning. Topic 6: Clustering

Support Vector Machines

Lecture 4: Principal components

Hierarchical agglomerative. Cluster Analysis. Christine Siedle Clustering 1

Smoothing Spline ANOVA for variable screening

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Graph-based Clustering

SIGGRAPH Interactive Image Cutout. Interactive Graph Cut. Interactive Graph Cut. Interactive Graph Cut. Hard Constraints. Lazy Snapping.

Classifier Selection Based on Data Complexity Measures *

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Multi-stable Perception. Necker Cube

Announcements. Supervised Learning

LECTURE : MANIFOLD LEARNING

Detection of an Object by using Principal Component Analysis

Lecture 5: Multilayer Perceptrons

KOHONEN'S SELF ORGANIZING NETWORKS WITH "CONSCIENCE"

Cluster Analysis of Electrical Behavior

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Feature Reduction and Selection

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

APPLIED MACHINE LEARNING

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Understanding K-Means Non-hierarchical Clustering

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Discriminative Dictionary Learning with Pairwise Constraints

Support Vector Machines

A PATTERN RECOGNITION APPROACH TO IMAGE SEGMENTATION

Recognizing Faces. Outline

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Analyzing Popular Clustering Algorithms from Different Viewpoints

Machine Learning. Support Vector Machines. (contains material adapted from talks by Constantin F. Aliferis & Ioannis Tsamardinos, and Martin Law)

Biostatistics 615/815

Classifying Acoustic Transient Signals Using Artificial Intelligence

Edge Detection in Noisy Images Using the Support Vector Machines

12/2/2009. Announcements. Parametric / Non-parametric. Case-Based Reasoning. Nearest-Neighbor on Images. Nearest-Neighbor Classification

CHAPTER 3 SEQUENTIAL MINIMAL OPTIMIZATION TRAINED SUPPORT VECTOR CLASSIFIER FOR CANCER PREDICTION

Detection of hand grasping an object from complex background based on machine learning co-occurrence of local image feature

Image Alignment CSC 767

Kent State University CS 4/ Design and Analysis of Algorithms. Dept. of Math & Computer Science LECT-16. Dynamic Programming

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Clustering algorithms and validity measures

Data Mining: Model Evaluation

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Three supervised learning methods on pen digits character recognition dataset

Ecient Computation of the Most Probable Motion from Fuzzy. Moshe Ben-Ezra Shmuel Peleg Michael Werman. The Hebrew University of Jerusalem

Web Mining: Clustering Web Documents A Preliminary Review

GSLM Operations Research II Fall 13/14

Survey of Cluster Analysis and its Various Aspects

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

Fuzzy C-Means Initialized by Fixed Threshold Clustering for Improving Image Retrieval

MULTI-VIEW ANCHOR GRAPH HASHING

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Lecture #15 Lecture Notes

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Lecture 36 of 42. Expectation Maximization (EM), Unsupervised Learning and Clustering

Video Content Representation using Optimal Extraction of Frames and Scenes

cos(a, b) = at b a b. To get a distance measure, subtract the cosine similarity from one. dist(a, b) =1 cos(a, b)

Unsupervised Neural Network Adaptive Resonance Theory 2 for Clustering

Classification / Regression Support Vector Machines

Cluster Ensemble and Its Applications in Gene Expression Analysis

Computer Animation and Visualisation. Lecture 4. Rigging / Skinning

Innovation Typology. Collaborative Authoritativeness. Focused Web Mining. Text and Data Mining In Innovation. Generational Models

Associative Based Classification Algorithm For Diabetes Disease Prediction

Visual Thesaurus for Color Image Retrieval using Self-Organizing Maps

SRBIR: Semantic Region Based Image Retrieval by Extracting the Dominant Region and Semantic Learning

1. Introduction. Abstract

An Evolvable Clustering Based Algorithm to Learn Distance Function for Supervised Environment

Module Management Tool in Software Development Organizations

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Lecture 9 Fitting and Matching

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Searching Large Image Databases using Color Information

Fitting: Deformable contours April 26 th, 2018

Object-Based Techniques for Image Retrieval

Simplification of 3D Meshes

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Transcription:

.0.009 Topcs Advanced Algorthmcs Clusterng Jaak Vlo 009 Sprng What s clusterng Herarchcal clusterng K means + K medods SOM Fuzzy EM Jaak Vlo MTAT.0.90 Text Algorthms Unsupervsed vs. Supervsed Clusterng Fnd groups nherent to data (clusterng) Fnd a classfer for known classes An old problem Many methods No sngle best sutes all needs method Vehcle Example Vehcle Clusters Vehcle Top speed km/h Colour Ar resstance Weght Kg V 0 red 0.0 00 V 0 black 0. 00 V 60 red 0.9 00 V 0 gray 0. 800 V blue 0. 90 V6 0 whte 0.0 600 V7 00 black 0.0 000 V8 0 red 0.60 00 V9 0 gray 0. 00 Weght [kg] 00 000 00 000 00 000 Lorres Medum market cars Sports cars 00 00 0 00 0 00 Top speed [km/h]

.0.009 Termnology Motvaton: Why Clusterng? feature Weght [kg] 00 000 00 000 00 000 Object or data pont Lorres cluster label Medum market cars Sports cars 00 00 0 00 0 00 Top speed [km/h] feature feature space Problem: Identfy (a small number of) groups of smlar objects n a gven (large) set of object. Goals: Fnd representatves for homogeneous groups Data Compresson Fnd natural clusters and descrbe ther propertes natural Data Types Fnd sutable and useful groupng useful Data Classes Fnd unusual data object Outler Detecton 8 Clusterng t s easy (for humans) Edge Detecton (advantage to smooth contours) Texture clusterng

.0.009 Clusterng cont Dstance measures: whch two profles are smlar to each other? Eucldean, Manhattan etc. Rank correlaton.. Correlaton, angle, etc.. Tme warpng Dstance measures Some standard dstance measures How to formally descrbe whch objects are close to each other, and whch are not More than one way to defne dstances. Dstance s a metrc, f d(x,x) = 0 d(x,y) = d(y,x) 0 d(a,b) d(a,c) + d(c, B) Eucldean dstance Eucldean squared Manhattan (cty-block) Average dstance d( f, g) ( f g).. c.. c d( f, g) ( f g) d( f, g) f g.. c d( f, g) c ( f g).. c Pearson correlaton Chord dstance d( f, g) c c ( ( f f )( g c f f ) g) ( g g) d( f, g) ( c f g c c f g ) If means of each column are 0, then t becomes: d( f, g) c f g c c f g cos d( f, g) ( cos ) Eucldean dstance between two vectors whose length has been normalzed to Legendre & Legendre: Numercal Ecologynd ed. y f g x

.0.009 Rank correlaton c 6 ( ) (, ) rank rank f g d f g c( c ) Rank - smallest has rank, next, etc. Equal values have rank that s average of the ranks f = 7 8 rank=.. Herarchcal clusterng 6,,... All aganst all dstance matrx. Lnkage strategy dentfy closest clusters and merge dstance( ::, : ) =. Performance: O(dn ) Herarchcal clusterng Cluster matrces: Herarchcal clusterng Calculate all parwse dstances and assgn each object nto a sngleton cluster Keep jonng together two closest clusters by usng the: Mnmum dstance => Sngle lnkage Maxmum dstance => Complete lnkage Average dstance => Average lnkage (UPGMA, WPGMA) Cluster sequences: Whle more than cluster select smallest dstance merge the two clusters update the changed dstances after merger Update dstances Merge Ca, Cb nto C Re calculate all dstances D(C, C) D(C, C) = mn{ D(C, Ca), D(C, Cb) } Merge Ca, Cb nto C D(C, C) = mn{ D(C, Ca), D(C, Cb) } Sngle lnk; Mnmal dstance D(C, C) = max{ D(C, Ca), D(C, Cb) } Complete lnk; Maxmum dstance D(C, C) = average{ D(C, Ca), D(C, Cb) } n a /( n a +n b ) * D(C, C a ) + n b /( n a +n b ) * D(C, C b ) UPGMA UnweghtedPar Group Method Average

.0.009 Persstent Systems Pvt. Ltd. http://www.persstent.co.n Runnng tme for herarchcal clusterng Dstances 00 attrb Tme mn mnute Clusterng 0,00, 000 dm Dstances 0 attrb. O( n ) dstances n tmes merge select smallest dstance update all dstances to new cluster Data sze 0K K 0K

.0.009 Herarchcal clusterng output Desgn any heatmap colorng scheme Cut GENOMES: Yeast Zoom Heat map color schema desgn Lmts of standard clusterng Herarchcal clusterng s (very) good for vsualzaton (frst mpresson) and browsng Speed for modern data sets remans relatvely slow (mnutes or even hours) ArrayExpress database needs some faster analytcal tools Hard to predct number of clusters (=>Unsupervsed) 600 genes, 80 exp. Montor sze 600x00 pxels Laptop: 800x600 600 genes, 80 exp. Montor sze 600x00 Laptop: 800x600 COLLAPSE 7 subtrees Developed and mplemented n Expresson Profler n October 000 by 6

.0.009 VsHC; 009 Fast Approxmate Herarchcal Clusterng usng Smlarty Heurstcs Herarchcal clusterng s appled n gene expresson data analyss, number of genes can be 0000+ Herarchcal clusterng: Each subtree s a cluster. Herarchy s bult by teratvely jonng two most smlar clusters nto a larger one. Fast Herarchcal Clusterng Avod calculatng all O(n ) dstances: Estmate dstances Input data Input data vsualzed Use pvots Fnd close objects Cluster wth partal nformaton Meels Kull, Jaak Vlo. Fast Approxmate Herarchcal Clusterng usng Smlarty Heurstcs. BoData Mnng, :9 008. [HappeClust webste] [URL, do:0.86/76-08--9][pubmed] Dstances from one pvot Dstances from two pvots Eucldean dstances Average lnkage herarchcal clusterng Dstances from two pvots 7

.0.009 Dstances from two pvots Epslon Grd Order (EGO)... -grd -grd Here we use Chebyshev dstance (maxmum of dfferences) By trangle nequalty we get: Eucldean dstance n orgnal plot cannot b smaller than Chebyshev dstance here ) Dataponts sorted accordng to EGO ord ) Each pont s compared wth the later ponts untl one hypercube away Major Clusterng Approaches Epslon Grd Order (EGO) -grd ) Dataponts sorted accordng to EGO ord ) Each pont s compared wth the later ponts untl one hypercube away e.g. Is compared wth the ponts n the marked hypercubes Parttonng algorthms/representatve based/prototype based Clusterng Algorthm: Construct varous parttons and then evaluate them by some crteron or ftness functon Kmeans Herarchcal algorthms: Create a herarchcal decomposton of the set of data (or objects) usng some crteron Densty based: based on connectvty and densty functons DBSCAN, DENCLUE, Grd based: based on a multple level granularty structure Model based: A model s hypotheszed for each of the clusters and the dea s to fnd the best ft of that model to each other EM 6 Representatve Based Clusterng Ams at fndng a set of objects among all objects (called representatves) n the data set that best represent the objects n the data set. Each representatve corresponds to a cluster. The remanng objects n the data set are then clustered around these representatves by assgnng objects to the cluster of the closest representatve. Remarks:. The popular k medod algorthm, also called PAM, s a representatvebased clusterng algorthm; K means also shares the characterstcs of representatve based clusterng, except that the representatves used by k means not necessarly have to belong to the data set.. If the representatve do not need to belong to the dataset we call the algorthms prototype based clusterng. K means s a prototype based clusterng algorthm K means, K medods, Partton the data ponts nto K groups Each group s centered around t s mean or medod Mean s an abstract pont Medod: most central object 7 8

.0.009 K means. Guess K centre K means. Assgn obj to clusters. Move C to gravty centres Representatve Based Clusterng (Contnued) Representatve Based Supervsed Clusterng (contnued) Attrbute Attrbute Attrbute Attrbute Objectve of RSC: Fnd a subset O R of O such that the clusterng X obtaned by usng the objects n O R as representatves mnmzes q(x); q s an objectve/ftness functon. The K Means Clusterng Method Gven k, the k means algorthm s mplemented n steps:. Partton objects nto k nonempty subsets. Compute seed ponts as the centrods of the clusters of the current partton. The centrod s the center (mean pont) of the cluster.. Assgn each object to the cluster wth the nearest seed pont.. Go back to Step, stop when no more new assgnment. The K Means Clusterng Method Example 0 0 9 9 8 8 7 7 6 6 0 0 0 6 7 8 9 0 0 6 7 8 9 0 0 0 9 9 8 8 7 7 6 6 0 0 6 7 8 9 0 0 0 6 7 8 9 0 9

.0.009 Comments on K Means Strength Relatvely effcent: O(t*k*n*d), where n s # objects, k s # clusters, and t s # teratons, d s the # dmensons. Usually, d, k, t << n; n ths case, K Mean s runtme s O(n). Storage only O(n) n contrast to other representatve based algorthms, only computes dstances between centrods and objects n the dataset, and not between objects n the dataset; therefore, the dstance matrx does not need to be stored. Easy to use; well studed; we know what to expect Fnds local optmum of the SSE ftness functon. The global optmum may be found usng technques such as: determnstc annealng and genetc algorthms Implctly uses a ftness functon (fnds a local mnmum for SSE see later) does not waste tme computng ftness values Weakness Applcable only when mean s defned what about categorcal data? Need to specfy k, the number of clusters, n advance Senstve to outlers Not sutable to dscover clusters wth non convex shapes Senstve to ntalzaton; bad ntalzaton mght lead to bad results. 6 Complcaton: Empty Clusters K= X X X XX X X X X XX X We assume that the k-means ntalzaton assgns the green, blue, and brown ponts to a sngle cluster; after centrods are computed and objects are reassgned, t can easly be seen that that the brown cluster becomes empty. Convex Shape Cluster Convex Shape: f we take two ponts belongng to a cluster then all the ponts on a drect lne connectng these two ponts must also n the cluster. Shape of K means/k medods clusters are convex polygons Convex Shape. Shapes of clusters of a representatve based clusterng algorthm can be computed as a Vorono dagram for the set of cluster representatves. Vorono cells are always convex, but there are convex shapes that a dfferent from those of Vorono cells. Vorono Dagram for a Representatve based Clusterng Each cell contans one representatves, and every locaton wthn the cell s closer to that sample than to any other sample. Cluster Representatve (e.g. medod/centrod) A Vorono dagram dvdes the space nto such cells. Vorono cells defne cluster boundary! 7 8 K-means clusterng New centers - center of gravty for a cluster K-means clusterng output URLMAP: Cluster - objects closest to a center * Start clusterng by choosng K centers randomly most dstant centers more... * Iterate clusterng step untl no cluster changes * Determnstc, mght get stuck n local mnmum 0

.0.009 K means Fnds local optmum vary many tmes wth random start make an educated guess to start wth eg e.g. sample the data, perform herarchcal clusterng, select K centers. K medods Choose the cluster center to be one of the exstng objects. Why? If more complex dt data or dstance measure the Real center could not be found easly What s the mean of categorcal data? yellow, red, pnk? Instead of tryng to nvent use one of the exstng objects, whatever the dstance measure Self Organsng Maps (SOM) MxN matrx of neurons, each representng a cluster Object X s put to W, to whch t s most smlar. W and ts near surroundng s changed to resemble X more Tran, tran, tran Problem - there s no clear objectve functon to map D-dmesnonal data to dme W Motvaton: The Problem Statement The problem s how to fnd out semantcs relatonshp among lots of nformaton wthout manual labor How do I know, where to put my new data n, f I know nothng about nformaton s topology? When I have a topc, how can I get all the nformaton about t, f I don t know the place to search them? JASS 0 Informaton Vsualzaton wth SOMs sebs 6 Motvaton: The Idea Motvaton: The Idea Computer know automatcally nformaton classfcaton and put them together Input Pattern Text objects must be automatcally produced wth semantcs relatonshps Semantcs Map Input Pattern Input Pattern Topc Topc Topc JASS 0 Informaton Vsualzaton wth SOMs sebs 6 JASS 0 Informaton Vsualzaton wth SOMs sebs 66

.0.009 Self-Organzng Maps : Orgns Self-Organzng Maps Self-Organzng Maps Ideas frst ntroduced by C. von der Malsburg (97), developed and refned by T. Kohonen (98) Neural network algorthm usng unsupervsed compettve learnng Prmarly l used for organzaton and vsualzaton of complex data Bologcal bass: bran maps Teuvo Kohonen SOM - Archtecture Lattce of neurons ( nodes ) accepts and responds to set of nput sgnals Responses compared; wnnng neuron selected from lattce Selected neuron actvated together wth neghbourhood neurons Adaptve process changes weghts to more closely l resemble nputs j w j w j w j w jn d array of neurons Weghted synapses x x x... x n Set of nput sgnals (connected to all neurons n lattce) JASS 0 Informaton Vsualzaton wth SOMs sebs 67 JASS 0 Informaton Vsualzaton wth SOMs sebs 68 Self-Organzng Maps Intalsaton SOM Result Example Classfyng World Poverty Helsnk Unversty of Technology ()Randomly ntalse the weght vectors wj for all nodes j Poverty map based on 9 ndcators from World Bank statstcs (99) JASS 0 Informaton Vsualzaton wth SOMs sebs 69 JASS 0 Informaton Vsualzaton wth SOMs sebs 70 Input vector Fndng a Wnner () Choose an nput vector x from the tranng set In computer texts are shown as a frequency dstrbuton of one word. A Text Example: Self-organzng maps (SOMs) are a data vsualzaton technque nvented by Professor Teuvo Kohonen whch reduce the dmensons of data through the use of self-organzng neural networks. The problem that data vsualzaton attempts to solve s that humans smply cannot vsualze hgh dmensonal data as s so technque are created to help us understand ths hgh dmensonal data. Self-organzng maps data vsualzaton technque Professor nvented Teuvo Kohonen dmensons... Zebra 0 Regon () Fnd the best-matchng neuron (x), usually the neuron whose weght vector has smallest Eucldean dstance from the nput vector x The wnnng node s that whch s n some sense closest to the nput vector Eucldean dstance s the straght lne dstance between the data ponts, f they were plotted on a (mult-dmensonal) graph Eucldean dstance between two vectors a and b, a = (a,a,,an), b = (b,b, bn), s calculated as: d a, b a b Eucldean dstance JASS 0 Informaton Vsualzaton wth SOMs sebs 7 JASS 0 Informaton Vsualzaton wth SOMs sebs 7

.0.009 Weght Update Example: Self-Organzng Maps SOM Weght Update Equaton wj(t +) = wj(t) + (t) (x)(j,t) [x - wj(t)] L. rate The weghts of every node are updated at each cycle by addng Current learnng rate Degree of neghbourhood wth respect to wnner Dfference between current weghts and nput vector to the current weghts Example of (t) Example of (x)(j,t) Anmal names and ther attrbutes Dove Hen Duck Goose Owl Hawk Eagle Fox Dog Wolf Cat Tger Lon Horse Zebra Cow 0 0 0 0 0 0 0 0 0 Small s Medum 0 0 0 0 0 0 0 0 0 0 0 0 Bg 0 0 0 0 0 0 0 0 0 0 0 legs 0 0 0 0 0 0 0 0 0 legs 0 0 0 0 0 0 0 has Har 0 0 0 0 0 0 0 Hooves 0 0 0 0 0 0 0 0 0 0 0 0 0 Mane 0 0 0 0 0 0 0 0 0 0 0 0 Feathers 0 0 0 0 0 0 0 0 0 Hunt 0 0 0 0 0 0 0 0 lkes Run 0 0 0 0 0 0 0 0 0 0 to Fly 0 0 0 0 0 0 0 0 0 0 0 Swm 0 0 0 0 0 0 0 0 0 0 0 0 0 0 brds A groupng accordng to smlarty has emerged peaceful No. of cycles x-axs shows dstance from wnnng node y-axs shows degree of neghbourhood (max. ) hunters [Teuvo Kohonen 00] Self-Organzng Maps; Sprnger; JASS 0 Informaton Vsualzaton wth SOMs sebs 7 JASS 0 Informaton Vsualzaton wth SOMs sebs 7 Clusterng etc. algorthms Herarchcal clusterng methods + vsualsaton K means, Self Organsng Maps (SOM) SOTA trees (Self Organsng Maps + Tree) Fuzzy, EM (object can belong to several clusters) Graph theory (clques, strongly connected components) Smlarty search: X > Y s.t. d(x,y)< 0. Model based (redscover dstrbutons) Planar embeddngs, Multdmensonal scalng Prncpal Component Analyss Correspondence analyss Independent Component Analyss Smlarty searches r r Smlarty searches Query: cyc (cyc, actvator for cyc, repressor for cyc) => genes + 0 most smlar ones for each = clusters Smlarty searches Expand a tght cluster by other most smlar genes:

.0.009 EM Expectaton Maxmzaton EM A popular teratve refnement algorthm An extenson to k means Assgn each object to a cluster accordng to a weght (prob. dstrbuton) New means/covarances are computed based on weghted measures General ldea Starts wth an ntal estmate of the parameter vector Iteratvely rescores the patterns aganst the mxture densty produced by the parameter vector The rescored patterns are used to update the parameter updates Patterns belongng to the same cluster, f they are placed by ther scores n a partcular component Algorthm converges fast but may not be n global optma The EM (Expectaton Maxmzaton) Algorthm Intally, randomly assgn k cluster centers Iteratvely refne the clusters based on two steps Expectaton step: assgn each data pont X to cluster C wth the followng probablty Maxmzaton step: Estmaton of model parameters Aprl, 009 Other Clusterng Methods PCA (Prncpal Component Analyss) Also called SVD (Sngular Value Decomposton) Reduces dmensonalty of gene expresson space Fnds best vew that helps separate data nto groups Supervsed Methods SVM (Support Vector Machne) Prevous knowledge of whch h genes expected to cluster s used for tranng Bnary classfer uses feature space and kernel functon to defne a optmal hyperplane Also used for classfcaton of samples- expresson fngerprntng for dsease classfcaton Persstent Systems Pvt. Ltd. http://www.persstent.co.n Persstent Systems Pvt. Ltd. http://www.persstent.co.n