Micro-array Image Analysis using Clustering Methods
|
|
- Maryann Snow
- 6 years ago
- Views:
Transcription
1 Micro-array Image Analysis using Clustering Methods Mrs Rekha A Kulkarni PICT PUNE kulkarni_rekha@hotmail.com Abstract Micro-array imaging is an emerging technology and several experimental procedures have been developed producing different image characteristics. Micro-array images are processed using a multi step procedure involving image segmentation, information extraction, normalization. Images are structured with high intensity spots located on a grid, Each spot corresponds to a gene. Spots have roughly circular shape though some show significant deviation from this shape due to the experimental variations. The clustering methods group a set of genes or arrays representing the similarity of genes to each other. Clustering techniques such as K-means, Partitioning around medoids have been recently used for micro-array image segmentation or classification. K-means is a partitioning algorithm with a prefixed number k of clusters. It tries to minimize the sum of within-cluster-variances. Partitioning Around Medoids is a partitioning algorithm a generalization of K-means Key Words- Micro-Array, Gridding, Clustering, K- Means, PAM 1. Introduction Micro-array technologies, able to measure the expression of thousands of genes in a single experiment, have developed over the past decade and now produce huge amounts of data. New techniques for looking at genetic variations in large human populations, and for identifying interactions between sets of proteins in cells, are pouring data onto file servers around the world. Bioinformatics is charged with managing and making sense of all of the data, keeping pace with both data production and technology development. There's plenty of work to go around. Micro-array is a glass microscope slide with a large number of ordered target sequences on it. These target sequences normally consists of cdna or RNA sequences. These target sequences are single stranded as opposed to DNA that is double stranded. There are thousands of target sequences on the micro-array. Micro-array technology, as a high throughput approach of differential gene expression studies, efficiently generates massive amount of gene regulation data, facilitating scientists in quickly identifying what gene candidates to follow up with functional characterization. Traditional techniques for the study of gene expression allow investigators to study only one or few genes at a time. Genomic projects aimed at cloning, mapping and sequencing genomes of various organisms, generated large amount of sequence data. However, the function, expression and regulation of more than 80% of them were unknown. The next phase of the human genome project will place strong emphasis on assigning function to these genes. There are methods by which one can assign function to genes, out of which DNA micro-array analysis is widely used to extract patterns of gene expression. Although both cdna micro-arrays and oligonucleotide arrays are capable of analyzing patterns of gene expression, fundamental differences exist between the methods. 2. Existing segementation techniques Fixed circle segmentation Fits a circle with a constant diameter to all spots in the image Easy to implement The spots need to be of the same shape and size Adaptive circle segmentation The circle diameter is estimated separately for each spot segmentation Adaptive shape segmentation Specification of starting points or seeds Bonus: already know geometry of array! Regions grow outwards from the seed points preferentially according to the difference between a pixel s value and the running mean of values in an adjoining regions. 145
2 Histogram segmentation: Uses a target mask chosen to be larger than any other spot Foreground and background intensity are determined from the histogram of pixel values for pixels within the masked area 3. Intensity based segmentation The images are structured with high intensity spots which correspond to the probes located on a grid. The spots have roughly circular shape though some show significant deviation from this shape due to the experimental variation of the spotting procedure. The underlying principle in micro-array image analysis is that the spot intensity is a measure of gene expression. This implicitly assumes the gene expression of a spot to be governed entirely by the distribution of pixel intensities. Clustering based segmentation is used to extract the intensity of the spots. the approximate boundaries of spots in the micro-array are determined by adjustment of rectilinear grids The K means and Partitioning around medoids are used to generate a binary partition of pixel intensities. Images will have four colored spots. The red, green,yellow and black. A red colored spot implies that a particular gene is being expressed in the experimental channel. Green colored spot indicates high expression in the control channel. A yellow colored spot indicates that the gene is expressed in both channels. To estimate the differential gene expression the high intensity regions in the image corresponding to each probe have to be identified. This is done as part of image segmentation. Then the local background noise has to be estimated and removed which corresponds to background correction. An example of spot or gene summary statistics for cdna is the ratio of background corrected mean intensities. We denote R i the pixels from the red fluorescent scan and by G i the pixels from the green fluorescent scan. The differential expression level R / G is then calculated as the ratio of the mean ROI intensities : R / G = 1/S σ Ri µr 1/S σ Gi µ G where µ refers to the estimate of the local background and S is the number of ROI pixels. 4. Micro-array image analysis Image analysis involves three stages. First, the arrayed genes must be identified from spurious signals that can arise due to precipitated probe or other hybridization artifacts or dust on the surface of the slide. After gridding the spot intensities (real signal) and background (noise) has to be calculated for spots. It is always better to calculate background locally for each spot, rather than globally for the entire image. next step in image processing is the extraction of signal, noise and quality control measures for the spot. Image processing steps: 1. Addressing/ Gridding : Find the areas in an image that belong to spots. The combined areas of spot and its background is called target area. 2. Segmentation : Partition the target area into foreground and background 3. Reduction : Extract two scalar values R and G for red and green intensity and assign one value for relative abundance Gridding This is the process of assigning coordinates to each of the spots. Automating this part of the procedure permits high throughput analysis. Ideally spots should be equally spaces across the array. However the robot arm that prints the spots often introduces deviation from these pixel positions. The approximate boundary of the spots are determined by drawing rectilinear grid. As a result each spot is enclosed in a rectangular box.this is accomplished by adjusting the grid 4.2. Segmentation Segmentation allows the classification of pixels as corresponding to a spot of interest or as background. It involves partitioning the image into disjoint sets. Consider an image I Partitioned into L regions. Represented by ri I= 1..L then I= Union of ri where I =1..L And each ri satisfies a predicate. Image segmentation can be either texture based or intensity based. While the former is governed by spatial characteristic latter is entirely governed by the distribution of pixel intensities. The choice of segmentation technique is based on problem at hand. Since a gene expression value is proportional to intensity of the spot the segmentation based on intensity is appropriate. The objective is to separate the foreground and background pixel intensities inside each grid. The one dimensional distribution of the foreground and back ground pixel intensities can be represented by f(f) and f(b) respectively. The distribution of pixel intensities in a grid can be assumed to be the 146
3 superposition of f(f) and f(b). The following case can arise 1) f(f) and f(b) are narrowly distributed with no overlap. 2) F(F) is spread whereas f(b) is narrowly distributed. 3) F(F) and f(b) exhibit significant overlap. Start Microarry image file in TIFF format. Number of rows and numbaer of coloums i.e total number of spots Manuaaly align the row and column grids to determine the approximate boundary of each spot in the control channel (CY3) Retain the coordinate for the experimental channel (CY5) 5. Clustering methods Clustering techniques such as K-means and PAM is useful in detecting patterns in the data generated by unknown processes and have been recently used for micro-array segmentation. The input to k means and Pam clustering algorithms was the two dimensional values (Ri,Gi) where Ri and Gi represent the ith pixel intensity of a given spot in the cy3 and cy5 channels. Cluster algorithms k-means K-means is a partitioning algorithm with a prefixed number k of clusters. It tries to minimize the sum of within-cluster-variances. The algorithm chooses a random sample of k different objects as initial cluster midpoints. Then it alternates between two steps until convergence: 1. Assign each object to its closest of the k midpoints with respect to Euclidean distance. 2. Calculate k new midpoints as the averages of all points assigned to the old midpoints, respectively. Fig.1: Flow chart of Microarray Image Segmentation 4.3. Reduction : Rfg, Gfg and Rbg, Gbg cluster means with high resp. low intensities, (Rfg-Rbg)/(Gfg-Gbg) final rel ative abundance estimate. I=1 Map the image matrix I of the ith grid into a one dimentional vector v. Apply K-means clustering technique to obtain a binary partition of the vetor v f(i) = median of foreground pixels g(i) = median of background pixels t(i) = f(i) - b(i) Is Stop K-means is a randomized algorithm, two runs usually produce different results. Thus it has to be applied a few times to the same data set and the result with minimal sum of within-cluster variances should be chosen. PxKmeans: Pixel clustering with k-means 1. Construct initial representatives: Starting midpoints m1=(rfg,gfg) and m2=(rbg,gbg), where Rfg, Gfg are highest intensity values and Rbg, Gbg the lowest. 2. Find local optimum of cluster problem (k-means): Repeat alternating until convergence: - Assign each data point to its closest of the two midpoints. - Calculate two new midpoints as the means of all points assigned to the old midpoints, respectively. 3. Reduction: Rfg, Gfg and Rbg, Gbg cluster means with high resp. low intensities, (Rfg-Rbg)/(Gfg-Gbg) final rel ative abundance estimate. Cluster algorithms: PAM PAM (Partitioning around medoids) Kaufman and Rousseeuw is a partitioning algorithm, a generalization of k- means. 147
4 For an arbitrary dissimilarity matrix d it tries to minimize the sum (overall objects) of distances to the closest of k prototypes. Objective function: (d: Manhattan, Correlation, etc.) BUILD phase: Initial 'medoids. SWAP phase: Repeat until convergence: Consider all pairs of objects (i,j), where i is a medoid and j not, and make the i j swap (if any) which decreases the objective function most. PxPAM: pixel clustering with PAM 0. Calculation of dissimilarity matrix of spot pixels: Calculate the Manhattan distances between all pairs of pixels: dij = d(xi,xj) = Ri-Rj + Gi-Gj. 1. Construct two initial representatives (PAM Build phase): Define m1 as object with smallest Ói=1..n d(xi,m1) and m2 as object that decreases objective as much as possible. 2. Find local optimum of cluster problem (PAM Swap phase) 3. Reduction: Rfg and Gfg: values of medoid pixel with higher intensities; Rbg, Gbg of other one. (Rfg-Rbg)/(Gfg-Gbg) final relative abundance estimate. 6. Background correction and normalization The estimation of background intensity is generally considered necessary for the purpose of performing background correction. The motivation for background correction is that a spot s measured fluorescence intensity includes a contribution which is not specifically due to the hybridization of the mrna samples to the spotted DNA. Background correction of the spot intensities is usually performed by subtracting background estimation from the red and green foreground values with the aim of improving accuracy that is reducing the bias. Spot quality scores may include measures of spot size or shape or measures of background intensity to foreground intensity. In some cases background adjustment can substantially reduce the precision that is increases the variability of low spot values. The ratio of the fluorescence intensities for each spot is indicative of the relative abundance of the corresponding DNA sequence in the two nucleic acid sample. There are various types of noise that can affect the final signal produced by the scanner. These can be divided into two categories.source noise and detector noise. Examples of source noise are photon noise dust on the slides and treatment of the glass slides. Detector noise includes features of the amplification and digitization process. A perfect image should only reflect measures of the fluorescence intensities for the dye of interest. However in practice we have an input system and images are usually combination of undesired signals. 7. Gene clustering: The need of Gene clustering: Genes clustered together have the same pattern of expression indicates co regulated genes Co-regulated genes have function correlation, and might involve in same metabolic pathways Steps in Gene clustering For each gene, look at xi. (expression of ith gene through m experiments) Measure distances between xi. for i from 1 to m. Clustering is based on the similarity or distance metrics:euclidean distance, vector angle, and correlation coefficient Clustering is to group a set of genes or arrays into a tree. The branch of tree represents the similarity of genes to each other. Methods/Algorithms Hierarchical clustering Single linkage-nearest distance Complete linkage-max distance Average linkage-avg between all points in clusters K-mean clustering Hierarchical clustering The clustering solution is represented by a dendrogram, which is a rooted weighted tree, with leaves corresponding to the objects The edge s length reflects the dissimilarity between that cluster and remaining clusters Hierarchical clustering Steps 1. Filter data: remove genes having a lot of missing values or of low quality spots. 2. Pre-processing data: 1) Log transformation 2) Mean/median centering 3) Normalization 3. Create similarity metrics based on distance measure 4. Create distance matrices 148
5 5. Scan the distance matrix and find the smallest distance (for single linkage.) 6. Create a node/branch of a tree linking two genes with the smallest distance. Set the length of the branch to the distance of two genes. 7. Average two values and replaces two genes with a new item 8. Calculate the distance of the new item to other genes 9. Repeat the process n-1 times (n is the number of genes or arrays) K-mean clustering Have prior knowledge of k (# of clusters) Initialize cluster centroids: Data centroid based search Evenly spaced profiles Randomly generated profiles Calculate cluster centroids Euclidean distance: distance between 2 data points in a N dimension. Correlation K-mean clustering Steps: 1. Random assign the genes into K clusters 2. Measure the mean vector of all genes in each cluster 3. The mean into the cluster whose center is closest to the gene vector is used as center of the cluster and assign the gene. 4. Repeat steps 2 and 3 until reaching the maximum number of cycles number or reach steady state. 8. Conclusion DNA Micro-array analysis allows comparisons to be made between the expression levels of certain genes across different tissues and pathological conditions. Micro-array technology can give us an understanding of the temporal and spatial patterns of expression of all the genes involved in the developmental processes of an organism. Micro-array analysis will improve our understanding of diagnosis and prognosis. Transcription profiling using DNA micro-arrays has great potential as a systematic approach for discovering new classes of tumors for assigning known tumors to classes to predict response to therapy. Thus, it is perceived that gene expression monitoring could provide new insights into many aspects of tumor pathology, including cell of origin, stage, grade, clinical course and response to treatment. Other applications include identification of targets for drug development, diagnosis and prognosis, number detection, risk assessment and the study of mutation. Clustering large and high dimensional data collection is a challenging task. The problem is more complex if, in addition to clusterering, one is also interested in learning cluster dependent feature relevance weights. One possible solution to alleviate this problem is to use partial supervision to guide the search process and narrow down the space of possible solutions. Recently, semisupervised learning has emerged as a new research directive in machine learning to improve the performance of unsupervised learning using some supervised information. 9. References [1] G.A Baxes, Digital image processing, Principles and applications [2]E Gose,R,Johnsonbaugh,Steve jost, Pattern recognition and image analysis. [3] Radhakrishaan Nagarajan,Charlotte A Peterson, Identifying spots in micro-array images IEEE transaction on NanoBio, vol1 No 2 june 2002 [4]Radhakrishaan Nagarajan, intensity based segmentation of microarray images,ieee transaction on Med Imaging, vol 22 No 2 July 2003 [5] Mathias katzer,franz Kummert, Methods for automatic Micro array image segmentation, IEEE transaction on NanoBio, vol 2 No 4 Dec [6] Krishnapuram R. and Keller J.M A Possiblistic approach to clustering IEEE Trans on Fuzzy Systems, Vol 1 No.2 May 1993 pp [7] Hichem Frigui, Fuzzy clustering and Aggregation of Realational Data With Instance Level Constraits, IEEE Trans on Fuzzy Systems, Vol 16 No.6 Dec 2008 pp
MICROARRAY IMAGE SEGMENTATION USING CLUSTERING METHODS
Mathematical and Computational Applications, Vol. 5, No. 2, pp. 240-247, 200. Association for Scientific Research MICROARRAY IMAGE SEGMENTATION USING CLUSTERING METHODS Volkan Uslan and Đhsan Ömür Bucak
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationHow do microarrays work
Lecture 3 (continued) Alvis Brazma European Bioinformatics Institute How do microarrays work condition mrna cdna hybridise to microarray condition Sample RNA extract labelled acid acid acid nucleic acid
More informationCluster Analysis for Microarray Data
Cluster Analysis for Microarray Data Seventh International Long Oligonucleotide Microarray Workshop Tucson, Arizona January 7-12, 2007 Dan Nettleton IOWA STATE UNIVERSITY 1 Clustering Group objects that
More informationIntroduction to GE Microarray data analysis Practical Course MolBio 2012
Introduction to GE Microarray data analysis Practical Course MolBio 2012 Claudia Pommerenke Nov-2012 Transkriptomanalyselabor TAL Microarray and Deep Sequencing Core Facility Göttingen University Medical
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 3/3/08 CAP5510 1 Gene g Probe 1 Probe 2 Probe N 3/3/08 CAP5510
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationCLUSTERING IN BIOINFORMATICS
CLUSTERING IN BIOINFORMATICS CSE/BIMM/BENG 8 MAY 4, 0 OVERVIEW Define the clustering problem Motivation: gene expression and microarrays Types of clustering Clustering algorithms Other applications of
More informationData Mining: Concepts and Techniques. Chapter March 8, 2007 Data Mining: Concepts and Techniques 1
Data Mining: Concepts and Techniques Chapter 7.1-4 March 8, 2007 Data Mining: Concepts and Techniques 1 1. What is Cluster Analysis? 2. Types of Data in Cluster Analysis Chapter 7 Cluster Analysis 3. A
More informationIncorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data
Incorporating Known Pathways into Gene Clustering Algorithms for Genetic Expression Data Ryan Atallah, John Ryan, David Aeschlimann December 14, 2013 Abstract In this project, we study the problem of classifying
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationEECS 730 Introduction to Bioinformatics Microarray. Luke Huan Electrical Engineering and Computer Science
EECS 730 Introduction to Bioinformatics Microarray Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/ GeneChip 2011/11/29 EECS 730 2 Hybridization to the Chip 2011/11/29
More information/ Computational Genomics. Normalization
10-810 /02-710 Computational Genomics Normalization Genes and Gene Expression Technology Display of Expression Information Yeast cell cycle expression Experiments (over time) baseline expression program
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10. Cluster
More informationRegion-based Segmentation
Region-based Segmentation Image Segmentation Group similar components (such as, pixels in an image, image frames in a video) to obtain a compact representation. Applications: Finding tumors, veins, etc.
More informationGene expression & Clustering (Chapter 10)
Gene expression & Clustering (Chapter 10) Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species Dynamic programming Approximate pattern matching
More informationUnsupervised Learning. Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team
Unsupervised Learning Andrea G. B. Tettamanzi I3S Laboratory SPARKS Team Table of Contents 1)Clustering: Introduction and Basic Concepts 2)An Overview of Popular Clustering Methods 3)Other Unsupervised
More informationHigh throughput Data Analysis 2. Cluster Analysis
High throughput Data Analysis 2 Cluster Analysis Overview Why clustering? Hierarchical clustering K means clustering Issues with above two Other methods Quality of clustering results Introduction WHY DO
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationCLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16
CLUSTERING CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16 1. K-medoids: REFERENCES https://www.coursera.org/learn/cluster-analysis/lecture/nj0sb/3-4-the-k-medoids-clustering-method https://anuradhasrinivas.files.wordpress.com/2013/04/lesson8-clustering.pdf
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.
More informationECLT 5810 Clustering
ECLT 5810 Clustering What is Cluster Analysis? Cluster: a collection of data objects Similar to one another within the same cluster Dissimilar to the objects in other clusters Cluster analysis Grouping
More informationClustering. Chapter 10 in Introduction to statistical learning
Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationFuzzy C-means with Bi-dimensional Empirical Mode Decomposition for Segmentation of Microarray Image
www.ijcsi.org 316 Fuzzy C-means with Bi-dimensional Empirical Mode Decomposition for Segmentation of Microarray Image J.Harikiran 1, D.RamaKrishna 2, M.L.Phanendra 3, Dr.P.V.Lakshmi 4, Dr.R.Kiran Kumar
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationClass Discovery and Prediction of Tumor with Microarray Data
Minnesota State University, Mankato Cornerstone: A Collection of Scholarly and Creative Works for Minnesota State University, Mankato Theses, Dissertations, and Other Capstone Projects 2011 Class Discovery
More informationComparisons and validation of statistical clustering techniques for microarray gene expression data. Outline. Microarrays.
Comparisons and validation of statistical clustering techniques for microarray gene expression data Susmita Datta and Somnath Datta Presented by: Jenni Dietrich Assisted by: Jeffrey Kidd and Kristin Wheeler
More informationContents. ! Data sets. ! Distance and similarity metrics. ! K-means clustering. ! Hierarchical clustering. ! Evaluation of clustering results
Statistical Analysis of Microarray Data Contents Data sets Distance and similarity metrics K-means clustering Hierarchical clustering Evaluation of clustering results Clustering Jacques van Helden Jacques.van.Helden@ulb.ac.be
More informationUnsupervised Learning
Outline Unsupervised Learning Basic concepts K-means algorithm Representation of clusters Hierarchical clustering Distance functions Which clustering algorithm to use? NN Supervised learning vs. unsupervised
More informationClustering Techniques
Clustering Techniques Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 16 Lopresti Fall 2007 Lecture 16-1 - Administrative notes Your final project / paper proposal is due on Friday,
More informationClustering Jacques van Helden
Statistical Analysis of Microarray Data Clustering Jacques van Helden Jacques.van.Helden@ulb.ac.be Contents Data sets Distance and similarity metrics K-means clustering Hierarchical clustering Evaluation
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationUnsupervised Learning Partitioning Methods
Unsupervised Learning Partitioning Methods Road Map 1. Basic Concepts 2. K-Means 3. K-Medoids 4. CLARA & CLARANS Cluster Analysis Unsupervised learning (i.e., Class label is unknown) Group data to form
More informationUnsupervised Learning
Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor Lucila Ohno-Machado and Professor Staal Vinterbo 6.873/HST.951 Medical Decision
More informationClustering. RNA-seq: What is it good for? Finding Similarly Expressed Genes. Data... And Lots of It!
RNA-seq: What is it good for? Clustering High-throughput RNA sequencing experiments (RNA-seq) offer the ability to measure simultaneously the expression level of thousands of genes in a single experiment!
More informationCHAPTER 4: CLUSTER ANALYSIS
CHAPTER 4: CLUSTER ANALYSIS WHAT IS CLUSTER ANALYSIS? A cluster is a collection of data-objects similar to one another within the same group & dissimilar to the objects in other groups. Cluster analysis
More informationUnsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi
Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which
More informationClustering. Lecture 6, 1/24/03 ECS289A
Clustering Lecture 6, 1/24/03 What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed
More informationWhat to come. There will be a few more topics we will cover on supervised learning
Summary so far Supervised learning learn to predict Continuous target regression; Categorical target classification Linear Regression Classification Discriminative models Perceptron (linear) Logistic regression
More informationCluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationCHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION
CHAPTER 6 MODIFIED FUZZY TECHNIQUES BASED IMAGE SEGMENTATION 6.1 INTRODUCTION Fuzzy logic based computational techniques are becoming increasingly important in the medical image analysis arena. The significant
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1
Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationMR IMAGE SEGMENTATION
MR IMAGE SEGMENTATION Prepared by : Monil Shah What is Segmentation? Partitioning a region or regions of interest in images such that each region corresponds to one or more anatomic structures Classification
More informationClustering & Classification (chapter 15)
Clustering & Classification (chapter 5) Kai Goebel Bill Cheetham RPI/GE Global Research goebel@cs.rpi.edu cheetham@cs.rpi.edu Outline k-means Fuzzy c-means Mountain Clustering knn Fuzzy knn Hierarchical
More informationIntroduction to digital image classification
Introduction to digital image classification Dr. Norman Kerle, Wan Bakx MSc a.o. INTERNATIONAL INSTITUTE FOR GEO-INFORMATION SCIENCE AND EARTH OBSERVATION Purpose of lecture Main lecture topics Review
More informationClustering: - (a) k-means (b)kmedoids(c). DBSCAN
COMPARISON OF K MEANS, K MEDOIDS, DBSCAN ALGORITHMS USING DNA MICROARRAY DATASET C.Kondal raj CPA college of Arts and science, Theni(Dt), Tamilnadu, India E-mail : kondalrajc@gmail.com Abstract Data mining
More information9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology
9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example
More informationcse 252c Fall 2004 Project Report: A Model of Perpendicular Texture for Determining Surface Geometry
cse 252c Fall 2004 Project Report: A Model of Perpendicular Texture for Determining Surface Geometry Steven Scher December 2, 2004 Steven Scher SteveScher@alumni.princeton.edu Abstract Three-dimensional
More informationECS 234: Data Analysis: Clustering ECS 234
: Data Analysis: Clustering What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed
More informationChapter 6: Cluster Analysis
Chapter 6: Cluster Analysis The major goal of cluster analysis is to separate individual observations, or items, into groups, or clusters, on the basis of the values for the q variables measured on each
More informationA Dendrogram. Bioinformatics (Lec 17)
A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and
More informationClustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationDimension reduction : PCA and Clustering
Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental
More informationBiclustering Bioinformatics Data Sets. A Possibilistic Approach
Possibilistic algorithm Bioinformatics Data Sets: A Possibilistic Approach Dept Computer and Information Sciences, University of Genova ITALY EMFCSC Erice 20/4/2007 Bioinformatics Data Sets Outline Introduction
More informationNature Publishing Group
Figure S I II III 6 7 8 IV ratio ssdna (S/G) WT hr hr hr 6 7 8 9 V 6 6 7 7 8 8 9 9 VII 6 7 8 9 X VI XI VIII IX ratio ssdna (S/G) rad hr hr hr 6 7 Chromosome Coordinate (kb) 6 6 Nature Publishing Group
More informationMicroarray data analysis
Microarray data analysis Computational Biology IST Technical University of Lisbon Ana Teresa Freitas 016/017 Microarrays Rows represent genes Columns represent samples Many problems may be solved using
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationDNA microarrays [1] are used to measure the expression
IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 24, NO. 7, JULY 2005 901 Mixture Model Analysis of DNA Microarray Images K. Blekas*, Member, IEEE, N. P. Galatsanos, Senior Member, IEEE, A. Likas, Senior Member,
More informationClustering in Data Mining
Clustering in Data Mining Classification Vs Clustering When the distribution is based on a single parameter and that parameter is known for each object, it is called classification. E.g. Children, young,
More informationDigital Image Processing
Digital Image Processing Part 9: Representation and Description AASS Learning Systems Lab, Dep. Teknik Room T1209 (Fr, 11-12 o'clock) achim.lilienthal@oru.se Course Book Chapter 11 2011-05-17 Contents
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 15: Microarray clustering http://compbio.pbworks.com/f/wood2.gif Some slides were adapted from Dr. Shaojie Zhang (University of Central Florida) Microarray
More informationOlmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.
Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More informationCluster Analysis. CSE634 Data Mining
Cluster Analysis CSE634 Data Mining Agenda Introduction Clustering Requirements Data Representation Partitioning Methods K-Means Clustering K-Medoids Clustering Constrained K-Means clustering Introduction
More informationCluster Analysis. Angela Montanari and Laura Anderlucci
Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a
More informationCHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES
CHAPTER 6 HYBRID AI BASED IMAGE CLASSIFICATION TECHNIQUES 6.1 INTRODUCTION The exploration of applications of ANN for image classification has yielded satisfactory results. But, the scope for improving
More informationMultivariate analyses in ecology. Cluster (part 2) Ordination (part 1 & 2)
Multivariate analyses in ecology Cluster (part 2) Ordination (part 1 & 2) 1 Exercise 9B - solut 2 Exercise 9B - solut 3 Exercise 9B - solut 4 Exercise 9B - solut 5 Multivariate analyses in ecology Cluster
More informationData Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science
Data Mining Dr. Raed Ibraheem Hamed University of Human Development, College of Science and Technology Department of Computer Science 2016 201 Road map What is Cluster Analysis? Characteristics of Clustering
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More informationGoal-oriented Schema in Biological Database Design
Goal-oriented Schema in Biological Database Design Ping Chen Department of Computer Science University of Helsinki Helsinki, Finland 00014 EMAIL: pchen@cs.helsinki.fi Abstract In this paper, I reviewed
More informationAn Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data
An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University
More informationAn Unsupervised Approach for Combining Scores of Outlier Detection Techniques, Based on Similarity Measures
An Unsupervised Approach for Combining Scores of Outlier Detection Techniques, Based on Similarity Measures José Ramón Pasillas-Díaz, Sylvie Ratté Presenter: Christoforos Leventis 1 Basic concepts Outlier
More informationCommunity Detection. Jian Pei: CMPT 741/459 Clustering (1) 2
Clustering Community Detection http://image.slidesharecdn.com/communitydetectionitilecturejune0-0609559-phpapp0/95/community-detection-in-social-media--78.jpg?cb=3087368 Jian Pei: CMPT 74/459 Clustering
More informationCourse on Microarray Gene Expression Analysis
Course on Microarray Gene Expression Analysis ::: Normalization methods and data preprocessing Madrid, April 27th, 2011. Gonzalo Gómez ggomez@cnio.es Bioinformatics Unit CNIO ::: Introduction. The probe-level
More informationBioconductor s stepnorm package
Bioconductor s stepnorm package Yuanyuan Xiao 1 and Yee Hwa Yang 2 October 18, 2004 Departments of 1 Biopharmaceutical Sciences and 2 edicine University of California, San Francisco yxiao@itsa.ucsf.edu
More informationEfficient Image Compression of Medical Images Using the Wavelet Transform and Fuzzy c-means Clustering on Regions of Interest.
Efficient Image Compression of Medical Images Using the Wavelet Transform and Fuzzy c-means Clustering on Regions of Interest. D.A. Karras, S.A. Karkanis and D. E. Maroulis University of Piraeus, Dept.
More informationPattern recognition. Classification/Clustering GW Chapter 12 (some concepts) Textures
Pattern recognition Classification/Clustering GW Chapter 12 (some concepts) Textures Patterns and pattern classes Pattern: arrangement of descriptors Descriptors: features Patten class: family of patterns
More informationMethodology for spot quality evaluation
Methodology for spot quality evaluation Semi-automatic pipeline in MAIA The general workflow of the semi-automatic pipeline analysis in MAIA is shown in Figure 1A, Manuscript. In Block 1 raw data, i.e..tif
More informationMissing Data Estimation in Microarrays Using Multi-Organism Approach
Missing Data Estimation in Microarrays Using Multi-Organism Approach Marcel Nassar and Hady Zeineddine Progress Report: Data Mining Course Project, Spring 2008 Prof. Inderjit S. Dhillon April 02, 2008
More information3. Cluster analysis Overview
Université Laval Multivariate analysis - February 2006 1 3.1. Overview 3. Cluster analysis Clustering requires the recognition of discontinuous subsets in an environment that is sometimes discrete (as
More informationCHAPTER-6 WEB USAGE MINING USING CLUSTERING
CHAPTER-6 WEB USAGE MINING USING CLUSTERING 6.1 Related work in Clustering Technique 6.2 Quantifiable Analysis of Distance Measurement Techniques 6.3 Approaches to Formation of Clusters 6.4 Conclusion
More informationCS4733 Class Notes, Computer Vision
CS4733 Class Notes, Computer Vision Sources for online computer vision tutorials and demos - http://www.dai.ed.ac.uk/hipr and Computer Vision resources online - http://www.dai.ed.ac.uk/cvonline Vision
More informationAnalyzing ICAT Data. Analyzing ICAT Data
Analyzing ICAT Data Gary Van Domselaar University of Alberta Analyzing ICAT Data ICAT: Isotope Coded Affinity Tag Introduced in 1999 by Ruedi Aebersold as a method for quantitative analysis of complex
More informationAutomatic Grayscale Classification using Histogram Clustering for Active Contour Models
Research Article International Journal of Current Engineering and Technology ISSN 2277-4106 2013 INPRESSCO. All Rights Reserved. Available at http://inpressco.com/category/ijcet Automatic Grayscale Classification
More informationDI TRANSFORM. The regressive analyses. identify relationships
July 2, 2015 DI TRANSFORM MVstats TM Algorithm Overview Summary The DI Transform Multivariate Statistics (MVstats TM ) package includes five algorithm options that operate on most types of geologic, geophysical,
More informationMicroarray Data Analysis (V) Preprocessing (i): two-color spotted arrays
Microarray Data Analysis (V) Preprocessing (i): two-color spotted arrays Preprocessing Probe-level data: the intensities read for each of the components. Genomic-level data: the measures being used in
More informationClustering and Dissimilarity Measures. Clustering. Dissimilarity Measures. Cluster Analysis. Perceptually-Inspired Measures
Clustering and Dissimilarity Measures Clustering APR Course, Delft, The Netherlands Marco Loog May 19, 2008 1 What salient structures exist in the data? How many clusters? May 19, 2008 2 Cluster Analysis
More informationCluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010
Cluster Analysis Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX 7575 April 008 April 010 Cluster Analysis, sometimes called data segmentation or customer segmentation,
More informationSVM Classification in -Arrays
SVM Classification in -Arrays SVM classification and validation of cancer tissue samples using microarray expression data Furey et al, 2000 Special Topics in Bioinformatics, SS10 A. Regl, 7055213 What
More informationSupplementary text S6 Comparison studies on simulated data
Supplementary text S Comparison studies on simulated data Peter Langfelder, Rui Luo, Michael C. Oldham, and Steve Horvath Corresponding author: shorvath@mednet.ucla.edu Overview In this document we illustrate
More information3. Cluster analysis Overview
Université Laval Analyse multivariable - mars-avril 2008 1 3.1. Overview 3. Cluster analysis Clustering requires the recognition of discontinuous subsets in an environment that is sometimes discrete (as
More informationUnsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Unsupervised Data Mining: Clustering Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 1. Supervised Data Mining Classification Regression Outlier detection
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More information