How do microarrays work
|
|
- Bruno Warren Chase
- 6 years ago
- Views:
Transcription
1 Lecture 3 (continued) Alvis Brazma European Bioinformatics Institute How do microarrays work condition mrna cdna hybridise to microarray condition
2 Sample RNA extract labelled acid acid acid nucleic acid acid A microarray experiment hybridisation genes Array design array array array Microarray Gene Protocol Protocol expression Protocol Protocol data matrix Protocol Protocol normalization integration Steps in microarray data processing Array scans Quantitations Spots AGenes B D C
3 Microarray expression measurements in cell cycle for over 400 periodic genes in yeast Rustici et al, Nature Genetics, 004 The goal of data normalisation - Gene Expression Data Matrix j Gen nes i X(i,j) amount of the RNA of the i-th gene in the j-th sample 3
4 What are we actually measuring? Fluorescence Intensity = RNA abundance probe efficiency hybridisation conditions error What are we measuring? Fluorescence Intensity = RNA abundance probe efficiency hybridisation conditions How do we know probe efficiency and hybridisation conditions? 4
5 Lecture 4 expression profiles and their analysis The goal of data normalisation - Gene Expression Data Matrix j Gen nes i X(i,j) amount of the RNA of the i-th gene in the j-th sample 5
6 Gene Expression Profile j i A gene expression profile: (x(i,), x(i,),, x(i,m)) a vector of real numbers Ge enes A sample expression profile probe effects are large 6
7 Gene expression profile Find genes with similar expression profiles 7
8 Gene Expression Profile j i A gene expression profile: (x(i,), x(i,),, x(i,m)) a vector of real numbers Ge enes A sample expression profile A B C Condition Condition Figure 4. 8
9 Gene Expression Profile i A gene expression profile: (x(i,), x(i,),, x(i,m)) a vector of real numbers Ge enes Log ratios 5 How to measure distance between two gene (or sample) expression profiles? 0 - A = (-, 0,-,,-4) B = (0,,-,4,-5) -5 9
10 Log ratios 5 A = (-, 0,-,,-4) B = (0,,-,4,-5) 0 - Euclidean distance = (add up the squares of all arrows and take a square root) = (+4++9+) / = 4-5 Euclidean distance D Eucl ( A, B) = n i= ( a i b i ) 0
11 Log ratios 5 A = (-, 0,-,,-4) B = (0,,-,4,-5) 0 - The absolute values are not very meaningful (remember that sequence effects are large) - the Euclidean distance may not be the best How to measure similarities in trends? -5 Log ratios 5 A = (-, 0,-,,-4) B = (0,,-,4,-5). Center all vectors around
12 Log ratios Log ratios 5 A = (0,,-,,-3) B = (0,,-,4,-5) Chord distance = 0 - Make the length of both equal to A = ( ) / 4 B = ( ) / 7-5
13 The length of a vector Given a vector A=(a,, ak), we define its length A as A = a + + A = ( ) / 4 B = ( ) / 7 A = (0, /4,-/4,/,-3/4) B = (0,/7,-/7,4/7,-5/7)... ak Log ratios Log ratios 5 / /4-5 A = (0,,-,,-3) B = (0,,-,4,-5) - A = (0, /4,-/4,/,-3/4) B = (0,/7,-/7,4/7,-5/7) 3
14 Log ratios A = (0,,-,,-3) B = (0,,-,4,-5) Chord distance = /4 0 -/4 -. Center all vectors around 0. Make the length of both equal to A = ( ) / 4 B = ( ) / 7 A = (0, /4,-/4,/,-3/4) B = (0,/7,-/7,4/7,-5/7) Log ratios A = ( ) / 4 B = ( ) / 7 A = (0, /4,-/4,/,-3/4) B = (0,/7,-/7,4/7,-5/7) /4 0 -/4 - Chord distance =. Center all vectors around 0. Make the length of both equal to 3. Calculate Euclidean distance between the centered and scaled vectors (see that the chord distance in this case is about 0.0) 4
15 x a A Euclidean distance b B a Angle distance 0.5 Chord distance b B A α γ β x a b a b Log ratios A = ( ) / 4 B = ( ) / 7 A = (0, /4,-/4,/,-3/4) B = (0,/7,-/7,4/7,-5/7) /4 0 -/4 - Correlation distance =. Very similar to Chord distance calculate the cos between the two vectors: cos(a,b )= =0*0+/4*/7+(-/4)*(-/7)+ /4)*( /7)+ +/*4/7+(-3/4)*(-5/7) = =/4. Cor_dist = -cos(a,b ) = =3/4 /5 5
16 Relationships between chord and correlation distances D chord D chord ( A, B) = ( a b + a b ' ' ) ( ' ' ) = a b a b + A B A B ( ( a' b' + a' ' )) = ( cos ) ( A, B) = b α D chord ( A, B) = sinα Cor(A,B) = cos(ab), if A and sin α = cosα B are means centered x normalised sed( (length ) vectors a b a A b Chord distance α B γ β 0.5 a b a b A Euclidean distance B x Log ratios Correlation and anticorrelation /4 0 -/4 cos x perfect correlation has distance 0, anticorrelation has max distance - cos x - both perfect correlation and perfect anticorrelation distances are 0-6
17 Log ratios Rank correlation A = (0, /4,-/4,/,-3/4) B = (0,/7,-/7,4/7,-5/7) /4 0 -/4 - Transform the values to ranks A = (0,,-,,-) B = (0,,-,,-) Compute the (correlation) distance between them (for that first normalise them to length ). Advantages and disadvantages of rank correlation based distances Advantages does not depend on the precise values Disadvantages ranks depend on the precise values in the large density arrears, e.g., when the expression values are very close to each other (closer than the error bars), their relative order (ranks) are very prone to error 7
18 Distance measures A distance measure D(A,B) is said to be metric, if it satisfies the following properties: if A=B, then D(A,B) = 0, i.e., the distance of an object to itself is 0; if A B, then D(A,B) 0, i.e., the distance is always nonnegative; D(A,B) = D(B,A), i.e., it does not matter in which order we measure the distance; D(A,B) + D(B,C) D(A,C), i.e., given three objects, the length of a direct path from the first to the third objects cannot be greater than the length of the path through the second object. Why they arise? Missing data points Bad quality spot e.g. flagged as bad by the image analysis software (e.g, so-called half moon spots, empty circles, ) Very low intensity signal in one or both channels (may be 0 or infinity ratio) Inconsistency between replicates (on the same or different arrays) 8
19 Missing data points Why they are a nuisance? How to compute distance between vectors with missing data points ignore the dimension If many comparisons have to be made, missing dimensions may start to accumulate How to deal with them? If replicates are available, they can be used Replace missing values by 0 Replace by the row average value K nearest neighbour imputation (KNN imputation) KNN imputation We are given a gene expression matrix M Let X=(X (X, X,, Xi,, Xn)beavectorinthe the matrix M with a missing value at Xi at the dimension i Find in the gene expression data matrix matrix vectors X, X,, X k, such that they are the k closest vectors to X in M (in the sense of a chosen distance measure) among the vectors that do not have a missing i value at dimension i i Replace the missing value Xi with the mean (or median) of X i, X i,, X k i, i.e., mean (median) of the values at dimension i of vectors X, X,, X k 9
20 Gene Expression Profile Ge enes A gene expression profile: X=(X, Xi,, Xn) avector of real numbers, Xi a missing data point KNN imputation We are given a gene expression matrix M Let X=(X (X, X,, Xi,, Xn)beavectorinthe the matrix M with a missing value at Xi at the dimension i Find in the gene expression data matrix matrix vectors X, X,, X k, such that they are the k closest vectors to X in M (in the sense of a chosen distance measure) among the vectors that do not have a missing i value at dimension i i Replace the missing value Xi with the mean (or median) of X i, X i,, X k i, i.e., mean (median) of the values at dimension i of vectors X, X,, X k 0
21 B A C Condition Condition Figure 4. Supervised vs. unsupervised analysis - class discovery vs. clustering
22 What is a cluster? In a set of elements, subsets of elements that are in some sense closer to each other than average Closeness can be defined by a distance measure Distance by itself is not sufficient How to measure distance between more than points? Shape of the cluster? Thresholds of closeness which are the same clusters, which are not What is a cluster? The definition of what is a cluster is difficult In practice it is defined by an algorithm that finds clusters
23 Clustering algorithms Hierarchical vs flat Hierarchical clustering builds a hierarchical tree (also called dendrogram) showing the relationship among the elements Flat clustering partitions the set of elements in subsets (nonoverlapping or overlapping) c c4 c c5 c3 Hierarchical clustering how does it work? , , , 3 4 5,
24 Different linkages Keep joining together two closest clusters by using the: Minimum distance => Single linkage Maximum distance => Complete linkage Average distance => Average linkage Alternative maintain a centroid in each cluster and use it for linking 4
25 y A B A= (,5) B = (4,) C = (3,-3) 3) x X=(+4+3)/3=3 Y=(5+-4)/3= C -5 y A B A= (,5) B = (4,) C = (3,-3) 3) x X=(+4+3)/3= C -5 5
26 y A B A= (,5) B = (4,) C = (3,-3) 3) x X=(+4+3)/3=3 Y=(5+-4)/3= C -5 y A B A= (,5) B = (4,) C = (3,-3) 3) x X=(+4+3)/3=3 Y=(5+-4)/3= - - G = (3,) -3-4 C -5 6
27 K means clustering. Select K points (vectors) called centers in the space somehow (at random, or more intelligently so that they are far a way). For each vector in the universe that you want to cluster, calculate the distance between it and all the K centers, and assign it to the center which is the closest - In this way K clusters are defined. 3. In each cluster define the new center as its gravity center 4. Repeat steps -3 until the gravity centers do not move any more, or after some fixed number of steps. Guess K centres 3. Move to gravity centres. Assign to clusters 7
28 K means clustering. Select K points (vectors) called centers in the space somehow (at random, or more intelligently so that they are far a way). For each vector in the universe that you want to cluster, calculate the distance between it and all the K centers, and assign it to the center which is the closest - In this way K clusters are defined. 3. In each cluster define the new center as its gravity center 4. Repeat steps -3 until the gravity centers do not move any more, or after some fixed number of steps Other clustering methods Kohonen s self organising maps Self organising trees (Dopazo) Probability distribution based clustering Two way clustering Fuzzy clustering Cluster comparison 8
29 Clustering genes and smaples When does it make sense to cluster samples? Ordination methods Principal components analysis (PCA) 9
30 Principal Component Analysis (PCA) Also known as Ordination or SVD (each version having slightly different meaning) Fairly nontrivial mathematical apparatus, but quite simple idea Condition Condition Condition 3 Temperature Altitude Latitude Gene Measurement Gene Measurement Gene n Measurement n 30
31 Temperature Altitude (South) Temperature Alti-latitude Altitude (South) 3
32 Temperature Alti-latitude Second PC First principal component PCA in a nutshel The main idea in the original n-dimensional space find the direction of most data variability (i.e., in which direction data-points are most stretched Orient a new coordinate axis in this direction. This will be the first principal component, and the relative stretch is the first eigenvalue, and the direction is the first eigenvector Then find the direction of the next highest h variability orthogonal to the first eigenvector this is the second component And so on 3
33 First 5 eigenvalues (X) (Y) (Z)
34 Supervised vs unsupervised analysis 34
35 35
36 36
37 Classifiers - applications Training on known data find a classifier that t can separate one experimental factor value from the other based only on data Apply to new data this will tell us where the new sample belongs (e.g., diseased or normal diagnostics) 37
38 K nearest neighbours classifier x x 38
39 Linear discriminants x x = ax + b discrimination line x x x = ax + b discrimination line x 39
Dimension reduction : PCA and Clustering
Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU The DNA Array Analysis Pipeline Array design Probe design Question Experimental
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Introduction Often, we have only a set of features x = x 1, x 2,, x n, but no associated response y. Therefore we are not interested in prediction nor classification,
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationMICROARRAY IMAGE SEGMENTATION USING CLUSTERING METHODS
Mathematical and Computational Applications, Vol. 5, No. 2, pp. 240-247, 200. Association for Scientific Research MICROARRAY IMAGE SEGMENTATION USING CLUSTERING METHODS Volkan Uslan and Đhsan Ömür Bucak
More informationExploratory data analysis for microarrays
Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA
More informationHierarchical Clustering
What is clustering Partitioning of a data set into subsets. A cluster is a group of relatively homogeneous cases or observations Hierarchical Clustering Mikhail Dozmorov Fall 2016 2/61 What is clustering
More informationMachine Learning and Data Mining. Clustering (1): Basics. Kalev Kask
Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of
More informationCluster Analysis for Microarray Data
Cluster Analysis for Microarray Data Seventh International Long Oligonucleotide Microarray Workshop Tucson, Arizona January 7-12, 2007 Dan Nettleton IOWA STATE UNIVERSITY 1 Clustering Group objects that
More informationCluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1
Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)
More informationClustering CS 550: Machine Learning
Clustering CS 550: Machine Learning This slide set mainly uses the slides given in the following links: http://www-users.cs.umn.edu/~kumar/dmbook/ch8.pdf http://www-users.cs.umn.edu/~kumar/dmbook/dmslides/chap8_basic_cluster_analysis.pdf
More informationUnsupervised learning in Vision
Chapter 7 Unsupervised learning in Vision The fields of Computer Vision and Machine Learning complement each other in a very natural way: the aim of the former is to extract useful information from visual
More informationComparisons and validation of statistical clustering techniques for microarray gene expression data. Outline. Microarrays.
Comparisons and validation of statistical clustering techniques for microarray gene expression data Susmita Datta and Somnath Datta Presented by: Jenni Dietrich Assisted by: Jeffrey Kidd and Kristin Wheeler
More informationUnsupervised Learning and Clustering
Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)
More informationECS 234: Data Analysis: Clustering ECS 234
: Data Analysis: Clustering What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed
More informationCLUSTERING IN BIOINFORMATICS
CLUSTERING IN BIOINFORMATICS CSE/BIMM/BENG 8 MAY 4, 0 OVERVIEW Define the clustering problem Motivation: gene expression and microarrays Types of clustering Clustering algorithms Other applications of
More informationCluster Analysis: Agglomerate Hierarchical Clustering
Cluster Analysis: Agglomerate Hierarchical Clustering Yonghee Lee Department of Statistics, The University of Seoul Oct 29, 2015 Contents 1 Cluster Analysis Introduction Distance matrix Agglomerative Hierarchical
More informationMicro-array Image Analysis using Clustering Methods
Micro-array Image Analysis using Clustering Methods Mrs Rekha A Kulkarni PICT PUNE kulkarni_rekha@hotmail.com Abstract Micro-array imaging is an emerging technology and several experimental procedures
More informationUniversity of Florida CISE department Gator Engineering. Clustering Part 2
Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical
More information9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology
9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example
More informationGiri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748
CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 3/3/08 CAP5510 1 Gene g Probe 1 Probe 2 Probe N 3/3/08 CAP5510
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationClassification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University
Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate
More informationStatistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte
Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,
More informationClustering. Lecture 6, 1/24/03 ECS289A
Clustering Lecture 6, 1/24/03 What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed
More informationClustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
Clustering CE-717: Machine Learning Sharif University of Technology Spring 2016 Soleymani Outline Clustering Definition Clustering main approaches Partitional (flat) Hierarchical Clustering validation
More informationChapter VIII.3: Hierarchical Clustering
Chapter VIII.3: Hierarchical Clustering 1. Basic idea 1.1. Dendrograms 1.2. Agglomerative and divisive 2. Cluster distances 2.1. Single link 2.2. Complete link 2.3. Group average and Mean distance 2.4.
More informationOutline. Multivariate analysis: Least-squares linear regression Curve fitting
DATA ANALYSIS Outline Multivariate analysis: principal component analysis (PCA) visualization of high-dimensional data clustering Least-squares linear regression Curve fitting e.g. for time-course data
More informationCSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo
CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Do Something..
More informationCourse on Microarray Gene Expression Analysis
Course on Microarray Gene Expression Analysis ::: Normalization methods and data preprocessing Madrid, April 27th, 2011. Gonzalo Gómez ggomez@cnio.es Bioinformatics Unit CNIO ::: Introduction. The probe-level
More informationGene expression & Clustering (Chapter 10)
Gene expression & Clustering (Chapter 10) Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species Dynamic programming Approximate pattern matching
More informationClustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017
Clustering and Dimensionality Reduction Stony Brook University CSE545, Fall 2017 Goal: Generalize to new data Model New Data? Original Data Does the model accurately reflect new data? Supervised vs. Unsupervised
More informationRedefining and Enhancing K-means Algorithm
Redefining and Enhancing K-means Algorithm Nimrat Kaur Sidhu 1, Rajneet kaur 2 Research Scholar, Department of Computer Science Engineering, SGGSWU, Fatehgarh Sahib, Punjab, India 1 Assistant Professor,
More informationOlmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.
Center of Atmospheric Sciences, UNAM November 16, 2016 Cluster Analisis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster)
More informationClustering Techniques
Clustering Techniques Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 16 Lopresti Fall 2007 Lecture 16-1 - Administrative notes Your final project / paper proposal is due on Friday,
More informationClustering and Visualisation of Data
Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some
More informationMSA220 - Statistical Learning for Big Data
MSA220 - Statistical Learning for Big Data Lecture 13 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Clustering Explorative analysis - finding groups
More informationClustering, cont. Genome 373 Genomic Informatics Elhanan Borenstein. Some slides adapted from Jacques van Helden
Clustering, cont Genome 373 Genomic Informatics Elhanan Borenstein Some slides adapted from Jacques van Helden Improving the search heuristic: Multiple starting points Simulated annealing Genetic algorithms
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More information/ Computational Genomics. Normalization
10-810 /02-710 Computational Genomics Normalization Genes and Gene Expression Technology Display of Expression Information Yeast cell cycle expression Experiments (over time) baseline expression program
More informationCLUSTER ANALYSIS. V. K. Bhatia I.A.S.R.I., Library Avenue, New Delhi
CLUSTER ANALYSIS V. K. Bhatia I.A.S.R.I., Library Avenue, New Delhi-110 012 In multivariate situation, the primary interest of the experimenter is to examine and understand the relationship amongst the
More informationUnderstanding Clustering Supervising the unsupervised
Understanding Clustering Supervising the unsupervised Janu Verma IBM T.J. Watson Research Center, New York http://jverma.github.io/ jverma@us.ibm.com @januverma Clustering Grouping together similar data
More informationCS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample
CS 1675 Introduction to Machine Learning Lecture 18 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem:
More informationCSE 158. Web Mining and Recommender Systems. Midterm recap
CSE 158 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday! 5:10 pm 6:10 pm Closed book but I ll provide a similar level of basic info as in the last page of previous midterms CSE 158
More informationClustering. CS294 Practical Machine Learning Junming Yin 10/09/06
Clustering CS294 Practical Machine Learning Junming Yin 10/09/06 Outline Introduction Unsupervised learning What is clustering? Application Dissimilarity (similarity) of objects Clustering algorithm K-means,
More informationMeasure of Distance. We wish to define the distance between two objects Distance metric between points:
Measure of Distance We wish to define the distance between two objects Distance metric between points: Euclidean distance (EUC) Manhattan distance (MAN) Pearson sample correlation (COR) Angle distance
More informationAN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS
AN IMPROVED HYBRIDIZED K- MEANS CLUSTERING ALGORITHM (IHKMCA) FOR HIGHDIMENSIONAL DATASET & IT S PERFORMANCE ANALYSIS H.S Behera Department of Computer Science and Engineering, Veer Surendra Sai University
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 15: Microarray clustering http://compbio.pbworks.com/f/wood2.gif Some slides were adapted from Dr. Shaojie Zhang (University of Central Florida) Microarray
More informationROTS: Reproducibility Optimized Test Statistic
ROTS: Reproducibility Optimized Test Statistic Fatemeh Seyednasrollah, Tomi Suomi, Laura L. Elo fatsey (at) utu.fi March 3, 2016 Contents 1 Introduction 2 2 Algorithm overview 3 3 Input data 3 4 Preprocessing
More informationCSE 258 Lecture 5. Web Mining and Recommender Systems. Dimensionality Reduction
CSE 258 Lecture 5 Web Mining and Recommender Systems Dimensionality Reduction This week How can we build low dimensional representations of high dimensional data? e.g. how might we (compactly!) represent
More informationMining di Dati Web. Lezione 3 - Clustering and Classification
Mining di Dati Web Lezione 3 - Clustering and Classification Introduction Clustering and classification are both learning techniques They learn functions describing data Clustering is also known as Unsupervised
More informationUnsupervised Learning : Clustering
Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex
More informationMUSI-6201 Computational Music Analysis
MUSI-6201 Computational Music Analysis Part 4.3: Feature Post-Processing alexander lerch November 4, 2015 instantaneous features overview text book Chapter 3: Instantaneous Features (pp. 63 69) sources:
More information3. Cluster analysis Overview
Université Laval Analyse multivariable - mars-avril 2008 1 3.1. Overview 3. Cluster analysis Clustering requires the recognition of discontinuous subsets in an environment that is sometimes discrete (as
More informationCS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample
Lecture 9 Clustering Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square Clustering Groups together similar instances in the data sample Basic clustering problem: distribute data into k different groups
More informationClustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York
Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity
More informationDocument Clustering using Feature Selection Based on Multiviewpoint and Link Similarity Measure
Document Clustering using Feature Selection Based on Multiviewpoint and Link Similarity Measure Neelam Singh neelamjain.jain@gmail.com Neha Garg nehagarg.february@gmail.com Janmejay Pant geujay2010@gmail.com
More informationMachine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016
Machine Learning for Signal Processing Clustering Bhiksha Raj Class 11. 13 Oct 2016 1 Statistical Modelling and Latent Structure Much of statistical modelling attempts to identify latent structure in the
More informationSpectral Classification
Spectral Classification Spectral Classification Supervised versus Unsupervised Classification n Unsupervised Classes are determined by the computer. Also referred to as clustering n Supervised Classes
More informationINF4820, Algorithms for AI and NLP: Hierarchical Clustering
INF4820, Algorithms for AI and NLP: Hierarchical Clustering Erik Velldal University of Oslo Sept. 25, 2012 Agenda Topics we covered last week Evaluating classifiers Accuracy, precision, recall and F-score
More informationCSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction
CSE 255 Lecture 5 Data Mining and Predictive Analytics Dimensionality Reduction Course outline Week 4: I ll cover homework 1, and get started on Recommender Systems Week 5: I ll cover homework 2 (at the
More informationWhy MultiLayer Perceptron/Neural Network? Objective: Attributes:
Why MultiLayer Perceptron/Neural Network? Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are
More informationHigh throughput Data Analysis 2. Cluster Analysis
High throughput Data Analysis 2 Cluster Analysis Overview Why clustering? Hierarchical clustering K means clustering Issues with above two Other methods Quality of clustering results Introduction WHY DO
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Murhaf Fares & Stephan Oepen Language Technology Group (LTG) September 27, 2017 Today 2 Recap Evaluation of classifiers Unsupervised
More informationSVM Classification in -Arrays
SVM Classification in -Arrays SVM classification and validation of cancer tissue samples using microarray expression data Furey et al, 2000 Special Topics in Bioinformatics, SS10 A. Regl, 7055213 What
More informationINF4820. Clustering. Erik Velldal. Nov. 17, University of Oslo. Erik Velldal INF / 22
INF4820 Clustering Erik Velldal University of Oslo Nov. 17, 2009 Erik Velldal INF4820 1 / 22 Topics for Today More on unsupervised machine learning for data-driven categorization: clustering. The task
More informationChapter 4: Text Clustering
4.1 Introduction to Text Clustering Clustering is an unsupervised method of grouping texts / documents in such a way that in spite of having little knowledge about the content of the documents, we can
More informationCluster analysis. Agnieszka Nowak - Brzezinska
Cluster analysis Agnieszka Nowak - Brzezinska Outline of lecture What is cluster analysis? Clustering algorithms Measures of Cluster Validity What is Cluster Analysis? Finding groups of objects such that
More informationClustering analysis of gene expression data
Clustering analysis of gene expression data Chapter 11 in Jonathan Pevsner, Bioinformatics and Functional Genomics, 3 rd edition (Chapter 9 in 2 nd edition) Human T cell expression data The matrix contains
More informationEECS 730 Introduction to Bioinformatics Microarray. Luke Huan Electrical Engineering and Computer Science
EECS 730 Introduction to Bioinformatics Microarray Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/ GeneChip 2011/11/29 EECS 730 2 Hybridization to the Chip 2011/11/29
More informationData Mining Clustering
Data Mining Clustering Jingpeng Li 1 of 34 Supervised Learning F(x): true function (usually not known) D: training sample (x, F(x)) 57,M,195,0,125,95,39,25,0,1,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0 0
More information3. Cluster analysis Overview
Université Laval Multivariate analysis - February 2006 1 3.1. Overview 3. Cluster analysis Clustering requires the recognition of discontinuous subsets in an environment that is sometimes discrete (as
More informationWhat to come. There will be a few more topics we will cover on supervised learning
Summary so far Supervised learning learn to predict Continuous target regression; Categorical target classification Linear Regression Classification Discriminative models Perceptron (linear) Logistic regression
More information10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors
Dejan Sarka Anomaly Detection Sponsors About me SQL Server MVP (17 years) and MCT (20 years) 25 years working with SQL Server Authoring 16 th book Authoring many courses, articles Agenda Introduction Simple
More informationIntroduction to GE Microarray data analysis Practical Course MolBio 2012
Introduction to GE Microarray data analysis Practical Course MolBio 2012 Claudia Pommerenke Nov-2012 Transkriptomanalyselabor TAL Microarray and Deep Sequencing Core Facility Göttingen University Medical
More informationSingular Value Decomposition, and Application to Recommender Systems
Singular Value Decomposition, and Application to Recommender Systems CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Recommendation
More informationCluster Analysis. Ying Shen, SSE, Tongji University
Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group
More informationClustering & Classification (chapter 15)
Clustering & Classification (chapter 5) Kai Goebel Bill Cheetham RPI/GE Global Research goebel@cs.rpi.edu cheetham@cs.rpi.edu Outline k-means Fuzzy c-means Mountain Clustering knn Fuzzy knn Hierarchical
More informationMetabolomic Data Analysis with MetaboAnalyst
Metabolomic Data Analysis with MetaboAnalyst User ID: guest6522519400069885256 April 14, 2009 1 Data Processing and Normalization 1.1 Reading and Processing the Raw Data MetaboAnalyst accepts a variety
More informationINF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering
INF4820 Algorithms for AI and NLP Evaluating Classifiers Clustering Erik Velldal & Stephan Oepen Language Technology Group (LTG) September 23, 2015 Agenda Last week Supervised vs unsupervised learning.
More informationIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More information10701 Machine Learning. Clustering
171 Machine Learning Clustering What is Clustering? Organizing data into clusters such that there is high intra-cluster similarity low inter-cluster similarity Informally, finding natural groupings among
More informationClustering. Chapter 10 in Introduction to statistical learning
Clustering Chapter 10 in Introduction to statistical learning 16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 1 Clustering ² Clustering is the art of finding groups in data (Kaufman and Rousseeuw, 1990). ² What
More informationForestry Applied Multivariate Statistics. Cluster Analysis
1 Forestry 531 -- Applied Multivariate Statistics Cluster Analysis Purpose: To group similar entities together based on their attributes. Entities can be variables or observations. [illustration in Class]
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 8 Based in part on slides from textbook, slides of Susan Holmes. December 2, / 1
Week 8 Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Part I Clustering 2 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group
More informationGenomics - Problem Set 2 Part 1 due Friday, 1/26/2018 by 9:00am Part 2 due Friday, 2/2/2018 by 9:00am
Genomics - Part 1 due Friday, 1/26/2018 by 9:00am Part 2 due Friday, 2/2/2018 by 9:00am One major aspect of functional genomics is measuring the transcript abundance of all genes simultaneously. This was
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-31-017 Outline Background Defining proximity Clustering methods Determining number of clusters Comparing two solutions Cluster analysis as unsupervised Learning
More informationBirkbeck (University of London)
Birkbeck (University of London) MSc Examination for Internal Students Department of Computer Science and Information Systems Information Retrieval and Organisation (COIY64H7) Credit Value: 5 Date of Examination:
More informationClustering Basic Concepts and Algorithms 1
Clustering Basic Concepts and Algorithms 1 Jeff Howbert Introduction to Machine Learning Winter 014 1 Machine learning tasks Supervised Classification Regression Recommender systems Reinforcement learning
More informationHard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering
An unsupervised machine learning problem Grouping a set of objects in such a way that objects in the same group (a cluster) are more similar (in some sense or another) to each other than to those in other
More informationAn Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data
An Intelligent Clustering Algorithm for High Dimensional and Highly Overlapped Photo-Thermal Infrared Imaging Data Nian Zhang and Lara Thompson Department of Electrical and Computer Engineering, University
More informationTELCOM2125: Network Science and Analysis
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2015 2 Part 4: Dividing Networks into Clusters The problem l Graph partitioning
More informationSegmentation Computer Vision Spring 2018, Lecture 27
Segmentation http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 218, Lecture 27 Course announcements Homework 7 is due on Sunday 6 th. - Any questions about homework 7? - How many of you have
More informationL6 Transformations in the Euclidean Plane
L6 Transformations in the Euclidean Plane NGEN06(TEK230) Algorithms in Geographical Information Systems by: Irene Rangel, updated by Sadegh Jamali, Per-Ola Olsson (source: Lecture notes in GIS, Lars Harrie)
More informationDistances, Clustering! Rafael Irizarry!
Distances, Clustering! Rafael Irizarry! Heatmaps! Distance! Clustering organizes things that are close into groups! What does it mean for two genes to be close?! What does it mean for two samples to
More informationStats fest Multivariate analysis. Multivariate analyses. Aims. Multivariate analyses. Objects. Variables
Stats fest 7 Multivariate analysis murray.logan@sci.monash.edu.au Multivariate analyses ims Data reduction Reduce large numbers of variables into a smaller number that adequately summarize the patterns
More informationRecognizing Handwritten Digits Using the LLE Algorithm with Back Propagation
Recognizing Handwritten Digits Using the LLE Algorithm with Back Propagation Lori Cillo, Attebury Honors Program Dr. Rajan Alex, Mentor West Texas A&M University Canyon, Texas 1 ABSTRACT. This work is
More informationINF 4300 Classification III Anne Solberg The agenda today:
INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15
More informationMultivariate analyses in ecology. Cluster (part 2) Ordination (part 1 & 2)
Multivariate analyses in ecology Cluster (part 2) Ordination (part 1 & 2) 1 Exercise 9B - solut 2 Exercise 9B - solut 3 Exercise 9B - solut 4 Exercise 9B - solut 5 Multivariate analyses in ecology Cluster
More information