# Dimension reduction : PCA and Clustering

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Dimension reduction : PCA and Clustering By Hanne Jarmer Slides by Christopher Workman Center for Biological Sequence Analysis DTU

2 The DNA Array Analysis Pipeline Array design Probe design Question Experimental Design Sample Preparation Hybridization Buy Chip/Array Image analysis Expression Index Calculation Normalization Comparable Gene Expression Data Statistical Analysis Fit to Model (time series) Advanced Data Analysis Clustering PCA Classification Promoter Analysis Meta analysis Survival analysis Regulatory Network

3 The DNA Array Analysis Pipeline Array design Probe design Question Experimental Design Sample Preparation Hybridization Buy Chip/Array Image analysis Expression Index Calculation Normalization Comparable Gene Expression Data Statistical Analysis Fit to Model (time series) Advanced Data Analysis Clustering PCA Classification Promoter Analysis Meta analysis Survival analysis Regulatory Network

4 What is Principal Component Analysis (PCA)? Numerical method Dimensionality reduction technique Primarily for visualization of arrays/samples Unsupervised method used to explore the intrinsic variability of the data

5 PCA Performs a rotation of the data that maximizes the variance in the new axes Projects high dimensional data into a low dimensional sub-space (visualized in 2-3 dims) Often captures much of the total data variation in a few dimensions (< 5) Exact solutions require a fully determined system (matrix with full rank) i.e. A square matrix with independent rows

6 Principal components 1 st Principal component (PC1) Direction along which there is greatest variation 2 nd Principal component (PC2) Direction with maximum variation left in data, orthogonal to PC1

7 Singular Value Decomposition An implementation of PCA Defined in terms of matrices: X is the expression data matrix U are the left eigenvectors V are the right eigenvectors S are the singular values (S 2 = Λ)

8 Singular Value Decomposition

9 Singular Value Decomposition Requirements: No missing values Centered observations, i.e. normalize data such that each gene has mean = 0

10 PCA projections (as XY-plot)

11 Related methods Factor Analysis* Multidimensional scaling (MDS) Generalized multidimensional scaling (GMDS) Semantic mapping Isomap Independent component analysis (ICA) * Factor analysis is often confused with PCA though the two methods are related but distinct. Factor analysis is equivalent to PCA if the error terms in the factor analysis model are assumed to all have the same variance.

12 Why do we cluster? Organize observed data into meaningful structures Summarize large data sets Used when we have no a priori hypotheses Optimization: Minimize within cluster distances Maximize between cluster distances

13 Many types of clustering methods Method: K-class Hierarchical, e.g. UPGMA Agglomerative (bottom-up)... all alone... join... Divisive (top-down)... all together... split... Graph theoretic Information used: Supervised vs unsupervised Final description of the items: Partitioning vs non-partitioning fuzzy, multi-class

14 Hierarchical clustering Representation of all pair-wise distances Parameters: none (distance measure) Results: One large cluster Hierarchical tree (dendrogram) Deterministic

15 Hierarchical clustering UPGMA Algorithm Unweighted Pair Group Method with Arithmetic Mean Assign each item to its own cluster Join the nearest clusters Re-estimate the distance between clusters Repeat for 1 to n

16 Hierarchical clustering

17 Hierarchical clustering

18 Hierarchical Clustering Data with clustering order and distances Dendrogram representation 2D data is a special (simple) case!

19 Hierarchical Clustering Original data space Merging steps define a dendrogram

20 K-means - Algorithm J. B. MacQueen (1967): "Some Methods for classification and Analysis of Multivariate Observations", Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1:

21 K-mean clustering, K=3

22 K-mean clustering, K=3

23 K-mean clustering, K=3

24 K-Means Iteration i Iteration i+1 Circles: prototypes (parameters to fit) Squares: data points

25 K-means clustering Cell Cycle data

26 Self Organizing Maps (SOM) Partitioning method (similar to the K-means method) Clusters are organized in a two-dimensional grid Size of grid must be specified (eg. 2x2 or 3x3) SOM algorithm finds the optimal organization of data in the grid

27 SOM - example

28 SOM - example

29 SOM - example

30 SOM - example

31 SOM - example

32 Comparison of clustering methods Hierarchical clustering Distances between all variables Time consuming with a large number of gene Advantage to cluster on selected genes K-means clustering Faster algorithm Does only show relations between all variables SOM Machine learning algorithm

33 Distance measures Euclidian distance Vector angle distance Pearsons distance

34 Comparison of distance measures

35 Summary Dimension reduction important to visualize data Methods: Principal Component Analysis Clustering Hierarchical K-means Self organizing maps (distance measure important)

36 DNA Microarray Analysis Overview/Review Normalization PCA (using SVD) Cluster analysis Before After

### Exploratory data analysis for microarrays

Exploratory data analysis for microarrays Jörg Rahnenführer Computational Biology and Applied Algorithmics Max Planck Institute for Informatics D-66123 Saarbrücken Germany NGFN - Courses in Practical DNA

### 9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

9/9/ I9 Introduction to Bioinformatics, Clustering algorithms Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Outline Data mining tasks Predictive tasks vs descriptive tasks Example

### How do microarrays work

Lecture 3 (continued) Alvis Brazma European Bioinformatics Institute How do microarrays work condition mrna cdna hybridise to microarray condition Sample RNA extract labelled acid acid acid nucleic acid

### Cluster Analysis for Microarray Data

Cluster Analysis for Microarray Data Seventh International Long Oligonucleotide Microarray Workshop Tucson, Arizona January 7-12, 2007 Dan Nettleton IOWA STATE UNIVERSITY 1 Clustering Group objects that

### CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo CSE 6242 A / CS 4803 DVA Feb 12, 2013 Dimension Reduction Guest Lecturer: Jaegul Choo Data is Too Big To Do Something..

### Clustering k-mean clustering

Clustering k-mean clustering Genome 373 Genomic Informatics Elhanan Borenstein The clustering problem: partition genes into distinct sets with high homogeneity and high separation Clustering (unsupervised)

### Dimension Reduction CS534

Dimension Reduction CS534 Why dimension reduction? High dimensionality large number of features E.g., documents represented by thousands of words, millions of bigrams Images represented by thousands of

### Clustering and Visualisation of Data

Clustering and Visualisation of Data Hiroshi Shimodaira January-March 28 Cluster analysis aims to partition a data set into meaningful or useful groups, based on distances between data points. In some

### Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Cluster Analysis Mu-Chun Su Department of Computer Science and Information Engineering National Central University 2003/3/11 1 Introduction Cluster analysis is the formal study of algorithms and methods

### CLUSTERING IN BIOINFORMATICS

CLUSTERING IN BIOINFORMATICS CSE/BIMM/BENG 8 MAY 4, 0 OVERVIEW Define the clustering problem Motivation: gene expression and microarrays Types of clustering Clustering algorithms Other applications of

### 10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

161 Machine Learning Hierarchical clustering Reading: Bishop: 9-9.2 Second half: Overview Clustering - Hierarchical, semi-supervised learning Graphical models - Bayesian networks, HMMs, Reasoning under

### Cluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6

Cluster Analysis and Visualization Workshop on Statistics and Machine Learning 2004/2/6 Outlines Introduction Stages in Clustering Clustering Analysis and Visualization One/two-dimensional Data Histogram,

### Clustering. Bruno Martins. 1 st Semester 2012/2013

Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 Motivation Basic Concepts

### 5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

Computational Methods for Data Analysis Massimo Poesio UNSUPERVISED LEARNING Clustering Unsupervised learning introduction 1 Supervised learning Training set: Unsupervised learning Training set: 2 Clustering

### Spectral Classification

Spectral Classification Spectral Classification Supervised versus Unsupervised Classification n Unsupervised Classes are determined by the computer. Also referred to as clustering n Supervised Classes

### Slides adapted from Marshall Tappen and Bryan Russell. Algorithms in Nature. Non-negative matrix factorization

Slides adapted from Marshall Tappen and Bryan Russell Algorithms in Nature Non-negative matrix factorization Dimensionality Reduction The curse of dimensionality: Too many features makes it difficult to

### Clustering Techniques

Clustering Techniques Bioinformatics: Issues and Algorithms CSE 308-408 Fall 2007 Lecture 16 Lopresti Fall 2007 Lecture 16-1 - Administrative notes Your final project / paper proposal is due on Friday,

### Kapitel 4: Clustering

Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases WiSe 2017/18 Kapitel 4: Clustering Vorlesung: Prof. Dr.

### Cluster Analysis. Ying Shen, SSE, Tongji University

Cluster Analysis Ying Shen, SSE, Tongji University Cluster analysis Cluster analysis groups data objects based only on the attributes in the data. The main objective is that The objects within a group

### Unsupervised Learning

Unsupervised Learning Chapter 14: The Elements of Statistical Learning Presented for 540 by Len Tanaka Objectives Introduction Techniques: Association Rules Cluster Analysis Self-Organizing Maps Projective

### Objective of clustering

Objective of clustering Discover structures and patterns in high-dimensional data. Group data with similar patterns together. This reduces the complexity and facilitates interpretation. Expression level

### Outlier Detection and Removal Algorithm in K-Means and Hierarchical Clustering

World Journal of Computer Application and Technology 5(2): 24-29, 2017 DOI: 10.13189/wjcat.2017.050202 http://www.hrpub.org Outlier Detection and Removal Algorithm in K-Means and Hierarchical Clustering

### CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING Cluster Analysis: Basic Concepts and Methods Huan Sun, CSE@The Ohio State University 09/25/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han 2 Chapter 10.

### CIE L*a*b* color model

CIE L*a*b* color model To further strengthen the correlation between the color model and human perception, we apply the following non-linear transformation: with where (X n,y n,z n ) are the tristimulus

### Indexing in Search Engines based on Pipelining Architecture using Single Link HAC

Indexing in Search Engines based on Pipelining Architecture using Single Link HAC Anuradha Tyagi S. V. Subharti University Haridwar Bypass Road NH-58, Meerut, India ABSTRACT Search on the web is a daily

### General Instructions. Questions

CS246: Mining Massive Data Sets Winter 2018 Problem Set 2 Due 11:59pm February 8, 2018 Only one late period is allowed for this homework (11:59pm 2/13). General Instructions Submission instructions: These

### Clustering. Lecture 6, 1/24/03 ECS289A

Clustering Lecture 6, 1/24/03 What is Clustering? Given n objects, assign them to groups (clusters) based on their similarity Unsupervised Machine Learning Class Discovery Difficult, and maybe ill-posed

### Statistics 202: Data Mining. c Jonathan Taylor. Clustering Based in part on slides from textbook, slides of Susan Holmes.

Clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Clustering Clustering Goal: Finding groups of objects such that the objects in a group will be similar (or

### University of Florida CISE department Gator Engineering. Clustering Part 2

Clustering Part 2 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Partitional Clustering Original Points A Partitional Clustering Hierarchical

### Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

### Statistical Methods for Data Mining

Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on supervised learning

### K-Means Clustering 3/3/17

K-Means Clustering 3/3/17 Unsupervised Learning We have a collection of unlabeled data points. We want to find underlying structure in the data. Examples: Identify groups of similar data points. Clustering

### Multidimensional Visualization and Clustering

Multidimensional Visualization and Clustering Presentation for Visual Analytics of Professor Klaus Mueller Xiaotian (Tim) Yin 04-26 26-20072007 Paper List HD-Eye: Visual Mining of High-Dimensional Data

### Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures

### Unsupervised Learning

Unsupervised Learning Unsupervised learning Until now, we have assumed our training samples are labeled by their category membership. Methods that use labeled samples are said to be supervised. However,

### Spatializing GIS Commands with Self-Organizing Maps. Jochen Wendel Barbara P. Buttenfield Roland J. Viger Jeremy M. Smith

Spatializing GIS Commands with Self-Organizing Maps Jochen Wendel Barbara P. Buttenfield Roland J. Viger Jeremy M. Smith Outline Introduction Characterizing GIS Commands Implementation Interpretation of

### 3. Cluster analysis Overview

Université Laval Analyse multivariable - mars-avril 2008 1 3.1. Overview 3. Cluster analysis Clustering requires the recognition of discontinuous subsets in an environment that is sometimes discrete (as

### Unsupervised Learning : Clustering

Unsupervised Learning : Clustering Things to be Addressed Traditional Learning Models. Cluster Analysis K-means Clustering Algorithm Drawbacks of traditional clustering algorithms. Clustering as a complex

### Oliver Dürr. Statistisches Data Mining (StDM) Woche 5. Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften

Statistisches Data Mining (StDM) Woche 5 Oliver Dürr Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften oliver.duerr@zhaw.ch Winterthur, 17 Oktober 2017 1 Multitasking

### Supervised vs unsupervised clustering

Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful

### Machine Learning. Unsupervised Learning. Manfred Huber

Machine Learning Unsupervised Learning Manfred Huber 2015 1 Unsupervised Learning In supervised learning the training data provides desired target output for learning In unsupervised learning the training

### COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS

COMBINED METHOD TO VISUALISE AND REDUCE DIMENSIONALITY OF THE FINANCIAL DATA SETS Toomas Kirt Supervisor: Leo Võhandu Tallinn Technical University Toomas.Kirt@mail.ee Abstract: Key words: For the visualisation

### Unsupervised Learning: Clustering

Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning

### Stats fest Multivariate analysis. Multivariate analyses. Aims. Multivariate analyses. Objects. Variables

Stats fest 7 Multivariate analysis murray.logan@sci.monash.edu.au Multivariate analyses ims Data reduction Reduce large numbers of variables into a smaller number that adequately summarize the patterns

### Unsupervised Learning

Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support, Fall 2005 Instructors: Professor Lucila Ohno-Machado and Professor Staal Vinterbo 6.873/HST.951 Medical Decision

### PATTERN CLASSIFICATION AND SCENE ANALYSIS

PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane

### Unsupervised Learning. Supervised learning vs. unsupervised learning. What is Cluster Analysis? Applications of Cluster Analysis

7 Supervised learning vs unsupervised learning Unsupervised Learning Supervised learning: discover patterns in the data that relate data attributes with a target (class) attribute These patterns are then

### Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

Cluster Analysis Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX 7575 April 008 April 010 Cluster Analysis, sometimes called data segmentation or customer segmentation,

### Introduction to R and Statistical Data Analysis

Microarray Center Introduction to R and Statistical Data Analysis PART II Petr Nazarov petr.nazarov@crp-sante.lu 22-11-2010 OUTLINE PART II Descriptive statistics in R (8) sum, mean, median, sd, var, cor,

### Iterative Signature Algorithm for the Analysis of Large-Scale Gene Expression Data. By S. Bergmann, J. Ihmels, N. Barkai

Iterative Signature Algorithm for the Analysis of Large-Scale Gene Expression Data By S. Bergmann, J. Ihmels, N. Barkai Reasoning Both clustering and Singular Value Decomposition(SVD) are useful tools

### Applications Video Surveillance (On-line or off-line)

Face Face Recognition: Dimensionality Reduction Biometrics CSE 190-a Lecture 12 CSE190a Fall 06 CSE190a Fall 06 Face Recognition Face is the most common biometric used by humans Applications range from

### Chapter 6: Cluster Analysis

Chapter 6: Cluster Analysis The major goal of cluster analysis is to separate individual observations, or items, into groups, or clusters, on the basis of the values for the q variables measured on each

### Algorithms for Bioinformatics

Adapted from slides by Leena Salmena and Veli Mäkinen, which are partly from http: //bix.ucsd.edu/bioalgorithms/slides.php. 582670 Algorithms for Bioinformatics Lecture 6: Distance based clustering and

### Lecture 3: Camera Calibration, DLT, SVD

Computer Vision Lecture 3 23--28 Lecture 3: Camera Calibration, DL, SVD he Inner Parameters In this section we will introduce the inner parameters of the cameras Recall from the camera equations λx = P

### 4. Cluster Analysis. Francesc J. Ferri. Dept. d Informàtica. Universitat de València. Febrer F.J. Ferri (Univ. València) AIRF 2/ / 1

Anàlisi d Imatges i Reconeixement de Formes Image Analysis and Pattern Recognition:. Cluster Analysis Francesc J. Ferri Dept. d Informàtica. Universitat de València Febrer 8 F.J. Ferri (Univ. València)

### UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING. Daniela Joiţa Titu Maiorescu University, Bucharest, Romania

UNSUPERVISED STATIC DISCRETIZATION METHODS IN DATA MINING Daniela Joiţa Titu Maiorescu University, Bucharest, Romania danielajoita@utmro Abstract Discretization of real-valued data is often used as a pre-processing

### DATA MINING LECTURE 7. Hierarchical Clustering, DBSCAN The EM Algorithm

DATA MINING LECTURE 7 Hierarchical Clustering, DBSCAN The EM Algorithm CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related)

### INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,

### Hierarchical Clustering

Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree like diagram that records the sequences of merges or splits 0 0 0 00

### Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which

### Clustering (COSC 416) Nazli Goharian. Document Clustering.

Clustering (COSC 416) Nazli Goharian nazli@cs.georgetown.edu 1 Document Clustering. Cluster Hypothesis : By clustering, documents relevant to the same topics tend to be grouped together. C. J. van Rijsbergen,

### Dimension reduction for hyperspectral imaging using laplacian eigenmaps and randomized principal component analysis:midyear Report

Dimension reduction for hyperspectral imaging using laplacian eigenmaps and randomized principal component analysis:midyear Report Yiran Li yl534@math.umd.edu Advisor: Wojtek Czaja wojtek@math.umd.edu

### Information Retrieval and Web Search Engines

Information Retrieval and Web Search Engines Lecture 7: Document Clustering May 25, 2011 Wolf-Tilo Balke and Joachim Selke Institut für Informationssysteme Technische Universität Braunschweig Homework

### Iterated Consensus Clustering: A Technique We Can All Agree On

Iterated Consensus Clustering: A Technique We Can All Agree On Mindy Hong, Robert Pearce, Kevin Valakuzhy, Carl Meyer, Shaina Race Abstract Cluster Analysis is a field of Data Mining used to extract underlying

### Introduction to Cancer Genomics

Introduction to Cancer Genomics Gene expression data analysis part I David Gfeller Computational Cancer Biology Ludwig Center for Cancer research david.gfeller@unil.ch 1 Overview 1. Basic understanding

Workload Characterization Techniques Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/

### 21 The Singular Value Decomposition; Clustering

The Singular Value Decomposition; Clustering 125 21 The Singular Value Decomposition; Clustering The Singular Value Decomposition (SVD) [and its Application to PCA] Problems: Computing X > X takes (nd

### Image Processing. Image Features

Image Processing Image Features Preliminaries 2 What are Image Features? Anything. What they are used for? Some statements about image fragments (patches) recognition Search for similar patches matching

### Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your

### Clustering & Bootstrapping

Clustering & Bootstrapping Jelena Prokić University of Groningen The Netherlands March 25, 2009 Groningen Overview What is clustering? Various clustering algorithms Bootstrapping Application in dialectometry

### Machine Tool Volumetric Error Feature Extraction and Classification Using Principal Component Analysis and K- means Methods

7th International Conference on Virtual Machining Process Technology (VMPT), Hamilton, May 7-9, 2018 Machine Tool Volumetric Error Feature Extraction and Classification Using Principal Component Analysis

### SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM. Olga Kouropteva, Oleg Okun and Matti Pietikäinen

SELECTION OF THE OPTIMAL PARAMETER VALUE FOR THE LOCALLY LINEAR EMBEDDING ALGORITHM Olga Kouropteva, Oleg Okun and Matti Pietikäinen Machine Vision Group, Infotech Oulu and Department of Electrical and

### Facial Expression Recognition Using Non-negative Matrix Factorization

Facial Expression Recognition Using Non-negative Matrix Factorization Symeon Nikitidis, Anastasios Tefas and Ioannis Pitas Artificial Intelligence & Information Analysis Lab Department of Informatics Aristotle,

### Unsupervised Learning

Networks for Pattern Recognition, 2014 Networks for Single Linkage K-Means Soft DBSCAN PCA Networks for Kohonen Maps Linear Vector Quantization Networks for Problems/Approaches in Machine Learning Supervised

### Robust Pose Estimation using the SwissRanger SR-3000 Camera

Robust Pose Estimation using the SwissRanger SR- Camera Sigurjón Árni Guðmundsson, Rasmus Larsen and Bjarne K. Ersbøll Technical University of Denmark, Informatics and Mathematical Modelling. Building,

### SOM+EOF for Finding Missing Values

SOM+EOF for Finding Missing Values Antti Sorjamaa 1, Paul Merlin 2, Bertrand Maillet 2 and Amaury Lendasse 1 1- Helsinki University of Technology - CIS P.O. Box 5400, 02015 HUT - Finland 2- Variances and

### Visual Representations for Machine Learning

Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering

### Vector Semantics. Dense Vectors

Vector Semantics Dense Vectors Sparse versus dense vectors PPMI vectors are long (length V = 20,000 to 50,000) sparse (most elements are zero) Alternative: learn vectors which are short (length 200-1000)

### Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Classification Vladimir Curic Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University Outline An overview on classification Basics of classification How to choose appropriate

### Clustering Part 4 DBSCAN

Clustering Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville DBSCAN DBSCAN is a density based clustering algorithm Density = number of

### Texture Image Segmentation using FCM

Proceedings of 2012 4th International Conference on Machine Learning and Computing IPCSIT vol. 25 (2012) (2012) IACSIT Press, Singapore Texture Image Segmentation using FCM Kanchan S. Deshmukh + M.G.M

### Anomaly Detection on Data Streams with High Dimensional Data Environment

Anomaly Detection on Data Streams with High Dimensional Data Environment Mr. D. Gokul Prasath 1, Dr. R. Sivaraj, M.E, Ph.D., 2 Department of CSE, Velalar College of Engineering & Technology, Erode 1 Assistant

### Unsupervised Learning

Unsupervised Learning A review of clustering and other exploratory data analysis methods HST.951J: Medical Decision Support Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision

### Clustering Gene Expression Data: Acknowledgement: Elizabeth Garrett-Mayer; Shirley Liu; Robert Tibshirani; Guenther Walther; Trevor Hastie

Clustering Gene Expression Data: Acknowledgement: Elizabeth Garrett-Mayer; Shirley Liu; Robert Tibshirani; Guenther Walther; Trevor Hastie Data from Garber et al. PNAS (98), 2001. Clustering Clustering

### Clustering & Classification (chapter 15)

Clustering & Classification (chapter 5) Kai Goebel Bill Cheetham RPI/GE Global Research goebel@cs.rpi.edu cheetham@cs.rpi.edu Outline k-means Fuzzy c-means Mountain Clustering knn Fuzzy knn Hierarchical

### CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA. By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr.

CANCER PREDICTION USING PATTERN CLASSIFICATION OF MICROARRAY DATA By: Sudhir Madhav Rao &Vinod Jayakumar Instructor: Dr. Michael Nechyba 1. Abstract The objective of this project is to apply well known

### CS573 Data Privacy and Security. Li Xiong

CS573 Data Privacy and Security Anonymizationmethods Li Xiong Today Clustering based anonymization(cont) Permutation based anonymization Other privacy principles Microaggregation/Clustering Two steps:

### A Dendrogram. Bioinformatics (Lec 17)

A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and

### Document Clustering using Concept Space and Cosine Similarity Measurement

29 International Conference on Computer Technology and Development Document Clustering using Concept Space and Cosine Similarity Measurement Lailil Muflikhah Department of Computer and Information Science

### EECS 730 Introduction to Bioinformatics Microarray. Luke Huan Electrical Engineering and Computer Science

EECS 730 Introduction to Bioinformatics Microarray Luke Huan Electrical Engineering and Computer Science http://people.eecs.ku.edu/~jhuan/ GeneChip 2011/11/29 EECS 730 2 Hybridization to the Chip 2011/11/29

### IMAGE MATCHING - ALOK TALEKAR - SAIRAM SUNDARESAN 11/23/2010 1

IMAGE MATCHING - ALOK TALEKAR - SAIRAM SUNDARESAN 11/23/2010 1 : Presentation structure : 1. Brief overview of talk 2. What does Object Recognition involve? 3. The Recognition Problem 4. Mathematical background:

### Data Fusion. Merging data from multiple sources to optimize data or create value added data

Data Fusion Jeffrey S. Evans - Landscape Ecologist USDA Forest Service Rocky Mountain Research Station Forestry Sciences Lab - Moscow, Idaho Data Fusion Data Fusion is a formal framework in which are expressed

### Mathematical Operations

CHAPTER 10 Mathematical Operations The basic approach for the problems of this type is more or less similar to that of coding and decoding. One has to study the symbols or the geometrical figures and their

PCOMP http://127.0.0.1:55825/help/topic/com.rsi.idl.doc.core/pcomp... IDL API Reference Guides > IDL Reference Guide > Part I: IDL Command Reference > Routines: P PCOMP Syntax Return Value Arguments Keywords

### RDRToolbox A package for nonlinear dimension reduction with Isomap and LLE.

RDRToolbox A package for nonlinear dimension reduction with Isomap and LLE. Christoph Bartenhagen October 30, 2017 Contents 1 Introduction 1 1.1 Loading the package......................................

### Step-by-Step Guide to Relatedness and Association Mapping Contents

Step-by-Step Guide to Relatedness and Association Mapping Contents OBJECTIVES... 2 INTRODUCTION... 2 RELATEDNESS MEASURES... 2 POPULATION STRUCTURE... 6 Q-K ASSOCIATION ANALYSIS... 10 K MATRIX COMPRESSION...

### CPSC 340: Machine Learning and Data Mining. Finding Similar Items Fall 2017

CPSC 340: Machine Learning and Data Mining Finding Similar Items Fall 2017 Assignment 1 is due tonight. Admin 1 late day to hand in Monday, 2 late days for Wednesday. Assignment 2 will be up soon. Start

### Package ArrayBin. February 19, 2015

Package ArrayBin February 19, 2015 Version 0.2 Date 2013-02-01 Title Binarization of numeric data arrays Author Ed Curry Maintainer Depends R (>= 2.15.0), SAGx Fast adaptive binarization for numeric data