A Dendrogram. Bioinformatics (Lec 17)

Similar documents
Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Support Vector Machines

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

Support Vector Machines

Machine Learning in Biology

Application of Support Vector Machine In Bioinformatics

SVM Classification in -Arrays

Machine Learning Classifiers and Boosting

5 Learning hypothesis classes (16 points)

Machine Learning: Think Big and Parallel

CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks

May 1, CODY, Error Backpropagation, Bischop 5.3, and Support Vector Machines (SVM) Bishop Ch 7. May 3, Class HW SVM, PCA, and K-means, Bishop Ch

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Based on Raymond J. Mooney s slides

Unsupervised Learning

CSE 573: Artificial Intelligence Autumn 2010

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Kernels and Clustering

CS570: Introduction to Data Mining

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

Content-based image and video analysis. Machine learning

SUPPORT VECTOR MACHINES

ECG782: Multidimensional Digital Signal Processing

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Lecture 9: Support Vector Machines

732A54/TDDE31 Big Data Analytics

Perceptron as a graph

SUPPORT VECTOR MACHINES

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

What to come. There will be a few more topics we will cover on supervised learning

Support vector machines

Network Traffic Measurements and Analysis

Classification: Feature Vectors

Exploratory data analysis for microarrays

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

6.034 Quiz 2, Spring 2005

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #19: Machine Learning 1

Lecture #11: The Perceptron

Function approximation using RBF network. 10 basis functions and 25 data points.

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Support Vector Machines

Support vector machines

Figure (5) Kohonen Self-Organized Map

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Bioinformatics - Lecture 07

High throughput Data Analysis 2. Cluster Analysis

SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Machine Learning for NLP

Neural Networks. Neural Network. Neural Network. Neural Network 2/21/2008. Andrew Kusiak. Intelligent Systems Laboratory Seamans Center

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

CS 343: Artificial Intelligence

Instance-based Learning

Data Mining in Bioinformatics Day 1: Classification

CS 343H: Honors AI. Lecture 23: Kernels and clustering 4/15/2014. Kristen Grauman UT Austin

Support Vector Machines + Classification for IR

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Recursive Similarity-Based Algorithm for Deep Learning

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

Lecture 1 Notes. Outline. Machine Learning. What is it? Instructors: Parth Shah, Riju Pahwa

Machine Learning (CSE 446): Unsupervised Learning

Radial Basis Function Networks: Algorithms

Information Management course

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines

Clustering & Classification (chapter 15)

CS 8520: Artificial Intelligence

Introduction to Artificial Intelligence

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Search Engines. Information Retrieval in Practice

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Neural Networks and Deep Learning

Machine Learning Department School of Computer Science Carnegie Mellon University. K- Means + GMMs

Kernel Methods & Support Vector Machines

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

4. Feedforward neural networks. 4.1 Feedforward neural network structure

Cluster Analysis: Agglomerate Hierarchical Clustering

COMPUTATIONAL INTELLIGENCE

Quiz Section Week 8 May 17, Machine learning and Support Vector Machines

All lecture slides will be available at CSC2515_Winter15.html

Supervised vs unsupervised clustering

Hierarchical Clustering 4/5/17

Data mining with Support Vector Machine

12 Classification using Support Vector Machines

9/29/13. Outline Data mining tasks. Clustering algorithms. Applications of clustering in biology

Support Vector Machines.

Data Warehousing and Machine Learning

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Preprocessing DWML, /33

Lab 2: Support vector machines

CS6220: DATA MINING TECHNIQUES

Slide07 Haykin Chapter 9: Self-Organizing Maps

Class 6 Large-Scale Image Classification

More Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA

Contents. Preface to the Second Edition

Transcription:

A Dendrogram 3/15/05 1

Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and make them part of the same cluster. Replace the pair by an average of the two s ij Try the applet at: http://www.cs.mcgill.ca/~papou/#applet 3/15/05 2

Distance Metrics For clustering, define a distance function: Euclidean distance metrics D d 1/ k k k( X, Y ) ( Xi Yi) = i= 1 Pearson correlation coefficient ρ xy d 1 = d Xi X σx k=2: Euclidean Distance Yi Y σ i= 1 y -1 ρ xy 1 3/15/05 3

Start End 3/15/05 4

K-Means Clustering [McQueen 67] Repeat Start with randomly chosen cluster centers Assign points to give greatest increase in score Recompute cluster centers Reassign points until (no changes) Try the applet at: http://www.cs.mcgill.ca/~bonnef/project.html 3/15/05 5

Self-Organizing Maps [Kohonen] Kind of neural network. Clusters data and find complex relationships between clusters. Helps reduce the dimensionality of the data. Map of 1 or 2 dimensions produced. Unsupervised Clustering Like K-Means, except for visualization 3/15/05 6

SOM Algorithm Select SOM architecture, and initialize weight vectors and other parameters. While (stopping condition not satisfied) do for each input point x winning node q has weight vector closest to x. Update weight vector of q and its neighbors. Reduce neighborhood size and learning rate. 3/15/05 7

SOM Algorithm Details Distance between x and weight vector: Winning node: Weight update function (for neighbors): wi( k + 1) = wi( k) + µ ( k, x, i)[ x( k) wi( k)] Learning rate: q( x) = min i x µ ( k, x, i) η 0( k)exp w i ri rq( x) = 2 σ 2 x wi 3/15/05 8

World Poverty SOM 3/15/05 9

World Poverty Map 3/15/05 10

Neural Networks Synaptic Weights W Bias θ Input X Σ ƒ( ) Output y 3/15/05 11

Learning NN Weights W 1 Input X Σ Error Σ + Adaptive Algorithm Desired Response 3/15/05 12

Recurrent NN Feed-forward NN Layered Types of NNs Other issues Hidden layers possible Different activation functions possible 3/15/05 13

Application: Secondary Structure Prediction 3/15/05 14

Support Vector Machines Supervised Statistical Learning Method for: Classification Regression Simplest Version: Training: Present series of labeled examples (e.g., gene expressions of tumor vs. normal cells) Prediction: Predict labels of new examples. 3/15/05 15

Learning Problems B A B A A B B A 3/15/05 16

SVM Binary Classification Partition feature space with a surface. Surface is implied by a subset of the training points (vectors) near it. These vectors are referred to as Support Vectors. Efficient with high-dimensional data. Solid statistical theory Subsume several other methods. 3/15/05 17

Learning Problems Binary Classification Multi-class classification Regression 3/15/05 18

3/15/05 19

3/15/05 20

3/15/05 21

SVM General Principles SVMs perform binary classification by partitioning the feature space with a surface implied by a subset of the training points (vectors) near the separating surface. These vectors are referred to as Support Vectors. Efficient with high-dimensional data. Solid statistical theory Subsume several other methods. 3/15/05 22

SVM Example (Radial Basis Function) 3/15/05 23

SVM Ingredients Support Vectors Mapping from Input Space to Feature Space Dot Product Kernel function Weights 3/15/05 24

Classification of 2-D (Separable) data 3/15/05 25

Classification of (Separable) 2-D data 3/15/05 26

Classification of (Separable) 2-D data +1-1 Margin of a point Margin of a point set 3/15/05 27

Classification using the Separator x w x i + b > 0 w x j + b < 0 x Separator w x + b = 0 3/15/05 28

Perceptron Algorithm (Primal) Given separable training set S and learning rate η>0 w 0 = 0; // Weight b 0 = 0; // Bias k = 0; R = max 7x i 7 repeat w = Σ a i y i x i for i = 1 to N if y i (w k x i + b k ) 0 then w k+1 = w k + ηy i x i b k+1 = b k + ηy i R 2 k = k + 1 Until no mistakes made within loop Return k, and (w k, b k ) where k = # of mistakes Rosenblatt, 1956 3/15/05 29

Performance for Separable Data Theorem: If margin m of S is positive, then k (2R/m) 2 i.e., the algorithm will always converge, and will converge quickly. 3/15/05 30

Perceptron Algorithm (Dual) Given a separable training set S a = 0; b 0 = 0; R = max 7x i 7 repeat for i = 1 to N if y i (Σa j y x j i x j + b) 0 then a i = a i + 1 b = b + y i R 2 endif Until no mistakes made within loop Return (a, b) 3/15/05 31

Non-linear Separators 3/15/05 32

Main idea: Map into feature space 3/15/05 33

Non-linear Separators X F 3/15/05 34

Useful URLs http://www.support-vector.net 3/15/05 35

Perceptron Algorithm (Dual) Given a separable training set S a = 0; b 0 = 0; R = max 7x i 7 repeat for i = 1 to N if y i (Σa j y k(x j i,x j ) + b) 0 then a i = a i + 1 b = b + y i R 2 Until no mistakes made within loop Return (a, b) k(x i,x j ) = Φ(x i ) Φ(x j ) 3/15/05 36

Different Kernel Functions Polynomial kernel Radial Basis Kernel Sigmoid Kernel κ ( X, Y ) = ( X Y ) κ ( X, Y ) exp X Y = 2 2σ κ ( X, Y ) = tanh( ω( X Y ) + θ ) d 2 3/15/05 37

SVM Ingredients Support Vectors Mapping from Input Space to Feature Space Dot Product Kernel function 3/15/05 38

Generalizations How to deal with more than 2 classes? Idea: Associate weight and bias for each class. How to deal with non-linear separator? Idea: Support Vector Machines. How to deal with linear regression? How to deal with non-separable data? 3/15/05 39

Applications Text Categorization & Information Filtering 12,902 Reuters Stories, 118 categories (91%!!) Image Recognition Face Detection, tumor anomalies, defective parts in assembly line, etc. Gene Expression Analysis Protein Homology Detection 3/15/05 40

3/15/05 41

3/15/05 42

3/15/05 43

SVM Example (Radial Basis Function) 3/15/05 44