CSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning

Similar documents
CS 343: Artificial Intelligence

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Kernels and Clustering

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Classification: Feature Vectors

Unsupervised Learning: Clustering

CSE 573: Artificial Intelligence Autumn 2010

CS 343H: Honors AI. Lecture 23: Kernels and clustering 4/15/2014. Kristen Grauman UT Austin

Clustering Lecture 8. David Sontag New York University. Slides adapted from Luke Zettlemoyer, Vibhav Gogate, Carlos Guestrin, Andrew Moore, Dan Klein

Hierarchical Clustering 4/5/17

Dimension reduction : PCA and Clustering

CSE 40171: Artificial Intelligence. Informed Search: A* Search

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

Clustering and Dimensionality Reduction

Unsupervised Learning

CSE 255 Lecture 5. Data Mining and Predictive Analytics. Dimensionality Reduction

Lecture 25: Review I

CS 229 Midterm Review

CSE 258 Lecture 5. Web Mining and Recommender Systems. Dimensionality Reduction

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Machine Learning and Data Mining. Clustering. (adapted from) Prof. Alexander Ihler

Clustering and Dimensionality Reduction. Stony Brook University CSE545, Fall 2017

Clustering COMS 4771

Midterm Examination CS540-2: Introduction to Artificial Intelligence

10601 Machine Learning. Hierarchical clustering. Reading: Bishop: 9-9.2

Clustering Lecture 14

Today s lecture. Clustering and unsupervised learning. Hierarchical clustering. K-means, K-medoids, VQ

Network Traffic Measurements and Analysis

Clustering and Visualisation of Data

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Using Machine Learning to Optimize Storage Systems

General Instructions. Questions

Machine Learning. Unsupervised Learning. Manfred Huber

May 1, CODY, Error Backpropagation, Bischop 5.3, and Support Vector Machines (SVM) Bishop Ch 7. May 3, Class HW SVM, PCA, and K-means, Bishop Ch

Unsupervised Learning

A Course in Machine Learning

Machine Learning (CSE 446): Unsupervised Learning

CSE 5243 INTRO. TO DATA MINING

Cluster Analysis. Ying Shen, SSE, Tongji University

Spectral Classification

Clustering. CS294 Practical Machine Learning Junming Yin 10/09/06

Clustering Lecture 3: Hierarchical Methods

Introduction to Machine Learning. Xiaojin Zhu

CSE 5243 INTRO. TO DATA MINING

Ch.5 Classification and Clustering. In machine learning, there are two main types of learning problems, supervised and unsupervised learning.

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

DS504/CS586: Big Data Analytics Big Data Clustering Prof. Yanhua Li

5/15/16. Computational Methods for Data Analysis. Massimo Poesio UNSUPERVISED LEARNING. Clustering. Unsupervised learning introduction

Cluster Analysis for Microarray Data

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

Gene Clustering & Classification

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Unsupervised: no target value to predict

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Cluster Analysis: Agglomerate Hierarchical Clustering

SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

Grundlagen der Künstlichen Intelligenz

CSE 255 Lecture 6. Data Mining and Predictive Analytics. Community Detection

Clustering. Unsupervised Learning

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

COMP33111: Tutorial and lab exercise 7

CSE 40171: Artificial Intelligence. Uninformed Search: Search Spaces

K-Means Clustering 3/3/17

10701 Machine Learning. Clustering

Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

Segmentation Computer Vision Spring 2018, Lecture 27

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

CSE 158. Web Mining and Recommender Systems. Midterm recap

Applications. Foreground / background segmentation Finding skin-colored regions. Finding the moving objects. Intelligent scissors

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

1 Case study of SVM (Rob)

Announcements. CS 188: Artificial Intelligence Spring Today. Example: Map-Coloring. Example: Cryptarithmetic.

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Unsupervised Learning

University of Florida CISE department Gator Engineering. Clustering Part 2

Administrative. Machine learning code. Supervised learning (e.g. classification) Machine learning: Unsupervised learning" BANANAS APPLES

Supervised and Unsupervised Learning (II)

Stats 170A: Project in Data Science Exploratory Data Analysis: Clustering Algorithms

CPSC 425: Computer Vision

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Data Exploration with PCA and Unsupervised Learning with Clustering Paul Rodriguez, PhD PACE SDSC

Lecture 4 Hierarchical clustering

Clustering CS 550: Machine Learning

Clustering Part 3. Hierarchical Clustering

Hierarchical Clustering

Clustering in R d. Clustering. Widely-used clustering methods. The k-means optimization problem CSE 250B

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

4. Ad-hoc I: Hierarchical clustering

Unsupervised Learning. Clustering and the EM Algorithm. Unsupervised Learning is Model Learning

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

CS 2750: Machine Learning. Clustering. Prof. Adriana Kovashka University of Pittsburgh January 17, 2017

Transcription:

CSE 40171: Artificial Intelligence Learning from Data: Unsupervised Learning 32

Homework #6 has been released. It is due at 11:59PM on 11/7. 33

CSE Seminar: 11/1 Amy Reibman Purdue University 3:30pm DBART 138 Video Quality in an Age of Universal Cameras 34

Recap: Classification Classification systems: Supervised learning Make a prediction given evidence Useful when you have labeled data Slide credit: Dan Klein and Pieter Abbeel, UC Berkeley CS 188 Image credit: http://yann.lecun.com/exdb/mnist/

Slide credit: Dan Klein and Pieter Abbeel, UC Berkeley CS 188 Clustering

Clustering Basic idea: group together similar instances Example: 2D point patterns What could similar mean? One option: small (squared) Euclidean distance Slide credit: Dan Klein and Pieter Abbeel, UC Berkeley CS 188

Clustering: K-means Slide credit: Dan Klein and Pieter Abbeel, UC Berkeley CS 188 38

K-Means An iterative clustering algorithm Pick K random points as cluster centers (means) Alternate: - Assign data instances to closest mean -Assign each mean to the average of its assigned points Stop when no points assignments change Slide credit: Dan Klein and Pieter Abbeel, UC Berkeley CS 188

K-means Example

K-Means as Optimization Consider the total distance to the means: points assignments means Each iteration reduces phi Two stages each iteration: Update assignments: fix means c, change assignments a Update means: fix assignments a, change means c Slide credit: Dan Klein and Pieter Abbeel, UC Berkeley CS 188

Phase I: Update Assignments For each point, re-assign to closest mean: Can only decrease total distance phi! Slide credit: Dan Klein and Pieter Abbeel, UC Berkeley CS 188

Phase II: Update Means Move each mean to the average of its assigned points: Also can only decrease total distance (Why?) Fun fact: the point y with minimum squared Euclidean distance to a set of points {x} is their mean Slide credit: Dan Klein and Pieter Abbeel, UC Berkeley CS 188

Initialization K-means is non-deterministic Requires initial means It does matter what you pick! What can go wrong? Various schemes for preventing this kind of thing: variance-based split / merge, initialization heuristics Slide credit: Dan Klein and Pieter Abbeel, UC Berkeley CS 188

K-Means Getting Stuck A local optimum: Why doesn t this work out like the earlier example, with the purple taking over half the blue? Slide credit: Dan Klein and Pieter Abbeel, UC Berkeley CS 188

K-Means Questions Will K-means converge? To a global optimum? Will it always find the true patterns in the data? If the patterns are very very clear? Will it find something interesting? Do people ever use it? How many clusters to pick? Slide credit: Dan Klein and Pieter Abbeel, UC Berkeley CS 188

Clustering: PCA 47

Principal Components Analysis Technique used to emphasize variation and bring out strong patterns in a dataset It's often used to make data easy to explore and visualize Can also be used for data clustering Intuitive visual explanation: http://setosa.io/ev/principal-component-analysis/

Principal Components Analysis Intuitive visual explanation: http://setosa.io/ev/principal-component-analysis/

Eigenvalue Decomposition Define a transformation matrix, W: y j =W T x j,j =1, 2...N m-dimensional orthonormal (W R m n ) n-dimensional S T = NX (x j x)(x j x) T Data Scatter Matrix j=1!50

Eigenvalue Decomposition fs T = NX (y j ȳ)(y j ȳ) T =W T S T W j=1 Trans. Data Scatter Matrix Eigenvectors of ST W opt = arg max W T S T W = {v 1,v 2...v k } W!51

Intuitive visual explanation: http://setosa.io/ev/principal-component-analysis/

Clustering with PCA

Clustering: Agglomerative Clustering Slide credit: Dan Klein and Pieter Abbeel, UC Berkeley CS 188

Agglomerative Clustering Agglomerative (hierarchical) clustering: First merge very similar instances Incrementally build larger clusters out of smaller clusters Algorithm: Maintain a set of clusters Initially, each instance in its own cluster Repeat: Pick the two closest clusters Merge them into a new cluster Stop when there s only one cluster left Produces not one clustering, but a family of clusterings represented by a dendrogram Slide credit: Dan Klein and Pieter Abbeel, UC Berkeley CS 188

Agglomerative Clustering How should we define closest for clusters with multiple elements? Many options Closest pair (single-link clustering) Farthest pair (complete-link clustering) Average of all pairs Ward s method (min variance, like k- means) Different choices create different clustering behaviors Slide credit: Dan Klein and Pieter Abbeel, UC Berkeley CS 188

Example: Google News Story groupings: unsupervised clustering Top-level categories: supervised classification