ADVANCED CLASSIFICATION TECHNIQUES

Similar documents
LARGE MARGIN CLASSIFIERS

Lecture 9: Support Vector Machines

SUPPORT VECTOR MACHINES

Support Vector Machines + Classification for IR

6.034 Quiz 2, Spring 2005

Supervised vs unsupervised clustering

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Naïve Bayes for text classification

All lecture slides will be available at CSC2515_Winter15.html

CSE 573: Artificial Intelligence Autumn 2010

Performance Evaluation of Various Classification Algorithms

CS5670: Computer Vision

Instance-based Learning

9 Classification: KNN and SVM

Machine Learning. Chao Lan

Classifiers and Detection. D.A. Forsyth

Linear methods for supervised learning

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Generative and discriminative classification techniques

PV211: Introduction to Information Retrieval

Machine Learning in Biology

Lecture 3. Oct

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

Machine Learning for NLP

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Classification: Feature Vectors

CS 229 Midterm Review

Module 4. Non-linear machine learning econometrics: Support Vector Machine

Machine Learning: Think Big and Parallel

Support Vector Machines

Support Vector Machines

Support vector machines

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

CSE 158. Web Mining and Recommender Systems. Midterm recap

CSE 417T: Introduction to Machine Learning. Lecture 22: The Kernel Trick. Henry Chai 11/15/18

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

Applied Statistics for Neuroscientists Part IIa: Machine Learning

Administrative. Machine learning code. Supervised learning (e.g. classification) Machine learning: Unsupervised learning" BANANAS APPLES

VECTOR SPACE CLASSIFICATION

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

COMS 4771 Support Vector Machines. Nakul Verma

Application of Support Vector Machine Algorithm in Spam Filtering

Support Vector Machines

Kernel Methods & Support Vector Machines

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Support Vector Machines. James McInerney Adapted from slides by Nakul Verma

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Support Vector Machines.

Instance-based Learning

LOGISTIC REGRESSION FOR MULTIPLE CLASSES

SUPPORT VECTOR MACHINES

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Support Vector Machines

Chakra Chennubhotla and David Koes

Applying Supervised Learning

Classification and K-Nearest Neighbors

More on Classification: Support Vector Machine

Optimal Separating Hyperplane and the Support Vector Machine. Volker Tresp Summer 2018

Semi-supervised Learning

Support Vector Machines

CS229 Final Project: Predicting Expected Response Times

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Chapter 4: Algorithms CS 795

Supervised Learning Classification Algorithms Comparison

Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers

Python With Data Science

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Chapter 9. Classification and Clustering

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Topic 4: Support Vector Machines

Topics in Machine Learning

Introduction to Support Vector Machines

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Multi-Class Logistic Regression and Perceptron

Network Traffic Measurements and Analysis

Introduction to Machine Learning

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul

k-nearest Neighbors + Model Selection

Lecture 9. Support Vector Machines

5 Machine Learning Abstractions and Numerical Optimization

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..

Recognition Tools: Support Vector Machines

Chapter 4: Algorithms CS 795

Comparative analysis of classifier algorithm in data mining Aikjot Kaur Narula#, Dr.Raman Maini*

Support Vector Machines

MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11. Nearest Neighbour Classifier. Keywords: K Neighbours, Weighted, Nearest Neighbour

Spam Classification Documentation

CPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016

Kernels and Clustering

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

PROBLEM 4

CS570: Introduction to Data Mining

Instance and case-based reasoning

Statistics 202: Statistical Aspects of Data Mining

Transcription:

Admin ML lab next Monday Project proposals: Sunday at 11:59pm ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 Fall 2014 Project proposal presentations Machine Learning: A Geometric View 1

Apples vs. Bananas Apples vs. Bananas Turn features into numerical values Weight Color Label 4 Red Apple Weight Color Label 4 0 Apple 1 A A B B B 5 Yellow Apple 6 Yellow Banana Can we visualize this data? 5 1 Apple 6 1 Banana Color 3 Red Apple 7 Yellow Banana 3 0 Apple 7 1 Banana 0 A A 8 Yellow Banana 6 Yellow Apple 8 1 Banana 6 1 Apple 0 Weight 10 We can view examples as points in an n-dimensional space where n is the number of features called the feature space Examples in a feature space Test example: what class? feature 2 feature 2 label 1 label 2 label 3 label 1 label 2 label 3 feature 1 feature 1 2

Test example: what class? Another classification algorithm? To classify an example d: Label d with the label of the closest example to d in the training set feature 2 closest to red label 1 label 2 label 3 feature 1 What about this example? What about this example? feature 2 feature 2 label 1 label 2 label 3 closest to red, but label 1 label 2 label 3 feature 1 feature 1 3

What about this example? k-nearest Neighbor (k-nn) feature 2 To classify an example d: Find k nearest neighbors of d Choose as the label the majority label within the k nearest neighbors label 1 Most of the next closest are blue label 2 label 3 feature 1 k-nearest Neighbor (k-nn) Euclidean distance To classify an example d: Find k nearest neighbors of d Choose as the label the majority label within the k nearest neighbors Euclidean distance! (or L1 or ) (b 1, b 2,,b n) (a 1, a 2,, a n) How do we measure nearest? D(a, b) = (a 1 b 1 ) 2 + (a 2 b 2 ) 2 +... + (a n b n ) 2 4

Decision boundaries k-nn decision boundaries The decision boundaries are places in the features space where the classification of a point/example changes label 1 label 2 label 3 label 1 label 2 label 3 Where are the decision boundaries for k-nn? k-nn gives locally defined decision boundaries between classes K Nearest Neighbour (knn) Classifier K Nearest Neighbour (knn) Classifier K = 1 K = 1 What is the decision boundary for k-nn for this one? 5

Machine learning models Some machine learning approaches make strong assumptions about the data If the assumptions are true this can often lead to better performance If the assumptions aren t true, they can fail miserably Other approaches don t make many assumptions about the data This can allow us to learn from more varied data But, they are more prone to overfitting and generally require more training data 6

Actual model 7

Model assumptions If you don t have strong assumptions about the model, it can take you a longer to learn Assume now that our model of the blue class is two circles 8

Actual model Knowing the model beforehand can drastically improve the learning and the number of examples required 9

Make sure your assumption is correct, though! Machine learning models k-nn model What were the model assumptions (if any) that k-nn and NB made about the data? Are there training data sets that could never be learned correctly by these algorithms? K = 1 10

Linear models A strong assumption is linear separability: in 2 dimensions, you can separate labels/classes by a line in higher dimensions, need hyperplanes Hyperplanes A hyperplane is line/plane in a high dimensional space A linear model is a model that assumes the data is linearly separable What defines a line? What defines a hyperplane? Defining a line Defining a line Any pair of values (w 1,w 2) defines a line through the origin: Any pair of values (w 1,w 2) defines a line through the origin: 0 = w 1 0 = w 1 0 =1 + 2 What does this line look like? 11

Defining a line Defining a line Any pair of values (w 1,w 2) defines a line through the origin: Any pair of values (w 1,w 2) defines a line through the origin: 0 = w 1 0 = w 1 0 =1 + 2 0 =1 + 2-2 -1 0 1 2 1 0.5 0-0.5-1 -2-1 0 1 2 1 0.5 0-0.5-1 Defining a line Classifying with a line Any pair of values (w 1,w 2) defines a line through the origin: Mathematically, how can we classify points based on a line? 0 = w 1 0 =1 + 2 (1,2) BLUE 0 =1 + 2 (1,1) w=(1,2) We can also view it as the line perpendicular to the weight vector RED (1,-1) w=(1,2) 12

Classifying with a line Defining a line Mathematically, how can we classify points based on a line? Any pair of values (w 1,w 2) defines a line through the origin: 0 =1 + 2 0 = w 1 (1,1): 1*1+ 2 *1= 3 BLUE (1,1) 0 =1 + 2 (1,-1): 1*1+ 2 * 1= 1 RED (1,-1) The sign indicates which side of the line w=(1,2) How do we move the line off of the origin? Defining a line Defining a line Any pair of values (w 1,w 2) defines a line through the origin: Any pair of values (w 1,w 2) defines a line through the origin: a = w 1 a = w 1 Now intersects at -1 1=1 + 2 1=1 + 2-2 -1 0 1 2-2 -1 0 1 2 0.5 0-0.5-1 -1.5 13

Linear models A linear model in n-dimensional space (i.e. n features) is define by n+1 weights: In two dimensions, a line: 0 = w 1 + b (where b = -a) In three dimensions, a plane: 0 = w 1 + w 3 f 3 + b In n-dimensions, a hyperplane 0 = b + w i f i n i=1 Classifying with a linear model We can classify with a linear model by checking the sign: b + w j f j > 0 Positive example,,, f m classifier m b + w j f j < 0 Negative example m j=1 j=1 Learning a linear model Which hyperplane would you choose? Geometrically, we know what a linear model represents Given a linear model (i.e. a set of weights and b) we can classify examples Training Data learn How do we learn a linear model? (data with labels) 14

Large margin classifiers Large margin classifiers margin margin Choose the line where the distance to the nearest point(s) is as large as possible margin margin The margin of a classifier is the distance to the closest points of either class Large margin classifiers attempt to maximize this Large margin classifier setup Select the hyperplane with the largest margin where the points are classified correctly! Measuring the margin How do we calculate the margin? Setup as a constrained optimization problem: max w,b margin(w, b) subject to: y i (w x i + b) > 0 i what does this say? y i : label for example i, either 1 (positive) or -1 (negative) xi: our feature vector for example i 15

Support vectors For any separating hyperplane, there exist some set of closest points These are called the support vectors Measuring the margin The margin is the distance to the support vectors, i.e. the closest points, on either side of the hyperplane Support vector machine problem Posted as a quadratic optimization problem Maximize/minimize a quadratic function Subject to a set of linear constraints Many, many variants of solving this problem One of the most successful classification approaches Support vector machines One of the most successful (if not the most successful) classification approach: decision tree Support vector machine k nearest neighbor Naïve Bayes 16

Trends over time Soft Margin Classification max w,b margin(w, b) subject to: y i (w x i + b) > 0 i What about this problem? Soft Margin Classification max w,b margin(w, b) subject to: y i (w x i + b) > 0 i Slack variables max w,b margin(w, b) subject to: y i (w x i + b) > 0 i We d like to learn something like this, but our constraints won t allow it L max $% margin(w,b) + C ' ) * subject to: + * $, - * + % ) i * ) i 0 slack variables (one for each example) What effect does this have? 17

Slack variables Slack variables slack penalties max $% margin(w,b) + C ' ( ) subject to: * ) $ +, ) + % ( i ) ( i 0 margin max $% margin(w,b) + C ' ( ) subject to: * ) $ +, ) + % ( i ) ( i 0 trade-off between margin maximization and penalization penalized by how far from correct allowed to make a mistake Soft margin SVM Other successful classifiers in NLP max $% margin(w,b) + C ' ( ) subject to: * ) $ +, ) + % ( i ) ( i 0 Still a quadratic optimization problem! Perceptron algorithm Linear classifier Trains online Fast and easy to implement Often used for tuning parameters (not necessarily for classifying) Logistic regression classifier (aka Maximum entropy classifier) Probabilistic classifier Doesn t have the NB constraints Performs very well More computationally intensive to train than NB 18

Resources SVM SVM light: http://svmlight.joachims.org/ Others, but this one is awesome! Quiz 3 Mean 23: 80% Median: 23.5 (81%) Maximum Entropy classifier http://nlp.stanford.edu/software/classifier.shtml General ML frameworks: Python: scikit-learn, MLpy Java: Weka (http://www.cs.waikato.ac.nz/ml/weka/) Many others 19