VECTOR SPACE CLASSIFICATION

Similar documents
Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Recap of the last lecture. CS276A Text Retrieval and Mining. Text Categorization Examples. Categorization/Classification. Text Classification

Information Retrieval

Data Preprocessing. Supervised Learning

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

CS60092: Information Retrieval

Naïve Bayes for text classification

5/21/17. Standing queries. Spam filtering Another text classification task. Categorization/Classification. Document Classification

Nearest Neighbor Classification. Machine Learning Fall 2017

Information Retrieval

Natural Language Processing

Lecture 9: Support Vector Machines

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

Introduction to Information Retrieval

Multi-Class Logistic Regression and Perceptron

Distribution-free Predictive Approaches

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

Multi-Stage Rocchio Classification for Large-scale Multilabeled

ADVANCED CLASSIFICATION TECHNIQUES

Nearest Neighbor Classification

MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11. Nearest Neighbour Classifier. Keywords: K Neighbours, Weighted, Nearest Neighbour

Application of Support Vector Machine Algorithm in Spam Filtering

CSC 411: Lecture 05: Nearest Neighbors

Notes and Announcements

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Machine Learning: k-nearest Neighbors. Lecture 08. Razvan C. Bunescu School of Electrical Engineering and Computer Science

CISC 4631 Data Mining

Statistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology

Data Mining. Lecture 03: Nearest Neighbor Learning

9 Classification: KNN and SVM

Today s topic CS347. Results list clustering example. Why cluster documents. Clustering documents. Lecture 8 May 7, 2001 Prabhakar Raghavan

Supervised Learning: K-Nearest Neighbors and Decision Trees

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Classification & Clustering. Hadaiq Rolis Sanabila

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Machine Learning: Think Big and Parallel

Machine Learning. Classification

k-nearest Neighbor (knn) Sept Youn-Hee Han

Machine Learning Classifiers and Boosting

Based on Raymond J. Mooney s slides

Pattern Recognition Chapter 3: Nearest Neighbour Algorithms

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

Lecture 3. Oct

6.034 Quiz 2, Spring 2005

k-nearest Neighbors + Model Selection

Classification and K-Nearest Neighbors

Supervised Learning Classification Algorithms Comparison

ECG782: Multidimensional Digital Signal Processing

Support Vector Machines

Lecture 8 May 7, Prabhakar Raghavan

Contents. Preface to the Second Edition

Search Engines. Information Retrieval in Practice

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

CAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

PV211: Introduction to Information Retrieval

Support Vector Machines + Classification for IR

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

Machine Learning / Jan 27, 2010

Text Categorization (I)

What to come. There will be a few more topics we will cover on supervised learning

MI example for poultry/export in Reuters. Overview. Introduction to Information Retrieval. Outline.

Birkbeck (University of London)

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015

Clustering & Classification (chapter 15)

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Intro to Artificial Intelligence

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Data Mining and Machine Learning: Techniques and Algorithms

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers

Classification. 1 o Semestre 2007/2008

Social Media Computing

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Statistical Methods for NLP

Nearest Neighbor Predictors

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Support Vector Machines 290N, 2015

Text Documents clustering using K Means Algorithm

Chapter 4: Algorithms CS 795

Text Analytics. Text Clustering. Ulf Leser

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

CLASSIFICATION JELENA JOVANOVIĆ. Web:

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Content-based image and video analysis. Machine learning

CS570: Introduction to Data Mining

Topics in Machine Learning

Going nonparametric: Nearest neighbor methods for regression and classification

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques.

Supervised Learning. CS 586 Machine Learning. Prepared by Jugal Kalita. With help from Alpaydin s Introduction to Machine Learning, Chapter 2.

Transcription:

VECTOR SPACE CLASSIFICATION Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. Chapter 14 Wei Wei wwei@idi.ntnu.no Lecture series 1

RecALL: Naïve Bayes classifiers Classify based on prior weight of class and conditional parameter for what each word says: c NB argmax log P(c j ) log P(x i c j ) c j C i positions Training is done by counting and dividing: P(c j ) N c j N Don t forget to smooth P(x k c j ) T c j x k xi V [T c j x i ] 2

Vector SPACE text classification Today: Vector space methods for Text Classification Rocchio classification K Nearest Neighbors Linear classifier and non-linear classifier Classification with more than two classes 3

Vector SPACE text classification Vector space methods for Text Classification 4

VECTOR SPACE CLASSIFICATION Vector Space Representation Each document is a vector, one component for each term (= word). Normally normalize vectors to unit length. High-dimensional vector space: Terms are axes 10,000+ 000+ dimensions, or even 100,000+ 000+ Docs are vectors in this space How can we do classification in this space? 5

VECTOR SPACE CLASSIFICATION As before, the training set is a set of documents, each labeled with its class (e.g., topic) In vector space classification, this set corresponds to a labeled set of points (or, equivalently, vectors) in the vector space Hypothesis 1: Documents in the same class form a contiguous region of space Hypothesis 2: Documents from different classes don t overlap We define surfaces to delineate classes in the space 6

Documents in a vector space Government Science Arts 7

Test document: which class? Government Science Arts 8

Test document = government Government Science Arts Our main topic today is how to find good separators 9 9

Vector SPACE text classification Rocchio text classification i 10

Rocchio text classification Rocchio Text Classification Use standard tf-idf weighted vectors to represent text documents For training documents in each category, compute a prototype vector by summing the vectors of the training documents in the category Prototype = centroid of fmembers m of class Assign test documents to the category with the closest prototype vector based on cosine similarity 11

DEFINITION OF CENTROID (c) 1 v (d) D c d Dc Where D c is the set of all documents that belong to class c and v (d) is the vector space representation of d. Note that centroid will in general not be a unit vector even when the inputs are unit vectors. 12

ROCCHIO TEXT CLASSIFICATION r r1 r2 r3 b1 b2 t=b b 13

ROCCHIO PROPERTIES Forms a simple generalization of the examples in each class (a prototype). Prototype vector does not need to be averaged or otherwise normalized for length since cosine similarity is insensitive to vector length. Classification is based on similarity to class prototypes. Does not guarantee classifications are consistent with the given training data. 14

ROCCHIO ANOMALY Prototype models have problems with polymorphic (disjunctive) categories. r r1 r2 t=r b b1 b2 r3 r4 15

Vector SPACE text classification k Nearest Neighbor Classification i 16

K NEAREST NEIGHBOR CLASSIFICATION knn = k Nearest Neighbor To classify document d into class c: Define k-neighborhood N as k nearest neighbors of d Count number of documents i in N that belong to c Estimate P(c d) as i/k Choose as class argmax c P(c d) [ = majority class] 17

K NEAREST NEIGHBOR CLASSIFICATION Unlike Rocchio, knn classification determines the decision boundary locally. For 1NN (k=1), we assign each document to the class of its closest neighbor. For knn, we assign each document to the majority class of its k closest neighbors. K here is a parameter. The rationale of knn: contiguity hypothesis. We expect a test document d to have the same label as the training documents located nearly. 18

Knn: k=1 19

Knn: k=1 1,5,10 510 20

KNN: weighted sum voting 21

K NEAREST NEIGHBOR CLASSIFICATION Test Government Science Arts 22

K NEAREST NEIGHBOR CLASSIFICATION Learning is just storing the representations of the training examples in D. Testing instance x (under 1NN): Compute similarity between x and all examples in D. Assign x the category of the most similar example in D. Does not explicitly compute a generalization or category prototypes. Also called: Case-based learning Memory-based learning Lazy learning Rationale of knn: contiguity hypothesis 23

SIMILARITY METRICS Nearest neighbor method depends on a similarity (or distance) metric. Simplest for continuous m-dimensional m instance space is Euclidean distance. Simplest for m-dimensional m binary instance space is Hamming distance (number of feature values that differ). For text, cosine similarity of tf.idf weighted vectors is typically most effective. 24

An Example: consine similarity r1 r2 r3 t=b b2 b1 25

Knn discusion Functional definition of similarity e.g. cos, Euclidean, kernel functions,... How many neighbors do we consider? Value of k determined empirically (normally 3 or 5 ) Does each neighbor get the same weight? Weighted-sum or not 26

Knn discusion (cont.) No feature selection necessary Scales well with large number of classes Dont n t need to train n classifiers s for n classes s Classes can influence each other Small changes to one class can have ripple effect Scores can be hard to convert to probabilities No training i necessary Actually: perhaps not true. (Data editing, etc.) May be more expensive at test time 27

Text Classification Linear classifier and non-linear classifier 28

Linear Classifier Many common text classifiers are linear classifiers Naïve Bayes Perceptron Rocchio Logistic regression Support vector machines (with linear kernel) Linear regression 29

Linear Classifier: 2d In two dimensions, a linear classifier is a line. These lines have functional form w 1 x 1 w2x2 b The classification rules: if w1 x1 w2x2 b, => c if w x w x b, => not-c 1 1 2 2 Here: T ( x 1, x2) : 2D vector representation of the document T ( w 1, w2 ) : the parameter vector that t defines the boundary 30

Linear Classifier We can generalize this 2D linear classifier to higher dimensions by defining a hyperplane: The classification rules is then: Why Rocchio and Naive Bayes classifiers are linear classfiers. 31

Non-linear classifier Non-linear classifier: k-nn A linear classifier e.g. Naive Bayes does badly on the task: knn will do very well (assuming enough training data) 32

Text classification Classification i with more than two classes 33

Classification with more than two classes Classification for classes that are not mutually exclusive is called any-of classification problem. Classification for classes that are mutually exclusive is called one-of classification problem. We have learned two-class linear classifiers. linear classifier that can classify d to c or not-c. How can we extend the two-class linear classifiers to J>2 classes. to classify a document d to one of or any of classes c1, c2, c3 34

Classification with more than two classes For one-of of classification tasks: 1. Build a classifier for each class, where the training set consists of the set of documents in the class and its complement. 2. Given the test document, apply each classifier separately. 3. Assign the document to the class with the maximum score the maximum confidence value or the maximum probability 35

Classification with more than two classes For any-of classification tasks: 1. Build a classifier for each class, where the training set consists of the set of documents in the class and its compement. 2. Given the test document, apply each classifier separately. The decision of one classifier has no influence on the decisions of the other classfier. 36

THE TEXT CLASSIFICATION PROBLEM An example: Document d with only a sentance: London is planning to organize the 2012 Olympics. We have six classes: <UK>, <China>, <car>, <coffee>, <elections>, <sports> Determined: <UK> and <sports> p(uk)p(d UK) > t1 p(sports)p(d sports) > t2 Naive Bayes Text Classification 37

summary Vector space methods for Text Classification Rocchio classification K Nearest Neighbors Linear classifier and non-linear classifier Classification with more than two classes 38