# VECTOR SPACE CLASSIFICATION

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 VECTOR SPACE CLASSIFICATION Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. Chapter 14 Wei Wei Lecture series 1

2 RecALL: Naïve Bayes classifiers Classify based on prior weight of class and conditional parameter for what each word says: c NB argmax log P(c j ) log P(x i c j ) c j C i positions Training is done by counting and dividing: P(c j ) N c j N Don t forget to smooth P(x k c j ) T c j x k xi V [T c j x i ] 2

3 Vector SPACE text classification Today: Vector space methods for Text Classification Rocchio classification K Nearest Neighbors Linear classifier and non-linear classifier Classification with more than two classes 3

4 Vector SPACE text classification Vector space methods for Text Classification 4

5 VECTOR SPACE CLASSIFICATION Vector Space Representation Each document is a vector, one component for each term (= word). Normally normalize vectors to unit length. High-dimensional vector space: Terms are axes 10, dimensions, or even 100, Docs are vectors in this space How can we do classification in this space? 5

6 VECTOR SPACE CLASSIFICATION As before, the training set is a set of documents, each labeled with its class (e.g., topic) In vector space classification, this set corresponds to a labeled set of points (or, equivalently, vectors) in the vector space Hypothesis 1: Documents in the same class form a contiguous region of space Hypothesis 2: Documents from different classes don t overlap We define surfaces to delineate classes in the space 6

7 Documents in a vector space Government Science Arts 7

8 Test document: which class? Government Science Arts 8

9 Test document = government Government Science Arts Our main topic today is how to find good separators 9 9

10 Vector SPACE text classification Rocchio text classification i 10

11 Rocchio text classification Rocchio Text Classification Use standard tf-idf weighted vectors to represent text documents For training documents in each category, compute a prototype vector by summing the vectors of the training documents in the category Prototype = centroid of fmembers m of class Assign test documents to the category with the closest prototype vector based on cosine similarity 11

12 DEFINITION OF CENTROID (c) 1 v (d) D c d Dc Where D c is the set of all documents that belong to class c and v (d) is the vector space representation of d. Note that centroid will in general not be a unit vector even when the inputs are unit vectors. 12

13 ROCCHIO TEXT CLASSIFICATION r r1 r2 r3 b1 b2 t=b b 13

14 ROCCHIO PROPERTIES Forms a simple generalization of the examples in each class (a prototype). Prototype vector does not need to be averaged or otherwise normalized for length since cosine similarity is insensitive to vector length. Classification is based on similarity to class prototypes. Does not guarantee classifications are consistent with the given training data. 14

15 ROCCHIO ANOMALY Prototype models have problems with polymorphic (disjunctive) categories. r r1 r2 t=r b b1 b2 r3 r4 15

16 Vector SPACE text classification k Nearest Neighbor Classification i 16

17 K NEAREST NEIGHBOR CLASSIFICATION knn = k Nearest Neighbor To classify document d into class c: Define k-neighborhood N as k nearest neighbors of d Count number of documents i in N that belong to c Estimate P(c d) as i/k Choose as class argmax c P(c d) [ = majority class] 17

18 K NEAREST NEIGHBOR CLASSIFICATION Unlike Rocchio, knn classification determines the decision boundary locally. For 1NN (k=1), we assign each document to the class of its closest neighbor. For knn, we assign each document to the majority class of its k closest neighbors. K here is a parameter. The rationale of knn: contiguity hypothesis. We expect a test document d to have the same label as the training documents located nearly. 18

19 Knn: k=1 19

20 Knn: k=1 1,5,

21 KNN: weighted sum voting 21

22 K NEAREST NEIGHBOR CLASSIFICATION Test Government Science Arts 22

23 K NEAREST NEIGHBOR CLASSIFICATION Learning is just storing the representations of the training examples in D. Testing instance x (under 1NN): Compute similarity between x and all examples in D. Assign x the category of the most similar example in D. Does not explicitly compute a generalization or category prototypes. Also called: Case-based learning Memory-based learning Lazy learning Rationale of knn: contiguity hypothesis 23

24 SIMILARITY METRICS Nearest neighbor method depends on a similarity (or distance) metric. Simplest for continuous m-dimensional m instance space is Euclidean distance. Simplest for m-dimensional m binary instance space is Hamming distance (number of feature values that differ). For text, cosine similarity of tf.idf weighted vectors is typically most effective. 24

25 An Example: consine similarity r1 r2 r3 t=b b2 b1 25

26 Knn discusion Functional definition of similarity e.g. cos, Euclidean, kernel functions,... How many neighbors do we consider? Value of k determined empirically (normally 3 or 5 ) Does each neighbor get the same weight? Weighted-sum or not 26

27 Knn discusion (cont.) No feature selection necessary Scales well with large number of classes Dont n t need to train n classifiers s for n classes s Classes can influence each other Small changes to one class can have ripple effect Scores can be hard to convert to probabilities No training i necessary Actually: perhaps not true. (Data editing, etc.) May be more expensive at test time 27

28 Text Classification Linear classifier and non-linear classifier 28

29 Linear Classifier Many common text classifiers are linear classifiers Naïve Bayes Perceptron Rocchio Logistic regression Support vector machines (with linear kernel) Linear regression 29

30 Linear Classifier: 2d In two dimensions, a linear classifier is a line. These lines have functional form w 1 x 1 w2x2 b The classification rules: if w1 x1 w2x2 b, => c if w x w x b, => not-c Here: T ( x 1, x2) : 2D vector representation of the document T ( w 1, w2 ) : the parameter vector that t defines the boundary 30

31 Linear Classifier We can generalize this 2D linear classifier to higher dimensions by defining a hyperplane: The classification rules is then: Why Rocchio and Naive Bayes classifiers are linear classfiers. 31

32 Non-linear classifier Non-linear classifier: k-nn A linear classifier e.g. Naive Bayes does badly on the task: knn will do very well (assuming enough training data) 32

33 Text classification Classification i with more than two classes 33

34 Classification with more than two classes Classification for classes that are not mutually exclusive is called any-of classification problem. Classification for classes that are mutually exclusive is called one-of classification problem. We have learned two-class linear classifiers. linear classifier that can classify d to c or not-c. How can we extend the two-class linear classifiers to J>2 classes. to classify a document d to one of or any of classes c1, c2, c3 34

35 Classification with more than two classes For one-of of classification tasks: 1. Build a classifier for each class, where the training set consists of the set of documents in the class and its complement. 2. Given the test document, apply each classifier separately. 3. Assign the document to the class with the maximum score the maximum confidence value or the maximum probability 35

36 Classification with more than two classes For any-of classification tasks: 1. Build a classifier for each class, where the training set consists of the set of documents in the class and its compement. 2. Given the test document, apply each classifier separately. The decision of one classifier has no influence on the decisions of the other classfier. 36

37 THE TEXT CLASSIFICATION PROBLEM An example: Document d with only a sentance: London is planning to organize the 2012 Olympics. We have six classes: <UK>, <China>, <car>, <coffee>, <elections>, <sports> Determined: <UK> and <sports> p(uk)p(d UK) > t1 p(sports)p(d sports) > t2 Naive Bayes Text Classification 37

38 summary Vector space methods for Text Classification Rocchio classification K Nearest Neighbors Linear classifier and non-linear classifier Classification with more than two classes 38

### Information Retrieval

Introduction to Information Retrieval CS276: Information Retrieval and Web Search Text Classification 1 Chris Manning and Pandu Nayak Ch. 13 Standing queries The path from IR to text classification: You

### Distribution-free Predictive Approaches

Distribution-free Predictive Approaches The methods discussed in the previous sections are essentially model-based. Model-free approaches such as tree-based classification also exist and are popular for

### Lecture 9: Support Vector Machines

Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and

### Multi-Stage Rocchio Classification for Large-scale Multilabeled

Multi-Stage Rocchio Classification for Large-scale Multilabeled Text data Dong-Hyun Lee Nangman Computing, 117D Garden five Tools, Munjeong-dong Songpa-gu, Seoul, Korea dhlee347@gmail.com Abstract. Large-scale

### Multi-Class Logistic Regression and Perceptron

Multi-Class Logistic Regression and Perceptron Instructor: Wei Xu Some slides adapted from Dan Jurfasky, Brendan O Connor and Marine Carpuat MultiClass Classification Q: what if we have more than 2 categories?

### Classification & Clustering. Hadaiq Rolis Sanabila

Classification & Clustering Hadaiq Rolis Sanabila hadaiq@cs.ui.ac.id Natural Language Processing and Text Mining Pusilkom UI 22 26 Maret 2016 CLASSIFICATION 2 Categorization/Classification Given: A description

### Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

### MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Supervised Learning: Nonparametric

### Machine Learning: Think Big and Parallel

Day 1 Inderjit S. Dhillon Dept of Computer Science UT Austin CS395T: Topics in Multicore Programming Oct 1, 2013 Outline Scikit-learn: Machine Learning in Python Supervised Learning day1 Regression: Least

### Support Vector Machines

Support Vector Machines About the Name... A Support Vector A training sample used to define classification boundaries in SVMs located near class boundaries Support Vector Machines Binary classifiers whose

### Support Vector Machines + Classification for IR

Support Vector Machines + Classification for IR Pierre Lison University of Oslo, Dep. of Informatics INF3800: Søketeknologi April 30, 2014 Outline of the lecture Recap of last week Support Vector Machines

### PV211: Introduction to Information Retrieval

PV211: Introduction to Information Retrieval http://www.fi.muni.cz/~sojka/pv211 IIR 15-1: Support Vector Machines Handout version Petr Sojka, Hinrich Schütze et al. Faculty of Informatics, Masaryk University,

### Clustering & Classification (chapter 15)

Clustering & Classification (chapter 5) Kai Goebel Bill Cheetham RPI/GE Global Research goebel@cs.rpi.edu cheetham@cs.rpi.edu Outline k-means Fuzzy c-means Mountain Clustering knn Fuzzy knn Hierarchical

### Text Categorization (I)

CS473 CS-473 Text Categorization (I) Luo Si Department of Computer Science Purdue University Text Categorization (I) Outline Introduction to the task of text categorization Manual v.s. automatic text categorization

### These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Machine Learning Algorithms (IFT6266 A7) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

### INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering Erik Velldal University of Oslo Sept. 18, 2012 Topics for today 2 Classification Recap Evaluating classifiers Accuracy, precision,

### Classification. 1 o Semestre 2007/2008

Classification Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 Single-Class

### Text Analytics. Text Clustering. Ulf Leser

Text Analytics Text Clustering Ulf Leser Text Classification Given a set D of docs and a set of classes C. A classifier is a function f: D C Problem: Finding a good classifier A good classifier assigns

### Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers

Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers James P. Biagioni Piotr M. Szczurek Peter C. Nelson, Ph.D. Abolfazl Mohammadian, Ph.D. Agenda Background

### Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Model Selection Matt Gormley Lecture 4 January 29, 2018 1 Q&A Q: How do we deal

### Supervised vs unsupervised clustering

Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful

### COMPARATIVE ANALYSIS OF PARTICLE SWARM OPTIMIZATION ALGORITHMS FOR TEXT FEATURE SELECTION

San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 5-13-2015 COMPARATIVE ANALYSIS OF PARTICLE SWARM OPTIMIZATION ALGORITHMS FOR TEXT FEATURE SELECTION

### CS570: Introduction to Data Mining

CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8 & 9 Han, Chapters 4 & 5 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data Mining.

### Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008 Lecture 25: Kernels and Clustering 12/2/2008 Dan Klein UC Berkeley Case-Based Reasoning Similarity for classification Case-based reasoning Predict an instance

### Statistical Methods for NLP

Statistical Methods for NLP Text Similarity, Text Categorization, Linear Methods of Classification Sameer Maskey Announcement Reading Assignments Bishop Book 6, 62, 7 (upto 7 only) J&M Book 3, 45 Journal

### Classification: Linear Discriminant Functions

Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions

### Jarek Szlichta

Jarek Szlichta http://data.science.uoit.ca/ Approximate terminology, though there is some overlap: Data(base) operations Executing specific operations or queries over data Data mining Looking for patterns

### Introduction to Clustering

Introduction to Clustering Ref: Chengkai Li, Department of Computer Science and Engineering, University of Texas at Arlington (Slides courtesy of Vipin Kumar) What is Cluster Analysis? Finding groups of

### CSE 494/598 Lecture-11: Clustering & Classification

CSE 494/598 Lecture-11: Clustering & Classification LYDIA MANIKONDA HT TP://WWW.PUBLIC.ASU.EDU/~LMANIKON / **With permission, content adapted from last year s slides and from Intro to DM dmbook@cs.umn.edu

### Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?

### K-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing

K-Nearest Neighbour Classifier Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Reminder Supervised data mining Classification Decision Trees Izabela

### Machine Learning: Algorithms and Applications Mockup Examination

Machine Learning: Algorithms and Applications Mockup Examination 14 May 2012 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students Write First Name, Last Name, Student Number and Signature

### Content-based Recommender Systems

Recuperação de Informação Doutoramento em Engenharia Informática e Computadores Instituto Superior Técnico Universidade Técnica de Lisboa Bibliography Pasquale Lops, Marco de Gemmis, Giovanni Semeraro:

### Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci

Part 12: Advanced Topics in Collaborative Filtering Francesco Ricci Content Generating recommendations in CF using frequency of ratings Role of neighborhood size Comparison of CF with association rules

### DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes and a class attribute

### Linear Regression: One-Dimensional Case

Linear Regression: One-Dimensional Case Given: a set of N input-response pairs The inputs (x) and the responses (y) are one dimensional scalars Goal: Model the relationship between x and y (CS5350/6350)

### 6.867 Machine Learning

6.867 Machine Learning Problem set - solutions Thursday, October What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove. Do not

### IBL and clustering. Relationship of IBL with CBR

IBL and clustering Distance based methods IBL and knn Clustering Distance based and hierarchical Probability-based Expectation Maximization (EM) Relationship of IBL with CBR + uses previously processed

### A Dendrogram. Bioinformatics (Lec 17)

A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and

### Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Output: Knowledge representation Tables Linear models Trees Rules

### CLUSTERING. JELENA JOVANOVIĆ Web:

CLUSTERING JELENA JOVANOVIĆ Email: jeljov@gmail.com Web: http://jelenajovanovic.net OUTLINE What is clustering? Application domains K-Means clustering Understanding it through an example The K-Means algorithm

### A Feature Selection Method to Handle Imbalanced Data in Text Classification

A Feature Selection Method to Handle Imbalanced Data in Text Classification Fengxiang Chang 1*, Jun Guo 1, Weiran Xu 1, Kejun Yao 2 1 School of Information and Communication Engineering Beijing University

### MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

### CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Fall, 2015!1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function

### An Improvement of Centroid-Based Classification Algorithm for Text Classification

An Improvement of Centroid-Based Classification Algorithm for Text Classification Zehra Cataltepe, Eser Aygun Istanbul Technical Un. Computer Engineering Dept. Ayazaga, Sariyer, Istanbul, Turkey cataltepe@itu.edu.tr,

### CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data

### Instance-Based Learning.

Instance-Based Learning www.biostat.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts k-nn classification k-nn regression edited nearest neighbor k-d trees for nearest

### Nearest Neighbor Classification

Nearest Neighbor Classification Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 11, 2017 1 / 48 Outline 1 Administration 2 First learning algorithm: Nearest

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Computer Science 591Y Department of Computer Science University of Massachusetts Amherst February 3, 2005 Topics Tasks (Definition, example, and notes) Classification

### Information Retrieval. Lecture 5 - The vector space model. Introduction. Overview. Term weighting. Wintersemester 2007

Information Retrieval Lecture 5 - The vector space model Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 28 Introduction Boolean model: all documents

### Introduction to Information Retrieval

Introduction to Information Retrieval http://informationretrieval.org IIR 6: Flat Clustering Hinrich Schütze Center for Information and Language Processing, University of Munich 04-06- /86 Overview Recap

### Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

### Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

### An Empirical Performance Comparison of Machine Learning Methods for Spam Categorization

An Empirical Performance Comparison of Machine Learning Methods for Spam E-mail Categorization Chih-Chin Lai a Ming-Chi Tsai b a Dept. of Computer Science and Information Engineering National University

### Information Management course

Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

### CS 179 Lecture 16. Logistic Regression & Parallel SGD

CS 179 Lecture 16 Logistic Regression & Parallel SGD 1 Outline logistic regression (stochastic) gradient descent parallelizing SGD for neural nets (with emphasis on Google s distributed neural net implementation)

### Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Density estimation In density estimation problems, we are given a random sample from an unknown density Our objective is to estimate? Applications Classification If we estimate the density for each class,

### Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your

### Object and Action Detection from a Single Example

Object and Action Detection from a Single Example Peyman Milanfar* EE Department University of California, Santa Cruz *Joint work with Hae Jong Seo AFOSR Program Review, June 4-5, 29 Take a look at this:

### Region-based Segmentation

Region-based Segmentation Image Segmentation Group similar components (such as, pixels in an image, image frames in a video) to obtain a compact representation. Applications: Finding tumors, veins, etc.

### CHAPTER 6 EXPERIMENTS

CHAPTER 6 EXPERIMENTS 6.1 HYPOTHESIS On the basis of the trend as depicted by the data Mining Technique, it is possible to draw conclusions about the Business organization and commercial Software industry.

### Information Retrieval. Techniques for Relevance Feedback

Information Retrieval Techniques for Relevance Feedback Introduction An information need may be epressed using different keywords (synonymy) impact on recall eamples: ship vs boat, aircraft vs airplane

### A Survey on Postive and Unlabelled Learning

A Survey on Postive and Unlabelled Learning Gang Li Computer & Information Sciences University of Delaware ligang@udel.edu Abstract In this paper we survey the main algorithms used in positive and unlabeled

### Summarizing Public Opinion on a Topic

Summarizing Public Opinion on a Topic 1 Abstract We present SPOT (Summarizing Public Opinion on a Topic), a new blog browsing web application that combines clustering with summarization to present an organized,

### Accelerated Machine Learning Algorithms in Python

Accelerated Machine Learning Algorithms in Python Patrick Reilly, Leiming Yu, David Kaeli reilly.pa@husky.neu.edu Northeastern University Computer Architecture Research Lab Outline Motivation and Goals

### CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 9, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 9, 2014 1 / 47

### KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn

KTH ROYAL INSTITUTE OF TECHNOLOGY Lecture 14 Machine Learning. K-means, knn Contents K-means clustering K-Nearest Neighbour Power Systems Analysis An automated learning approach Understanding states in

### Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Wednesday June 4, 2008 Time:

English Student no:... Page 1 of 14 Contact during the exam: Geir Solskinnsbakk Phone: 735 94218/ 93607988 Exam in course TDT4215 Web Intelligence - Solutions and guidelines - Wednesday June 4, 2008 Time:

### Introduction to Artificial Intelligence

Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)

### QED Q: Why is it called the triangle inequality? A: Analogue with euclidean distance in the plane: picture Defn: Minimum Distance of a code C:

Lecture 3: Lecture notes posted online each week. Recall Defn Hamming distance: for words x = x 1... x n, y = y 1... y n of the same length over the same alphabet, d(x, y) = {1 i n : x i y i } i.e., d(x,

### modern database systems lecture 4 : information retrieval

modern database systems lecture 4 : information retrieval Aristides Gionis Michael Mathioudakis spring 2016 in perspective structured data relational data RDBMS MySQL semi-structured data data-graph representation

### Classification Algorithms in Data Mining

August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms

### Weight adjustment schemes for a centroid based classifier

Weight adjustment schemes for a centroid based classifier Shrikanth Shankar and George Karypis University of Minnesota, Department of Computer Science Minneapolis, MN 55455 Technical Report: TR 00-035

### Semi-supervised Learning

Semi-supervised Learning Piyush Rai CS5350/6350: Machine Learning November 8, 2011 Semi-supervised Learning Supervised Learning models require labeled data Learning a reliable model usually requires plenty

### Semi-supervised learning and active learning

Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners

### Document Clustering: Comparison of Similarity Measures

Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation

### Introduction to Machine Learning. Xiaojin Zhu

Introduction to Machine Learning Xiaojin Zhu jerryzhu@cs.wisc.edu Read Chapter 1 of this book: Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi- Supervised Learning. http://www.morganclaypool.com/doi/abs/10.2200/s00196ed1v01y200906aim006

### Lecture 7: Relevance Feedback and Query Expansion

Lecture 7: Relevance Feedback and Query Expansion Information Retrieval Computer Science Tripos Part II Ronan Cummins Natural Language and Information Processing (NLIP) Group ronan.cummins@cl.cam.ac.uk

### Cluster Analysis. Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX April 2008 April 2010

Cluster Analysis Prof. Thomas B. Fomby Department of Economics Southern Methodist University Dallas, TX 7575 April 008 April 010 Cluster Analysis, sometimes called data segmentation or customer segmentation,

### Artificial Neural Networks (Feedforward Nets)

Artificial Neural Networks (Feedforward Nets) y w 03-1 w 13 y 1 w 23 y 2 w 01 w 21 w 22 w 02-1 w 11 w 12-1 x 1 x 2 6.034 - Spring 1 Single Perceptron Unit y w 0 w 1 w n w 2 w 3 x 0 =1 x 1 x 2 x 3... x

### This lecture: IIR Sections Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes Vector space scoring

This lecture: IIR Sections 6.2 6.4.3 Ranked retrieval Scoring documents Term frequency Collection statistics Weighting schemes Vector space scoring 1 Ch. 6 Ranked retrieval Thus far, our queries have all

### Data Science Tutorial

Eliezer Kanal Technical Manager, CERT Daniel DeCapria Data Scientist, ETC Software Engineering Institute Carnegie Mellon University Pittsburgh, PA 15213 2017 SEI SEI Data Science in in Cybersecurity Symposium

### SVM cont d. Applications face detection [IEEE INTELLIGENT SYSTEMS]

SVM cont d A method of choice when examples are represented by vectors or matrices Input space cannot be readily used as attribute-vector (e.g. too many attrs) Kernel methods: map data from input space

### Mining Web Data. Lijun Zhang

Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

### Clustering. Bruno Martins. 1 st Semester 2012/2013

Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 Motivation Basic Concepts

### Machine Learning for NLP

Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs

### Instance-Based Learning. Goals for the lecture

Instance-Based Learning Mar Craven and David Page Computer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed

### Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Unsupervised Learning Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi Content Motivation Introduction Applications Types of clustering Clustering criterion functions Distance functions Normalization Which

### Automatic Summarization

Automatic Summarization CS 769 Guest Lecture Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of Wisconsin, Madison February 22, 2008 Andrew B. Goldberg (CS Dept) Summarization

### Chapter 4: Non-Parametric Techniques

Chapter 4: Non-Parametric Techniques Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Supervised Learning How to fit a density

### Function Algorithms: Linear Regression, Logistic Regression

CS 4510/9010: Applied Machine Learning 1 Function Algorithms: Linear Regression, Logistic Regression Paula Matuszek Fall, 2016 Some of these slides originated from Andrew Moore Tutorials, at http://www.cs.cmu.edu/~awm/tutorials.html

### Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University Kinds of Clustering Sequential Fast Cost Optimization Fixed number of clusters Hierarchical

### Using a genetic algorithm for editing k-nearest neighbor classifiers

Using a genetic algorithm for editing k-nearest neighbor classifiers R. Gil-Pita 1 and X. Yao 23 1 Teoría de la Señal y Comunicaciones, Universidad de Alcalá, Madrid (SPAIN) 2 Computer Sciences Department,

### Louis Fourrier Fabien Gaie Thomas Rolf

CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted

### Spam Classification Documentation

Spam Classification Documentation What is SPAM? Unsolicited, unwanted email that was sent indiscriminately, directly or indirectly, by a sender having no current relationship with the recipient. Objective:

### Clustering. Unsupervised Learning

Clustering. Unsupervised Learning Maria-Florina Balcan 04/06/2015 Reading: Chapter 14.3: Hastie, Tibshirani, Friedman. Additional resources: Center Based Clustering: A Foundational Perspective. Awasthi,

### Information Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.

Introduction to Information Retrieval Ethan Phelps-Goodman Some slides taken from http://www.cs.utexas.edu/users/mooney/ir-course/ Information Retrieval (IR) The indexing and retrieval of textual documents.