CAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification

Similar documents
On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions

Applying Supervised Learning

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Machine Learning. Chao Lan

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

More on Classification: Support Vector Machine

Machine Learning in Biology

CSE 158. Web Mining and Recommender Systems. Midterm recap

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Random Forest A. Fornaser

Supervised vs unsupervised clustering

Multi-label classification using rule-based classifier systems

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Non-Parametric Modeling

CSCI567 Machine Learning (Fall 2014)

CS 229 Midterm Review

Machine Learning. Supervised Learning. Manfred Huber

VECTOR SPACE CLASSIFICATION

MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11. Nearest Neighbour Classifier. Keywords: K Neighbours, Weighted, Nearest Neighbour

Deep Learning for Computer Vision

Classification: Linear Discriminant Functions

Data Preprocessing. Supervised Learning

Support Vector Machines + Classification for IR

Performance Evaluation of Various Classification Algorithms

Facial Expression Classification with Random Filters Feature Extraction

PARALLEL CLASSIFICATION ALGORITHMS

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Classifiers and Detection. D.A. Forsyth

6.034 Quiz 2, Spring 2005

MTTTS17 Dimensionality Reduction and Visualization. Spring 2018 Jaakko Peltonen. Lecture 11: Neighbor Embedding Methods continued

Machine Learning: Think Big and Parallel

Intro to Artificial Intelligence

Naïve Bayes for text classification

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8

The exam is closed book, closed notes except your one-page cheat sheet.

ECG782: Multidimensional Digital Signal Processing

Interpretable Machine Learning with Applications to Banking

Network Traffic Measurements and Analysis

Enhancing Forestry Object Detection using Multiple Features

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

CSE 190, Spring 2015: Midterm

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Performance Analysis of Data Mining Classification Techniques

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Louis Fourrier Fabien Gaie Thomas Rolf

ADVANCED CLASSIFICATION TECHNIQUES

RESAMPLING METHODS. Chapter 05

Statistics 202: Statistical Aspects of Data Mining

What to come. There will be a few more topics we will cover on supervised learning

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CSE4334/5334 DATA MINING

R (2) Data analysis case study using R for readily available data set using any one machine learning algorithm.

Bioinformatics - Lecture 07

BIOINF 585: Machine Learning for Systems Biology & Clinical Informatics

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Content-based image and video analysis. Machine learning

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2018

Event: PASS SQL Saturday - DC 2018 Presenter: Jon Tupitza, CTO Architect

Nearest Neighbor Classification. Machine Learning Fall 2017

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Detecting malware even when it is encrypted

Overview of machine learning

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

Generative and discriminative classification techniques

Machine Learning : Clustering, Self-Organizing Maps

INF 4300 Classification III Anne Solberg The agenda today:

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

Distribution-free Predictive Approaches

Topics in Machine Learning

All lecture slides will be available at CSC2515_Winter15.html

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

Contents. Preface to the Second Edition

Lecture 25: Review I

CSE 6242 A / CS 4803 DVA. Feb 12, Dimension Reduction. Guest Lecturer: Jaegul Choo

Hand Written Digit Recognition Using Tensorflow and Python

Optimizing feature representation for speaker diarization using PCA and LDA

Nearest neighbor classification DSE 220

Discriminate Analysis

MULTICORE LEARNING ALGORITHM

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

Stat 602X Exam 2 Spring 2011

SVM-KNN : Discriminative Nearest Neighbor Classification for Visual Category Recognition

ISyE 6416 Basic Statistical Methods Spring 2016 Bonus Project: Big Data Report

A Taxonomy of Semi-Supervised Learning Algorithms

Python With Data Science

Based on Raymond J. Mooney s slides

Applied Statistics for Neuroscientists Part IIa: Machine Learning

EPL451: Data Mining on the Web Lab 5

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Transcription:

CAMCOS Report Day December 9 th, 2015 San Jose State University Project Theme: Classification

On Classification: An Empirical Study of Existing Algorithms based on two Kaggle Competitions Team 1 Team 2 Wilson A. Florero-Salinas Carson Sprook Dan Li Abhirupa Sen Data Set Handwritten Digits Xiaoyan Chong Minglu Ma Yue Wang Sha Li Data Set Spring Leaf Team Advisor: Dr. Guanliang Chen

Outline 1. What is Classification? 2. The two Kaggle Competitions 3. Data Preprocessing 4. Classification Algorithms 5. Summary 6. Conclusions 7. Future Work

What is Classification? start with a data set whose categories are already known this data is called the training set new data becomes available whose categories are unknown this data set is called the test set Goal: use the training set to predict the label of a new data point. INSERT PICTURE HERE

The Kaggle Competition Kaggle is an international platform that hosts data prediction competitions Students and experts in data science compete Our CAMCOS team entered two competitions Team 1: Digit Recognizer (Ends December 31st) Team 2: Springleaf Marketing Response (Ended October 19th)

Team 1: The MNIST 1 data set 1 subset of data collected by NIST, the US's National Institute of Standards and Technology

Potential Applications Banking: Check deposits Surveillance: license plates Shipping: Envelopes/Packages

Initial Challenges and workarounds High dimensional data set Each image is stored in a vector Computationally expensive digits are written differently by different people left-handed vs right-handed Preprocess the data set Reduce dimension increase computation speed apply some transformation to the images enhance features important for classification

Data Preprocessing Methods In our experiments we have used the following methods Principal Component Analysis (PCA) Linear Discriminant Analysis (LDA) 2D LDA Nonparametric Discriminant Analysis (NDA) Kernel PCA t-distributed Stochastic Neighbor Embedding (t-sne) parametric t-sne kernel t-sne

Deskewing Before: After:

Principal Component Analysis (PCA) Using too many dimensions(784) can be computationally expensive. Uses variance as dimensionality reduction criterion Throw away variables with least amount of variance

Linear Discriminant Analysis

LDA and PCA on pairwise data PCA: LDA:

Classification Methods In our experiments we have used the following methods Nearest Neighbors Methods (Instance based) Naive Bayes (NB) Maximum a Posterior (MAP) Logistic Regression Support Vector Machines (Linear Classifier) Neural Networks Random Forests Xgboost

K nearest neighbors k neighbors decide the group to which a test data belongs The best result for our data set with k = 8 is 2.76% misclassification class 2 class 1 The test data falls in class 1. k = 5 The neighborhood of the test data includes both the classes. However, majority belongs to class 1. Prediction = class 1

K means Situation 1: Data is well separated. Each class has a centroid/average. Situation 2: Here test data is closer to centroid 1. The test data actually belongs to cluster 2 The test data is closest to centroid 3 centroid 1 centroid 2 centroid 1 centroid 3 centroid 2 Test data is predicted to be from class 3. is predicted to belong to class 1 as it is closer to the global centroid of class 1

Solution : local k means For every class local centroids are calculated around the test data. The class with the local centroid closest to the test data is the prediction. Results Some of best results came from local k means in the beginning. With k = 14 misclassification of 1.75% Local PCA + local k means gave 1.53% misclassification. With deskewing this misclassification could be reduced to 1.14%

Support Vector Machines (SVM) Suppose you are given a data set in which their classes are known want to classify to construct a linear decision boundary to classify new observations

Support Vector Machines (SVM) Decision boundary chosen to maximize the separation m between classes

SVM with multiple classes SVM is a binary classifier how can we extend SVM to more than two classes? One method: One vs Rest Construct one SVM model for each class Each SVM separates one class from the rest

Support Vector Machines (SVM) What if data cannot be separated by a line? Kernel SVM

Parameter selection for SVMs It is not obvious what parameters to use when training multiple models Within each class, compute a corresponding gamma This gives a starting point for parameter selection How to obtain an approximate range of parameters for training?

An alternative approach Our approach: Separately apply PCA to each digit space This extracts the patterns from each digit class We can use different parameters for each digit group. 0 Model 1 PCA 1 Model 2 SVM(1 VS. all) 9 Model 10

Some Challenges for kernel SVM Using knn, with k=5, gamma values for each class Error 1.25% using knn different gamma on each model Error 1.2% is achieved with the averaged gamma

Known results using PCA and Kernel SVM Method Error Time SVM 1.40% 743 sec PCA + SVM 1.25% 234sec digit PCA + SVM 1.20% 379 sec LDA + local Kmeans 7.77% 660 sec

Neural Networks inspired by biology mimic the way humans learn best result 1.30%

Neural Networks: Artificial Neuron

Neural Networks: Learning

Summary of Results

Conclusion UNDER CONSTRUCTION

Future work UNDER CONSTRUCTION