Naïve Bayes, Gaussian Distributions, Practical Applications

Similar documents
Machine Learning

Advanced Introduction to Machine Learning CMU-10715

Feature Selection for fmri Classification

Multi-voxel pattern analysis: Decoding Mental States from fmri Activity Patterns

Machine Learning

Machine Learning

Machine Learning

Automated fmri Feature Abstraction using Neural Network Clustering Techniques

ECE 5424: Introduction to Machine Learning

Multivariate Pattern Classification. Thomas Wolbers Space and Aging Laboratory Centre for Cognitive and Neural Systems

Lecture 10: Support Vector Machines and their Applications

Machine Learning. Supervised Learning. Manfred Huber

Dimensionality Reduction of fmri Data using PCA

TA Section: Problem Set 4

Learning Common Features from fmri Data of Multiple Subjects John Ramish Advised by Prof. Tom Mitchell 8/10/04

Simple Model Selection Cross Validation Regularization Neural Networks

Cognitive States Detection in fmri Data Analysis using incremental PCA

Voxel selection algorithms for fmri

Machine Learning

Network Traffic Measurements and Analysis

CSE 446 Bias-Variance & Naïve Bayes

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

Machine Learning. Classification

Supervised Learning for Image Segmentation

Temporal Feature Selection for fmri Analysis

Applying Supervised Learning

Generative and discriminative classification techniques

Machine Learning of fmri Virtual Sensors of Cognitive States

Predicting the Word from Brain Activity

Multivariate pattern classification

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

MULTIVARIATE ANALYSES WITH fmri DATA

Notes and Announcements

Machine Learning

Learning from High Dimensional fmri Data using Random Projections

Supervised vs unsupervised clustering

10601 Machine Learning. Model and feature selection

Boosting Simple Model Selection Cross Validation Regularization

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Machine Learning. Decision Trees. Manfred Huber

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

Machine Learning Crash Course: Part I

Clustering Distance measures K-Means. Lecture 22: Aykut Erdem December 2016 Hacettepe University

Automated Canvas Analysis for Painting Conservation. By Brendan Tobin

Machine Learning / Jan 27, 2010

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Bayesian Inference in fmri Will Penny

Naïve Bayes Classifier. Lecture 8: Aykut Erdem October 2017 Hacettepe University

Recognition Part I: Machine Learning. CSE 455 Linda Shapiro

Logis&c Regression. Aar$ Singh & Barnabas Poczos. Machine Learning / Jan 28, 2014

Statistical Methods for NLP

Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques

Feature Extraction and Classification. COMP-599 Sept 19, 2016

Generative and discriminative classification techniques

Machine Learning. Chao Lan

Mapping of Hierarchical Activation in the Visual Cortex Suman Chakravartula, Denise Jones, Guillaume Leseur CS229 Final Project Report. Autumn 2008.

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS325 Artificial Intelligence Ch. 20 Unsupervised Machine Learning

CSE 6242/CX Ensemble Methods. Or, Model Combination. Based on lecture by Parikshit Ram

10-701/15-781, Fall 2006, Final

Regularization and model selection

Evaluating Classifiers

Unsupervised Learning

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Introduction to Trajectory Clustering. By YONGLI ZHANG

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

ECG782: Multidimensional Digital Signal Processing

Penalizied Logistic Regression for Classification

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

What is machine learning?

Clustering Lecture 5: Mixture Model

Decoding the Human Motor Cortex

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

CAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification

Dynamic Bayesian network (DBN)

Classification and Detection in Images. D.A. Forsyth

CS 229 Midterm Review

CS 229 Final Project Report Learning to Decode Cognitive States of Rat using Functional Magnetic Resonance Imaging Time Series

Features: representation, normalization, selection. Chapter e-9

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

CS229 Final Project: Predicting Expected Response Times

Solution Sketches Midterm Exam COSC 6342 Machine Learning March 20, 2013

CS145: INTRODUCTION TO DATA MINING

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Large Scale Data Analysis Using Deep Learning

Thorsten Joachims Then: Universität Dortmund, Germany Now: Cornell University, USA

SD 372 Pattern Recognition

Pattern classification of fmri data: Applications for analysis of spatially distributed cortical networks

Content-based image and video analysis. Machine learning

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Learning via Optimization

Machine Learning Classifiers and Boosting

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

Lecture 19: Generative Adversarial Networks

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017

Predicting Popular Xbox games based on Search Queries of Users

08 An Introduction to Dense Continuous Robotic Mapping

7. Boosting and Bagging Bagging

CSC 2515 Introduction to Machine Learning Assignment 2

Transcription:

Naïve Bayes, Gaussian Distributions, Practical Applications Required reading: Mitchell draft chapter, sections 1 and 2. (available on class website) Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University Ground Hog s Day, 2009

Overview Recently: learn P(Y X) instead of deterministic f:x Y Bayes rule MLE and MAP estimates for parameters of P Conditional independence classification with Naïve Bayes Today: Text classification with Naïve bayes Gaussian distributions for continuous X Gaussian Naïve Bayes classifier Image classification with Naïve bayes

Learning to classify text documents Classify which emails are spam? Classify which emails promise an attachment? Classify which web pages are student home pages? How shall we represent text documents for Naïve Bayes?

Baseline: Bag of Words Approach aardvark 0 about 2 all 2 Africa 1 apple 0 anxious 0... gas 1... oil 1 Zaire 0

Naïve Bayes in a Nutshell Bayes rule: Assuming conditional independence among X i s: So, classification rule for X new = < X 1,, X n > is:

For code and data, see www.cs.cmu.edu/~tom/mlbook.html click on Software and Data

What if we have continuous X i? Eg., image classification: X i is real-valued i th pixel

What if we have continuous X i? Eg., image classification: X i is real-valued i th pixel Naïve Bayes requires P(X i Y k ), but X i is real (continuous) Common approach: assume P(X i Y k ) follows a normal (Gaussian) distribution

Gaussian Distribution (also known as Normal distribution) p(x) is a probability density function, whose integral (not sum) is 1

What if we have continuous X i? Eg., image classification: X i is i th pixel Gaussian Naïve Bayes (GNB): assume Sometimes assume variance is independent of Y (i.e., σ i ), or independent of X i (i.e., σ k ) or both (i.e., σ)

Gaussian Naïve Bayes Algorithm continuous X i (but still discrete Y) Train Naïve Bayes (examples) for each value y k estimate* for each attribute X i estimate class conditional mean, variance Classify (X new ) * probabilities must sum to 1, so need estimate only n-1 parameters...

Estimating Parameters: Y discrete, X i continuous Maximum likelihood estimates: jth training example ith feature kth class δ(z)=1 if z true, else 0

How many parameters must we estimate for Gaussian Naïve Bayes if Y has k possible values, X=<X1, Xn>?

What is form of decision surface for Gaussian Naïve Bayes classifier? eg., if distributions are spherical, attributes have same variance ( )

What is form of decision surface for Naïve Bayes classifier?

GNB Example: Classify a person s cognitive activity, based on brain image reading a sentence or viewing a picture? reading the word Hammer or Apartment viewing a vertical or horizontal line? answering the question, or getting confused?

Stimuli for our study: ant or 60 distinct exemplars, presented 6 times each

fmri voxel means for bottle : means defining P(Xi Y= bottle) fmri activation high Mean fmri activation over all stimuli: average bottle minus mean activation: below average

Training Classifiers over fmri sequences Learn the classifier function Mean(fMRI(t+4),...,fMRI(t+7)) WordCategory Leave one out cross validation Preprocessing: Adjust for head motion Convert each image x to standard normal image Learning algorithms tried: knn (spatial correlation) SVM SVDM Gaussian Naïve Bayes Regularized Logistic regression Feature selection methods tried: Logistic regression weights, voxel stability, activity relative to fixation,...

Classification task: is person viewing a tool or building? Classification accuracy statistically significant p<0.05

Where in the brain is activity that distinguishes tools vs. buildings? Accuracy at of each a radius voxel one with a classifier radius 1 centered searchlight at each voxel:

voxel clusters: searchlights Accuracies of cubical 27-voxel classifiers centered at each significant voxel [0.7-0.8]

Are classifiers detecting neural representations of meaning or perceptual features? ML: Can we train on word stimuli, then decode picture stimuli? YES: We can train classifiers when presenting English words, then decode category of picture stimuli, or Portuguese words Test on words Test on pictures Therefore, the learned neural activation patterns must capture how the brain represents the meaning of input stimulus

Are representations similar across different people? ML: Can we train classifier on data from a collection of people, then decode stimuli for a new person? YES: We can train on one group of people, and classify fmri images of new person rank accuracy classify which of 60 items Therefore, seek a theory of neural representations common to all of us (and of how we vary)

What you should know: Training and using classifiers based on Bayes rule Conditional independence What it is Why it s important Naïve Bayes What it is Why we use it so much Training using MLE, MAP estimates Discrete variables (Bernoulli) and continuous (Gaussian)

Questions to think about: Can you use Naïve Bayes for a combination of discrete and real-valued X i? How can we easily model just 2 of n attributes as dependent? What does the decision surface of a Naïve Bayes classifier look like? How would you select a subset of X i s?