Image Classification pipeline. Lecture 2-1

Similar documents
Image Classification pipeline. Lecture 2-1

CS5670: Computer Vision

Deep neural networks II

K-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

Real-time Object Detection CS 229 Course Project

Image classification Computer Vision Spring 2018, Lecture 18

Deep Learning for Computer Vision

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Spatial Localization and Detection. Lecture 8-1

k-nearest Neighbors + Model Selection

Verification: is that a lamp? What do we mean by recognition? Recognition. Recognition

What do we mean by recognition?

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text

EE795: Computer Vision and Intelligent Systems

Nearest Neighbor Classifiers

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

HOG-based Pedestriant Detector Training

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Classification: Feature Vectors

Automatic Colorization of Grayscale Images

Instance-based Learning

Naïve Bayes for text classification

Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy

Lecture 37: ConvNets (Cont d) and Training

Polytechnic University of Tirana

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Classifying Depositional Environments in Satellite Images

CSE 573: Artificial Intelligence Autumn 2010

Dynamic Routing Between Capsules

Kernels and Clustering

CISC 4631 Data Mining

Template Matching Rigid Motion

Object Recognition II

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Apparel Classifier and Recommender using Deep Learning

CS 343: Artificial Intelligence

Beyond Bags of features Spatial information & Shape models

Learning to Recognize Faces in Realistic Conditions

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

Project 3 Q&A. Jonathan Krause

Deformable Part Models

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Template Matching Rigid Motion. Find transformation to align two images. Focus on geometric features

Object Detection Design challenges

Computer Vision. Exercise Session 10 Image Categorization

Deep Learning for Remote Sensing

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Nearest Neighbor Classification. Machine Learning Fall 2017

Applying Supervised Learning

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14

Machine Learning 13. week

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

Deep Learning for Computer Vision II

Data Mining. Lecture 03: Nearest Neighbor Learning

Lecture 8: Grid Search and Model Validation Continued

ECG782: Multidimensional Digital Signal Processing

K- Nearest Neighbors(KNN) And Predictive Accuracy

Image Processing. David Kauchak cs160 Fall Empirical Evaluation of Dissimilarity Measures for Color and Texture

Object recognition (part 1)

Local features: detection and description. Local invariant features

Multi-layer Perceptron Forward Pass Backpropagation. Lecture 11: Aykut Erdem November 2016 Hacettepe University

CAP 6412 Advanced Computer Vision

The Boundary Graph Supervised Learning Algorithm for Regression and Classification

CSE 152 : Introduction to Computer Vision, Spring 2018 Assignment 5

Multi-Glance Attention Models For Image Classification

CS231A Midterm Review. Friday 5/6/2016

Semi-Automatic Transcription Tool for Ancient Manuscripts

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Detection III: Analyzing and Debugging Detection Methods

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Lecture 3. Oct

Supervised vs unsupervised clustering

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

LEARNING TO GENERATE CHAIRS WITH CONVOLUTIONAL NEURAL NETWORKS

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Supervised Learning: Nearest Neighbors

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

Local features: detection and description May 12 th, 2015

Contexts and 3D Scenes

CSE 527: Introduction to Computer Vision

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

LARGE MARGIN CLASSIFIERS

CS395T paper review. Indoor Segmentation and Support Inference from RGBD Images. Chao Jia Sep

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Lecture 28 Intro to Tracking

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

Recall: Blob Merge/Split Lecture 28

Contexts and 3D Scenes

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Lecture 12 Recognition

CP365 Artificial Intelligence

CSCI567 Machine Learning (Fall 2014)

Object Category Detection. Slides mostly from Derek Hoiem

Transcription:

Lecture 2: Image Classification pipeline Lecture 2-1

Administrative: Piazza For questions about midterm, poster session, projects, etc, use Piazza! SCPD students: Use your @stanford.edu address to register for Piazza; contact scpd-customerservice@stanford.edu for help. Lecture 2-2

Administrative: Assignment 1 Out yesterday, due 4/18 11:59pm - K-Nearest Neighbor Linear classifiers: SVM, Softmax Two-layer neural network Image features Lecture 2-3

Administrative: Friday Discussion Sections Fridays 12:30pm - 1:20pm in Skilling Auditorium Hands-on tutorials, with more practical detail than main lecture Check course website for schedule: http://cs231n.stanford.edu/syllabus.html This Friday: Python / numpy / Google Cloud setup Lecture 2-4

Administrative: Python + Numpy http://cs231n.github.io/python-numpy-tutorial/ Lecture 2-5

Administrative: Google Cloud We will be using Google Cloud in this class We will be distributing coupons coupons to all enrolled students See our tutorial here for walking through Google Cloud setup: http://cs231n.github.io/gce-tutorial/ Lecture 2-6

Image Classification: A core task in Computer Vision (assume given set of discrete labels) {dog, cat, truck, plane,...} cat This image by Nikita is licensed under CC-BY 2.0 Lecture 2-7

The Problem: Semantic Gap What the computer sees An image is just a big grid of numbers between [0, 255]: This image by Nikita is licensed under CC-BY 2.0 e.g. 800 x 600 x 3 (3 channels RGB) Lecture 2-8

Challenges: Viewpoint variation All pixels change when the camera moves! This image by Nikita is licensed under CC-BY 2.0 Lecture 2-9

Challenges: Illumination This image is CC0 1.0 public domain This image is CC0 1.0 public domain This image is CC0 1.0 public domain Lecture 2-10 This image is CC0 1.0 public domain

Challenges: Deformation This image by Umberto Salvagnin is licensed under CC-BY 2.0 This image by Umberto Salvagnin is licensed under CC-BY 2.0 This image by sare bear is licensed under CC-BY 2.0 Lecture 2-11 This image by Tom Thai is licensed under CC-BY 2.0

Challenges: Occlusion This image is CC0 1.0 public domain This image by jonsson is licensed under CC-BY 2.0 This image is CC0 1.0 public domain Lecture 2-12

Challenges: Background Clutter This image is CC0 1.0 public domain This image is CC0 1.0 public domain Lecture 2-13

Challenges: Intraclass variation This image is CC0 1.0 public domain Lecture 2-14

An image classifier Unlike e.g. sorting a list of numbers, no obvious way to hard-code the algorithm for recognizing a cat, or other classes. Lecture 2-15

Attempts have been made Find edges Find corners? John Canny, A Computational Approach to Edge Detection, IEEE TPAMI 1986 Lecture 2-16

Machine Learning: Data-Driven Approach 1. Collect a dataset of images and labels 2. Use Machine Learning to train a classifier 3. Evaluate the classifier on new images Example training set Lecture 2-17

First classifier: Nearest Neighbor Memorize all data and labels Predict the label of the most similar training image Lecture 2-18

Example Dataset: CIFAR10 10 classes 50,000 training images 10,000 testing images Alex Krizhevsky, Learning Multiple Layers of Features from Tiny Images, Technical Report, 2009. Lecture 2-19

Example Dataset: CIFAR10 10 classes 50,000 training images 10,000 testing images Test images and nearest neighbors Alex Krizhevsky, Learning Multiple Layers of Features from Tiny Images, Technical Report, 2009. Lecture 2-20

Distance Metric to compare images L1 distance: add Lecture 2-21

Nearest Neighbor classifier Lecture 2-22

Nearest Neighbor classifier Memorize training data Lecture 2-23

Nearest Neighbor classifier For each test image: Find closest train image Predict label of nearest image Lecture 2-24

Nearest Neighbor classifier Q: With N examples, how fast are training and prediction? Lecture 2-25

Nearest Neighbor classifier Q: With N examples, how fast are training and prediction? A: Train O(1), predict O(N) Lecture 2-26

Nearest Neighbor classifier Q: With N examples, how fast are training and prediction? A: Train O(1), predict O(N) This is bad: we want classifiers that are fast at prediction; slow for training is ok Lecture 2-27

What does this look like? Lecture 2-28

K-Nearest Neighbors Instead of copying label from nearest neighbor, take majority vote from K closest points K=1 K=3 K=5 Lecture 2-29

What does this look like? Lecture 2-30

What does this look like? Lecture 2-31

K-Nearest Neighbors: Distance Metric L1 (Manhattan) distance L2 (Euclidean) distance Lecture 2-32

K-Nearest Neighbors: Distance Metric L1 (Manhattan) distance K=1 L2 (Euclidean) distance K=1 Lecture 2-33

K-Nearest Neighbors: Demo Time http://vision.stanford.edu/teaching/cs231n-demos/knn/ Lecture 2-34

Hyperparameters What is the best value of k to use? What is the best distance to use? These are hyperparameters: choices about the algorithm that we set rather than learn Lecture 2-35

Hyperparameters What is the best value of k to use? What is the best distance to use? These are hyperparameters: choices about the algorithm that we set rather than learn Very problem-dependent. Must try them all out and see what works best. Lecture 2-36

Setting Hyperparameters Idea #1: Choose hyperparameters that work best on the data Your Dataset Lecture 2-37

Setting Hyperparameters Idea #1: Choose hyperparameters that work best on the data BAD: K = 1 always works perfectly on training data Your Dataset Lecture 2-38

Setting Hyperparameters Idea #1: Choose hyperparameters that work best on the data BAD: K = 1 always works perfectly on training data Your Dataset Idea #2: Split data into train and test, choose hyperparameters that work best on test data train test Lecture 2-39

Setting Hyperparameters Idea #1: Choose hyperparameters that work best on the data BAD: K = 1 always works perfectly on training data Your Dataset Idea #2: Split data into train and test, choose hyperparameters that work best on test data BAD: No idea how algorithm will perform on new data train test Lecture 2-40

Setting Hyperparameters BAD: K = 1 always works perfectly on training data Idea #1: Choose hyperparameters that work best on the data Your Dataset Idea #2: Split data into train and test, choose hyperparameters that work best on test data BAD: No idea how algorithm will perform on new data train test Idea #3: Split data into train, val, and test; choose hyperparameters on val and evaluate on test train Better! validation Lecture 2-41 test

Setting Hyperparameters Your Dataset Idea #4: Cross-Validation: Split data into folds, try each fold as validation and average the results fold 1 fold 2 fold 3 fold 4 fold 5 test fold 1 fold 2 fold 3 fold 4 fold 5 test fold 1 fold 2 fold 3 fold 4 fold 5 test Useful for small datasets, but not used too frequently in deep learning Lecture 2-42

Setting Hyperparameters Example of 5-fold cross-validation for the value of k. Each point: single outcome. The line goes through the mean, bars indicated standard deviation (Seems that k ~= 7 works best for this data) Lecture 2-43

k-nearest Neighbor on images never used. - Very slow at test time - Distance metrics on pixels are not informative Original Original image is CC0 public domain Boxed Shifted Tinted (all 3 images have same L2 distance to the one on the left) Lecture 2-44

k-nearest Neighbor on images never used. Dimensions = 3 Points = 43 - Curse of dimensionality Dimensions = 2 Points = 42 Dimensions = 1 Points = 4 Lecture 2-45

K-Nearest Neighbors: Summary In Image classification we start with a training set of images and labels, and must predict labels on the test set The K-Nearest Neighbors classifier predicts labels based on nearest training examples Distance metric and K are hyperparameters Choose hyperparameters using the validation set; only run on the test set once at the very end! Lecture 2-46

Linear Classification Lecture 2-47

Neural Network Linear classifiers This image is CC0 1.0 public domain Lecture 2-48

Two young girls are Boy is doing backflip playing with lego toy. on wakeboard Man in black shirt is playing guitar. Construction worker in orange safety vest is working on road. Karpathy and Fei-Fei, Deep Visual-Semantic Alignments for Generating Image Descriptions, CVPR 2015 Figures copyright IEEE, 2015. Reproduced for educational purposes. Lecture 2-49

Recall CIFAR10 50,000 training images each image is 32x32x3 10,000 test images. Lecture 2-50

Parametric Approach Image f(x,w) Array of 32x32x3 numbers (3072 numbers total) 10 numbers giving class scores W parameters or weights Lecture 2-51

Parametric Approach: Linear Classifier Image f(x,w) = Wx f(x,w) Array of 32x32x3 numbers (3072 numbers total) 10 numbers giving class scores W parameters or weights Lecture 2-52

Parametric Approach: Linear Classifier 3072x1 Image f(x,w) = Wx 10x1 10x3072 f(x,w) Array of 32x32x3 numbers (3072 numbers total) 10 numbers giving class scores W parameters or weights Lecture 2-53

Parametric Approach: Linear Classifier 3072x1 Image f(x,w) = Wx + b 10x1 10x3072 f(x,w) Array of 32x32x3 numbers (3072 numbers total) 10x1 10 numbers giving class scores W parameters or weights Lecture 2-54

Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Stretch pixels into column 56 56 231 24 2 Input image 231 24 2 Lecture 2-55

Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Stretch pixels into column 56 56 231 24 2 Input image 0.2-0.5 0.1 2.0 1.1-96.8 Cat score 437.9 Dog score 61.95 Ship score 231 1.5 1.3 2.1 0.0 24 0 0.25 0.2-0.3 + 3.2 = -1.2 2 W b Lecture 2-56

Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Algebraic Viewpoint f(x,w) = Wx Lecture 2-57

Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Input image Algebraic Viewpoint f(x,w) = Wx 0.2-0.5 1.5 1.3 0.25 0.1 2.0 2.1 0.0 0.2-0.3 W b Score 1.1 3.2-1.2-96.8 437.9 61.95 Lecture 2-58

Interpreting a Linear Classifier Lecture 2-59

Interpreting a Linear Classifier: Visual Viewpoint Lecture 2-60

Interpreting a Linear Classifier: Geometric Viewpoint f(x,w) = Wx + b Array of 32x32x3 numbers (3072 numbers total) Plot created using Wolfram Cloud Cat image by Nikita is licensed under CC-BY 2.0 Lecture 2-61

Hard cases for a linear classifier Class 1: First and third quadrants Class 1: 1 <= L2 norm <= 2 Class 1: Three modes Class 2: Second and fourth quadrants Class 2: Everything else Class 2: Everything else Lecture 2-62

Linear Classifier: Three Viewpoints Algebraic Viewpoint Visual Viewpoint Geometric Viewpoint f(x,w) = Wx One template per class Hyperplanes cutting up space Lecture 2-63

So far: Defined a (linear) score function f(x,w) = Wx + b Example class scores for 3 images for some W: How can we tell whether this W is good or bad? Cat image by Nikita is licensed under CC-BY 2.0 Car image is CC0 1.0 public domain Frog image is in the public domain -3.45-8.87 0.09 2.9 4.48 8.02 3.78 1.06-0.36-0.72-0.51 6.04 5.31-4.22-4.19 3.58 4.49-4.37-2.09-2.93 Lecture 2-64 3.42 4.64 2.65 5.1 2.64 5.55-4.34-1.5-4.79 6.14

f(x,w) = Wx + b Coming up: - Loss function - Optimization - ConvNets! (quantifying what it means to have a good W) (start with random W and find a W that minimizes the loss) (tweak the functional form of f) Lecture 2-65