Image Classification pipeline. Lecture 2-1

Similar documents
Image Classification pipeline. Lecture 2-1

CS5670: Computer Vision

Deep neural networks II

Real-time Object Detection CS 229 Course Project

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

K-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Spatial Localization and Detection. Lecture 8-1

Image classification Computer Vision Spring 2018, Lecture 18

Deep Learning for Computer Vision

k-nearest Neighbors + Model Selection

Verification: is that a lamp? What do we mean by recognition? Recognition. Recognition

What do we mean by recognition?

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text

EE795: Computer Vision and Intelligent Systems

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Nearest Neighbor Classifiers

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

HOG-based Pedestriant Detector Training

Polytechnic University of Tirana

CSE 573: Artificial Intelligence Autumn 2010

Classification: Feature Vectors

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Project 3 Q&A. Jonathan Krause

Machine Learning 13. week

Automatic Colorization of Grayscale Images

Object Recognition II

Instance-based Learning

Deep Learning for Computer Vision II

Naïve Bayes for text classification

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy

Multi-layer Perceptron Forward Pass Backpropagation. Lecture 11: Aykut Erdem November 2016 Hacettepe University

Multi-Glance Attention Models For Image Classification

Lecture 37: ConvNets (Cont d) and Training

Apparel Classifier and Recommender using Deep Learning

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Classifying Depositional Environments in Satellite Images

Object Detection Based on Deep Learning

Kernels and Clustering

Dynamic Routing Between Capsules

CISC 4631 Data Mining

Does the Brain do Inverse Graphics?

Beyond Bags of features Spatial information & Shape models

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Deep Learning for Remote Sensing

Semantic Image Search. Alex Egg

Template Matching Rigid Motion

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Yiqi Yan. May 10, 2017

CS 343: Artificial Intelligence

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

Learning to Recognize Faces in Realistic Conditions

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

Deformable Part Models

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Template Matching Rigid Motion. Find transformation to align two images. Focus on geometric features

Computer Vision. Exercise Session 10 Image Categorization

Object Detection Design challenges

Nearest Neighbor Classification. Machine Learning Fall 2017

Applying Supervised Learning

Announcements. Recognition. Recognition. Recognition. Recognition. Homework 3 is due May 18, 11:59 PM Reading: Computer Vision I CSE 152 Lecture 14

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Towards Large-Scale Semantic Representations for Actionable Exploitation. Prof. Trevor Darrell UC Berkeley

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

Lecture 28 Intro to Tracking

Backpropagation and Neural Networks. Lecture 4-1

Recall: Blob Merge/Split Lecture 28

Contexts and 3D Scenes

Data Mining. Lecture 03: Nearest Neighbor Learning

Lecture 8: Grid Search and Model Validation Continued

ECG782: Multidimensional Digital Signal Processing

Does the Brain do Inverse Graphics?

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

K- Nearest Neighbors(KNN) And Predictive Accuracy

Deep Learning For Video Classification. Presented by Natalie Carlebach & Gil Sharon

Image Processing. David Kauchak cs160 Fall Empirical Evaluation of Dissimilarity Measures for Color and Texture

Object recognition (part 1)

Lecture 12 Recognition

Local features: detection and description. Local invariant features

Contexts and 3D Scenes

CAP 6412 Advanced Computer Vision

Object Category Detection. Slides mostly from Derek Hoiem

The Boundary Graph Supervised Learning Algorithm for Regression and Classification

M. Sc. (Artificial Intelligence and Machine Learning)

CSE 152 : Introduction to Computer Vision, Spring 2018 Assignment 5

Semi-Automatic Transcription Tool for Ancient Manuscripts

ImageNet Classification with Deep Convolutional Neural Networks

Detection III: Analyzing and Debugging Detection Methods

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Lecture 3. Oct

Supervised vs unsupervised clustering

LEARNING TO GENERATE CHAIRS WITH CONVOLUTIONAL NEURAL NETWORKS

Introduction to Data Science. Introduction to Data Science with Python. Python Basics: Basic Syntax, Data Structures. Python Concepts (Core)

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Supervised Learning: Nearest Neighbors

CIS680: Vision & Learning Assignment 2.b: RPN, Faster R-CNN and Mask R-CNN Due: Nov. 21, 2018 at 11:59 pm

Local features: detection and description May 12 th, 2015

Transcription:

Lecture 2: Image Classification pipeline Lecture 2-1

Administrative: Piazza For questions about midterm, poster session, projects, etc, use Piazza! SCPD students: Use your @stanford.edu address to register for Piazza; contact scpd-customerservice@stanford.edu for help. Lecture 2-2

Administrative: Assignment 1 Out yesterday, due 4/17 11:59pm - K-Nearest Neighbor Linear classifiers: SVM, Softmax Two-layer neural network Image features Lecture 2-3

Administrative: Friday Discussion Sections (Some) Fridays 12:30pm - 1:20pm in Gates B03 Hands-on tutorials, with more practical detail than main lecture We may not have discussion sections every Friday, check syllabus: http://cs231n.stanford.edu/syllabus.html This Friday: Python / numpy / Google Cloud setup Lecture 2-4

Administrative: Course Project Project proposal due 4/24 Lecture 2-5

Administrative: Python + Numpy http://cs231n.github.io/python-numpy-tutorial/ Lecture 2-6

Administrative: Google Cloud We will be using Google Cloud in this class We will be distributing coupons coupons to all enrolled students See our tutorial here for walking through Google Cloud setup: https://github.com/cs231n/gcloud Lecture 2-7

Image Classification: A core task in Computer Vision (assume given set of discrete labels) {dog, cat, truck, plane,...} cat This image by Nikita is licensed under CC-BY 2.0 Lecture 2-8

The Problem: Semantic Gap What the computer sees An image is just a big grid of numbers between [0, 255]: This image by Nikita is licensed under CC-BY 2.0 e.g. 800 x 600 x 3 (3 channels RGB) Lecture 2-9

Challenges: Viewpoint variation All pixels change when the camera moves! This image by Nikita is licensed under CC-BY 2.0 Lecture 2-10

Challenges: Background Clutter This image is CC0 1.0 public domain This image is CC0 1.0 public domain Lecture 2-11

Challenges: Illumination This image is CC0 1.0 public domain This image is CC0 1.0 public domain This image is CC0 1.0 public domain Lecture 2-12 This image is CC0 1.0 public domain

Challenges: Deformation This image by Umberto Salvagnin is licensed under CC-BY 2.0 This image by Umberto Salvagnin is licensed under CC-BY 2.0 This image by sare bear is licensed under CC-BY 2.0 Lecture 2-13 This image by Tom Thai is licensed under CC-BY 2.0

Challenges: Occlusion This image is CC0 1.0 public domain This image by jonsson is licensed under CC-BY 2.0 This image is CC0 1.0 public domain Lecture 2-14

Challenges: Intraclass variation This image is CC0 1.0 public domain Lecture 2-15

An image classifier Unlike e.g. sorting a list of numbers, no obvious way to hard-code the algorithm for recognizing a cat, or other classes. Lecture 2-16

Attempts have been made Find edges Find corners? John Canny, A Computational Approach to Edge Detection, IEEE TPAMI 1986 Lecture 2-17

Machine Learning: Data-Driven Approach 1. Collect a dataset of images and labels 2. Use Machine Learning to train a classifier 3. Evaluate the classifier on new images Example training set Lecture 2-18

First classifier: Nearest Neighbor Memorize all data and labels Predict the label of the most similar training image Lecture 2-19

Example Dataset: CIFAR10 10 classes 50,000 training images 10,000 testing images Alex Krizhevsky, Learning Multiple Layers of Features from Tiny Images, Technical Report, 2009. Lecture 2-20

Example Dataset: CIFAR10 10 classes 50,000 training images 10,000 testing images Test images and nearest neighbors Alex Krizhevsky, Learning Multiple Layers of Features from Tiny Images, Technical Report, 2009. Lecture 2-21

Distance Metric to compare images L1 distance: add Lecture 2-22

Nearest Neighbor classifier Lecture 2-23

Nearest Neighbor classifier Memorize training data Lecture 2-24

Nearest Neighbor classifier For each test image: Find closest train image Predict label of nearest image Lecture 2-25

Nearest Neighbor classifier Q: With N examples, how fast are training and prediction? Lecture 2-26

Nearest Neighbor classifier Q: With N examples, how fast are training and prediction? A: Train O(1), predict O(N) Lecture 2-27

Nearest Neighbor classifier Q: With N examples, how fast are training and prediction? A: Train O(1), predict O(N) This is bad: we want classifiers that are fast at prediction; slow for training is ok Lecture 2-28

Nearest Neighbor classifier Many methods exist for fast / approximate nearest neighbor (beyond the scope of 231N!) A good implementation: https://github.com/facebookresearch/faiss Johnson et al, Billion-scale similarity search with GPUs, arxiv 2017 Lecture 2-29

What does this look like? Lecture 2-30

K-Nearest Neighbors Instead of copying label from nearest neighbor, take majority vote from K closest points K=1 K=3 K=5 Lecture 2-31

What does this look like? Lecture 2-32

What does this look like? Lecture 2-33

K-Nearest Neighbors: Distance Metric L1 (Manhattan) distance L2 (Euclidean) distance Lecture 2-34

K-Nearest Neighbors: Distance Metric L1 (Manhattan) distance K=1 L2 (Euclidean) distance K=1 Lecture 2-35

K-Nearest Neighbors: Demo Time http://vision.stanford.edu/teaching/cs231n-demos/knn/ Lecture 2-36

Hyperparameters What is the best value of k to use? What is the best distance to use? These are hyperparameters: choices about the algorithm that we set rather than learn Lecture 2-37

Hyperparameters What is the best value of k to use? What is the best distance to use? These are hyperparameters: choices about the algorithm that we set rather than learn Very problem-dependent. Must try them all out and see what works best. Lecture 2-38

Setting Hyperparameters Idea #1: Choose hyperparameters that work best on the data Your Dataset Lecture 2-39

Setting Hyperparameters Idea #1: Choose hyperparameters that work best on the data BAD: K = 1 always works perfectly on training data Your Dataset Lecture 2-40

Setting Hyperparameters Idea #1: Choose hyperparameters that work best on the data BAD: K = 1 always works perfectly on training data Your Dataset Idea #2: Split data into train and test, choose hyperparameters that work best on test data train test Lecture 2-41

Setting Hyperparameters Idea #1: Choose hyperparameters that work best on the data BAD: K = 1 always works perfectly on training data Your Dataset Idea #2: Split data into train and test, choose hyperparameters that work best on test data BAD: No idea how algorithm will perform on new data train test Lecture 2-42

Setting Hyperparameters BAD: K = 1 always works perfectly on training data Idea #1: Choose hyperparameters that work best on the data Your Dataset Idea #2: Split data into train and test, choose hyperparameters that work best on test data BAD: No idea how algorithm will perform on new data train test Idea #3: Split data into train, val, and test; choose hyperparameters on val and evaluate on test train Better! validation Lecture 2-43 test

Setting Hyperparameters Your Dataset Idea #4: Cross-Validation: Split data into folds, try each fold as validation and average the results fold 1 fold 2 fold 3 fold 4 fold 5 test fold 1 fold 2 fold 3 fold 4 fold 5 test fold 1 fold 2 fold 3 fold 4 fold 5 test Useful for small datasets, but not used too frequently in deep learning Lecture 2-44

Setting Hyperparameters Example of 5-fold cross-validation for the value of k. Each point: single outcome. The line goes through the mean, bars indicated standard deviation (Seems that k ~= 7 works best for this data) Lecture 2-45

k-nearest Neighbor on images never used. - Very slow at test time - Distance metrics on pixels are not informative Original Original image is CC0 public domain Boxed Shifted Tinted (all 3 images have same L2 distance to the one on the left) Lecture 2-46

k-nearest Neighbor on images never used. Dimensions = 3 Points = 43 - Curse of dimensionality Dimensions = 2 Points = 42 Dimensions = 1 Points = 4 Lecture 2-47

K-Nearest Neighbors: Summary In Image classification we start with a training set of images and labels, and must predict labels on the test set The K-Nearest Neighbors classifier predicts labels based on nearest training examples Distance metric and K are hyperparameters Choose hyperparameters using the validation set; only run on the test set once at the very end! Lecture 2-48

Linear Classification Lecture 2-49

Neural Network Linear classifiers This image is CC0 1.0 public domain Lecture 2-50

Two young girls are Boy is doing backflip playing with lego toy. on wakeboard Man in black shirt is playing guitar. Construction worker in orange safety vest is working on road. Karpathy and Fei-Fei, Deep Visual-Semantic Alignments for Generating Image Descriptions, CVPR 2015 Figures copyright IEEE, 2015. Reproduced for educational purposes. Lecture 2-51

Recall CIFAR10 50,000 training images each image is 32x32x3 10,000 test images. Lecture 2-52

Parametric Approach Image f(x,w) Array of 32x32x3 numbers (3072 numbers total) 10 numbers giving class scores W parameters or weights Lecture 2-53

Parametric Approach: Linear Classifier Image f(x,w) = Wx f(x,w) Array of 32x32x3 numbers (3072 numbers total) 10 numbers giving class scores W parameters or weights Lecture 2-54

Parametric Approach: Linear Classifier 3072x1 Image f(x,w) = Wx 10x1 10x3072 f(x,w) Array of 32x32x3 numbers (3072 numbers total) 10 numbers giving class scores W parameters or weights Lecture 2-55

Parametric Approach: Linear Classifier 3072x1 Image f(x,w) = Wx + b 10x1 10x3072 f(x,w) Array of 32x32x3 numbers (3072 numbers total) 10x1 10 numbers giving class scores W parameters or weights Lecture 2-56

Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Stretch pixels into column 56 56 231 24 2 Input image 231 24 2 Lecture 2-57

Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Stretch pixels into column 56 56 231 24 2 Input image 0.2-0.5 0.1 2.0 1.1-96.8 Cat score 437.9 Dog score 61.95 Ship score 231 1.5 1.3 2.1 0.0 24 0 0.25 0.2-0.3 + 3.2 = -1.2 2 W b Lecture 2-58

Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Algebraic Viewpoint f(x,w) = Wx Lecture 2-59

Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Input image Algebraic Viewpoint f(x,w) = Wx 0.2-0.5 1.5 1.3 0.25 0.1 2.0 2.1 0.0 0.2-0.3 W b Score 1.1 3.2-1.2-96.8 437.9 61.95 Lecture 2-60

Interpreting a Linear Classifier Lecture 2-61

Interpreting a Linear Classifier: Visual Viewpoint Lecture 2-62

Interpreting a Linear Classifier: Geometric Viewpoint f(x,w) = Wx + b Array of 32x32x3 numbers (3072 numbers total) Plot created using Wolfram Cloud Cat image by Nikita is licensed under CC-BY 2.0 Lecture 2-63

Hard cases for a linear classifier Class 1: First and third quadrants Class 1: 1 <= L2 norm <= 2 Class 1: Three modes Class 2: Second and fourth quadrants Class 2: Everything else Class 2: Everything else Lecture 2-64

Linear Classifier: Three Viewpoints Algebraic Viewpoint Visual Viewpoint Geometric Viewpoint f(x,w) = Wx One template per class Hyperplanes cutting up space Lecture 2-65

So far: Defined a (linear) score function f(x,w) = Wx + b Example class scores for 3 images for some W: How can we tell whether this W is good or bad? Cat image by Nikita is licensed under CC-BY 2.0 Car image is CC0 1.0 public domain Frog image is in the public domain -3.45-8.87 0.09 2.9 4.48 8.02 3.78 1.06-0.36-0.72-0.51 6.04 5.31-4.22-4.19 3.58 4.49-4.37-2.09-2.93 Lecture 2-66 3.42 4.64 2.65 5.1 2.64 5.55-4.34-1.5-4.79 6.14

f(x,w) = Wx + b Coming up: - Loss function - Optimization - ConvNets! (quantifying what it means to have a good W) (start with random W and find a W that minimizes the loss) (tweak the functional form of f) Lecture 2-67