Templates and Background Subtraction. Prof. D. Stricker Doz. G. Bleser

Similar documents
Window based detectors

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

CS4495/6495 Introduction to Computer Vision. 8C-L1 Classification: Discriminative models

Object Category Detection: Sliding Windows

2D Image Processing Feature Descriptors

Discriminative classifiers for image recognition

Previously. Window-based models for generic object detection 4/11/2011

Generic Object-Face detection

Object Category Detection: Sliding Windows


Image Analysis. Window-based face detection: The Viola-Jones algorithm. iphoto decides that this is a face. It can be trained to recognize pets!

Object detection as supervised classification

Recap Image Classification with Bags of Local Features

Object Detection. Computer Vision Yuliang Zou, Virginia Tech. Many slides from D. Hoiem, J. Hays, J. Johnson, R. Girshick

Object Detection Design challenges

Category vs. instance recognition

Object Category Detection. Slides mostly from Derek Hoiem

Object detection. Asbjørn Berge. INF 5300 Advanced Topic: Video Content Analysis. Technology for a better society

Histograms of Oriented Gradients for Human Detection p. 1/1

Modern Object Detection. Most slides from Ali Farhadi

Deformable Part Models

EE795: Computer Vision and Intelligent Systems

Beyond Bags of features Spatial information & Shape models

Bias-Variance Trade-off (cont d) + Image Representations

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

Human detection using histogram of oriented gradients. Srikumar Ramalingam School of Computing University of Utah

Person Detection in Images using HoG + Gentleboost. Rahul Rajan June 1st July 15th CMU Q Robotics Lab

Object Recognition II

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Human Motion Detection and Tracking for Video Surveillance

Edge and corner detection

Classification of objects from Video Data (Group 30)

Pedestrian Detection and Tracking in Images and Videos

Category-level localization

HOG-based Pedestriant Detector Training

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

Bayes Risk. Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Discriminative vs Generative Models. Loss functions in classifiers

Histograms of Oriented Gradients

Classifiers for Recognition Reading: Chapter 22 (skip 22.3)

Histogram of Oriented Gradients for Human Detection

Computer Vision for HCI. Topics of This Lecture

Face Detection and Alignment. Prof. Xin Yang HUST

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Object Classification for Video Surveillance

Previously. Edge detection. Today. Thresholding. Gradients -> edges 2/1/2011. Edges and Binary Image Analysis

SURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image

Histogram of Oriented Gradients (HOG) for Object Detection

Edges and Binary Image Analysis April 12 th, 2018

CAP 5415 Computer Vision Fall 2012

Edges and Binary Image Analysis. Thurs Jan 26 Kristen Grauman UT Austin. Today. Edge detection and matching

Object recognition (part 1)

Find that! Visual Object Detection Primer

Local features: detection and description. Local invariant features

Study of Viola-Jones Real Time Face Detector

Skin and Face Detection

Tri-modal Human Body Segmentation

A robust method for automatic player detection in sport videos

Human-Robot Interaction

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

Implementation of Optical Flow, Sliding Window and SVM for Vehicle Detection and Tracking

Multiple-Person Tracking by Detection

PEOPLE IN SEATS COUNTING VIA SEAT DETECTION FOR MEETING SURVEILLANCE

Human detection solution for a retail store environment

Selection of Scale-Invariant Parts for Object Class Recognition

Anno accademico 2006/2007. Davide Migliore

EE795: Computer Vision and Intelligent Systems

Face/Flesh Detection and Face Recognition

Local features and image matching. Prof. Xin Yang HUST

Local Image Features

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

The SIFT (Scale Invariant Feature

Human detections using Beagle board-xm

Scale Invariant Feature Transform

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Computer Vision with MATLAB MATLAB Expo 2012 Steve Kuznicki

An Implementation on Histogram of Oriented Gradients for Human Detection

Outline 7/2/201011/6/

Image gradients and edges

High Level Computer Vision. Sliding Window Detection: Viola-Jones-Detector & Histogram of Oriented Gradients (HOG)

Feature descriptors and matching

Classification and Detection in Images. D.A. Forsyth

HISTOGRAMS OF ORIENTATIO N GRADIENTS

Development in Object Detection. Junyuan Lin May 4th

Non-rigid body Object Tracking using Fuzzy Neural System based on Multiple ROIs and Adaptive Motion Frame Method

Local Image Features

Local invariant features

School of Computing University of Utah

Scale Invariant Feature Transform

Object detection using non-redundant local Binary Patterns

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

Supervised Sementation: Pixel Classification

Wikipedia - Mysid

Motion illusion, rotating snakes

Object and Class Recognition I:

Local Image Features

Part based models for recognition. Kristen Grauman

Fast Human Detection Using a Cascade of Histograms of Oriented Gradients

Parameter Sensitive Detectors

Local Feature Detectors

Transcription:

Templates and Background Subtraction Prof. D. Stricker Doz. G. Bleser 1

Surveillance

Video: Example of multiple people tracking http://www.youtube.com/watch?v=inqv34bchem&feature=player_embedded

As for the area object category classification : Representation choice of the object Window based Part based

Advantages and Disadvantages Part-Based May be better able to deal with moving body parts May be able to handle occlusion, overlaps Requires more complex reasoning Global approaches Typically simple, i.e. we train a discriminative classifier on top of the feature descriptions Work well for small resolutions Typically does detection via classification, i.e. uses a binary classifier Dr. Edgar Seemann 6

Detection via classification: Main idea Basic component: a binary classifier Car/non-car Classifier No, Yes, not car. a car. 7 K. Grauman, B. Leibe

Detection via classification: Main idea If object may be in a cluttered scene, slide a window around looking for it. Car/non-car Classifier 8 K. Grauman, B. Leibe

Window-based models: Building an object model Simple holistic descriptions of image content grayscale / color histogram vector of pixel intensities Kristen Grauman

Window-based models: Building an object model Pixel-based representations sensitive to small shifts Color or grayscale-based appearance description can be sensitive to illumination and intra-class appearance variation Kristen Grauman

Window-based models: Building an object model Consider edges, contours, and (oriented) intensity gradients Kristen Grauman

Window-based models: Building an object model Consider edges, contours, and (oriented) intensity gradients Summarize local distribution of gradients with histogram Locally orderless: offers invariance to small shifts and rotations Contrast-normalization: try to correct for variable illumination Kristen Grauman

Window-based models: Building an object model Given the representation, train a binary classifier Car/non-car Classifier No, Yes, not car. a car. Kristen Grauman

Discriminative classifier construction Nearest neighbor Neural networks 10 6 examples Shakhnarovich, Viola, Darrell 2003 Berg, Berg, Malik 2005... LeCun, Bottou, Bengio, Haffner 1998 Rowley, Baluja, Kanade 1998 Support Vector Machines Boosting Conditional Random Fields Guyon, Vapnik Heisele, Serre, Poggio, 2001, Viola, Jones 2001, Torralba et al. 2004, Opelt et al. 2006, McCallum, Freitag, Pereira 2000; Kumar, Hebert 2003 Slide adapted from Antonio Torralba

Influential Works in Detection Sung-Poggio (1994, 1998) : ~1450 citations Basic idea of statistical template detection (I think), bootstrapping to get face-like negative examples, multiple whole-face prototypes (in 1994) Rowley-Baluja-Kanade (1996-1998) : ~2900 Parts at fixed position, non-maxima suppression, simple cascade, rotation, pretty good accuracy, fast Schneiderman-Kanade (1998-2000,2004) : ~1250 Careful feature engineering, excellent results, cascade Viola-Jones (2001, 2004) : ~6500 Haar-like features, Adaboost as feature selection, hyper-cascade, very fast, easy to implement Dalal-Triggs (2005) : ~2000 Careful feature engineering, excellent results, HOG feature, online code Felzenszwalb-Huttenlocher (2000): ~800 Efficient way to solve part-based detectors Felzenszwalb-McAllester-Ramanan (2008)? ~350 Excellent template/parts-based blend Slide: Derek Hoiem

Basic framework Learn an object model from training samples or online Choose a representation Learn or fit parameters of model / classifier Generate candidates in new image Score the candidates

Window-based models: Generating and scoring candidates Car/non-car Classifier Kristen Grauman

Window-based object detection: recap Training: 1. Obtain training data 2. Define features 3. Define classifier Given new image: 1. Slide window 2. Score by classifier Training examples Car/non-car Classifier Feature extraction Kristen Grauman

Dr. Edgar Seemann Gradient Histograms

Slides from Sminchisescu

Gradient Histograms Have become extremely popular and successful in the vision community Avoid hard decisions compared to edge based features Examples: SIFT (Scale-Invariant Image Transform) GLOH (Gradient Location and Orientation Histogram) HOG (Histogram of Oriented Gradients) Dr. Edgar Seemann 21

Computing gradients: recall One sided: Two sided: f f ( x) lim ' h 0 '( x) limh 0 f ( x h) f ( x) h f ( x h) f ( x h) 2h Filter masks in x-direction One sided: Two sided: -1 1-1 0 1 Gradient: Magnitude: s 2 2 s x s y y Orientation: arctan( ) s s x 22

Histograms Gradient histograms measure the orientations and strengths of image gradients within an image region Dr. Edgar Seemann

Example: SIFT descriptor (recall) The most popular gradient-based descriptor Typically used in combination with an interest point detector Region rescaled to a grid of 16x16 pixels 4x4 regions = 16 histograms (concatenated) Histograms: 8 orientation bins, gradients weighted by gradient magnitude Final descriptor has 128 dimensions and is normalized to compensate for illumination differences Dr. Edgar Seemann 24

Histograms of Oriented Gradients Gradient-based feature descriptor developed for people detection Authors: Dalal&Triggs (INRIA Grenoble, F) Global descriptor for the complete body Very high-dimensional Typically ~4000 dimensions Dr. Edgar Seemann 25

HOG Very promising results on challenging data sets Phases 1. Learning Phase 2. Detection Phase Dr. Edgar Seemann 26

Detector: Learning Phase 1. Learning Set of cropped images containing pedestrians in normal environment Global descriptor rather than local features Using linear SVM Dr. Edgar Seemann 27

Detector: Detection Phase 2. Detection Sliding window over each scale Simple SVM prediction Dr. Edgar Seemann 28

Descriptor 1. Compute gradients on an image region of 64x128 pixels 2. Compute histograms on cells of typically 8x8 pixels (i.e. 8x16 cells) 3. Normalize histograms within overlapping blocks of cells (typically 2x2 cells, i.e. 7x15 blocks) 4. Concatenate histograms Dr. Edgar Seemann 29

Gradients Convolution with [-1 0 1] filters No smoothing Compute gradient magnitude+direction Per pixel: color channel with greatest magnitude -> final gradient Dr. Edgar Seemann 30

Gradient: alternatives centered diagonal uncentered cubiccorrected Sobel

Cell histograms 9 bins for gradient orientations (0-180 degrees) Filled with magnitudes Interpolated trilinearly: Bilinearly into spatial cells Linearly into orientation bins Dr. Edgar Seemann 32

Linear and Bilinear interpolation for subsampling Linear: Bilinear: Dr. Edgar Seemann 33

Histogram interpolation example θ=85 degrees Distance to bin centers Bin 70 -> 15 degrees Bin 90 -> 5 degress Ratios: 5/20=1/4, 15/20=3/4 Distance to bin centers Left: 2, Right: 6 Top: 2, Bottom: 6 Ratio Left-Right: 6/8, 2/8 Ratio Top-Bottom: 6/8, 2/8 Ratios: 6/8*6/8 = 36/64 = 9/16 6/8*2/8 = 12/64 = 3/16 2/8*6/8 = 12/64 = 3/16 2/8*2/8 = 4/64 = 1/16 Dr. Edgar Seemann 34

Blocks Overlapping blocks of 2x2 cells Cell histograms are concatenated and then normalized Note that each cell several occurrences with different normalization in final descriptor Normalization Different norms possible We add a normalization epsilon to avoid division by zero Dr. Edgar Seemann 35

Blocks Gradient magnitudes are weighted according to a Gaussian spatial window Distant gradients contribute less to the histogram Dr. Edgar Seemann 36

Final Descriptor Concatenation of Blocks Visualization: Dr. Edgar Seemann 37

Engineering Developing a feature descriptor requires a lot of engineering Testing of parameters (e.g. size of cells, blocks, number of cells in a block, size of overlap) Normalization schemes (e.g. L1, L2-Norms etc., gamma correction, pixel intensity normalization) An extensive evaluation of different choices was performed, when the descriptor was proposed It s not only the idea, but also the engineering effort Dr. Edgar Seemann 38

Demo Dr. Edgar Seemann 39

How to simplify that? If the camera is static, yes, we can.

Background subtraction Simple techniques can do ok with static camera But hard to do perfectly Widely used: Traffic monitoring (counting vehicles, detecting & tracking vehicles, pedestrians), Human action recognition (run, walk, jump, squat), Human-computer interaction Object tracking

Blue/Green Screen

An example: The Ultimatte system p 1 and p 2

Background subtraction Goal: Separation of Foreground (object) and Background (everything else = noise) Result could be a Binary image, containing foreground only Probability image, containing the likelihood of each pixel being foreground Useful for further processing, such as using silhouettes, etc. Less processing, less room for errors Approaches Motion-based Color-based Some approaches can learn!

Background Subtraction A largely unsolved problem One video frame Estimated background Difference Image Thresholded Foreground on blue

Slide credit: Birgi Tamersoy

Why is it Hard? 1. Illumination Changes Gradual (evening to night) Sudden (overhead clouds) 2. Changes in the background geometry Parked cars (should become part of the background) 3. Camera related issues Camera oscillations (shaking) Grainy noise 4. Changes in background objects Tree branches Sea waves

Foreground-Background Segmentation using Motion and Color Motion-based Model-free No learning Image differencing Color-based Background subtraction Background used as a model No learning Advanced background subtraction Background is learned Very advanced background subtraction (Gaussian Mixture Models) Background is learned

Motion-based: Image Differencing 49

Slide credit: Birgi Tamersoy

Slide credit: Birgi Tamersoy

5. Deleting Noise Singular pixels are likely to appear: Pixel-noise!! Apply Median filter: Depending on filter size, bigger spots can be erased Alternative: Morphologic 1. Save image in last frame 2. Capture camera image 3. Subtract image 4. Threshold 5. Delete noise

Recall: Median filtering A median filter operates over a window by selecting the median intensity in the window

Background Subtraction

Background Subtraction Foreground is moving, background is stable Algorithm 1. Capture image containing background 2. Capture camera image 3. Subtract image (= difference = motion) 4. Threshold 5. Delete noise 55

Advanced Background Subtraction 56

Advanced Background Subtraction What if we have small motion in the background? Bushed, leaves, etc. and noise in the camera/lighting Learn(!) the background Initialization over n frame Update after subtraction: B i+ 1 = α* I i + (1 -α) * B i Capture N images and calculate the average background image (no object present) 1. Calculate average background image 2. Capture camera image 3. Subtract image (= motion) 4. Threshold 5. Delete noise 57

Slide credit: Birgi Tamersoy

Slide credit: Birgi Tamersoy

Average/Median Image Alyosha Efros, CMU

Background Subtraction - = Alyosha Efros, CMU

Pros and cons Advantages: Extremely easy to implement and use! All pretty fast. Corresponding background models need not be constant, they change over time. Disadvantages: Accuracy of frame differencing depends on object speed and frame rate Median background model: relatively high memory requirements. Setting global threshold Th Slide credit: Birgi Tamersoy

Very Advanced Background Subtraction 63

Very Advanced Background Subtraction 1) Use Neighborhood relation!! Compare pixel with its neighbors!! Weight them!! 2) Learn the background and its variations!! E.g. Gaussian models (mean,var) for each pixel!!! E.g. a Histogram for each Pixel The more images you train on the better!! Idea: Some pixel may vary more than other pixels Algorithm: Consider each pixel (x,y) in the input image and check, how much it varies with respect to the mean and variance of the learned Gaussian models 1. Calculate mean and variance for each pixel 2. Capture camera image 3. Subtract image (= motion) 4. Weight the distances (new) 5. Threshold according to variance 6. Delete noise

Weight the Distances, Correlation between Pixel values If one pixel is considered to be a foreground pixel Its neighbor is also likely to be a foreground pixel If its neighbor is not considered to be a foreground pixel, one of the two might be wrong Neighboring pixels are highly correlated (similar) 1. Calculate mean and variance for each pixel 2. Capture camera image 3. Subtract image (= motion) 4. Weight the distances (new) 5. Threshold according to variance 6. Delete noise

Weight the Distances What does a pixel say about a foreground pixel, that is further away? Pixels with an increasing distance to each other are saying less about each other The correlation between pixels decreases with distance 1. Calculate mean and variance for each pixel 2. Capture camera image 3. Subtract image (= motion) 4. Weight the distances (new) 5. Threshold according to variance 6. Delete noise 66

Weight the Distances Use a Gaussian for Weighting To test a pixel I(x,y) Center a Gaussian on this pixel and weight the neighboring pixels accordingly => Convolution! 1D example: Signal: Gaussian filter: Output: 60 48 2 3 1 222 100 0.25 0.5 0.25 39.5 13.75 2.25 56.75 136.3 67

Learn the background and its variations Likelihood image or Probability image, containing the likelihood for each pixel of being a background/foreground pixel 1. Calculate mean and variance for each pixel 2. Capture camera image 3. Subtract image (= motion) 4. Weight the distances (new) 5. Threshold according to variance 6. Delete noise 68

A little detour to Statistics 69

Statistics Mean x Center of gravity of the object Variance 1 N mean x i N i 1 y mean y i N i 1 The Variance measures the variations of the object pixel positions around the center of gravity 1 N N is the number of object pixels x var 1 N N i 1 ( x i x mean 2 ) y var 1 N N i 1 ( y i y mean 2 ) x var is big. y var is small 70

Statistics x x sigma var Standard deviation: sigma ( ) Normal distribution = Gaussian distribution Range 2 3 4 Samples 68.26 % 95.44 % 99.73 % 99.99 %

Statistics How to use it Automatic thresholding based on statistics Example: the color of the hand Training =>mean color Algorithm: hand pixel if: TH min < pixel < Th max How do we define TH min and Th max? Use statistics: TH min = mean-2 and TH max = mean+2

Very Advanced Background Subtraction TH should be different for each pixel 73

Threshold According to Variance Threshold can be chosen depending on the variance A local threshold Standard Deviation For example: If Th min < dist < Th max => object pixel Th min = mean Th max = mean + K is set by you! (K=2 usually) 1. Calculate mean and variance for each pixel 2. Capture camera image 3. Subtract image (= motion) 4. Weight the distances (new) 5. Threshold according to variance 6. Delete noise Range 2 3 4 ( x, y) var( x, y) Samples 68.26 % 95.44 % 99.73 % 99.99 % 74

Outloook: Gaussian Mixture Models Each pixel modeled with a mixture of Gaussians Flexible to handle variations in the background

Tracking 76

Tracking Follow the object(s) over time Finding the trajectory (curve connecting the positions over time) Simple tracking: Advanced tracking: Cluttered background and multiple objects 77

Prediction Given the position of the object in previous images, where do we think (predict) the object will be in the current image?? We need a motion model The size of the search region depends on: The uncertain of the prediction The framerate How fast can the object max. move? Search region 78

Motion Model Predicted position at time t: Brownian Motion: According to a Gaussian model 0 th order: 1 th order: Similar for y ( xt, yt ) ( xt 1, yt 1) x t x t 1 x ( t x t, y ) t 1 t x : velocity in t 1 x at time t-1 2 th order Similar for y x t x 1 2 xt 1 t x t 1 t x 2 :acceleration in x at time t 1 Many other types exist: look at your application! t 1 t-1

Readings Navneet Dalal and Bill Triggs, Histograms of Oriented Gradients for Human Detection Chris Stauer & W.E.L. Grimson, Adaptive Background Mixture Models for Real-Time Tracking Thank you!!!