Object Classification for Video Surveillance

Similar documents
Pattern recognition (3)

Advanced Techniques for Mobile Robotics Bag-of-Words Models & Appearance-Based Mapping

By Suren Manvelyan,

Recognition with Bag-ofWords. (Borrowing heavily from Tutorial Slides by Li Fei-fei)

Colorado School of Mines. Computer Vision. Professor William Hoff Dept of Electrical Engineering &Computer Science.

Distances and Kernels. Motivation

CS 4495 Computer Vision Classification 3: Bag of Words. Aaron Bobick School of Interactive Computing

Indexing local features and instance recognition May 15 th, 2018

Indexing local features and instance recognition May 16 th, 2017

Indexing local features and instance recognition May 14 th, 2015

Lecture 14: Indexing with local features. Thursday, Nov 1 Prof. Kristen Grauman. Outline

Today. Main questions 10/30/2008. Bag of words models. Last time: Local invariant features. Harris corner detector: rotation invariant detection

Lecture 12 Visual recognition

EECS 442 Computer vision. Object Recognition

Recognizing Object Instances. Prof. Xin Yang HUST

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Recognizing object instances

Visual Navigation for Flying Robots. Structure From Motion

Feature Matching + Indexing and Retrieval

An Integrated System for Moving Object Classification in Surveillance Videos

Recognizing object instances. Some pset 3 results! 4/5/2011. Monday, April 4 Prof. Kristen Grauman UT-Austin. Christopher Tosh.

The bits the whirl-wind left out..

Image Features and Categorization. Computer Vision Jia-Bin Huang, Virginia Tech

Instance recognition

Visual Object Recognition

Image Features and Categorization. Computer Vision Jia-Bin Huang, Virginia Tech

Previously. Part-based and local feature models for generic object recognition. Bag-of-words model 4/20/2011

Part-based and local feature models for generic object recognition

Part based models for recognition. Kristen Grauman

Discriminative classifiers for image recognition

Window based detectors

Lecture 15: Object recognition: Bag of Words models & Part based generative models

Introduction to Object Recognition & Bag of Words (BoW) Models

Beyond Bags of Features

Large-Scale Traffic Sign Recognition based on Local Features and Color Segmentation

Shape Matching. Brandon Smith and Shengnan Wang Computer Vision CS766 Fall 2007

Face Detection and Alignment. Prof. Xin Yang HUST

Patch Descriptors. CSE 455 Linda Shapiro

Human Motion Detection and Tracking for Video Surveillance

Deformable Part Models

Category-level localization

Classification of objects from Video Data (Group 30)

Human Detection and Action Recognition. in Video Sequences

Lecture 12 Recognition. Davide Scaramuzza

Templates and Background Subtraction. Prof. D. Stricker Doz. G. Bleser

Linear combinations of simple classifiers for the PASCAL challenge

Human-Robot Interaction

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Recap Image Classification with Bags of Local Features

TA Section: Problem Set 4

The SIFT (Scale Invariant Feature

Modern Object Detection. Most slides from Ali Farhadi

Improving Recognition through Object Sub-categorization

Lecture 12 Recognition

FAST HUMAN DETECTION USING TEMPLATE MATCHING FOR GRADIENT IMAGES AND ASC DESCRIPTORS BASED ON SUBTRACTION STEREO

Generic Object-Face detection

Object Category Detection: Sliding Windows

Multiple Kernel Learning for Emotion Recognition in the Wild

Lecture 18: Human Motion Recognition

Class 9 Action Recognition

Local Image Features

Object detection as supervised classification

Person Detection in Images using HoG + Gentleboost. Rahul Rajan June 1st July 15th CMU Q Robotics Lab

Learning Visual Semantics: Models, Massive Computation, and Innovative Applications

Boosting Object Detection Performance in Crowded Surveillance Videos

Object Category Detection. Slides mostly from Derek Hoiem

Artistic ideation based on computer vision methods

Category vs. instance recognition

Histograms of Oriented Gradients for Human Detection p. 1/1

Development in Object Detection. Junyuan Lin May 4th

Visuelle Perzeption für Mensch- Maschine Schnittstellen

String distance for automatic image classification

Spatial Latent Dirichlet Allocation

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

CS 231A Computer Vision (Winter 2018) Problem Set 3

SURF. Lecture6: SURF and HOG. Integral Image. Feature Evaluation with Integral Image

Patch Descriptors. EE/CSE 576 Linda Shapiro

Learning Detectors from Large Datasets for Object Retrieval in Video Surveillance

Local Features based Object Categories and Object Instances Recognition

PEOPLE IN SEATS COUNTING VIA SEAT DETECTION FOR MEETING SURVEILLANCE

Object Detection Design challenges

Tri-modal Human Body Segmentation

Action recognition in videos

Real-Time Human Detection using Relational Depth Similarity Features

Local Features and Bag of Words Models

Pedestrian Detection and Tracking in Images and Videos

HOG-based Pedestriant Detector Training

Feature descriptors. Alain Pagani Prof. Didier Stricker. Computer Vision: Object and People Tracking

Co-occurrence Histograms of Oriented Gradients for Pedestrian Detection

Beyond Bags of features Spatial information & Shape models

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

EECS150 - Digital Design Lecture 14 FIFO 2 and SIFT. Recap and Outline

Previously. Window-based models for generic object detection 4/11/2011

Detecting and Segmenting Humans in Crowded Scenes

Fast and Stable Human Detection Using Multiple Classifiers Based on Subtraction Stereo with HOG Features

CS4495/6495 Introduction to Computer Vision. 8C-L1 Classification: Discriminative models


Find that! Visual Object Detection Primer

Human detections using Beagle board-xm

Recognition of Animal Skin Texture Attributes in the Wild. Amey Dharwadker (aap2174) Kai Zhang (kz2213)

Transcription:

Object Classification for Video Surveillance Rogerio Feris IBM TJ Watson Research Center rsferis@us.ibm.com http://rogerioferis.com 1

Outline Part I: Object Classification in Far-field Video Part II: Large Scale Object Classification (Near-field / Still Images) 2

Part I Object Classification in Far-field Video 3

Motivation Object Classification in far-field video 4

Search for Pedestrians 5

6

7

Motivation Search for Trucks with Yellow Color Results locates DHL delivery trucks delivering mail at the IBM Hawthorne Facility 8

Far-field Object Classification: Why this is a Hard Problem? Illumination changes and shadows Arbitrary camera views Low-resolution imagery (objects are often less than 100 pixels in height; difficult to use e.g., SIFT features or parts-based approaches) Projective image distortion (cameras with large field-of-view) Groups of people may look like cars 9

Two Main Streams of Work for Far-Field Object Classification: 1) Methods that rely on moving object segmentation Use background subtraction to detect and track moving objects 2) Methods that do NOT use background modeling for classification Scan the entire video frame applying specialized detectors (e.g., car and pedestrian detectors) 10

Classification after Object Segmentation Assume static surveillance cameras Three main steps: Background Subtraction to detect moving object Track moving object Classify Object Track into car, person, etc. 11

Classification after Object Segmentation Constrained two-class object classification problem: discriminating vehicles from pedestrians. Key papers: Bose & Grimson, Improving Object Classification in Far-Field Video, CVPR 04 Lisa Brown, View-Independent Vehicle/Person Classification, WVSSN 04 12

Classification after Object Segmentation Simple Shape and Motion Descriptors Scene-dependent Features Foreground blob area (sensitive to perspective distortions) Aspect ratio (cars may have completely different aspect ratios depending on the pose frontal, side-view, etc.) Speed (cars may move slow just like people) Position and Direction of Motion (people and cars tend to occupy specific regions and follow different patterns of motion for different scenes) 13

Classification after Object Segmentation Simple Shape and Motion Descriptors Scene-independent Features Percentage Occupancy (number of silhouette pixels divided by the bounding box area) Direction of motion with respect to major axis direction (cars tend to move along the major axis direction) Shape Deformation (people tend to have large shape deformations than cars when moving) see recurrent motion image (next slide) 14

Classification after Object Segmentation Recurrent Motion Image [Omar Javed, ECCV 2002] Binary Silhouette image sequence for an object a Exclusive-or operator 15

Classification after Object Segmentation Recurrent Motion Image [Omar Javed, ECCV 2002] 16

Classification after Object Segmentation Recurrent Motion Image [Omar Javed, ECCV 2002] Pros: Scene-independent feature (Works for multiple camera views) Cons : Objects need to be aligned over the frames (translation and scale compensated) Morphology operations in foreground blobs obtained by background subtraction may complicate analysis of shape deformations 17

Classification after Object Segmentation Common Practice: Adaptation / Scene Transfer 1) Project a classifier based on scene-independent features (works for multiple camera views) 2) Deploy the classifier in a specific camera and let it run for hours 3) Select examples classified with high confidence 4) Use these examples to normalize scene-dependent features and retrain the classifier 18

Classification after Object Segmentation Incorporating appearance features So far we just considered shape and motion descriptors from the segmented object (foreground blob), which are limited to handle multiple object classes [Li et al, Real-time object classification in video surveillance based on appearance learning, CVPR 07] Local Binary Patterns + Adaboost Learning for appearance classification Moving objects are classified into 6 classes: car, van, truck, person, bike, and group of people Large training set is required 19

IBM System Interactive Interface User specifies Regions of Interest (ROI) for each class User specifies the size of objects in different locations of the image to compensate for projective distortions. 20

IBM System Bayesian Classifier Class Size Velocity Position Shape_Deformation Size P(s x,c) and Position P(x,C) distributions initially obtained from Interactive interface All distributions are adapted as new samples are classified 21

IBM System Shape Deformation feature Histograms of oriented gradients (HOG) eight bins corresponding to eight directions Differences of HOGs (or histogram intersection) tell how much the shape was deformed, without requiring precise alignment of bounding boxes (as in Recurrent Motion Image feature) Histogram Intersection: i = Frame Number j = Bin number 22

IBM System Shape Deformation feature 23

Two Main Streams of Work for Far-Field Object Classification: 1) Methods that rely on moving object segmentation Use background subtraction to detect and track moving objects 2) Methods that do NOT use background modeling for classification Scan the entire video frame applying specialized detectors (e.g., car and pedestrian detectors) 24

Pedestrian Detection Learning Patterns of Motion and Appearance [Viola et al, ICCV 03] Training Pairs (frame t and t+1) Appearance Filters: rectangle features applied in one of the images (exactly like in Viola/Jones face detector) Motion Filters: rectangle features applied in the following images: Adaboost learning is used to select discriminative appearance+motion filters 25

Pedestrian Detection Learning Patterns of Motion and Appearance [Viola et al, ICCV 03] Robust to shadows and low-resolution imagery 26

Pedestrian Detection Learning Patterns of Motion and Appearance [Viola et al, ICCV 03] Static Detector (only appearance) Dynamic Detector (apperance+motion) 27

Pedestrian Detection Histograms of Oriented Gradients (HOG) for Human Detection [Dalal and Triggs, CVPR 05] State-of-the-art approach for pedestrian detection Dense grid of cells in the detection window. For each cell, compute HOG and train an SVM with the concatenated vector of HOGs SOURCE CODE: http://pascal.inrialpes.fr/soft/olt/ 28

Pedestrian Detection Histograms of Oriented Gradients (HOG) for Human Detection [Dalal and Triggs, CVPR 05] 29

Pedestrian Detection Configuration estimates Improve Pedestrian Finding [Tran & Forsyth, NIPS 07] Structured Learning for detecting parts 30

Car Detection Local Statistics of Parts [Schneiderman, 2000] See Face Detection class for details about this method! 31

Summary 1) Methods that rely on moving object segmentation Fast and reliable approach for static cameras with few objects. These methods do not work for moving cameras or crowded scenes, where background subtraction results are not meaningful 2) Methods that do NOT use background modeling for classification Useful for crowded scenes and moving cameras. Work better under shadows and lighting changes. Training is very expensive (collecting samples and training time). More false positives and sometimes problems with generalization depending on the training/test set. Difficult to handle multiple object poses. 32

Part II Object Classification in Near-field Video 33

Motivation Object Classification in near-field video License Plate Recognition (LPR) 34

Motivation Object Classification in near-field video Recognizing Products in Retail Stores for Loss Prevention Veggie Vision - http://www.research.ibm.com/ecvg/jhc_proj/veggie.html Loss Prevention in Self-Checkout 35

Motivation Object Classification in near-field video LaneHawk (check Evolution Robotics Retail Company) http://www.evolution.com/products/lanehawk/ 36

Near-Field Object Classification We will briefly cover: 1) Shape-Based Methods 2) Methods Based on Bag of Words 37

Shape-based Approaches Shape Context Matching [Belongie et al, 2000] Source Code: http://www.eecs.berkeley.edu/research/projects/cs/vision/shape/sc_digits.htm l 38

Shape-based Approaches Shape Context Matching [Belongie et al, 2000] Linear Assignment (Bipartite Graph Matching) for establishing correspondences 39

Shape-based Approaches Deformable Shape Matching [Berg et al, CVPR 2005] Quadratic Assignment (approximation, since NP-Hard) for establishing correspondences 40

Shape-based Approaches Deformable Shape Matching [Berg et al, CVPR 2005] 41

Shape-based Approaches Learning Graph Matching [Caetano et al, ICCV 07] Pairs of Labeled Matches as training 42

Shape-based Approaches Learning Graph Matching [Caetano et al, ICCV 07] Structured learning problem: given pair of graphs (shapes), predict a matching matrix that provide the best alignment They show that linear assignment (node-node consistency) with learning can match (or exceed) quadratic assignment (edge-edge consistency) without learning Source Code: Structured SVMs http://svmlight.joachims.org/svm_struct.html 43

Near-Field Object Classification We will briefly cover: 1) Shape-Based Methods 2) Methods Based on Bag of Words 44

Bag of Words Excellent Course: Recognizing and Learning Object Categories http://people.csail.mit.edu/torralba/shortcourserloc/ They have a much more detailed presentation about this topic, including Source Code! 45

Object Bag of words Slide from Fei-Fei Li 46

Analogy to documents Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted sensory, point brain, by point to visual centers in the visual, brain; the perception, cerebral cortex was a movie screen, so to speak, upon which the image in retinal, the eye was cerebral projected. Through cortex, the discoveries of eye, Hubel cell, and Wiesel optical we now know that behind the origin of the visual perception in the nerve, brain there image is a considerably more complicated Hubel, course of Wiesel events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a stepwise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image. China is forecasting a trade surplus of $90bn ( 51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures China, are likely trade, to further annoy the surplus, US, which has commerce, long argued that China's exports are unfairly helped by a deliberately exports, undervalued imports, yuan. Beijing US, agrees the yuan, surplus bank, is too high, domestic, but says the yuan is only one factor. Bank of China governor Zhou foreign, Xiaochuan increase, said the country also needed to do trade, more to value boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value. Slide from Fei-Fei Li 47

Slide from Fei-Fei Li 48

learning recognition feature detection & representation codewords dictionary image representation category models (and/or) classifiers 49 Slide from Fei-Fei Li category decision

Bag of Words Feature Extraction (Interest Points / SIFT) Learning / Classification Generative Models Discriminative Classifiers 50

51

Generative Models based on Bag of Words See [Sivic et al, Discovering Object Categories in Image Collections, 2005] - Probabilistic Latent Semantic Analysis (plsa) - Latent Dirichlet Alocation (LDA) d D z w N face 52

Slide from Fei-Fei Li 53

Discriminative methods based on bag of words Grauman & Darrell, 2005, 2006: SVM w/ Pyramid Match kernels Others Csurka, Bray, Dance & Fan, 2004 Serre & Poggio, 2005 54

Pyramid match kernel optimal partial matching between sets of features Grauman & Darrell, 2005, Slide credit: Kristen Grauman 55

Pyramid Match (Grauman & Darrell 2005) Histogram intersection Slide credit: Kristen Grauman 56

Pyramid Match (Grauman & Darrell 2005) Histogram intersection matches at this level matches at previous level Difference in histogram intersections across levels counts number of new pairs matched Slide credit: Kristen Grauman 57

Pyramid match kernel histogram pyramids measure of difficulty of a match at level i Weights inversely proportional to bin size Normalize kernel values to avoid favoring large sets Slide credit: Kristen Grauman number of newly matched pairs at level i 58

Example pyramid match Slide credit: Kristen Grauman Level 0 59

Example pyramid match Slide credit: Kristen Grauman Level 1 60

Example pyramid match Slide credit: Kristen Grauman Level 2 61

Example pyramid match pyramid match Slide credit: Kristen Grauman 62

Summary: Pyramid match kernel optimal partial matching between sets of features difficulty of a match at level i number of new matches at level i 63 Slide credit: Kristen Grauman