Multimodal detection and recognition of persons with a static robot

Similar documents
Human Motion Detection and Tracking for Video Surveillance

Beyond Bags of Features

Robotics Programming Laboratory

Background Subtraction in Varying Illuminations Using an Ensemble Based on an Enlarged Feature Set

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Human Detection. A state-of-the-art survey. Mohammad Dorgham. University of Hamburg

DYNAMIC BACKGROUND SUBTRACTION BASED ON SPATIAL EXTENDED CENTER-SYMMETRIC LOCAL BINARY PATTERN. Gengjian Xue, Jun Sun, Li Song

Detecting motion by means of 2D and 3D information

Category vs. instance recognition

PEOPLE IN SEATS COUNTING VIA SEAT DETECTION FOR MEETING SURVEILLANCE

Learning 3D Part Detection from Sparsely Labeled Data: Supplemental Material

Silhouette-Based Method for Object Classification and Human Action Recognition in Video

International Journal of Innovative Research in Computer and Communication Engineering

A NOVEL APPROACH TO ACCESS CONTROL BASED ON FACE RECOGNITION

Detecting and Segmenting Humans in Crowded Scenes

Scalable Object Classification using Range Images

Distance-driven Fusion of Gait and Face for Human Identification in Video

Analysis of Local Appearance-based Face Recognition on FRGC 2.0 Database

Keywords Wavelet decomposition, SIFT, Unibiometrics, Multibiometrics, Histogram Equalization.

Gurmeet Kaur 1, Parikshit 2, Dr. Chander Kant 3 1 M.tech Scholar, Assistant Professor 2, 3

Object Category Detection. Slides mostly from Derek Hoiem

CSE/EE-576, Final Project

Object detection using non-redundant local Binary Patterns

A Background Subtraction Based Video Object Detecting and Tracking Method

HYBRID CENTER-SYMMETRIC LOCAL PATTERN FOR DYNAMIC BACKGROUND SUBTRACTION. Gengjian Xue, Li Song, Jun Sun, Meng Wu

Local Correlation-based Fingerprint Matching

CV of Qixiang Ye. University of Chinese Academy of Sciences

BIOMET: A Multimodal Biometric Authentication System for Person Identification and Verification using Fingerprint and Face Recognition

TRANSPARENT OBJECT DETECTION USING REGIONS WITH CONVOLUTIONAL NEURAL NETWORK

A Texture-based Method for Detecting Moving Objects

Person identification from spatio-temporal 3D gait

Pedestrian Detection and Tracking in Images and Videos

Real Time Stereo Vision Based Pedestrian Detection Using Full Body Contours

A Street Scene Surveillance System for Moving Object Detection, Tracking and Classification

Visuelle Perzeption für Mensch- Maschine Schnittstellen

International Journal of Advanced Research in Computer Science and Software Engineering

Integration of Multiple-baseline Color Stereo Vision with Focus and Defocus Analysis for 3D Shape Measurement

K-Nearest Neighbor Classification Approach for Face and Fingerprint at Feature Level Fusion

FAST HUMAN DETECTION USING TEMPLATE MATCHING FOR GRADIENT IMAGES AND ASC DESCRIPTORS BASED ON SUBTRACTION STEREO

Pedestrian and Part Position Detection using a Regression-based Multiple Task Deep Convolutional Neural Network

Human Detection and Tracking for Video Surveillance: A Cognitive Science Approach

Experiments of Image Retrieval Using Weak Attributes

Human-Robot Interaction

Face Recognition Using Vector Quantization Histogram and Support Vector Machine Classifier Rong-sheng LI, Fei-fei LEE *, Yan YAN and Qiu CHEN

Lecture 12 Recognition. Davide Scaramuzza

Applied Statistics for Neuroscientists Part IIa: Machine Learning

A Feature Point Matching Based Approach for Video Objects Segmentation

Efficient Kernels for Identifying Unbounded-Order Spatial Features

Paired Region Approach based Shadow Detection and Removal

Available online at ScienceDirect. Procedia Computer Science 56 (2015 )

Tri-modal Human Body Segmentation

Estimating Human Pose in Images. Navraj Singh December 11, 2009

Real-Time Model-Based Hand Localization for Unsupervised Palmar Image Acquisition

Obtaining Biometric ROC Curves from a Non-Parametric Classifier in a Long-Text-Input Keystroke Authentication Study

MULTIPLE HUMAN DETECTION AND TRACKING BASED ON WEIGHTED TEMPORAL TEXTURE FEATURES

Expanding gait identification methods from straight to curved trajectories

Research on Recognition and Classification of Moving Objects in Mixed Traffic Based on Video Detection

Random Forest A. Fornaser

Human detection solution for a retail store environment

3D object recognition used by team robotto

Continuous User Authentication Using Temporal Information

Outline. Incorporating Biometric Quality In Multi-Biometrics FUSION. Results. Motivation. Image Quality: The FVC Experience

Image Classification based on Saliency Driven Nonlinear Diffusion and Multi-scale Information Fusion Ms. Swapna R. Kharche 1, Prof.B.K.

People detection and tracking using stereo vision and color

Find that! Visual Object Detection Primer

Fast and Stable Human Detection Using Multiple Classifiers Based on Subtraction Stereo with HOG Features

Fish species recognition from video using SVM classifier

An Object Detection System using Image Reconstruction with PCA

Lecture 12 Recognition

Group Visual Sentiment Analysis

Part-based and local feature models for generic object recognition

Definition, Detection, and Evaluation of Meeting Events in Airport Surveillance Videos

Object Tracking System Using Motion Detection and Sound Detection

Extraction of Human Gait Features from Enhanced Human Silhouette Images

Real-Time Face Detection using Dynamic Background Subtraction

Comparison of different preprocessing techniques and feature selection algorithms in cancer datasets

PERFORMANCE OF FACE RECOGNITION WITH PRE- PROCESSING TECHNIQUES ON ROBUST REGRESSION METHOD

MULTI ORIENTATION PERFORMANCE OF FEATURE EXTRACTION FOR HUMAN HEAD RECOGNITION

Mobile Human Detection Systems based on Sliding Windows Approach-A Review

A Statistical Approach to Culture Colors Distribution in Video Sensors Angela D Angelo, Jean-Luc Dugelay

HUMAN HEIGHT ESTIMATION USING A CALIBRATED CAMERA

Real-Time Tracking of Multiple People through Stereo Vision

Repositorio Institucional de la Universidad Autónoma de Madrid.

Approach to Increase Accuracy of Multimodal Biometric System for Feature Level Fusion

Automated Visual Inspection for Missing or Misaligned Components in SMT Assembly

Human detection using local shape and nonredundant

The Pennsylvania State University. The Graduate School. College of Engineering ONLINE LIVESTREAM CAMERA CALIBRATION FROM CROWD SCENE VIDEOS

Multi-Channel Adaptive Mixture Background Model for Real-time Tracking

Non-rigid body Object Tracking using Fuzzy Neural System based on Multiple ROIs and Adaptive Motion Frame Method

AUTOMATED THRESHOLD DETECTION FOR OBJECT SEGMENTATION IN COLOUR IMAGE

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Local features: detection and description May 12 th, 2015

Backpack: Detection of People Carrying Objects Using Silhouettes

Queue based Fast Background Modelling and Fast Hysteresis Thresholding for Better Foreground Segmentation

Adaptive Background Mixture Models for Real-Time Tracking

Visual Monitoring of Railroad Grade Crossing

Adaptive Gesture Recognition System Integrating Multiple Inputs

Classification of objects from Video Data (Group 30)

Recognizing Apples by Piecing Together the Segmentation Puzzle

Transcription:

Multimodal detection and recognition of persons with a static robot Jaldert Rombouts rombouts@ai.rug.nl Internal advisors: prof. dr. L.R.B Schomaker. Artificial Intelligence, University of Groningen drs. T. van der Zant. Artificial Intelligence, University of Groningen External advisor: dr. P. E. Rybski. Robotics Institute, Carnegie Mellon University

Overview Introduction Background and Approach Experiments and Results Discussion Questions

Introduction SnackBot (Lee et al., 2009) Human-Robot Interaction (HRI) Vending machine Topic: Person detection and recognition

Introduction Solutions: ID-Cards, biometrics (Jain et al., 2004b) Disadvantage: Close proximity, conscious user effort More natural solution? Based on soft biometrics (Jain et al., 2004a) Color, gait, shape (combinations) Passive (e.g. camera)

Introduction Implemented soft-biometric system(s) based on related work Evaluated performance: Multiple poses Various distances

Background and Approach Segmentation Feature extraction Data set First: robot and sensors

Robot and Sensors 8 7 Model Foreground COG 6 5 X (meters) 4 3 2 1 0-4 -3-2 -1 0 1 2 3 4 Y (meters)

Segmentation Implemented two methods: 1. Background modeling based (Horprasert et al. (1999)) 2. Stereo based (Darrell et al. (2000); Zhao et al. (2000)) Combined with laser-based leg-detector

Feature extraction Color: (HS)V, nrgb, Y(CrCb), CIE-L(ab) Torso or Head + Torso (spatial histogram) 1. Mean + standard deviation 2. 1D and 2D chromaticity histograms 4, 8, 16, 32 bins Person Height: from stereo

Data set 30 persons (fairly large w.r.t. related work) 2 environments, 9 positions, 4 poses Repeated recording for validation

Low office dividers Chair Opening Drawers Experimenter Desk 510 meter High office dividers Table Chair 8 9 7 Boxes 6 5 4 3 2 1 Legend Point in cluttered scene Point in sparse scene 710 meter

Low office dividers Chair Opening Drawers Experimenter Desk 510 meter High office dividers Table Chair 8 9 7 Boxes 6 5 4 3 2 1 Legend Point in cluttered scene Point in sparse scene 710 meter

Low office dividers Chair Opening Drawers Experimenter Desk 510 meter High office dividers Table Chair 8 9 7 Boxes 6 5 4 3 2 1 Legend Point in cluttered scene Point in sparse scene 710 meter

Data set - Poses

Data set - Poses

Overview Introduction Background and Approach Experiments and Results Discussion Questions

Recognition Main questions: What is the best set of features for recognition? Robustness against pose and distance? Difference between segmentation methods? Difference between environments?

Testing method Classifiers: K-Nearest Neighbor (knn) Support Vector Machine (SVM) Random Forest (RF) (Breiman, 2001) Good performance on large featurevectors with low information features

Testing method Cross-validation Average CA over environments

Recognition Feature selection: 1. Color space 2. Color feature 3. Combining height Detailed experiments: Environment, location and pose

Color features (1+2) HSV 2D 32 bin histogram was best Features extracted from torso slightly better than head+torso ±0.55 for DS and ±0.64 for BGM (Baseline ~0.03)

Bin size vs. CA 0.64 0.62 knn RF SVM 0.6 0.58 0.56 CA 0.54 0.52 0.5 0.48 0.46 4 8 16 32 Bin size

Combining height Not trivial: knn and SVM use distances in feature-space Small impact single feature Idea: make height more important by scaling axis

0.8 0.75 SVM-DS SVM-BGM knn-ds knn-bgm RF-DS RF-BGM 0.7 CA 0.65 0.6 0.55 0 10 20 30 40 50 60 70 80 Scaling Factor

Combining height SVM and knn profit from height (.15 in CA) RF only marginally (±0.01): overfitting Height does not seem important at trainlocation, but gains importance with distance RF cannot make use of domain-knowledge designer

Detailed experiments 1. Environment 2. Position 3. Pose

Position Clear influence of distance: 0.98-1.0 (DS) and 0.95-1.0 (BGM) [close] 0.65-0.85 (DS) and 0.77-0.88 (BGM) [medium] 0.42-0.59 (DS) and 0.58-0.70 (BGM) [far] Scores BGM/DS very similar for locations 1-6, BGM better at 7-9

Pose Robustness to varying pose Train on single pose (e.g. front), test on all poses: Per location (all four poses) 1 vs. 4: ±0.10 drop in CA when averaged over all locations

Summary Simple features yield good performance BGM better than DS (esp. further locations) Little influence of environment Clear influence of position (distance) Reasonably robust to pose

Discussion Careful with generalization: e.g. HSV might be best in our experimental circumstances, but worse in others Fitted to subjects/environments? Try more environments/subjects

Discussion Intra-day recognition only (Darrell et al. 2000; Harville, 2005) Combine with e.g. face, voice No unsupervised enrollment of users

Questions?

References Breiman, L. (2001). Random forests. Machine learning, 45(1):5 32. Darrell, T., Gordon, G., Harville, M., and Woodfill, J. (2000). Integrated Person Tracking Using Stereo, Color, and Pattern Detection. International Journal of Computer Vision, 37(2):175 185. Demsar, J., Zupan, B., Leban, G., and Curk, T. (2004). Orange: From Ex- Harville, M. (2005). Stereo person tracking with short and long term plan-view appearance models of shape and color. In IEEE, editor, IEEE Conference on Advanced Video and Signal Based Surveillance, 2005. AVSS 2005, pages 522 527. Jain, Heikkilä, A., J. Dass, and S., Silvén, and O. Nandakumar, (2004). A real-time K. (2004a). system Can for monitoring soft biometric of cyclists traits assist user recognition? In Jain, A. K. and Ratha, N. K., editors, Biometric Technology for Human Identification, volume 5404, pages 561 572. SPIE. Jain, A., Ross, A., and Prabhakar, S. (2004b). An introduction to biometric recognition. Circuits and Systems for Video Technology, IEEE Transactions on, 14(1):4 20.

References Horprasert, T., Harwood, D., and Davis, L. S. (1999). A statistical approach for real-time robust background subtraction and shadow detection. In Proc. IEEE ICCV, volume 99, pages 1 19. Lee, M., Forlizzi, J., Rybski, P., Crabbe, F., Chung, W., Finkle, J., Glaser, E., and Kiesler, S. (2009). The snackbot: documenting the design of a robot for long-term human-robot interaction. In Proceedings of the 4th ACM/IEEE international conference on Human robot interaction, pages 7 14. ACM New York, NY, USA. Zhao, L. and Thorpe, C. E. (2000). Stereo-and neural network-based pedestrian detection. Intelligent Transportation Systems, IEEE Transactions on, 1(3):148 154.