Foveated Vision and Object Recognition on Humanoid Robots

Similar documents
Visual Odometry. Features, Tracking, Essential Matrix, and RANSAC. Stephan Weiss Computer Vision Group NASA-JPL / CalTech

FACE RECOGNITION USING INDEPENDENT COMPONENT

Filtering, scale, orientation, localization, and texture. Nuno Vasconcelos ECE Department, UCSD (with thanks to David Forsyth)

Computer Vision I. Dense Stereo Correspondences. Anita Sellent 1/15/16

Particle Filtering. CS6240 Multimedia Analysis. Leow Wee Kheng. Department of Computer Science School of Computing National University of Singapore

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

MORPH-II: Feature Vector Documentation

Robot vision review. Martin Jagersand

Local Features Tutorial: Nov. 8, 04

Depth. Common Classification Tasks. Example: AlexNet. Another Example: Inception. Another Example: Inception. Depth

Real-Time Scale Invariant Object and Face Tracking using Gabor Wavelet Templates

CAP 5415 Computer Vision Fall 2012

Dynamic Time Warping for Binocular Hand Tracking and Reconstruction

Face Detection Using Convolutional Neural Networks and Gabor Filters

The SIFT (Scale Invariant Feature

Online Learning for Object Recognition with a Hierarchical Visual Cortex Model

SUMMARY: DISTINCTIVE IMAGE FEATURES FROM SCALE- INVARIANT KEYPOINTS

Short Survey on Static Hand Gesture Recognition

Computational Foundations of Cognitive Science

12/3/2009. What is Computer Vision? Applications. Application: Assisted driving Pedestrian and car detection. Application: Improving online search

Oriented Filters for Object Recognition: an empirical study

Learning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009

CHAPTER 1 Introduction 1. CHAPTER 2 Images, Sampling and Frequency Domain Processing 37

EE795: Computer Vision and Intelligent Systems

Illumination-Robust Face Recognition based on Gabor Feature Face Intrinsic Identity PCA Model

Digital Image Processing COSC 6380/4393

Support Vector Machines

All good things must...

3D Face and Hand Tracking for American Sign Language Recognition

Robust Model-Free Tracking of Non-Rigid Shape. Abstract

Ceilbot vision and mapping system

Introduction to visual computation and the primate visual system

9.913 Pattern Recognition for Vision. Class I - Overview. Instructors: B. Heisele, Y. Ivanov, T. Poggio

Last update: May 4, Vision. CMSC 421: Chapter 24. CMSC 421: Chapter 24 1

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

OBJECT RECOGNITION ALGORITHM FOR MOBILE DEVICES

CHAPTER 5 GLOBAL AND LOCAL FEATURES FOR FACE RECOGNITION

Object Tracking using HOG and SVM

Categorization by Learning and Combining Object Parts

CS 231A Computer Vision (Fall 2012) Problem Set 3

Seminar Heidelberg University

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Video Mosaics for Virtual Environments, R. Szeliski. Review by: Christopher Rasmussen

TEXTURE ANALYSIS USING GABOR FILTERS FIL

BSB663 Image Processing Pinar Duygulu. Slides are adapted from Selim Aksoy

Augmented Reality VU. Computer Vision 3D Registration (2) Prof. Vincent Lepetit

CS 223B Computer Vision Problem Set 3

Edge and local feature detection - 2. Importance of edge detection in computer vision

TEXTURE ANALYSIS USING GABOR FILTERS

Binocular Stereo Vision

Texture. Outline. Image representations: spatial and frequency Fourier transform Frequency filtering Oriented pyramids Texture representation

Model-based Visual Tracking:

Representing 3D Objects: An Introduction to Object Centered and Viewer Centered Models

Stereo and Epipolar geometry

lecture 10 - depth from blur, binocular stereo

References. [9] C. Uras and A. Verri. Hand gesture recognition from edge maps. Int. Workshop on Automatic Face- and Gesture-

Miniature faking. In close-up photo, the depth of field is limited.

Visual Tracking. Image Processing Laboratory Dipartimento di Matematica e Informatica Università degli studi di Catania.

Global Gabor features for rotation invariant object classification

3D Object Recognition using Multiclass SVM-KNN

Road-Sign Detection and Recognition Based on Support Vector Machines. Maldonado-Bascon et al. et al. Presented by Dara Nyknahad ECG 789

Pairwise Threshold for Gaussian Mixture Classification and its Application on Human Tracking Enhancement

Proc. Int. Symp. Robotics, Mechatronics and Manufacturing Systems 92 pp , Kobe, Japan, September 1992

Final Exam Study Guide

Local Feature Detectors

Feature Descriptors. CS 510 Lecture #21 April 29 th, 2013

IMPROVING SPATIO-TEMPORAL FEATURE EXTRACTION TECHNIQUES AND THEIR APPLICATIONS IN ACTION CLASSIFICATION. Maral Mesmakhosroshahi, Joohee Kim

Robust Vision-Based Detection and Grasping Object for Manipulator using SIFT Keypoint Detector

Neural Network and Deep Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Local Image preprocessing (cont d)

Binocular cues to depth PSY 310 Greg Francis. Lecture 21. Depth perception

WP1: Video Data Analysis

Face Cyclographs for Recognition

Does the Brain do Inverse Graphics?

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Human Robot Communication. Paul Fitzpatrick

Contents I IMAGE FORMATION 1

1 (5 max) 2 (10 max) 3 (20 max) 4 (30 max) 5 (10 max) 6 (15 extra max) total (75 max + 15 extra)

Support Vector Machines

CS201: Computer Vision Introduction to Tracking

EE795: Computer Vision and Intelligent Systems

CS 4495 Computer Vision A. Bobick. CS 4495 Computer Vision. Features 2 SIFT descriptor. Aaron Bobick School of Interactive Computing

Implementing the Scale Invariant Feature Transform(SIFT) Method

CS4733 Class Notes, Computer Vision

Eppur si muove ( And yet it moves )

Exercise: Simulating spatial processing in the visual pathway with convolution

Style-based Inverse Kinematics

ECE 172A: Introduction to Intelligent Systems: Machine Vision, Fall Midterm Examination

Fourier Transform and Texture Filtering

Fitting D.A. Forsyth, CS 543

Epipolar Geometry in Stereo, Motion and Object Recognition

Kinematics of the Stewart Platform (Reality Check 1: page 67)

Component-based Face Recognition with 3D Morphable Models

CS4670: Computer Vision

Types of Edges. Why Edge Detection? Types of Edges. Edge Detection. Gradient. Edge Detection

Image Features: Local Descriptors. Sanja Fidler CSC420: Intro to Image Understanding 1/ 58

Vision-based Mobile Robot Localization and Mapping using Scale-Invariant Features

Practice Exam Sample Solutions

Final Exam Study Guide CSE/EE 486 Fall 2007

School of Computing University of Utah

Transcription:

Foveated Vision and Object Recognition on Humanoid Robots Aleš Ude Japan Science and Technology Agency, ICORP Computational Brain Project Jožef Stefan Institute, Dept. of Automatics, Biocybernetics and Robotics

Foveated Humanoid Vision System

Foveated Vision Vision should be able to deal with fast robot movements and occlusions. Motor control should be able to deal with vision failures. Tight integration between perception and motor control is necessary.

Objective Our goal is to integrate peripheral and foveal vision with the motor control and to use foveal vision for the task for which it is suited best: Object recognition in dynamic scenes

Motor Control for Foveated Vision Two goals: - position the object in the fovea of each eye - move smoothly to enable processing of foveal images 3-D stereo vision using actuated, head-mounted cameras is difficult due to inaccurate joint angle readings, delays, and vibrations. Uncalibrated stereo results in smoother movements.

Closed-Loop Control System Network of PD-controllers to exploit the redundancy of our humanoid. The controller network attempts to: position the object in the fovea, introduce cross-coupling between the eyes to help the eye movements if the object is lost in one view, assist preceding joints to maintain natural posture away from the joint limits

Example Controller D D joint blob ( * θ ) joint θ joint K & d ( * x ) blob xblob Kdvx& blob = θ = joint Left eye pan: θ & LEP = K p [ K K relaxation D LEP cross-target EP K v K C target EP v LX target LX target + RX target D K C RX target ] D

Head nod: θ & HN Example Controller ( * θ ) joint θ & joint θ joint D joint = K d [ K D K ( D D )] = K p relaxation HN ET > HN LET + RET

Control System Properties Accurate forward kinematics is not needed. The system can automatically compensate for failures in joint movements. When the target is not visible, the system brings the robot back to the preferred posture.

DB head motion

DB2 head motion

Positioning in the fovea Foveal cameras are vertically displaced from peripheral cameras. Vertical displacement from the central position in peripheral views to get the object in the center of foveal views. Constant displacement is sufficient because peripheral cameras are equipped with very wide angles lenses.

Is it useful for foveal vision?

Vision System Overview Visual search in peripheral images for object detection that initializes tracking. Results of tracking used to control the robot and normalize the images. Early processing for feature extraction. Learning and classification.

Probabilistic Approach Gaussian color mixture models p( I u Θ l )= K l k=1 ω l,k exp 1 2 I u I l,k ( ) T Σ l,k ( 2π ) 2 or 3 det( Σ ) l,k 1 Iu I l,k ( ) Random search to trigger the system to attend a high probability region. Automatic threshold selection.

Tracking Probabilistic approach: color distributions pixel (shape) distributions p( u Θ l )= 2π 1 det Γ l ( ) exp 1 ( 2 u u ) T l Γ 1 ( l u u ) l EM algorithm to minimize log-likelihood with respect to shape {u l,γ l } and mixture parameters {ω l }. Real-time implementation (60 Hz).

Tracking and Pursuit

Normalization To compare the images, we must account for changes in object position and scale. The tracker calculates an approximation for the object s position, orientation and scale. This limits our recognition system to the object we can track, but this is a necessary prerequisite to get the object into fovea anyway.

Normalization To compare the images, we must account for changes in object position and scale

Affine warping In our system, affine warping accounts for translations, scale changes and planar rotations As in all viewpoint-dependent models, we must account for rotations in depth by collecting sufficient amount of training data. = 1 1 0 0 1 ' ' 23 22 21 13 12 11 v u a a a a a a v u

Object Representations Geon structural description models: nonaccidental properties (Marr, Biederman, ) - difficult to build such descriptions View-based approaches: collections of viewpoint dependent surfaces and contours (Tarr, Poggio, Bülthoff, ) - problems with generalization over different instances of a perceptually defined class

Object Recognition System View-based approach: train the system by showing the object from many viewpoints. Preprocess the images to achieve robustness against change in position, orientation, scale and brightness and to extract relevant features. Classification using support vector machines.

Training Data Collection QuickTime and a YUV420 codec decompressor are needed to see this picture. QuickTime and a YUV420 codec decompressor are needed to see this picture.

Training Images No accurate turntables to systematically capture all possible viewpoints. Is such training data good enough for recognition?

Feature Extraction Gabor filters are defined as a convolution of the image with a family of Gabor kernels: Θ k (x) = k 2 σ exp k 2 x 2 2 2σ 2 exp( ik *x) exp σ 2 2

Gabor Kernels Odd Gabor wavelets (Gaussian modulated by sine wave) Even Gabor wavelets (Gaussian modulated by cosine wave)

Properties of Gabor Kernels Biological relevance: they have similar shape as the receptive fields of simple cells in the visual cortex. Machine vision: Best tradeoff for localization in image and frequency space, which yields robustness against small distortions, rotation, and scaling.

Gabor Jets Gabor jets (Wiskott et al.) are calculated at each node. It has been suggested to compute jets with: m+ 2 cos( nπ / 8) m = 0,...,4 k, 2 2 m n = π, sin( nπ / 8) n = 0,...,7 New representation consists of magnitudes of 40 complex values calculated by convolution with image patch at each node.

Improving Performance Compute convolutions with Gabor filters using Fourier transform. This makes it possible to calculate Gabor jets at 15 Hz for 40 different orientations and scales 30 Hz for 16 different orientations and scales.

Model Representation with Gabor Jets Images of all objects, which need to be included in the database, from many different viewpoints are acquired to account for rotations in depth. Conversion into Gabor jets at each lattice node. Initially, we used PCA to extract a reduced number of features.

Learning Machines for Classification Data: observatio and the ns x asocciated R class { 1,1} Assumption: data is drawn from an unknown distribution P(x,y). The task of a learning machine is to learn mapping x i y i. i The machine is defined by a set of posssible mappings {f(x,α)}. n y i

Support Vector Machines for Classification SVMs are based on class of hyperplanes: ( w x) + b = 0, w R R corresponding to decision functions sgn(( w x) Optimal linear SVMs can be calculated by solving a quadratic program. n, + b) b

Nonlinear SVMs Decision functions f( x) = sgn αi yi K( xi, x) + b, i Polynomial kernels: RBF kernels: Sigmoid kernels: ( x, y) = ( x* y 1) p K + K K ( ) ( ) 2 2 x, y = exp x y /(2σ ) ( x, y) = tanh ( κ ( x * y) δ )

Recognition with SVMs We considered two different problems: Given one object, decide whether this object is in the image or not. can be solved by one SVM Decide which of the objects from the database is in the image. many SVMs organized in a tree structure

Implementation Details System implemented on 3 PCs that communicate over Ethernet First PC: Detection & Tracking Second PC: Classification Third PC: Closed-loop kinematic control and communication with DB. Time needed for classification (in case of Gabor jets + RBF SVMs): between 15 and 30 Hz (at 160x120 resolution)

Statistical tests (less training images) False positives False negatives Teddy Bear 1 Teddy Bear 2 Teddy Bear 3 Toy Dog Coffee Mug 4.5 % 7.8 % 1.4 % 3.9 % 2.1 % 0.5 % 0.3 % 0.9 % 2.7 % 0.7 % 200 images / object, 120 x 160 pixels, linear SVMs Teddy Bear 1 Teddy Bear 2 Teddy Bear 3 Toy Dog False positives 9.9 % 13.8 % 13.6 % 11.1 % False negatives 0.3 % 0.3 % 0.1 % 2.3 % 100 images / object, 120 x 160 pixels, linear SVMs Coffee Mug 2.1 % 0.1 %

Statistical tests (resolution reduction) Teddy Bear 1 Teddy Bear 2 Teddy Bear 3 Toy Dog False positives 5.8 % 6.8 % 3.1 % 2.7 % False negatives 0 % 0.3 % 0 % 0.3 % 200 images / object, 120 x 160 pixels, RBF SVMs Coffee Mug 0.8 % 0 % False positives False negatives Teddy Bear 1 Teddy Bear 2 Teddy Bear 3 Toy Dog Coffee Mug 9.1 % 9.3 % 10.5 % 8.9 % 9.5 % 0.5 % 1.0 % 0.3 % 3.0 % 0 % 200 images / object, 45 x 60 pixels, RBF SVMs

Statistical tests (linear and nonlinear SVMs) Teddy Bear 1 Teddy Bear 2 Teddy Bear 3 Toy Dog False positives 5.8 % 6.8 % 3.1 % 2.7 % False negatives 0 % 0.3 % 0 % 0.3 % 200 images / object, 120 x 160 pixels, RBF SVMs Coffee Mug 0.8 % 0 % False positives False negatives Teddy Bear 1 Teddy Bear 2 Teddy Bear 3 Toy Dog 4.5 % 7.8 % 1.4 % 3.9 % 0.5 % 0.3 % 0.9 % 2.7 % 200 images / object, 120 x 160 pixels, linear SVMs Coffee Mug 2.1 % 0.7 %

Conclusion Gabor jets + RBF-based support vector machines offer a very robust and reliable way to identify and recognize objects in the humanoid s foveal views. Biologically oriented system: foveated vision, Gabor filtering, view-based recognition.