Gaze interaction (2): models and technologies

Similar documents
Eye tracking by image processing for helping disabled people. Alireza Rahimpour

Prof. Fanny Ficuciello Robotics for Bioengineering Visual Servoing

IRIS SEGMENTATION OF NON-IDEAL IMAGES

3D Model Acquisition by Tracking 2D Wireframes

SIFT: SCALE INVARIANT FEATURE TRANSFORM SURF: SPEEDED UP ROBUST FEATURES BASHAR ALSADIK EOS DEPT. TOPMAP M13 3D GEOINFORMATION FROM IMAGES 2014

Chapter 9 Object Tracking an Overview

Feature Extraction and Image Processing, 2 nd Edition. Contents. Preface

Tutorial 8. Jun Xu, Teaching Asistant March 30, COMP4134 Biometrics Authentication

Computer vision: models, learning and inference. Chapter 13 Image preprocessing and feature extraction

Large-Scale 3D Point Cloud Processing Tutorial 2013

IRIS recognition II. Eduard Bakštein,

Learning the Deep Features for Eye Detection in Uncontrolled Conditions

Computer Vision Lecture 17

Image Formation. Antonino Furnari. Image Processing Lab Dipartimento di Matematica e Informatica Università degli Studi di Catania

Structure from Motion. Prof. Marco Marcon

Computer Vision Lecture 17

Segmentation and Tracking of Partial Planar Templates

Face Recognition At-a-Distance Based on Sparse-Stereo Reconstruction

Recap: Features and filters. Recap: Grouping & fitting. Now: Multiple views 10/29/2008. Epipolar geometry & stereo vision. Why multiple views?

Facial Processing Projects at the Intelligent Systems Lab

Schedule for Rest of Semester

Three-Dimensional Computer Vision

What is Computer Vision?

Announcements. Recognition I. Gradient Space (p,q) What is the reflectance map?

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Critique: Efficient Iris Recognition by Characterizing Key Local Variations

Webcam-based Eye Gaze Tracking under Natural Head Movement

Local qualitative shape from stereo. without detailed correspondence. Extended Abstract. Shimon Edelman. Internet:

Silhouette Coherence for Camera Calibration under Circular Motion

A face recognition system based on local feature analysis

Non-Differentiable Image Manifolds

Module 7 VIDEO CODING AND MOTION ESTIMATION

A Survey of Light Source Detection Methods

Eye Typing off the Shelf

Announcements. Recognition (Part 3) Model-Based Vision. A Rough Recognition Spectrum. Pose consistency. Recognition by Hypothesize and Test

3D Computer Vision. Structured Light II. Prof. Didier Stricker. Kaiserlautern University.

Structured Light II. Thanks to Ronen Gvili, Szymon Rusinkiewicz and Maks Ovsjanikov

CHAPTER 1 Introduction 1. CHAPTER 2 Images, Sampling and Frequency Domain Processing 37

Recognition (Part 4) Introduction to Computer Vision CSE 152 Lecture 17

Perceptual Grouping from Motion Cues Using Tensor Voting

Eye Detection by Haar wavelets and cascaded Support Vector Machine

Feature Tracking and Optical Flow

Motion Analysis. Motion analysis. Now we will talk about. Differential Motion Analysis. Motion analysis. Difference Pictures

Non-line-of-sight imaging

CSE 252B: Computer Vision II

STRUCTURE AND MOTION ESTIMATION FROM DYNAMIC SILHOUETTES UNDER PERSPECTIVE PROJECTION *

COMPUTER AND ROBOT VISION

EE795: Computer Vision and Intelligent Systems

Object Recognition Using Pictorial Structures. Daniel Huttenlocher Computer Science Department. In This Talk. Object recognition in computer vision

Step-by-Step Model Buidling

Chapter 7. Conclusions and Future Work

Local invariant features

Stereo. Outline. Multiple views 3/29/2017. Thurs Mar 30 Kristen Grauman UT Austin. Multi-view geometry, matching, invariant features, stereo vision

Dense 3-D Reconstruction of an Outdoor Scene by Hundreds-baseline Stereo Using a Hand-held Video Camera

SIFT - scale-invariant feature transform Konrad Schindler

EE795: Computer Vision and Intelligent Systems

Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation. Range Imaging Through Triangulation

Massachusetts Institute of Technology Department of Computer Science and Electrical Engineering 6.801/6.866 Machine Vision QUIZ II

Occluded Facial Expression Tracking

Announcements. Computer Vision I. Motion Field Equation. Revisiting the small motion assumption. Visual Tracking. CSE252A Lecture 19.

Gaze Tracking. Introduction :

ABSTRACT I. INTRODUCTION

CS201: Computer Vision Introduction to Tracking

Requirements for region detection

Abstract Purpose: Results: Methods: Conclusions:

Anno accademico 2006/2007. Davide Migliore

Edge and corner detection

Visual Tracking. Image Processing Laboratory Dipartimento di Matematica e Informatica Università degli studi di Catania.

Stereo Vision. MAN-522 Computer Vision

User Calibration-free Method using Corneal Surface Image for Eye Tracking

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

Dual-state Parametric Eye Tracking

Applying Synthetic Images to Learning Grasping Orientation from Single Monocular Images

Robert Collins CSE598G. Intro to Template Matching and the Lucas-Kanade Method

POME A mobile camera system for accurate indoor pose

NON-INTRUSIVE INFRARED-FREE EYE TRACKING METHOD

Motion and Tracking. Andrea Torsello DAIS Università Ca Foscari via Torino 155, Mestre (VE)

Visual Odometry. Features, Tracking, Essential Matrix, and RANSAC. Stephan Weiss Computer Vision Group NASA-JPL / CalTech

Chapter 3 Image Registration. Chapter 3 Image Registration

CHAPTER 5 GLOBAL AND LOCAL FEATURES FOR FACE RECOGNITION

A ROBUST REAL TIME EYE TRACKING AND GAZE ESTIMATION SYSTEM USING PARTICLE FILTERS TARIQ IQBAL. Department of Computer Science

Ball detection and predictive ball following based on a stereoscopic vision system

Last update: May 4, Vision. CMSC 421: Chapter 24. CMSC 421: Chapter 24 1

3D Models and Matching

Non-linear dimension reduction

Robust Eye Gaze Estimation

Image Segmentation and Registration

Sensor Modalities. Sensor modality: Different modalities:

University of Cambridge Engineering Part IIB Module 4F12: Computer Vision and Robotics Handout 1: Introduction

Last Lecture. Bayer pattern. Focal Length F-stop Depth of Field Color Capture. Prism. Your eye. Mirror. (flipped for exposure) Film/sensor.

On Modeling Variations for Face Authentication

Elliptical Head Tracker using Intensity Gradients and Texture Histograms

Keeping features in the camera s field of view: a visual servoing strategy

Model Based Perspective Inversion

Robust Real-Time Eye Detection and Tracking Under Variable Lighting Conditions and Various Face Orientations

LUMS Mine Detector Project

Image processing and features

Product information. Hi-Tech Electronics Pte Ltd

Towards the completion of assignment 1

Epipolar Geometry in Stereo, Motion and Object Recognition

Transcription:

Gaze interaction (2): models and technologies Corso di Interazione uomo-macchina II Prof. Giuseppe Boccignone Dipartimento di Scienze dell Informazione Università di Milano boccignone@dsi.unimi.it http://homes.dsi.unimi.it/~boccignone/l Gaze interaction A. Vinciarelli, M. Pantic, H. Bourlard, Social Signal Processing: Survey of an Emerging Domain, Image and Vision Computing (2008)

Gaze estimation without eye trackers Problem! detect the existence of eyes accurately interpret eye positions in the images using the pupil or iris center. for video images, the detected eyes are tracked from frame to frame. Gaze estimation : detected eyes in the images used to estimate and track where a person is looking in 3D, or alternatively, determining the 3D line of sight. Gaze estimation without eye trackers

//eye models Identify a model of the eye which is sufficiently expressive to take account of large variability in the appearance and dynamics, while also sufficiently constrained to be computationally efficient Eyelids may appear straight from one view but highly curved from another. The iris contour also changes with viewing angle. The dashed lines indicate when the eyelids appear straight the solid yellow lines represent the major axis of the iris ellipse Even for the same subject, a relatively small variation in viewing angles can cause significant changes in appearance //eye models The eye image may be characterized by the intensity distribution of the pupil(s), iris, and cornea, their shapes. Ethnicity, viewing angle, head pose, color, texture, light conditions, the position of the iris within the eye socket, and the state of the eye (i.e., open/ close) are issues that heavily influence the appearance of the eye. The intended application and available image data lead to different prior eye models. The prior model representation is often applied at different positions, orientations, and scales to reject false candidates

//eye models Shape-based methods: use a prior model of eye shape and surrounding structures fixed shape deformable shape Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure intensity-based methods subspace-based methods Hybrid methods: combine feature, shape, and appearance approaches to exploit their respective benefits //eye models: Shape-Based Approaches Shape-based methods: use a prior model of eye shape and and a similarity measure Prior model of eye shape and surrounding structures iris and pupil contours and the exterior shape of the eye (eyelids) simple elliptical or of a more complex nature parameters of the geometric model define the allowable template deformations and contain parameters for rigid (similarity) transformations and parameters for nonrigid template deformations ability to handle shape, scale, and rotation changes

//eye models: Shape-Based Approaches Simple Elliptical Shape Models: example: Valenti and Gevers uses isophote (i.e., curves connecting points of equal intensity) properties to infer the center of (semi)circular patterns which represent the eyes //eye models: Shape-Based Approaches Simple Elliptical Shape Models:

//eye models: Shape-Based Approaches Simple Elliptical Shape Models: //eye models: Shape-Based Approaches Simple Elliptical Shape Models: example: Webcam-based Visual Gaze Estimation (Valenti et al) uses isophote (i.e., curves connecting points of equal intensity) no head pose voting Direction to the center

//eye models: Shape-Based Approaches Simple Elliptical Shape Models: example: Webcam-based Visual Gaze Estimation (Valenti et al) uses isophote (i.e., curves connecting points of equal intensity) no head pose //eye models: Shape-Based Approaches Simple Elliptical Shape Models: example: Webcam-based Visual Gaze Estimation (Valenti et al) uses isophote (i.e., curves connecting points of equal intensity) no head pose

//eye models: Shape-Based Approaches Simple Elliptical Shape Models: example: Webcam-based Visual Gaze Estimation (Valenti et al) uses isophote (i.e., curves connecting points of equal intensity) no head pose //eye models: Shape-Based Approaches Simple Elliptical Shape Models: example: Webcam-based Visual Gaze Estimation (Valenti et al) uses scale space framework for multiresolution

//eye models: Shape-Based Approaches Simple Elliptical Shape Models: example: Webcam-based Visual Gaze Estimation (Valenti et al) simple interpolants for easy calibration //eye models: Shape-Based Approaches Complex Shape Models: example: Yuille deformable templates

//eye models: Shape-Based Approaches Complex Shape Models: example: Yuille deformable templates //eye models: Shape-Based Approaches Complex Shape Models: example: Yuille deformable templates

//eye models: Shape-Based Approaches Complex Shape Models: example: Yuille deformable templates //eye models: Shape-Based Approaches Complex Shape Models: 1. computationally demanding, 2. may require high contrast images, and 3. usually need to be initialized close to the eye for successful localization. For large head movements, they consequently need other methods to provide agood initialization

//eye models: Feature-Based Shape Methods Explore the characteristics of the human eye to identify a set of distinctive features around the eyes. The limbus, pupil (dark/bright pupil images), and cornea reflections are common features used for eye localization Local Features by Intensity The eye region contains several boundaries that may bedetected by gray-level differences Local Feature by Filter Responses Filter responses enhance particular characteristics in the image while suppressing others. A filter bank may therefore enhance desired features of the image and, if appropriately defined, deemphasize irrelevant features //eye models: Feature-Based Shape Methods Local Features by Intensity The eye region contains several boundaries that may be detected by gray-level differences

//eye models: Feature-Based Shape Methods Local Features by Intensity The eye region contains several boundaries that may be detected by gray-level differences (Harper et al.) //eye models: Feature-Based Shape Methods Local Features by Intensity The eye region contains several boundaries that may be detected by gray-level differences Sequential search strategy

//eye models: Feature-Based Shape Methods Local Features by Intensity The eye region contains several boundaries that may be detected by gray-level differences //eye models: Feature-Based Shape Methods Local Features by Intensity The eye region contains several boundaries that may be detected by gray-level differences

//eye models: Feature-Based Shape Methods Local Feature by Filter Responses Filter responses enhance particular characteristics in the image while suppressing others Example Sirohey and Rosenfeld: Edges of the eye s sclera are detected with four Gabor wavelets. A nonlinear filter is constructed to detect the left and right eye corner candidates. The eye corners are used to determine eye regions for further analysis. Postprocessing steps are employed to eliminate the spurious eye corner candidates. A voting method is used to locate the edge of the iris. Since the upper part of the iris may not be visible, the votes are accumulated by summing edge pixels in a U-shaped annular region. The annulus center receiving the most votes is selected as the iris center To detect the edge of the upper eyelid, all edge segments are examined in the eye region and fitted to a third-degree polynomial //eye models: Feature-Based Shape Methods Local Feature by Filter Responses Filter responses enhance particular characteristics in the image while suppressing others Example Sirohey and Rosenfeld:

//eye models: Feature-Based Shape Methods Local Feature by Filter Responses Filter responses enhance particular characteristics in the image while suppressing others Example Sirohey and Rosenfeld: //eye models: Feature-Based Shape Methods Local Feature by Filter Responses Filter responses enhance particular characteristics in the image while suppressing others Example Sirohey and Rosenfeld:

//eye models: Feature-Based Shape Methods Local Feature by Filter Responses Filter responses enhance particular characteristics in the image while suppressing others Example Sirohey and Rosenfeld: //eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure intensity-based methods subspace-based methods

//eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure intensity-based methods ( example Grauman et al) During the first stage of processing, the eyes are automatically located by searching temporally for "blinklike" motion //eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure intensity-based methods ( example Grauman et al) During the first stage of processing, the eyes are automatically located by searching temporally for "blink-like" motion

//eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure intensity-based methods ( example Grauman et al) During the first stage of processing, the eyes are automatically located by searching temporally for "blink-like" motion //eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure intensity-based methods ( example Grauman et al)

//eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure intensity-based methods ( example Grauman et al) //eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure subspace methods (eigeneyes)

//eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure subspace methods (eigeneyes) How can we find an efficient representation of such a data set? Rather, than storing every image, we might try to represent the images more effectively, e.g., in a lower dimensional subspace We seek a linear basis with which each image in the ensemble is approximatedas a linear combination of basis images //eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure subspace methods (eigeneyes) How can we find an efficient representation of such a data set? Rather, than storing every image, we might try to represent the images more effectively, e.g., in a lower dimensional subspace let s select the basis to minimize squared reconstruction error

//eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure subspace methods (eigeneyes) How can we find an efficient representation of such a data set? Rather, than storing every image, we might try to represent the images more effectively, e.g., in a lower dimensional subspace The eigenvectors of the sample covariance matrix of the image data provide the major axis //eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure subspace methods (eigeneyes)

//eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure subspace methods (eigeneyes) //in summary... Shape-based methods: use a prior model of eye shape and surrounding structures fixed shape deformable shape Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure intensity-based methods subspace-based methods Hybrid methods: combine feature, shape, and appearance approaches to exploit their respective benefits Other methods: eye trackers: active light (IR)...we have already considered these

Gaze estimation Gaze: the gaze direction the point of regard (PoR or fixation) Gaze modeling consequently focuses on the relations between the image data and the point of regard/gaze direction. Gaze estimation //some general problems 1. camera-calibration: determining intrinsic camera parameters; 2. geometric-calibration: determining relative locations and orientations of different units in the setup such as camera, light sources, and monitor; 3. personal calibration: estimating cornea curvature, angular offset between visual and optical axes; and 4. gazing mapping calibration: determining parameters of the eyegaze mapping functions.

Gaze estimation //methods IR light and feature extraction: 2D Regression-Based Gaze Estimation 3D Model-Based Gaze Estimation Appearance based methods Similarly to the appearance models of the eyes, appearance-based models for gaze estimation do not explicitly extract features, but rather use the image contents as input with thei ntention of mapping these directly to screen coordinates (PoR). do not require calibration of cameras and geometry data since the mapping is made directly on the image contents Natural light methods Gaze estimation //methods IR light and feature extraction: 2D Regression-Based Gaze Estimation 3D Model-Based Gaze Estimation Appearance based methods Similarly to the appearance models of the eyes, appearance-based models for gaze estimation do not explicitly extract features, but rather use the image contents as input with thei ntention of mapping these directly to screen coordinates (PoR). do not require calibration of cameras and geometry data since the mapping is made directly on the image contents Natural light methods Natural light approaches face several new challenges such as light changes in the visible spectrum, lower contrast images, but are not as sensitive to the IR light in the environment, and may thus, be potentially better suited when used outdoor

Gaze estimation //methods Appearance based methods Example: K.-H. Tan, D.J. Kriegman, and N. Ahuja,: appearance manifold model treat an image as a point in a high-dimensional space: a 20 pixel by 20 pixel intensity image can be considered a 400-component vector, or a point in a 400-dimensional space (appearance manifold) s2 s3 s1 each manifold point s is an image of an eye, labeled with a 2D coordinate of a point on a display Gaze estimation //methods Appearance based methods Example: K.-H. Tan, D.J. Kriegman, and N. Ahuja,: appearance manifold model treat an image as a point in a high-dimensional space: a 20 pixel by 20 pixel intensity image can be considered a 400-component vector, or a point in a 400-dimensional space (appearance manifold) s1 Manifold Learning s2 s3 each manifold point s is an image of an eye, labeled with a 2D coordinate of a point on a display

Gaze estimation //methods Appearance based methods Example: K.-H. Tan, D.J. Kriegman, and N. Ahuja,: appearance manifold model treat an image as a point in a high-dimensional space: a 20 pixel by 20 pixel intensity image can be considered a 400-component vector, or a point in a 400-dimensional space (appearance manifold) Gaze estimation //methods Appearance based methods Example: William Blake & Cipolla: mapping images to continuous output spaces using powerful Bayesian learning techniques

Gaze estimation //methods Appearance based methods Example: William Blake & Cipolla: mapping images to continuous output spaces using powerful Bayesian learning techniques calibration Gaze estimation //methods Example: William Blake & Cipolla: mapping images to continuous output spaces using powerful Bayesian learning techniques Rather than using raw pixel data, input images are processed to obtain different types of feature To infer the input output mapping for unseen inputs in real-time: sparse regression model (Gaussian Processes) Method is fully Bayesian: output predictions are provided with a measure of uncertainty During the learning phase, all unknown modelling parameters are inferred from data as part of the Bayesian framework: do not require known dynamics a priori.

Gaze estimation //methods Appearance based methods Example: William Blake & Cipolla: mapping images to continuous output spaces using powerful Bayesian learning techniques Can be applied to other contexts Gaze estimation //methods Appearance based methods Example: William Blake & Cipolla: mapping images to continuous output spaces using powerful Bayesian learning techniques Can be applied to other contexts

Gaze estimation //using other cues Gaze estimation //head-tracking The Watson head-tracker real-time object tracker uses range and appearance information from a stereo camera to recover the 3D rotation and translation of objects, or of the camera itself. The system can be connected to a face detector and used as an accurate head tracker. Additional supporting algorithms can improve the accuracy of the tracker Software download http://groups.csail.mit.edu/vision/vip/watson/index.htm

The Watson head tracker The Watson head tracker //head pointing

The Watson head tracker, //Interactive Kiosk Shared attention Shared attention through gaze interactions?

Shared attention //Developmental timeline Shared attention Mutual gaze Gaze following

Shared attention Imperative pointing Declarative pointing (create shared attention) Shared attention //Open questions

Shared attention //Models (B.Scassellati, MIT) Shared attention //Models (B.Scassellati, MIT)

Shared attention //Robots that Learn to Converse: Shared attention //Robots that Learn to Converse:

Shared attention //Robots that Learn to Converse: