Gaze interaction (2): models and technologies

Gaze interaction (2): models and technologies Corso di Interazione uomo-macchina II Prof. Giuseppe Boccignone Dipartimento di Scienze dell Informazione Università di Milano boccignone@dsi.unimi.it http://homes.dsi.unimi.it/~boccignone/l Gaze interaction A. Vinciarelli, M. Pantic, H. Bourlard, Social Signal Processing: Survey of an Emerging Domain, Image and Vision Computing (2008)

Gaze estimation without eye trackers Problem! detect the existence of eyes accurately interpret eye positions in the images using the pupil or iris center. for video images, the detected eyes are tracked from frame to frame. Gaze estimation : detected eyes in the images used to estimate and track where a person is looking in 3D, or alternatively, determining the 3D line of sight. Gaze estimation without eye trackers

//eye models Identify a model of the eye which is sufficiently expressive to take account of large variability in the appearance and dynamics, while also sufficiently constrained to be computationally efficient Eyelids may appear straight from one view but highly curved from another. The iris contour also changes with viewing angle. The dashed lines indicate when the eyelids appear straight the solid yellow lines represent the major axis of the iris ellipse Even for the same subject, a relatively small variation in viewing angles can cause significant changes in appearance //eye models The eye image may be characterized by the intensity distribution of the pupil(s), iris, and cornea, their shapes. Ethnicity, viewing angle, head pose, color, texture, light conditions, the position of the iris within the eye socket, and the state of the eye (i.e., open/ close) are issues that heavily influence the appearance of the eye. The intended application and available image data lead to different prior eye models. The prior model representation is often applied at different positions, orientations, and scales to reject false candidates

//eye models Shape-based methods: use a prior model of eye shape and surrounding structures fixed shape deformable shape Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure intensity-based methods subspace-based methods Hybrid methods: combine feature, shape, and appearance approaches to exploit their respective benefits //eye models: Shape-Based Approaches Shape-based methods: use a prior model of eye shape and and a similarity measure Prior model of eye shape and surrounding structures iris and pupil contours and the exterior shape of the eye (eyelids) simple elliptical or of a more complex nature parameters of the geometric model define the allowable template deformations and contain parameters for rigid (similarity) transformations and parameters for nonrigid template deformations ability to handle shape, scale, and rotation changes

//eye models: Shape-Based Approaches Simple Elliptical Shape Models: example: Valenti and Gevers uses isophote (i.e., curves connecting points of equal intensity) properties to infer the center of (semi)circular patterns which represent the eyes //eye models: Shape-Based Approaches Simple Elliptical Shape Models:

//eye models: Shape-Based Approaches Simple Elliptical Shape Models: //eye models: Shape-Based Approaches Simple Elliptical Shape Models: example: Webcam-based Visual Gaze Estimation (Valenti et al) uses isophote (i.e., curves connecting points of equal intensity) no head pose voting Direction to the center

//eye models: Shape-Based Approaches Simple Elliptical Shape Models: example: Webcam-based Visual Gaze Estimation (Valenti et al) uses isophote (i.e., curves connecting points of equal intensity) no head pose //eye models: Shape-Based Approaches Simple Elliptical Shape Models: example: Webcam-based Visual Gaze Estimation (Valenti et al) uses isophote (i.e., curves connecting points of equal intensity) no head pose

//eye models: Shape-Based Approaches Simple Elliptical Shape Models: example: Webcam-based Visual Gaze Estimation (Valenti et al) simple interpolants for easy calibration //eye models: Shape-Based Approaches Complex Shape Models: example: Yuille deformable templates

//eye models: Shape-Based Approaches Complex Shape Models: example: Yuille deformable templates //eye models: Shape-Based Approaches Complex Shape Models: example: Yuille deformable templates

//eye models: Shape-Based Approaches Complex Shape Models: example: Yuille deformable templates //eye models: Shape-Based Approaches Complex Shape Models: 1. computationally demanding, 2. may require high contrast images, and 3. usually need to be initialized close to the eye for successful localization. For large head movements, they consequently need other methods to provide agood initialization

//eye models: Feature-Based Shape Methods Explore the characteristics of the human eye to identify a set of distinctive features around the eyes. The limbus, pupil (dark/bright pupil images), and cornea reflections are common features used for eye localization Local Features by Intensity The eye region contains several boundaries that may bedetected by gray-level differences Local Feature by Filter Responses Filter responses enhance particular characteristics in the image while suppressing others. A filter bank may therefore enhance desired features of the image and, if appropriately defined, deemphasize irrelevant features //eye models: Feature-Based Shape Methods Local Features by Intensity The eye region contains several boundaries that may be detected by gray-level differences

//eye models: Feature-Based Shape Methods Local Features by Intensity The eye region contains several boundaries that may be detected by gray-level differences (Harper et al.) //eye models: Feature-Based Shape Methods Local Features by Intensity The eye region contains several boundaries that may be detected by gray-level differences Sequential search strategy

//eye models: Feature-Based Shape Methods Local Features by Intensity The eye region contains several boundaries that may be detected by gray-level differences //eye models: Feature-Based Shape Methods Local Features by Intensity The eye region contains several boundaries that may be detected by gray-level differences

//eye models: Feature-Based Shape Methods Local Feature by Filter Responses Filter responses enhance particular characteristics in the image while suppressing others Example Sirohey and Rosenfeld: Edges of the eye s sclera are detected with four Gabor wavelets. A nonlinear filter is constructed to detect the left and right eye corner candidates. The eye corners are used to determine eye regions for further analysis. Postprocessing steps are employed to eliminate the spurious eye corner candidates. A voting method is used to locate the edge of the iris. Since the upper part of the iris may not be visible, the votes are accumulated by summing edge pixels in a U-shaped annular region. The annulus center receiving the most votes is selected as the iris center To detect the edge of the upper eyelid, all edge segments are examined in the eye region and fitted to a third-degree polynomial //eye models: Feature-Based Shape Methods Local Feature by Filter Responses Filter responses enhance particular characteristics in the image while suppressing others Example Sirohey and Rosenfeld:

//eye models: Feature-Based Shape Methods Local Feature by Filter Responses Filter responses enhance particular characteristics in the image while suppressing others Example Sirohey and Rosenfeld: //eye models: Feature-Based Shape Methods Local Feature by Filter Responses Filter responses enhance particular characteristics in the image while suppressing others Example Sirohey and Rosenfeld:

//eye models: Feature-Based Shape Methods Local Feature by Filter Responses Filter responses enhance particular characteristics in the image while suppressing others Example Sirohey and Rosenfeld: //eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure intensity-based methods subspace-based methods

//eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure intensity-based methods ( example Grauman et al) During the first stage of processing, the eyes are automatically located by searching temporally for "blinklike" motion //eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure intensity-based methods ( example Grauman et al) During the first stage of processing, the eyes are automatically located by searching temporally for "blink-like" motion

//eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure intensity-based methods ( example Grauman et al) //eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure subspace methods (eigeneyes)

//eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure subspace methods (eigeneyes) How can we find an efficient representation of such a data set? Rather, than storing every image, we might try to represent the images more effectively, e.g., in a lower dimensional subspace We seek a linear basis with which each image in the ensemble is approximatedas a linear combination of basis images //eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure subspace methods (eigeneyes) How can we find an efficient representation of such a data set? Rather, than storing every image, we might try to represent the images more effectively, e.g., in a lower dimensional subspace let s select the basis to minimize squared reconstruction error

//eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure subspace methods (eigeneyes) How can we find an efficient representation of such a data set? Rather, than storing every image, we might try to represent the images more effectively, e.g., in a lower dimensional subspace The eigenvectors of the sample covariance matrix of the image data provide the major axis //eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure subspace methods (eigeneyes)

//eye models Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure subspace methods (eigeneyes) //in summary... Shape-based methods: use a prior model of eye shape and surrounding structures fixed shape deformable shape Appearance-based methods: rely on models built directly on the appearance of the eye region: template matching by constructing an image patch model and performing eye detection through model matching using a similarity measure intensity-based methods subspace-based methods Hybrid methods: combine feature, shape, and appearance approaches to exploit their respective benefits Other methods: eye trackers: active light (IR)...we have already considered these

Gaze estimation Gaze: the gaze direction the point of regard (PoR or fixation) Gaze modeling consequently focuses on the relations between the image data and the point of regard/gaze direction. Gaze estimation //some general problems 1. camera-calibration: determining intrinsic camera parameters; 2. geometric-calibration: determining relative locations and orientations of different units in the setup such as camera, light sources, and monitor; 3. personal calibration: estimating cornea curvature, angular offset between visual and optical axes; and 4. gazing mapping calibration: determining parameters of the eyegaze mapping functions.

Gaze estimation //methods IR light and feature extraction: 2D Regression-Based Gaze Estimation 3D Model-Based Gaze Estimation Appearance based methods Similarly to the appearance models of the eyes, appearance-based models for gaze estimation do not explicitly extract features, but rather use the image contents as input with thei ntention of mapping these directly to screen coordinates (PoR). do not require calibration of cameras and geometry data since the mapping is made directly on the image contents Natural light methods Gaze estimation //methods IR light and feature extraction: 2D Regression-Based Gaze Estimation 3D Model-Based Gaze Estimation Appearance based methods Similarly to the appearance models of the eyes, appearance-based models for gaze estimation do not explicitly extract features, but rather use the image contents as input with thei ntention of mapping these directly to screen coordinates (PoR). do not require calibration of cameras and geometry data since the mapping is made directly on the image contents Natural light methods Natural light approaches face several new challenges such as light changes in the visible spectrum, lower contrast images, but are not as sensitive to the IR light in the environment, and may thus, be potentially better suited when used outdoor

Gaze estimation //methods Appearance based methods Example: K.-H. Tan, D.J. Kriegman, and N. Ahuja,: appearance manifold model treat an image as a point in a high-dimensional space: a 20 pixel by 20 pixel intensity image can be considered a 400-component vector, or a point in a 400-dimensional space (appearance manifold) s2 s3 s1 each manifold point s is an image of an eye, labeled with a 2D coordinate of a point on a display Gaze estimation //methods Appearance based methods Example: K.-H. Tan, D.J. Kriegman, and N. Ahuja,: appearance manifold model treat an image as a point in a high-dimensional space: a 20 pixel by 20 pixel intensity image can be considered a 400-component vector, or a point in a 400-dimensional space (appearance manifold) s1 Manifold Learning s2 s3 each manifold point s is an image of an eye, labeled with a 2D coordinate of a point on a display

Gaze estimation //methods Appearance based methods Example: William Blake & Cipolla: mapping images to continuous output spaces using powerful Bayesian learning techniques calibration Gaze estimation //methods Example: William Blake & Cipolla: mapping images to continuous output spaces using powerful Bayesian learning techniques Rather than using raw pixel data, input images are processed to obtain different types of feature To infer the input output mapping for unseen inputs in real-time: sparse regression model (Gaussian Processes) Method is fully Bayesian: output predictions are provided with a measure of uncertainty During the learning phase, all unknown modelling parameters are inferred from data as part of the Bayesian framework: do not require known dynamics a priori.

Gaze estimation //methods Appearance based methods Example: William Blake & Cipolla: mapping images to continuous output spaces using powerful Bayesian learning techniques Can be applied to other contexts Gaze estimation //methods Appearance based methods Example: William Blake & Cipolla: mapping images to continuous output spaces using powerful Bayesian learning techniques Can be applied to other contexts

Gaze estimation //using other cues Gaze estimation //head-tracking The Watson head-tracker real-time object tracker uses range and appearance information from a stereo camera to recover the 3D rotation and translation of objects, or of the camera itself. The system can be connected to a face detector and used as an accurate head tracker. Additional supporting algorithms can improve the accuracy of the tracker Software download http://groups.csail.mit.edu/vision/vip/watson/index.htm

The Watson head tracker The Watson head tracker //head pointing

The Watson head tracker, //Interactive Kiosk Shared attention Shared attention through gaze interactions?

Shared attention //Developmental timeline Shared attention Mutual gaze Gaze following

Shared attention Imperative pointing Declarative pointing (create shared attention) Shared attention //Open questions

Shared attention //Models (B.Scassellati, MIT) Shared attention //Models (B.Scassellati, MIT)

Shared attention //Robots that Learn to Converse: Shared attention //Robots that Learn to Converse:

Shared attention //Robots that Learn to Converse: